Companies are in for a major change to how they operate, manage and leverage data in the coming years. Data is quickly becoming the new currency and leading businesses are looking for ways to capitalize on this change.
The Data Deluge Problem
A recent IDC report on data suggests the sheer amount of data generated doubles every two years. By the year 2020, the total amount of data will equate to 40,000 exabytes or 40 trillion gigabytes. To put in perspective, that’s more than 5,200 gigabytes for every man, woman and child in 2020.
However, the problem is not the data itself. The problem rests with how and what to do with the data. To complicate matters, much of the data generated comes from new sources such as wearable devices, mobile devices, social media and machine data.
Social Data Streams
The impact from social media is significant on its own:
Twitter: 400 millions tweets per day
Facebook: 4.75 billion content items shared per day
According to one whitepaper, Facebook currently houses more than 250 petabytes of data with .5 petabytes of new data arriving every day. Facebook and Twitter only represent two of the more popular social data sources yet there are many more.
IoT and Machine Data
A relatively recent source of data is coming from the Internet of Things (IoT). IoT represents a collection of uniquely identifiable items. Individually, these items generate their own sets of data. Data may also come in the form of ‘machine data’ or industrial data, which is generated through the use of equipment.
For example, GE’s GEnx next generation turbofan engines found on Boeing 787 and 747-8 aircraft contain some 5,000 data points that are analyzed every second. Put that into perspective. According to Wipro research, a single cross-country flight across the United States generates 240TB of data. The average Boeing 737 engine generates 10 terabytes every 30 minutes of flight.
Using a bit of math, the problem becomes fairly apparent. Using data from MIT’s Airline Data Project and the total number of Boeing 787’s in use as of December 31, 2013, the problem becomes:
Total data generated every day by the global 787 fleet in operation today:
(20TB/ hr x 9hr ave operation per day) x 2 engines x 114 787 aircraft = 41,040 terabytes (or 40 petabytes)
For Southwest Airlines alone, their data challenge is more significant:
Total data generated every day by Southwest Airlines’ fleet of 607 Boeing 737 aircraft:
(20TB/ hr x 10.8hr ave operation per day) x 2 engines x 607 737 aircraft = 262,224 terabytes (or 256 petabytes)
256 petabytes is a lot of data. GE included a couple more examples of Industrial Data in “The Case for an Industrial Big Data Platform.” From these examples, the sheer amount of data from IoT and machine data becomes clearly apparent. And these examples only highlight a small, specific use-case that does not take into account other aspects of the airline industry.
Not All Data is Equal
In many cases, unlike traditional enterprise data, which is structured in nature, these new sources of data reside in many forms and are typically unstructured. This presents a challenge to traditional data warehouses that are accustom to consuming and managing structured data.
When thinking about how to ‘consume’ these new sources of data, several key considerations reside with the data itself. Much of the data, in essence, has a half-life that drives its value over time. An important consideration is in which data to keep and for how long. The challenge is in knowing now what data might be needed in the future. That is easier said than done.
The default action taken by many enterprises today is to simply keep all data, which is costly just for the storage in which to house it. Unfortunately, this is leading to ‘data landfills’ of mixed data with varying degrees of value. As the volume of data increases, so will the landfills unless a different approach is taken.
The Holy Grail of Data Correlation
In addition to stockpiling data, the real value for many will come in the form of correlation. Leveraging one data stream provides valuable insight. However, when paired or correlated with multiple data streams, a much clearer picture becomes visible.
Think of the value to a company when they can compare social data, operational data and transaction data. For many marrying these data streams present multiple challenges. Now imagine that the number of streams (sources) along with the volume of data is increasing. It becomes clear how the problem gets pretty complicated pretty quickly.
Consumer vs. Corporate
From the increase in consumer adoption of devices and services over the past few years, it is clear that consumers are ready to generate more data. Enterprises need to prepare for the oncoming onslaught.
As consumers, we want enterprises to succeed in leveraging the data we provide. Take healthcare for example. Imagine if healthcare providers could correlate data between lab results, pharmacy data, claims data and social media streams. The outcome might be pre-emptive diagnosis based on trends of epidemics and illness across the globe. In addition, the results would be highly personalized and overall lower the cost of healthcare. If done, it would present significant economic and social improvements.
The Data Driven Economy
In summary, the onslaught from data is both concerning and exciting at the same time. The potential information generated from the data presents major opportunities across industries from providing greater work efficiency to saving lives. Business, as a whole, is becoming even more reliant on information and therefore data-driven. Data ultimately provides greater insight, personalization, and accuracy for business decisions.
It is important for enterprises to quickly evaluate new methods for data consumption and management. The success or failure of companies may very well reside in their ability to address the data tsunami.