Big data is a term that is overused nowadays, although its context and importance remain unclear to most of the audience and even to a large percentage of SMEs. Into the Fourth Industrial Revolution, data-driven decisions are more critical than ever for businesses, but at the same time, the difficulty level of collecting, analysing, and drawing conclusions from massive amounts of data has reached new heights.
According to Domo’s report in its 6th edition, ‘Data never sleeps 6.0’, 2.5 quintillion or 1,0006 bytes are created each day. Also, the report predicts that by 2020 for each person on the planet will correspond 1.7ΜΒ of data per second. These stratospheric numbers that cause vertigo, are not manageable with conventional techniques/methods and it is at this point, that the big data come to cope with these challenges to the fullest extent possible.
Big data refers to enormous and complex data sets that cannot be dealt with conventional data processing techniques and software. Big data analytics combine multiple methods of analysis such as predictive models, user behaviour analysis, statistical algorithms, and scenario analysis through high intelligence computing systems/software.
The analysis of both structured and unstructured data helps business analysts to reveal patterns, correlation, market trends, gain insights about user/customer behaviours and answer „what if questions“ that allow companies to make highly educated business decisions. Over the last decade, big data analytics are adopted by almost every major field from public affairs, manufacturing and healthcare to insurances and IT technologies. The following chart verifies the broad recognition and the effectiveness of big data analytics, which predicts the growth of the big data market for the next eight years.
To understand and better define the framework of big data, we need to look at some key features that shape the concept. Going forward, we will delve into the 5 Vs of the big data. It is important to note that there are different approaches to the number of factors that make up the big data concept.
Volume refers to the amount of data coming from different sources/channels, such as loyalty cards, credit cards, social media platforms, video, analytical tools, and so on. In the era of big data (mainly unstructured data formats), data sets can be stored in different locations. The use of robust data management software such as Apache Hadoop allows timely and effective analysis and correlation of these data to each other.
Today, the speed at which data is produced is outstanding, and business decisions must follow that pace. The essence of big data is not only the analysis of massive amounts of data but also the speed, and if it is possible, the real-time processing even without saving them to databases but rather, analysing complex and large data sets at the time they are created.
There are two components regarding velocity:
1. The frequency of data generation.
2. The frequency of processing, recording and publishing data.
The rapid progress of new technologies alters the nature and formats of data. Depending on the source or research that someone is looking on, the percentage of unstructured data (images, videos, social media updates, review etc.) among all the data generating, ranges between 80%-95% of the aggregate. Big data offers analysts the ability to identify correlations, patterns and more profound insights between sets of data that seemed to have nothing to do with each other.
All the above Vs add value to the results derived from the analysis of the data. The opportunity of collecting and analysing more extensive data sets from different sources(volume), offers the chance to companies/organisations to proceed with more targeted and accurate business decisions, by identifying uncover patterns and hidden correlations that support existing initiatives or generate new business ideas.
In terms of velocity, the value does not derive from the quantity but the fact, that the technological advances in the field of AI and machine learning, enables companies to extract the necessary insights precisely at the time they need to shape their business decisions.
Last but not least, the variety is also critical. Different types of data coming from various sources enable business analysts to identify new approaches and shape different perspectives over a business problem or decision.
Veracity refers to the reliability of the data. The degree of credibility defined by the assessment of the specific data sources, their context and the level at which the data are likely to be relevant for the analysis performed. Veracity does not only concern the accuracy of the sources but also their trustworthiness. Because of this, steps like eliminating prejudice, inconsistencies and overlaps are just a few of the actions to be taken.
Just in the opposite direction of value, veracity „experiences“ the side effects of the above characteristics (volume, velocity and variety). The vast amounts of data differ both in nature and in formats, which must be treated in real time making the analysts’ mission very tough.
We have already pointed out in the introductory part of the article through several statistics, that the big data industry is growing explosively. To better understand the broader context of this industry and the importance of big data nowadays, we need to go through some of the developments that enhance the ever-increasing trend of big data.
The technological achievements of recent years have changed sharply and continue to alter the nature and potentials of computing systems. In the early 2000s, only large companies were able to invest and take advantage of the value of computing, as the cost of acquiring or renting the necessary infrastructure was too high. However, now the landscape has changed dramatically, as computer systems and software have gone to a whole new level.
Computers now are more powerful than ever before, equipped with high-speed processors and the ability to store massive amounts of data. Personal computers and smaller screen devices have gone from hard drives that have been able to store MB and GB to Terabytes storage systems and the corporate systems from TB to Petabytes and even exabytes of storage capacity, resulting in a gradual reduction in business investment required to store vast volumes of data.
Another factor that reduces infrastructure costs stems from the development of cloud computing and especially cloud storage services. Many cloud storage providers offer services that include the security, availability and accessibility of their customer data. Companies and organisations have the opportunity to rent these cloud infrastructures for specific periods or on demand at a relatively low price per gigabyte.
Technological progress also has beneficial effects on the development of new analytical software/tools and databases that allow unstructured data to be analysed in depth, timely and with the ability to scale infinitely. Open source tools like Apache Hadoop, HPCC Systems („Thor“ data refinery and „Roxie“ query clusters), Apache Spark and NoSQL databases such as MongoDB, Cassandra and Amazon DynamoDB, are some of the products that enhance big data analytics.
Furthermore, the progress on analytical algorithms, mainly through machine learning, helps big data industry to strengthen further. The broad possibilities it offers, its ability to fulfil tasks without specific mandates(autonomy) and mainly the predictive analysis capacities, consists some of the reason why machine learning has been adopted from several verticals, which it continues to disrupt, such as retail, insurance, weather forecasting and so on.
In addition, progress in analytical algorithms, mainly through machine learning, helps the big data industry to be further enhanced. The broad capabilities it offers, its ability to perform non-specific tasks (autonomy) and the analytical prediction skills are some of the reasons why machine learning has been adopted in different industries/sectors such as finance, retail, insurance, and so on.
As already emphasised earlier in the infographic, the intensity with which it is given is outrageous. Two primary sources/advancements feed these data rates, IoT and social media.
In recent years, the Internet of Things is one of the hottest topics. Although this industry has not reached its full potential, it certainly gives more value to the concept of the big data analysis. IoT, leads the big data concept and its importance in a new era, as there will be an explosion in the volume of data created. From smartphones, smart buildings and cars to smart cities and transports, everything around us will produce and transmit real-time data that will deliver more informative load than ever before.
IoT is already here, and it’s a game changer for each vertical. To better understand the dynamic that IoT industry already have and the provision for its growth in the following years we will point out a fascinating insight. According to Statista, in 2018 there were 23bn of connected devices worldwide, and the prediction is that it will be more than 75bn in 2019.
On the other hand, the impact of social media is already known to businesses and big data analysts. Social media platforms have a significant effect on the vast amount of data created over the last decade and through the variety of different data produced (images, videos, comments, etc.) on social media platforms, analysts can acquire in-depth insights into different audiences and reveal new patterns and trends on a regular basis.
Big Data‘ is here to stay. Data production rates are expected to increase exponentially in the years to come, and each segment of our society contributes to this. Big data analysis is valuable to every industry, offering the opportunity to develop through the best possible insights. The more these industries grow, so much more big data will be produced, and the big data analytics will be even more critical.