- This topic has 0 replies, 1 voice, and was last updated 3 months, 2 weeks ago by Oluwole.
February 3, 2020 at 2:00 pm #85069Participant@oluwole
Odds are that you have come across the term “Big data” at some point. Have you heard it thrown around and never bothered about it? Or did you stumble upon it somewhere and you never quite understood what it meant? No matter, you’re in the right place! Together we will demystify “Big Data”. But first, I must assure you that it is not as mysterious as the “Big Foot”. Let’s go!!
Data and Big Data
According to webopedia, in the context of computing;
“Data is distinct pieces of digital information, usually formatted in a special way”
Digital information refers to data stored on computers and in digital media using a series of ones and zeros (hence the word “digital” from “digits”). Such information is usually formatted in specific ways, usually as numbers and texts.
All you’re staring at as you read this is data. It’s that simple.
What then is Big Data?
Big data essentially refers to huge sized data. It describes a large and complex collection of data sets, usually received from new and many data sources.
As a result of their size and complexity, big data need modern management tools for efficient processing and storage. They can also be analysed for insights which enhance decision making and process automation.
We will delve deeper into the usefulness of Big Data, but first, a little bit on its properties.
The Three Vs
Big data is characterized by these three Vs –
- velocity; and
Volume describes the amount of data. For a particular data to be considered as Big Data, it must have an enormous volume. Sometimes the data is of unknown volume, say in the case of the number of clicks in a mobile application. It could be in hundreds of terabytes like the amount of new data ingested into Facebook’s database daily. Or it could take the form of tens of petabytes like the daily flight data of jet engines.
Velocity refers to the speed of data generation; that is how fast the data is received and processed. Today, data streams into businesses from sources like application logs, social media sites, mobile devices, etc. at unprecedented speeds. These must be handled on time to actualize the real potential of the data.
Big Data flow is massive and continuous. As such, a delay in data processing may render it obsolete or make it not as useful as it could have been. Thus, this property is more important in the fields of artificial intelligence and machine learning.
Variety is the heterogeneous nature of the available data. In times past, data types were structured neatly in spreadsheets and databases. Nowadays, other “unstructured” formats such as emails, pictures, videos, audio, financial data, PDFs, etc. are commonplace. This variety of data demands the use of additional preprocessing, storage, mining and analyzing technologies.
Some other properties of Big Data are its variability (how unpredictable it is) and veracity (the quality of the data).
Now that we’ve established what Big Data looks like, let’s check out its types.
Types of Big Data
Big Data exists in three forms namely;
Structured data refers to any data that follows a pre-defined data model and can be processed in a fixed format. Their pattern, e.g. a tabular format where the rows and columns have a relationship, make them easily searchable.
Examples of structured data are numbers, dates word groups and strings commonly found in spreadsheets stored in SQL databases, Excel files, and data warehouses. They could either be machine-generated or human-generated.
Unstructured data refers to information that has no pre-defined model and is disorganized. They are usually text-heavy and possess irregularities and ambiguities that make it hard to process compared to structured data.
The analysis of unstructured data is relevant in the context of Big Data, especially as a large part of the data obtainable in organizations is unstructured. They could be in picture, video, or document formats. Therefore the ability to process and derive value from unstructured data is critical.
A good example of unstructured data is the output returned by “Google Search”
How Does Big Data Work?
Big data operates through three key actions;
- Integration: This involves the collection of data from varied sources and applications. Here, the data is received, processed, and formatted in a manner that allows for proper analysis.
- Management: Big data must be stored properly. This could be done either in the cloud, in hardware or both. A lot of businesses are beginning to opt for cloud storage because it supports current compute requirements and enables spinning up of resources when needed.
- Analysis: The crucial aspect of big data is its analysis. This could mean building data models with artificial intelligence, data exploration for new insights and visual analysis of varied data sets for clarity.
Why Big Data is Important
The importance of big data is contingent on what is done with it, rather than its size. In other words, it matters less how big the data is compared to how well it is stored, processed and analyzed.
The biggest advantage of big data is perhaps its influence on decision making. Companies having a lot of customers’ data have the advantage of tailoring their products, services and marketing moves to create the highest level of consumer satisfaction. Big data provides businesses with the opportunity to conduct a richer and more complete analysis which assist in making the right decisions.
Analysis of big data also provides solutions that enable cost reductions, time reductions, product development, optimized processes, and fraud detection.
Big Data Challenges
For all its glory and rave, big data does pose several challenges to businesses. Amongst these are its size, noisiness, cost, timeliness and security.
Data Size: The International Data Corporation (IDC) estimates that the data stored in the world’s IT systems is increasing twice as fast biennially. The majority of that data is unstructured. In other words, it does not reside in any database. Managing such volume of data is, therefore, a growing challenge and although there are different technologies (hardware and processes) to tackle this, they are usually expensive and voluminous.
Noisy Data: Businesses must be careful in distinguishing between signals and noise in the data available. The signal is the meaningful information or patterns inherent in the data. Noise, however, refers to the random, unwanted variation that interferes with the signal. Therefore, isolating the signals from the noise determines the relevance of the data and the process is much more difficult when handling very noisy data.
Timeliness: You don’t just want to store big data, rather you want to utilize the data in achieving your business goals. Businesses can effectively use big data if the insights obtained therein are acted upon quickly. Timely decision making is especially crucial in finance, healthcare and insurance industries. Stock markets, for instance, are prone to minute-by-minute fluctuations and businesses that are slow to react can end up losing invaluable trading opportunities.
Security: The internet of things has made big data stores – and indeed every internet user – targets for hackers. Sensitive user information and company data can be accessed in the absence of good data security measures. Some of these security measures include data encryption, data segregation and identity & access control.
What we’ve learnt so far:
- Big data refers to large-sized and complex data such as Facebook’s database.
- It is essentially characterized by three Vs – velocity, variety and volume.
- Big data could be structured (follows a set model), unstructured (doesn’t follow any model) or semi-structured (a combination of the two forms).
- Integration, management and analysis are the three actions involved in the use of big data.
- Big data is important to businesses in making good decisions and optimizing processes.
- Some of the challenges big data poses include its size, noisiness, timeliness and security.
The world today is data-driven. As populations increase and demographics become more dynamic, businesses will expand accordingly to meet new needs and demands. Big Data will, therefore, keep getting bigger and more complex. While this poses some challenges that technology and infrastructure will need to address, it is certain that the role of Big Data in the success of organizations will not diminish in the foreseeable future.
You must be logged in to reply to this topic.