In the previous article, we gave a brief introduction to the definition of Big Data. Now that we have a small understanding of what big data is and the driving force it poses for every industry, we should next try to understand what are the different types of data that are commonly analysed. This will help us gain a better understanding of what kind of data sets people are dealing with in data preparation and data analysis.
In the previous article we mentioned that some types of data come in unstructured, structured, and semi-structured formats. In this article, we will explain what that means more in-depth to give you a better understanding.
Structured data is far easier for Big Data programs to understand and process, while the numerous formats of unstructured data creates a greater challenge. Yet both types of data play a key role in effective data analysis.
To understand why there are different types of data, first we should understand that once data is generated, it is stored in a specific format on a storage device or server. The data format – commonly referred to as data structure – determines how quickly a computer can analyse and query it.
The Types of Big Data
Big data can be defined based on its structure. The structure of data depends on how it can be sorted. In other words, whether it can be formatted into tables of rows and columns. There are three types of big data when defining it by the structure: Once data is generated, it can come in the following types:
Structured data is data that follows a pre-defined data model and is straightforward to analyse. It is comprised of data types whose pattern makes them easily searchable. Examples of structured data types include: names, social security numbers and phone numbers. These things can be easily structured into an excel file with rows and columns to sort. In summary, this type of data has the advantage of being easily entered, stored, queried and analysed.
Unstructured data is the opposite of structured data – as you might expect. This type of data is far more difficult to process and analyse. This is because they comprise of data that are not usually as easy to search and structure into an easily-readable format like an excel sheet. The ability to store and process unstructured data has greatly grown in recent years, with many new technologies and tools having been developed that are able to store specialised types of unstructured data.
Examples of unstructured data include audio, video, word processing documents and presentations. Although these sorts of files may have an internal structure, they are still considered unstructured because the information contained does not fit in a database. These data files result in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in a structured database.
It should be noted that the term ‘big data’ is closely associated with unstructured data. This is because big data refers to very large data sets that are difficult to analyse using traditional methods. Big data can include both structured and unstructured data, but the IDC estimates that 90 percent of big data is unstructured data. Many of the tools designed to analyse big data can handle unstructured data.
You might be able to guess from the name. But this refers to data files that contain a mix of structured and unstructured data. This is a form of structured data that contains tags or internal markings that can be sorted into groups and hierarchies. Both documents and databases can be semi-structured. A common type of semi-structured data would be an email. The contents of the email such as the text and any audio/video files would be considered unstructured data. While the tags or internal markings like the date and time/names/addresses can be sorted, so therefore it is structured data.
The last category of data is metadata. From a technical point of view, this is not a separate data structure, but it is one of the most important elements for Big Data analysis and big data solutions. Metadata is data about data. It provides additional information about a specific set of data. In a set of photographs, for example, metadata could describe when and where the photos were taken. The metadata then provides fields for dates and locations which, by themselves, can be considered structured data. Because of this reason, metadata is frequently used by Big Data solutions for initial analysis.
Now that you’ve gained a greater understanding of the different types of big data, you might be wondering about the value of big data itself to businesses and organizations. How can it be used and how does it increase an organizations capabilities and revenues?