If you’re someone a vendor is trying to sell his big data solution, you probably heard them often say “If you buy our Big Data solution, you wouldn’t need that data warehouse anymore”. This raises an important question, indeed there are similarities between a big data solution and data warehouse. Both hold a lot of data. Both can be used for reporting. Both are managed by electronic storage devices. But are they truly replaceable?
What is Big Data?
In order to observe the truth (or lack thereof) in this line of thinking, let’s start with the basics. First, what is big data? There are many different forms of big data. The most widely understood form of big data is the form found in Hadoop, Cloudera, et al.
A good working definition of big data solutions is:
There are probably other ramifications and features, but these basic characteristics are a good working description of what most people mean when they talk about a big data solution. (In order to verify this working definition, refer to the websites of Cloudera or HortonWorks.)
What is a Data Warehouse?
There are different understandings of what is meant by big data, and there are different understandings of what is meant by data warehousing. In principle, there are two approaches, there is the Kimball approach to data warehousing, and there is the Inmon approach to data warehousing. For the purposes of this article, the Inmon approach to data warehousing will be discussed. The Inmon approach to data warehousing centers around the definition of a data warehouse, which was given many years ago. A data warehouse is a subject-oriented, non-volatile, integrated, time variant collection of data created for the purpose of management’s decision making. Other way of saying the same thing is that a data warehouse provides a “single version of the truth” for decision making in the corporation. With a data warehouse there is an integrated, granular, historical single point of reference for data in the corporation.
So why do people want a big data solution? People want a big data solution because in a lot of corporations there is a lot of data. And in those corporations that data – if unlocked properly – can contain much valuable information that can lead to better decisions that, in turn, can lead to more revenue, more profitability and more customers. And that is what most corporations want.
And why do people need a data warehouse? People need a data warehouse in order to make informed decisions. In order to really know what is going on in your corporation, you need data that is reliable, believable and accessible to everyone.
Comparing Big Data Solutions to a Data Warehouse
So when we compare a big data solution to a data warehouse, what do we find? We find that a big data solution is a technology and that data warehousing is an architecture. They are two very different things. A technology is just that – a means to store and manage large amounts of data. A data warehouse is a way of organizing data so that there is corporate credibility and integrity. When someone takes data from a data warehouse, that person knows that other people are using the same data for other purposes. There is a basis for reconcilability of data when there is a data warehouse.
“The difference between a technology and an architecture is the difference between hammers and nails and Santa Fe, New Mexico. Hammers and nails can be used to build many different things. You can build houses, tables, bridges, desks and many things with hammers and nails. The houses in Santa Fe are all of a distinctive architecture. In Santa Fe you find adobe, exposed beams and vigas. When you are in Santa Fe, you know that you are nowhere else. Santa Fe has its own architecture. And it is true that the homes and buildings in Santa Fe have been built from hammers and nails. But go to Santa Fe, and the difference between a technology and an architecture will be very clear to you.”
Source: Bill Inmon, BeyeNETWORK
According to an answer on Stackoverflow,
“Big Data is a term applied to data sets whose size is beyond the ability of commonly used tools to capture, manage and process the data within a tolerable elapsed time. But Data-warehouse is a collection of data marts representing historical data from different operations in the company.
It means Big Data is collection of large data in a particular manner but Data-warehouse collect data from different department of a organization. However Data-warehouse require efficient managing technique. Conceptually these are same only at one factor that they collect large amount of information.”