What Is “Big Data”?

One of the buzz words in the enterprise software industry today is Big Data. But what is it exactly, and how does it differ from traditional business intelligence (BI) or data analysis?

Big Data does not necessarily refer to the size of the data, but to the nature of the data and its purpose. At a high level, Big Data involves establishing connections between various unstructured data elements. This can include computer-generated data (like server log data) as well as human-generated content like blog posts, newsletters, social networks and more.

Because of the varied nature and velocity of these disparate elements, it is extremely difficult to do any meaningful data mining that establishes connecting trends and schemes. After all, how do you capture, store, search, analyze, and visualize unstructured data of every format possible in a unified manner?

Conversely, traditional BI looks at a subset of data. For instance, it creates a report or data model for Marketing or Sales purposes, and then populates that report or data model accordingly. Big Data, however, establishes connections across a wider range of data types. Rather than populating pre-determined data models, Big Data looks at the data and generates its own trends and patterns.

What is Big Data: Examples in the Field

Take invoicing, for example. The process of receiving a purchase order, creating an invoice, shipping and receiving of goods, settling payments, etc. creates a lot of transactional data. What you end up with is a record of line items, dollar totals, posting dates, etc. But what if you’d want to report on a wide range of transactions over an extended period of time that not only includes the transactional data but also the accompanying email strings, documentation, comment log, and more?

Big Data isn’t new, but it is quickly gaining popularity because of the growing ability of technology to handle large data sets. As in any market segment, however, there are fine distinctions between what each Big Data solution provider does exactly. The main difference is between vendors that offer analytical tools (examples are 1010data, ParAccel, and Quantivo) and those that handle distribution and support (including Cloudera, Sybase, 10gen, and Infobright).

Hadoop

A key component of these solutions is Hadoop, which is a free-license platform that indexes a variety of information and presents actionable results to the user. For example, in Finance, you can create complex data models for risk analysis that do not fit into the regular charts and tables of a BI solution.

Hadoop runs on a large number of machines that don’t share any memory. In other words, it spreads data across different servers. Because each server has its own processors, Hadoop produces and replicates data at lightning speed. There’s no unified place where data is stored, but if you run a search, Hadoop will get the separate pieces of data from each server and present them to you as a unified whole.

Big Data has been getting a major push recently with the release of SAP’s HANA product, a real-time data aggregation platform. This makes sense: the majority of the largest corporations in the world (who, one would assume, have the most data to analyze) run SAP. But other software providers aren’t far behind. Oracle offers its Big Data Appliance and many other major vendors have already followed suit.

Do you offer a Big Data solution that you’d like us to review? We’d love to hear from you. Please email us.

Want more insider-perspective posts on big data? Check out our side-by-side comparison of leading intelligence platforms on the Top 10 Business Intelligence Software report.

N. Rowan: