Needle in the Hay Stack

Archive for the ‘Introduction’ Category

Big Data Techniques

Most big data techniques have been around for many years. What’s new is their availability to more people, the speed with which they run (so that many variations can be processed), the variety of data they can process (to provide richer and deeper context), and the volume of data they can handle.

The various types of analytics in big data are

  1. Descriptive analytics
  2. Diagnostic analytics
  3. Prescriptive analytics
  4. Predictive analytics

Descriptive analytics – What happened?

Descriptive analytics aims to provide insight into what has happened. There are various methods and technologies that are involved in descriptive analytics like A/B testing, dashboards, business activity monitoring, complex event processing, content analytics, geospatial analytics, graph analytics, pattern/anomaly detection and clustering/classification.

Diagnostic analytics – Why did it happen?

Diagnostic analytics focuses on analysis of data to find out the causes of the event and relates to Root-cause analysis. The method/technologies include online analytical processing, data mining and interactive visualization.

Predictive analytics – What will happen?

Predictive analytics helps model and forecast what might happen.  Predictive analysis involves technologies like crowdsourcing, data mining, forecasting, machine learning and simulations.

Prescriptive analytics – Make it happen

Prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters. The methods and technologies include fuzzy logic, optimization, rules engines and decision analysis.

Needle in the Haystack

Big data is all about analyzing millions of data and making meaningful sense out of it. You can compare it to “finding the needles in the haystack”. The trick here is that the hay stack is ever growing (Volume), in fast phase (Velocity). Wikipedia defines “Hay is grass, legumes or other herbaceous plants that have been cut, dried, and stored” (Variety). The needles are useful tool (predictions) that are hidden inside.

While working with our clients in various projects we do witness lot of data being generated and probably stored but never used. Customer interactions, customer requests (ticketing systems), auto-generated system data, logs, etc., these data are mostly captured and archived let to decay and then discarded. Its time we identify the importance of this data.

The sudden interest in Big Data is resultant of the behavioral changes that we have seen in the recent past. Internet which had been a mere static source of information now has become a media for immense information exchange due to increased activity in social network and availability of mobile internet. This is the transformation of internet from an ugly cocoon to a beautiful butterfly… “It’s mobile now”. It’s also important to note how extreme compute and storage which was once at the reach of large enterprises or research establishment is now available to end users, thanks to Cloud computing.

Keep an eye in this space for more about Big Data.