Needle in the Hay Stack

Posts tagged ‘garbage’

Rag picker or Gold miner

Data is exploding at an astounding rate. While it took from the dawn of civilization to 2003 to create 5 exabytes of information, we now create that same volume in just two days!

The Gold rush has begun and this time data is the gold mine. Organizations have slowly started to realize the importance of data. There is still time when things reach the feverish pitch but unlike before when there was less gold but more miners it’s the other way around now. There is immense data and very less miners.


Finding skilled personnel is one of the major challenges associated with big data analytics. Successful big data analytics initiatives involve close collaboration between IT, business users, and “data scientists” to identify and implement the analytics that will solve the right business problems. As a data scientists sometimes one may wonder if they are working on the right data, is the data worth anything or just garbage.

One question that pops up might be  “Are you a Gold miner or Rag picker?” and the answer is “Depends on what pile you are working on … garbage or gold? ”

One of the ‘V’s that characterizes big data is ‘Variety’ which mostly is data in unstructured format. Unstructured data is heterogeneous and variable in nature and comes in many formats, including text, document, image, video and more. Unstructured data is growing faster than structured data. According to a IDC study it will account for 90 percent of all data created in the next decade. As a data analyst it is important to identify the usefulness of the data that you are working on however sometimes it is better not to restrict yourself. It is the uncertainty that makes it more interesting.

Never forget “the greatest things in the world are found in the most unusual places”