3. WHAT IS BIG DATA?
• There is no single standard definition…
• Data sets with sizes beyond the ability of commonly used software tools to capture, curate,
manage & process data with a tolerable elapsed time.
• In 2012, Gartner updated its definition as "Big data is high volume, high velocity, and/or high
variety information assets that require new forms of processing to enable enhanced decision
making, insight discovery and process optimization.“
4. THE 3V’S CLASSIFICATION OF BIG DATA
• Volume
The quantity of data generated
• Velocity
The rate at which the data can be transferred
• Variety
The different types of data that have to be stored.
5. VOLUME
• Every day…
• More than 1.5 billion shares are traded on the NYSE
• Facebook stores more than 2.6 billion likes & comments.
• Every Minute….
• McDonalds serves 2000 customers
• A new user is registered on G-mail
• Every Second….
• Banks process more than 10,000 transactions.
6. VELOCITY
• Data is being generated fast and needs to be processed fast.
• Late decisions → missing opportunities
Examples
• E-Promotions:- Based on your location, your purchase history, what you like → send promotions
right now for store next to you.
• Healthcare monitoring:- sensors monitoring your activities and body → any abnormal
measurements require immediate reactions.
7. VARIETY
• Various formats, types and structures.
• Text, numerical, images, audio, videos, sequences, time series, social media data, multi-dim
arrays, etc…
• A single application can be generated by collecting many types of data.
8. Advantages Limitations
Ability to make better decisions and take
meaningful actions at the right time.
Big risks on security and privacy
Cost Reduction Difficult to learn, requires expert training
to use in an organization
Technologies such as MapReduce, hive
and impala enable to run the queries
without changing the data structures
underneath.
Making relationships, applying
algorithms is very difficult