Big Data - Module 1

What is Big Data?
• There are humungous amount of data, available which have a
lot of meaningful insights – they need to be analysed
• Existing Online Transaction Processing (OLTP) and Business
Intelligence (BI) are not easily scalable considering cost, effort,
and manageability aspect.
• It is not just volume, but also the variety and velocity of data.
• Big data is a terminology that refers to challenges that we are
facing due to exponential volume, variety and velocity of data.

Shorter Time to React
• Data that enters your organization and has some kind of value
for a limited window of time
• This window usually shuts well before the data has been
transformed and loaded into a data warehouse for deeper
analysis.
• The higher the volumes of data entering your organization per
second, the bigger your challenge.

Data Economics
• Why Volume is good ?
– No individual record is particularly valuable
– Having every record is incredibly valuable
• Why storage decision is important ?
• How much value can I extract from every byte of data verses
the cost to save that data?
– If value > cost – then keep it online, on DB or filer
– If cost > value – I discard it or archive on tape (expensive to
throw data)

Data Storage
Schema Structured Un Structured
Storage Medium RDBMS Filers
Storage Reliability Very reliable Very reliable
Processing ability Very reliable unstructured schema
poses challenges
Location of
processing
SQL queries pull data
to server
Random means to
retrieve sense
Impact of data
increase
Cost increases
linearly
Cost increases
linearly
Support for Big Data No No

Big Data Approach
Big Data refer to
technologies that
can capture, process
and analyze data.

No SQL Database Types
• Key-value store
– Key can be custom or auto generated
– Value can be complex objects like XML, BLOB, JSON
etc
– Popular : DynamoDB, Azure Table Store (ATS), Riak
• Column store
– Data is stored as families of columns; high scalability
with very high performance architecture
– Examples : HBase, Cassandra, Vertica and Hypertable

No SQL Database Types
• Document database
– Designed to store, retrieve & manage document
oriented information; expands on key-value store
– Example: MongoDB, CouchDB
• Graph database
– Designed for data that whose relations are well
represented in graphs, usually with nodes
connected to edges
– Examples : Neo4J and Polyglot

Analytical Database
• An analytical database is a type of database built to store,
manage, and consume big data.
• Optimized for processing advanced analytics that involves
highly complex queries on terabytes of data and complex
statistical processing, data mining, and NLP (natural language
processing).
• Examples of analytical databases are Vertica (acquired by HP),
Aster Data (acquired by Teradata), Greenplum (acquired by
EMC), and so on.

Preprocess & Store
• Scenario
– Data getting continuously generated in large volume
– Need to pre-process before loading into target systems

Real Time Actions
• Scenario
– Manage actions to be taken
on continuously changing
data in real time

Sears – Competes on Big Data
• They have data of over 100 million customers, which they
analyse to offer real-time, relevant offers to their customers.
• The solution was 3 years in the making, which included
programming that would capture, analyze, and report on
customer activity at an individual level, across all 4,000
locations.
• Sears has a Hadoop cluster of 300-nodes that is populated
with over 2 petabytes of structure customer transaction data,
sales data and supply chain data.
• Results: Sears achieved an active member base in the 8 digits,
exceeding the projected 36 month membership target in 17
months.

Compound Annual Growth Rate
IDC Report Analysis

Big Data - Module 1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data - Module 1

Similar to Big Data - Module 1 (20)

Big Data - Module 1