A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.
3. What is Big Data ?
A new generation of technologies and architectures designed to economically extract value
from very large volumes of a wide variety of data, by enabling high velocity capture,
discovery and/or analysis.
4. • Terabytes
• Records
• Tables,
• Files
• Structured
• Unstructured
• Semi-Structured
• All the above
• Batch
• Realtime
• Streams
• Near Realtime
Velocity
Variety
Volume
9. NoSQL
● no ACID transactions
● sharded indexes
● restricted Joins
● support columnar storage!
In memory DB
● real time transactions
● not fully geared for
enterprise level data
● variety of indexes
● complex joins
HDFS
GDF
HBASE
Database evolution
22. OLAP (Online Analytical processing)
SELECT SUM(s.dollar_cost),
s.product_key,
p.description
FROM SALES_FACT s
…
…
…
GROUP BY s.product_key, p.description
23. Why Databases!
● Transaction processing (ACID properties)
● SQL - Indexes and queries
OLAP
● Transaction processing not needed for analytics
o Moving of data via ETL
● Large volumes of data, indexes become irrelevant
● Schema or Write vs Schema Read!
27. 1. Variety - How to we deal with different kinds of data ?
2. Volume - How to we cope with large volume of data?
3. Velocity - How do we solve realtime problems?
4. Value - What is our value ?!
Summary!