SlideShare a Scribd company logo
1 of 25
Download to read offline
WHAT IS BIG DATA?
What is Big Data?
• There are humungous amount of data, available which have a
lot of meaningful insights – they need to be analysed
• Existing Online Transaction Processing (OLTP) and Business
Intelligence (BI) are not easily scalable considering cost, effort,
and manageability aspect.
• It is not just volume, but also the variety and velocity of data.
• Big data is a terminology that refers to challenges that we are
facing due to exponential volume, variety and velocity of data.
Three V’s of Big Data
Three V’s of Big Data
THE CHALLENGE
Background
Shorter Time to React
• Data that enters your organization and has some kind of value
for a limited window of time
• This window usually shuts well before the data has been
transformed and loaded into a data warehouse for deeper
analysis.
• The higher the volumes of data entering your organization per
second, the bigger your challenge.
Data Economics
• Why Volume is good ?
– No individual record is particularly valuable
– Having every record is incredibly valuable
• Why storage decision is important ?
• How much value can I extract from every byte of data verses
the cost to save that data?
– If value > cost – then keep it online, on DB or filer
– If cost > value – I discard it or archive on tape (expensive to
throw data)
Data Storage
Schema Structured Un Structured
Storage Medium RDBMS Filers
Storage Reliability Very reliable Very reliable
Processing ability Very reliable unstructured schema
poses challenges
Location of
processing
SQL queries pull data
to server
Random means to
retrieve sense
Impact of data
increase
Cost increases
linearly
Cost increases
linearly
Support for Big Data No No
BIG DATA’S APPROACH
Big Data Approach
Big Data refer to
technologies that
can capture, process
and analyze data.
No SQL Database Types
• Key-value store
– Key can be custom or auto generated
– Value can be complex objects like XML, BLOB, JSON
etc
– Popular : DynamoDB, Azure Table Store (ATS), Riak
• Column store
– Data is stored as families of columns; high scalability
with very high performance architecture
– Examples : HBase, Cassandra, Vertica and Hypertable
No SQL Database Types
• Document database
– Designed to store, retrieve & manage document
oriented information; expands on key-value store
– Example: MongoDB, CouchDB
• Graph database
– Designed for data that whose relations are well
represented in graphs, usually with nodes
connected to edges
– Examples : Neo4J and Polyglot
Analytical Database
• An analytical database is a type of database built to store,
manage, and consume big data.
• Optimized for processing advanced analytics that involves
highly complex queries on terabytes of data and complex
statistical processing, data mining, and NLP (natural language
processing).
• Examples of analytical databases are Vertica (acquired by HP),
Aster Data (acquired by Teradata), Greenplum (acquired by
EMC), and so on.
BIG DATA USE CASE
PATTERNS
Preprocess & Store
• Scenario
– Data getting continuously generated in large volume
– Need to pre-process before loading into target systems
Real Time Actions
• Scenario
– Manage actions to be taken
on continuously changing
data in real time
Credit Card Issuer
Sears – Competes on Big Data
• They have data of over 100 million customers, which they
analyse to offer real-time, relevant offers to their customers.
• The solution was 3 years in the making, which included
programming that would capture, analyze, and report on
customer activity at an individual level, across all 4,000
locations.
• Sears has a Hadoop cluster of 300-nodes that is populated
with over 2 petabytes of structure customer transaction data,
sales data and supply chain data.
• Results: Sears achieved an active member base in the 8 digits,
exceeding the projected 36 month membership target in 17
months.
THE FUTURE OF BIG
DATA
Compound Annual Growth Rate
IDC Report Analysis
Careers in Big Data
THE END
Next : Hadoop

More Related Content

What's hot

data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining E2MATRIX
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data MiningRanak Ghosh
 
WSO2 BAM - Your Big Data Toolbox
WSO2 BAM - Your Big Data ToolboxWSO2 BAM - Your Big Data Toolbox
WSO2 BAM - Your Big Data ToolboxWSO2
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURESachin Batham
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Miningethantelaviv
 
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo..."Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...Comsysto Reply GmbH
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
1 PSUT Big Data Class, introduction
1 PSUT Big Data Class,  introduction1 PSUT Big Data Class,  introduction
1 PSUT Big Data Class, introductionAkram Al-Kouz
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Mike Frampton
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
What is bi analytics and big data
What is bi analytics and big dataWhat is bi analytics and big data
What is bi analytics and big datagaliasisense
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16AnwarrChaudary
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 

What's hot (20)

data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
 
WSO2 BAM - Your Big Data Toolbox
WSO2 BAM - Your Big Data ToolboxWSO2 BAM - Your Big Data Toolbox
WSO2 BAM - Your Big Data Toolbox
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo..."Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
1 PSUT Big Data Class, introduction
1 PSUT Big Data Class,  introduction1 PSUT Big Data Class,  introduction
1 PSUT Big Data Class, introduction
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
What is bi analytics and big data
What is bi analytics and big dataWhat is bi analytics and big data
What is bi analytics and big data
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 

Similar to Big Data - Module 1

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Pedro Mac Dowell Innecco
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 

Similar to Big Data - Module 1 (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Lecture1
Lecture1Lecture1
Lecture1
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 

Big Data - Module 1

  • 1.
  • 2. WHAT IS BIG DATA?
  • 3. What is Big Data? • There are humungous amount of data, available which have a lot of meaningful insights – they need to be analysed • Existing Online Transaction Processing (OLTP) and Business Intelligence (BI) are not easily scalable considering cost, effort, and manageability aspect. • It is not just volume, but also the variety and velocity of data. • Big data is a terminology that refers to challenges that we are facing due to exponential volume, variety and velocity of data.
  • 4. Three V’s of Big Data
  • 5. Three V’s of Big Data
  • 6.
  • 9. Shorter Time to React • Data that enters your organization and has some kind of value for a limited window of time • This window usually shuts well before the data has been transformed and loaded into a data warehouse for deeper analysis. • The higher the volumes of data entering your organization per second, the bigger your challenge.
  • 10. Data Economics • Why Volume is good ? – No individual record is particularly valuable – Having every record is incredibly valuable • Why storage decision is important ? • How much value can I extract from every byte of data verses the cost to save that data? – If value > cost – then keep it online, on DB or filer – If cost > value – I discard it or archive on tape (expensive to throw data)
  • 11. Data Storage Schema Structured Un Structured Storage Medium RDBMS Filers Storage Reliability Very reliable Very reliable Processing ability Very reliable unstructured schema poses challenges Location of processing SQL queries pull data to server Random means to retrieve sense Impact of data increase Cost increases linearly Cost increases linearly Support for Big Data No No
  • 13. Big Data Approach Big Data refer to technologies that can capture, process and analyze data.
  • 14. No SQL Database Types • Key-value store – Key can be custom or auto generated – Value can be complex objects like XML, BLOB, JSON etc – Popular : DynamoDB, Azure Table Store (ATS), Riak • Column store – Data is stored as families of columns; high scalability with very high performance architecture – Examples : HBase, Cassandra, Vertica and Hypertable
  • 15. No SQL Database Types • Document database – Designed to store, retrieve & manage document oriented information; expands on key-value store – Example: MongoDB, CouchDB • Graph database – Designed for data that whose relations are well represented in graphs, usually with nodes connected to edges – Examples : Neo4J and Polyglot
  • 16. Analytical Database • An analytical database is a type of database built to store, manage, and consume big data. • Optimized for processing advanced analytics that involves highly complex queries on terabytes of data and complex statistical processing, data mining, and NLP (natural language processing). • Examples of analytical databases are Vertica (acquired by HP), Aster Data (acquired by Teradata), Greenplum (acquired by EMC), and so on.
  • 17. BIG DATA USE CASE PATTERNS
  • 18. Preprocess & Store • Scenario – Data getting continuously generated in large volume – Need to pre-process before loading into target systems
  • 19. Real Time Actions • Scenario – Manage actions to be taken on continuously changing data in real time
  • 21. Sears – Competes on Big Data • They have data of over 100 million customers, which they analyse to offer real-time, relevant offers to their customers. • The solution was 3 years in the making, which included programming that would capture, analyze, and report on customer activity at an individual level, across all 4,000 locations. • Sears has a Hadoop cluster of 300-nodes that is populated with over 2 petabytes of structure customer transaction data, sales data and supply chain data. • Results: Sears achieved an active member base in the 8 digits, exceeding the projected 36 month membership target in 17 months.
  • 22. THE FUTURE OF BIG DATA
  • 23. Compound Annual Growth Rate IDC Report Analysis
  • 25. THE END Next : Hadoop