SlideShare a Scribd company logo
1 of 20
Big Data and Hadoop Training
Introduction to Big Data and Hadoop
Page 2Classification: Restricted
Agenda
• Importance of Data
• ESG Report on Data Analytics
• What is BigData?
• Structured vs. Unstructured Data
• Challenges of BigData
• Why Distributed Processing?
• BigData & it’s Hype
Page 3Classification: Restricted
• “Data is the new oil,” said Andreas Weigend, social data guru and former chief
scientist at Amazon.com. “Oil needs to be refined before it can be useful.” g
• Data analysis is important to businesses will be an understatement. In fact, no
business can survive without analyzing available data
Importance of Data
Page 4Classification: Restricted
• Majority of organizations view data analytics as a top 5 business and IT priority
• Reduced costs and process improvement are top data analytics platform
benefits
• No leading data analytics platform has emerged yet. Nearly one-third of the
organizations surveyed are using a custom-developed solution
• Big data is driving changes in analytics tools, infrastructure, and processes
ESG Report on Data Analytics
Page 5Classification: Restricted
•
Figure: Meaning of the term Big Data
Page 6Classification: Restricted
•
Figure: Size of the largest data set for processing
Page 7Classification: Restricted
Figure: Number of Data Sources to integrate
•
Page 8Classification: Restricted
Figure: Update frequency of the largest data set
•
Page 9Classification: Restricted
Figure: Challenges while processing data
•
Page 10Classification: Restricted
Figure: Key benefits from processing data
•
Page 11Classification: Restricted
What is BigData?
•Lots of Data (in terms of Terabytes or Petabytes)
•It is a term applied to data-sets whose size is beyond the ability of
commonly used software tools to capture, manage & process within a
tolerable elapsed time.
•Systems/Enterprises generate huge amount of data from Terabytes to even
Petabytes.
Page 12Classification: Restricted
Structured vs. Unstructured Data
•
Page 13Classification: Restricted
Definition
• Big data is defined by 3 Vs:
•
Page 14Classification: Restricted
Quiz Time
•For the given file formats, identify which category of data that it belongs to:
•Word Docs, PDFs, Text files
•email body
• XML files
•Data generated by ERPs, CRMs etc.
Page 15Classification: Restricted
Challenges of BigData
•Problem #1 : Slow Disk Reads/Writes
•Problem #2 : Hardware Failures
•Problem #3 : Data integration & Transfer
Page 16Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Disk seek-
time: 100
MB/sec
Disk seek-
time: 100
MB/sec
Page 17Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Time to Process:
(1TB/100MB) =
10485 sec or
175min.
Time to Process:
(1TB/5*100MB) =
2097 sec or 35 min.
Page 18Classification: Restricted
•Gartner: Hadoop will be in two-thirds of advanced analytics products by
2015
•Livemint.com: SMAC is the new flavour of IT companies
SMAC will allow the IT industry to offer more value to the clients
•Offshore Insights: Growth of IT companies will be dictated by cloud, mobile,
analytics, big data and social media services, according to a survey of 410
global IT decision-makers by research firm Offshore Insights, released in
February
BigData & it’s Hype
Page 19Classification: Restricted
Case Studies
Page 20Classification: Restricted
Thank You

More Related Content

What's hot

Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 
Understanding Dark Data
Understanding Dark DataUnderstanding Dark Data
Understanding Dark Data
Ahmed Banafa
 
High Performance data mining platforms-Things to consider
High Performance data mining platforms-Things to considerHigh Performance data mining platforms-Things to consider
High Performance data mining platforms-Things to consider
Ashish Jain
 

What's hot (20)

Dark data
Dark dataDark data
Dark data
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Best practices and trends in people soft
Best practices and trends in people softBest practices and trends in people soft
Best practices and trends in people soft
 
Dealing with Dark Data
Dealing with Dark DataDealing with Dark Data
Dealing with Dark Data
 
Tamr Gartner BI and Analytics Summit
Tamr Gartner BI and Analytics SummitTamr Gartner BI and Analytics Summit
Tamr Gartner BI and Analytics Summit
 
Big data
Big dataBig data
Big data
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Consumer Data Management
Consumer Data ManagementConsumer Data Management
Consumer Data Management
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Understanding Dark Data
Understanding Dark DataUnderstanding Dark Data
Understanding Dark Data
 
High Performance data mining platforms-Things to consider
High Performance data mining platforms-Things to considerHigh Performance data mining platforms-Things to consider
High Performance data mining platforms-Things to consider
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Big data
Big dataBig data
Big data
 
Big data in action
Big data in actionBig data in action
Big data in action
 

Similar to Introduction to Big Data and Hadoop

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
bigdata (1)
bigdata (1)bigdata (1)
bigdata (1)
DIVYA G
 

Similar to Introduction to Big Data and Hadoop (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big data in Engineering Application
Big data in Engineering ApplicationBig data in Engineering Application
Big data in Engineering Application
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Data for Action Talk - 2016-02-22
Data for Action Talk - 2016-02-22Data for Action Talk - 2016-02-22
Data for Action Talk - 2016-02-22
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
17783_bigdata-notes2.ppt
17783_bigdata-notes2.ppt17783_bigdata-notes2.ppt
17783_bigdata-notes2.ppt
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
bigdata (1)
bigdata (1)bigdata (1)
bigdata (1)
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

Introduction to Big Data and Hadoop