SlideShare a Scribd company logo
1 of 21
Page 1Classification: Restricted
Hadoop Training
Introduction To Big Data and Hadoop
Page 2Classification: Restricted
Agenda
•Importance of Data
•ESG Report on Data Analytics
•What is Big Data?
•Structured vs. Unstructured Data
•Definition
•Challenges of Big Data
•Why Distributed Processing?
•BigData & it’s Hype
•Case Studies
Page 3Classification: Restricted
Importance of Data
• “Data is the new oil,” said Andreas Weigend, social data
guru and former chief scientist at Amazon.com.
• “Oil needs to be refined before it can be useful.” g
• Data analysis is important to businesses will be an
understatement. In fact, no business can survive without
analyzing available data
Page 4Classification: Restricted
ESG Report on Data Analytics
•The Hadoop market is forecast to grow at a compound
annual growth rate (CAGR) 58% surpassing $1 billion by
2020.
•Majority of organizations view data analytics as a top 5
business and IT priority
•Reduced costs and process improvement are top data
analytics platform benefits
•No leading data analytics platform has emerged yet.
Nearly one-third of the organizations surveyed are
using a custom-developed solution
•Big data is driving changes in analytics tools,
infrastructure, and processes
Page 5Classification: Restricted
ESG Report on Data Analytics
Figure: Meaning of the term Big Data
Page 6Classification: Restricted
ESG Report on Data Analytics
Figure: Size of the largest data set for processing
Page 7Classification: Restricted
ESG Report on Data Analytics
Figure: Number of Data Sources to integrate
Page 8Classification: Restricted
ESG Report on Data Analytics
Figure: Update frequency of the largest data set
Page 9Classification: Restricted
ESG Report on Data Analytics
Figure: Challenges while processing data
Page 10Classification: Restricted
ESG Report on Data Analytics
Figure: Key benefits from processing data
Page 11Classification: Restricted
What is Big Data?
• Huge Data (in terms of Terabytes or Petabytes)
• It is a term applied to data-sets whose size is beyond
the ability of commonly used software tools to
capture, manage & process within a tolerable
elapsed time
Page 12Classification: Restricted
Structured vs. Unstructured Data
Page 13Classification: Restricted
Definition
Big Data is defined by 3 Vs:
Page 14Classification: Restricted
Quiz Time
For the given file formats, identify which category of
data that it belongs to:
A. Word Docs, PDFs, Text files
B. email body
C. XML files
D. Data generated by ERPs, CRMs etc.
Page 15Classification: Restricted
Challenges of Big Data
Problem #1 : Slow Disk Reads/Writes
Problem #2 : Hardware Failures
Problem #3 : Data integration & Transfer
Page 16Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Disk seek-time:
100 MB/sec
1TB/100MB
175 minutes
Disk seek-time:
100 MB/sec
Page 17Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Time to Process:
(1TB/100MB) =
10485 sec or
175min.
Time to Process:
(1TB/5*100MB) =
2097 sec or 35 min.
Page 18Classification: Restricted
BigData & it’s Hype
To Read 1 TB of data:
Time to Process:
(1TB/100MB) =
10485 sec or
175min.
Time to Process:
(1TB/5*100MB) =
2097 sec or 35 min.
Page 19Classification: Restricted
BigData & it’s Hype
Gartner: Hadoop will be in two-thirds of advanced
analytics products by 2015
Livemint.com: SMAC is the new flavour of IT
companies
SMAC will allow the IT industry to offer more value to
the clients
Offshore Insights: Growth of IT companies will be
dictated by cloud, mobile, analytics, big data and
social media services, according to a survey of 410
global IT decision-makers by research firm Offshore
Insights, released in February
Page 20Classification: Restricted
Case Studies
Page 21Classification: Restricted
Thank You!

More Related Content

What's hot

Bigdata
BigdataBigdata
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 

What's hot (20)

Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Graph Grid by Atom Rain
Graph Grid by Atom RainGraph Grid by Atom Rain
Graph Grid by Atom Rain
 
Bigdata
BigdataBigdata
Bigdata
 
Tamr | Biogen data unification imperative
Tamr | Biogen data unification imperativeTamr | Biogen data unification imperative
Tamr | Biogen data unification imperative
 
Tamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael StonebrakerTamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael Stonebraker
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Big data
Big dataBig data
Big data
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Geek Sync I Does Data Modeling Have Business Value?
Geek Sync I Does Data Modeling Have Business Value?Geek Sync I Does Data Modeling Have Business Value?
Geek Sync I Does Data Modeling Have Business Value?
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Building Smarter, Faster, and Scalable Data-Rich Application
Building Smarter, Faster, and Scalable Data-Rich ApplicationBuilding Smarter, Faster, and Scalable Data-Rich Application
Building Smarter, Faster, and Scalable Data-Rich Application
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Data Governance and Analytics
Data Governance and AnalyticsData Governance and Analytics
Data Governance and Analytics
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
The difficulties of data management & Data governance.
The difficulties of data management & Data governance.The difficulties of data management & Data governance.
The difficulties of data management & Data governance.
 
Delivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea UrsanerDelivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea Ursaner
 

Similar to Introduction to Big Data and Hadoop

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777
td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777
td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777
Lindy-Anne Botha
 

Similar to Introduction to Big Data and Hadoop (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Metadata Strategies - Data Squared
Metadata Strategies - Data SquaredMetadata Strategies - Data Squared
Metadata Strategies - Data Squared
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777
td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777
td-ameritrades-journey-from-data-warehouses-to-data-lakes_237777
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Introduction to Big Data and Hadoop