Submit Search
Upload
Solving Big Data problems using Hadoop
•
0 likes
•
121 views
Ravi Chaturvedi
Follow
Slides from the internal talk given at Morgan Stanley, at the end of 2013.
Read less
Read more
Technology
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 25
Download now
Download to read offline
Recommended
chapter - 6.ppt
chapter - 6.ppt
Tareq Hasan
Hadoop Fundamentals I
Hadoop Fundamentals I
Romeo Kienzler
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
James Chittenden
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Seeling Cheung
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
Amazon Web Services
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
Vicente Orjales
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
Recommended
chapter - 6.ppt
chapter - 6.ppt
Tareq Hasan
Hadoop Fundamentals I
Hadoop Fundamentals I
Romeo Kienzler
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
James Chittenden
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Seeling Cheung
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
Amazon Web Services
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
Vicente Orjales
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
HDFS & MapReduce
HDFS & MapReduce
Skillspeed
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Skillspeed
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
vithakur
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Consider performance and security for SharePoint WP/App
Consider performance and security for SharePoint WP/App
Tuấn Hải
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Snowflake Computing
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)
Nati Shalom
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
Agile Testing Alliance
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
DataWorks Summit
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
David Chen
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Romeo Kienzler
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
Neo4j
Eric Andersen Keynote
Eric Andersen Keynote
Data Con LA
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
Matthew Hayes
Semantic Web Standards and the Variety “V” of Big Data
Semantic Web Standards and the Variety “V” of Big Data
bobdc
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
Ali Dasdan
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
More Related Content
Similar to Solving Big Data problems using Hadoop
HDFS & MapReduce
HDFS & MapReduce
Skillspeed
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Skillspeed
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
vithakur
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Consider performance and security for SharePoint WP/App
Consider performance and security for SharePoint WP/App
Tuấn Hải
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Snowflake Computing
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)
Nati Shalom
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
Agile Testing Alliance
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
DataWorks Summit
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
David Chen
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Romeo Kienzler
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
Neo4j
Eric Andersen Keynote
Eric Andersen Keynote
Data Con LA
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
Matthew Hayes
Semantic Web Standards and the Variety “V” of Big Data
Semantic Web Standards and the Variety “V” of Big Data
bobdc
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
Ali Dasdan
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
Similar to Solving Big Data problems using Hadoop
(20)
HDFS & MapReduce
HDFS & MapReduce
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
Consider performance and security for SharePoint WP/App
Consider performance and security for SharePoint WP/App
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
Eric Andersen Keynote
Eric Andersen Keynote
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
Semantic Web Standards and the Variety “V” of Big Data
Semantic Web Standards and the Variety “V” of Big Data
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Recently uploaded
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Neo4j
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
AndikSusilo4
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Recently uploaded
(20)
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Solving Big Data problems using Hadoop
1.
Solving Big Data
problems using Hadoop Ravi Chaturvedi
2.
2© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Background
3.
3© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Why Big Data Problems even Exist ?
4.
4© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Matrix Multiplication Problem x= O(n3) = O(n2) x O(n) O(n) !!!
5.
5© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Google Search Architecture Britney Spears [‘Britney’, ‘Spears’]
6.
6© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Search Index Creation Problem 50+ billion web pages x 20KB = 1000+ terabytes (1 PB +) - One computer can read 50 MB/sec from disk - 7+ months to read the web http://googleblog.blogspot.in/2008/07/we-knew-web-was-big.html Storage Computation - GFS (Google File System Paper) - HDFS (Open Source Impl.) - Map Reduce (Paper) - Hadoop Map Reduce (Open Source Impl.)
7.
7© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Hadoop Distributed File System (HDFS)
8.
8© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Hadoop Distributed File System (HDFS) – Goals - Hardware Failure (MTBF) - Streaming data access (Throughput) - Large Data Set (TB to PB) - Simple Coherency Model (write-once – read-many) - Portability Across Heterogeneous Hardware and Software Platforms. - Moving Computation is Cheaper than Moving Data (locality of data)
9.
9© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL GFS – Architecture
10.
10© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL HDFS – Architecture
11.
11© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL HDFS – Cluster Architecture
12.
12© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL HDFS – Rack Failure
13.
13© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Functional Programming Review
14.
14© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Map Function func func func func func func
15.
15© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Map Function Properties - Does not change the existing data structure - Idempotence - Order of Operation doesn’t matter - Independent
16.
16© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Reduce Function func func func func func func result initial
17.
17© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Reduce Function Properties - Does not change the existing data structure - Order of Operation doesn’t matter if operation is commutative and associative Commutative Law: You can swap numbers over and still get the same answer. a + b = b + a a x b = b x a Associative Law: It doesn't matter how you group the numbers (i.e. which you calculate first). a + (b + c) = (a + b) + c a x (b x c) = (a x b) x c
18.
18© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Hadoop Map-Reduce
19.
19© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Hadoop Map Reduce – Problem Fitment - Lazy Convergence / Eventual Consistency - no ordering - Idempotence – make the same operation multiple time - Straightforward Partial Restart – no state thing. - Process Isolation – shared nothing
20.
20© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Hadoop Map Reduce – Properties - Automatic Parallelization and distribution - Fault Tolerant - Provide status monitoring tools - Clean abstraction for programmer
21.
21© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Hadoop Map Reduce - Architecture
22.
22© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Problems in Finance
23.
23© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Real-time P&L Calculation
24.
24© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Risk Calculation
25.
25© COPYRIGHT 2013
SAPIENT CORPORATION | CONFIDENTIAL Thank You!
Download now