SlideShare a Scribd company logo
1 of 26
re:Introduce Big Data and Hadoop Eco-system
Presented By:
Mohammed Shakir Ali
Oct 21st 2015.
2
What is Big Data ?
Big data is a popular term used to describe the exponential growth and availability of
data, both structured and unstructured. [Ref : www.sas.com]
Big data is a broad term for data sets so large or complex that traditional data processing
applications are inadequate. [Ref: www.wikipedia.com]
Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been
created in the last two years alone. (10^18 bytes = 1000 petabytes).
2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]
3
Characteristics of Big Data.
●
Volume
●
Variety
●
Velocity
●
Veracity
4
Characteristics of Big Data.
●
Volume
●
Variety
●
Velocity
●
Veracity
5
Is Big Data really new ?
Lets check...Google search terms for Big Data vs (Data Analysis and BI).
6
Is Big Data really new ?
Lets check...Google search terms for Big Data vs (Data Analysis and BI).
https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
7
Big Data Management Challenges.
Big Data just keeps growing and growing,...according to Forrester Research:
–The average organization will grow their data by 50 percent in the coming year.
–Overall corporate data will grow by a staggering 94 percent.
–Database systems will grow by 97 percent.
–Server backups for disaster recovery and continuity will expand by 89 percent.
8
Big Data Management Challenges.
Use case of a Leading Medical Research Facility:
-Generates 100 terabytes of data from various instruments,
-Data is copied by 10 different research departments,
- Departments further process the data and add 5 terabytes of additional synthesized data each.
-Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique.
-Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space
used to store it all.
Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique
data.
9
Big Data Management Challenges.
Three basic challenges:
–Storing,
–Processing and
–Managing it efficiently.
Reference:
http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/
Possible Solutions:
–Scale-out architectures to manage large Data
sets
-Reduce the data to unique set of data.
–Data Virtualization to incorporate centralized
management of Data set.
-Reuse of same data footprint and to reduce data
duplication.
Project Open Data
● Several governments around the world are making data available to public.
● Data is a valuable national resource and a strategic asset to the U.S.
Government, its partners, and the public.
● Managing this data as an asset and making it available, discoverable, and
usable – in a word, open – not only strengthens our democracy and
promotes efficiency and effectiveness in government, but also has the
potential to create economic opportunity and improve citizens’ quality of life.
● For example, when the U.S. Government released weather and GPS data to
the public, it fueled an industry that today is valued at tens of billions of
dollars per year.
Reference: https://project-open-data.cio.gov/
Benefits Big Data.
● Cost Reduction
Big data technologies like Hadoop and cloud-based analytics can provide substantial cost
advantages.
● Faster, better decision making
Analytics has always involved attempts to improve decision making, with high seed of
Hadoop and in-memory analytics, several organizations have speed up decision process
systems.
● New products and services.
Use of big data analytics is to create new products and services for customers.
Several organizations have come up with new products/services with help of Big Data.
● Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html
Conclusion
● Increased interest in Big Data and Hadoop eco-system is
seen in recent years.
● Recent trend in Data growth has created new challenges
for Data management, along with new opportunities.
● Several software products/solutions are available to
manage Big Data effectively.
Hadoop architecture Eco-system
14
What is Apache Hadoop
Apache Hadoop is an open-source software framework written in Java for distributed
storage and distributed processing of very large data sets.
- It runs on computer clusters built from commodity hardware.
- All the modules in Hadoop are designed to withstand hardware failures .
15
Apache Hadoop Framework.
Apache Hadoop framework is composed of the following modules:
1) Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on
commodity machines, providing very high aggregate bandwidth across the cluster;
2) Hadoop MapReduce – a programming model for large scale data processing.
3) Hadoop YARN – a resource-management platform responsible for managing computing
resources in clusters and using them for scheduling of users' applications and
4) Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
16
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
17
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21
PB of storage.
18
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21
PB of storage.
As of 2013, Hadoop adoption is widespread.
For example, more than half of the Fortune 50 use Hadoop
19
Search trends about Big Data.
HPC vs Hadoop search trends:
https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
20
Big Data and Hadoop Architecture
21
Apache Hadoop Architecture
22
Hadoop Cluster Setup
23
Apache Hadoop Projects
●
Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop.
●
Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop
●
Apache Spark: Apache Spark is an open source cluster computing framework originally
developed in the AMPLab at UC, Berkeley.
●
Apache Storm: Apache Storm is a distributed computation framework written
predominantly in the Clojure programming language.
●
Apache Hbase: HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and written in Java.
●
Apache Zookeeper, Impala, Flume, Sqoop…!
24
Search trends about Big Data.
Apache Hadoop vs Apache Spark search trends:
https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
25
Prominent Hadoop Distrubutors
●
Cloudera
●
Hortonworks
●
MapR
26
Hadoop preview:
Cloudera Quickstart VM:
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html
Big Data work flow.
http://insightdataengineering.com/blog/pipeline_map.html

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKristof Jozsa
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Hadoop essential setup
Hadoop essential setupHadoop essential setup
Hadoop essential setupOmid Mogharian
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceEdureka!
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrPranav Kulkarni
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 

What's hot (18)

Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Bigdata & Hadoop
Introduction to Bigdata & HadoopIntroduction to Bigdata & Hadoop
Introduction to Bigdata & Hadoop
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop essential setup
Hadoop essential setupHadoop essential setup
Hadoop essential setup
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data Science
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 

Similar to re:Introduce Big Data and Hadoop Eco-system.

Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesJeff Kelly
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companiesRobert Smith
 

Similar to re:Introduce Big Data and Hadoop Eco-system. (20)

Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use cases
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big Data
Big DataBig Data
Big Data
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
BDtraining
BDtrainingBDtraining
BDtraining
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

re:Introduce Big Data and Hadoop Eco-system.

  • 1. re:Introduce Big Data and Hadoop Eco-system Presented By: Mohammed Shakir Ali Oct 21st 2015.
  • 2. 2 What is Big Data ? Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. [Ref : www.sas.com] Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. [Ref: www.wikipedia.com] Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. (10^18 bytes = 1000 petabytes). 2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]
  • 3. 3 Characteristics of Big Data. ● Volume ● Variety ● Velocity ● Veracity
  • 4. 4 Characteristics of Big Data. ● Volume ● Variety ● Velocity ● Veracity
  • 5. 5 Is Big Data really new ? Lets check...Google search terms for Big Data vs (Data Analysis and BI).
  • 6. 6 Is Big Data really new ? Lets check...Google search terms for Big Data vs (Data Analysis and BI). https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 7. 7 Big Data Management Challenges. Big Data just keeps growing and growing,...according to Forrester Research: –The average organization will grow their data by 50 percent in the coming year. –Overall corporate data will grow by a staggering 94 percent. –Database systems will grow by 97 percent. –Server backups for disaster recovery and continuity will expand by 89 percent.
  • 8. 8 Big Data Management Challenges. Use case of a Leading Medical Research Facility: -Generates 100 terabytes of data from various instruments, -Data is copied by 10 different research departments, - Departments further process the data and add 5 terabytes of additional synthesized data each. -Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique. -Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space used to store it all. Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique data.
  • 9. 9 Big Data Management Challenges. Three basic challenges: –Storing, –Processing and –Managing it efficiently. Reference: http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/ Possible Solutions: –Scale-out architectures to manage large Data sets -Reduce the data to unique set of data. –Data Virtualization to incorporate centralized management of Data set. -Reuse of same data footprint and to reduce data duplication.
  • 10. Project Open Data ● Several governments around the world are making data available to public. ● Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public. ● Managing this data as an asset and making it available, discoverable, and usable – in a word, open – not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens’ quality of life. ● For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Reference: https://project-open-data.cio.gov/
  • 11. Benefits Big Data. ● Cost Reduction Big data technologies like Hadoop and cloud-based analytics can provide substantial cost advantages. ● Faster, better decision making Analytics has always involved attempts to improve decision making, with high seed of Hadoop and in-memory analytics, several organizations have speed up decision process systems. ● New products and services. Use of big data analytics is to create new products and services for customers. Several organizations have come up with new products/services with help of Big Data. ● Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html
  • 12. Conclusion ● Increased interest in Big Data and Hadoop eco-system is seen in recent years. ● Recent trend in Data growth has created new challenges for Data management, along with new opportunities. ● Several software products/solutions are available to manage Big Data effectively.
  • 14. 14 What is Apache Hadoop Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets. - It runs on computer clusters built from commodity hardware. - All the modules in Hadoop are designed to withstand hardware failures .
  • 15. 15 Apache Hadoop Framework. Apache Hadoop framework is composed of the following modules: 1) Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster; 2) Hadoop MapReduce – a programming model for large scale data processing. 3) Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications and 4) Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
  • 16. 16 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query.
  • 17. 17 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.
  • 18. 18 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. As of 2013, Hadoop adoption is widespread. For example, more than half of the Fortune 50 use Hadoop
  • 19. 19 Search trends about Big Data. HPC vs Hadoop search trends: https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 20. 20 Big Data and Hadoop Architecture
  • 23. 23 Apache Hadoop Projects ● Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop. ● Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop ● Apache Spark: Apache Spark is an open source cluster computing framework originally developed in the AMPLab at UC, Berkeley. ● Apache Storm: Apache Storm is a distributed computation framework written predominantly in the Clojure programming language. ● Apache Hbase: HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. ● Apache Zookeeper, Impala, Flume, Sqoop…!
  • 24. 24 Search trends about Big Data. Apache Hadoop vs Apache Spark search trends: https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 26. 26 Hadoop preview: Cloudera Quickstart VM: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html Big Data work flow. http://insightdataengineering.com/blog/pipeline_map.html

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>
  15. <number>
  16. <number>
  17. <number>
  18. <number>
  19. <number>
  20. <number>
  21. <number>
  22. <number>