SlideShare a Scribd company logo
1 of 26
re:Introduce Big Data and Hadoop Eco-system
Presented By:
Mohammed Shakir Ali
Oct 21st 2015.
2
What is Big Data ?
Big data is a popular term used to describe the exponential growth and availability of
data, both structured and unstructured. [Ref : www.sas.com]
Big data is a broad term for data sets so large or complex that traditional data processing
applications are inadequate. [Ref: www.wikipedia.com]
Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been
created in the last two years alone. (10^18 bytes = 1000 petabytes).
2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]
3
Characteristics of Big Data.
●
Volume
●
Variety
●
Velocity
●
Veracity
4
Characteristics of Big Data.
●
Volume
●
Variety
●
Velocity
●
Veracity
5
Is Big Data really new ?
Lets check...Google search terms for Big Data vs (Data Analysis and BI).
6
Is Big Data really new ?
Lets check...Google search terms for Big Data vs (Data Analysis and BI).
https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
7
Big Data Management Challenges.
Big Data just keeps growing and growing,...according to Forrester Research:
–The average organization will grow their data by 50 percent in the coming year.
–Overall corporate data will grow by a staggering 94 percent.
–Database systems will grow by 97 percent.
–Server backups for disaster recovery and continuity will expand by 89 percent.
8
Big Data Management Challenges.
Use case of a Leading Medical Research Facility:
-Generates 100 terabytes of data from various instruments,
-Data is copied by 10 different research departments,
- Departments further process the data and add 5 terabytes of additional synthesized data each.
-Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique.
-Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space
used to store it all.
Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique
data.
9
Big Data Management Challenges.
Three basic challenges:
–Storing,
–Processing and
–Managing it efficiently.
Reference:
http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/
Possible Solutions:
–Scale-out architectures to manage large Data
sets
-Reduce the data to unique set of data.
–Data Virtualization to incorporate centralized
management of Data set.
-Reuse of same data footprint and to reduce data
duplication.
Project Open Data
● Several governments around the world are making data available to public.
● Data is a valuable national resource and a strategic asset to the U.S.
Government, its partners, and the public.
● Managing this data as an asset and making it available, discoverable, and
usable – in a word, open – not only strengthens our democracy and
promotes efficiency and effectiveness in government, but also has the
potential to create economic opportunity and improve citizens’ quality of life.
● For example, when the U.S. Government released weather and GPS data to
the public, it fueled an industry that today is valued at tens of billions of
dollars per year.
Reference: https://project-open-data.cio.gov/
Benefits Big Data.
● Cost Reduction
Big data technologies like Hadoop and cloud-based analytics can provide substantial cost
advantages.
● Faster, better decision making
Analytics has always involved attempts to improve decision making, with high seed of
Hadoop and in-memory analytics, several organizations have speed up decision process
systems.
● New products and services.
Use of big data analytics is to create new products and services for customers.
Several organizations have come up with new products/services with help of Big Data.
● Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html
Conclusion
● Increased interest in Big Data and Hadoop eco-system is
seen in recent years.
● Recent trend in Data growth has created new challenges
for Data management, along with new opportunities.
● Several software products/solutions are available to
manage Big Data effectively.
Hadoop architecture Eco-system
14
What is Apache Hadoop
Apache Hadoop is an open-source software framework written in Java for distributed
storage and distributed processing of very large data sets.
- It runs on computer clusters built from commodity hardware.
- All the modules in Hadoop are designed to withstand hardware failures .
15
Apache Hadoop Framework.
Apache Hadoop framework is composed of the following modules:
1) Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on
commodity machines, providing very high aggregate bandwidth across the cluster;
2) Hadoop MapReduce – a programming model for large scale data processing.
3) Hadoop YARN – a resource-management platform responsible for managing computing
resources in clusters and using them for scheduling of users' applications and
4) Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
16
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
17
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21
PB of storage.
18
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21
PB of storage.
As of 2013, Hadoop adoption is widespread.
For example, more than half of the Fortune 50 use Hadoop
19
Search trends about Big Data.
HPC vs Hadoop search trends:
https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
20
Big Data and Hadoop Architecture
21
Apache Hadoop Architecture
22
Hadoop Cluster Setup
23
Apache Hadoop Projects
●
Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop.
●
Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop
●
Apache Spark: Apache Spark is an open source cluster computing framework originally
developed in the AMPLab at UC, Berkeley.
●
Apache Storm: Apache Storm is a distributed computation framework written
predominantly in the Clojure programming language.
●
Apache Hbase: HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and written in Java.
●
Apache Zookeeper, Impala, Flume, Sqoop…!
24
Search trends about Big Data.
Apache Hadoop vs Apache Spark search trends:
https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
25
Prominent Hadoop Distrubutors
●
Cloudera
●
Hortonworks
●
MapR
26
Hadoop preview:
Cloudera Quickstart VM:
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html
Big Data work flow.
http://insightdataengineering.com/blog/pipeline_map.html

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKristof Jozsa
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Hadoop essential setup
Hadoop essential setupHadoop essential setup
Hadoop essential setupOmid Mogharian
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceEdureka!
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrPranav Kulkarni
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 

What's hot (18)

Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Bigdata & Hadoop
Introduction to Bigdata & HadoopIntroduction to Bigdata & Hadoop
Introduction to Bigdata & Hadoop
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop essential setup
Hadoop essential setupHadoop essential setup
Hadoop essential setup
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data Science
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 

Similar to re:Introduce Big Data and Hadoop Eco-system.

Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesJeff Kelly
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companiesRobert Smith
 

Similar to re:Introduce Big Data and Hadoop Eco-system. (20)

Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use cases
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big Data
Big DataBig Data
Big Data
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
BDtraining
BDtrainingBDtraining
BDtraining
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 

re:Introduce Big Data and Hadoop Eco-system.

  • 1. re:Introduce Big Data and Hadoop Eco-system Presented By: Mohammed Shakir Ali Oct 21st 2015.
  • 2. 2 What is Big Data ? Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. [Ref : www.sas.com] Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. [Ref: www.wikipedia.com] Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. (10^18 bytes = 1000 petabytes). 2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]
  • 3. 3 Characteristics of Big Data. ● Volume ● Variety ● Velocity ● Veracity
  • 4. 4 Characteristics of Big Data. ● Volume ● Variety ● Velocity ● Veracity
  • 5. 5 Is Big Data really new ? Lets check...Google search terms for Big Data vs (Data Analysis and BI).
  • 6. 6 Is Big Data really new ? Lets check...Google search terms for Big Data vs (Data Analysis and BI). https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 7. 7 Big Data Management Challenges. Big Data just keeps growing and growing,...according to Forrester Research: –The average organization will grow their data by 50 percent in the coming year. –Overall corporate data will grow by a staggering 94 percent. –Database systems will grow by 97 percent. –Server backups for disaster recovery and continuity will expand by 89 percent.
  • 8. 8 Big Data Management Challenges. Use case of a Leading Medical Research Facility: -Generates 100 terabytes of data from various instruments, -Data is copied by 10 different research departments, - Departments further process the data and add 5 terabytes of additional synthesized data each. -Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique. -Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space used to store it all. Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique data.
  • 9. 9 Big Data Management Challenges. Three basic challenges: –Storing, –Processing and –Managing it efficiently. Reference: http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/ Possible Solutions: –Scale-out architectures to manage large Data sets -Reduce the data to unique set of data. –Data Virtualization to incorporate centralized management of Data set. -Reuse of same data footprint and to reduce data duplication.
  • 10. Project Open Data ● Several governments around the world are making data available to public. ● Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public. ● Managing this data as an asset and making it available, discoverable, and usable – in a word, open – not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens’ quality of life. ● For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Reference: https://project-open-data.cio.gov/
  • 11. Benefits Big Data. ● Cost Reduction Big data technologies like Hadoop and cloud-based analytics can provide substantial cost advantages. ● Faster, better decision making Analytics has always involved attempts to improve decision making, with high seed of Hadoop and in-memory analytics, several organizations have speed up decision process systems. ● New products and services. Use of big data analytics is to create new products and services for customers. Several organizations have come up with new products/services with help of Big Data. ● Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html
  • 12. Conclusion ● Increased interest in Big Data and Hadoop eco-system is seen in recent years. ● Recent trend in Data growth has created new challenges for Data management, along with new opportunities. ● Several software products/solutions are available to manage Big Data effectively.
  • 14. 14 What is Apache Hadoop Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets. - It runs on computer clusters built from commodity hardware. - All the modules in Hadoop are designed to withstand hardware failures .
  • 15. 15 Apache Hadoop Framework. Apache Hadoop framework is composed of the following modules: 1) Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster; 2) Hadoop MapReduce – a programming model for large scale data processing. 3) Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications and 4) Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
  • 16. 16 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query.
  • 17. 17 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.
  • 18. 18 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. As of 2013, Hadoop adoption is widespread. For example, more than half of the Fortune 50 use Hadoop
  • 19. 19 Search trends about Big Data. HPC vs Hadoop search trends: https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 20. 20 Big Data and Hadoop Architecture
  • 23. 23 Apache Hadoop Projects ● Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop. ● Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop ● Apache Spark: Apache Spark is an open source cluster computing framework originally developed in the AMPLab at UC, Berkeley. ● Apache Storm: Apache Storm is a distributed computation framework written predominantly in the Clojure programming language. ● Apache Hbase: HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. ● Apache Zookeeper, Impala, Flume, Sqoop…!
  • 24. 24 Search trends about Big Data. Apache Hadoop vs Apache Spark search trends: https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 26. 26 Hadoop preview: Cloudera Quickstart VM: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html Big Data work flow. http://insightdataengineering.com/blog/pipeline_map.html

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>
  15. <number>
  16. <number>
  17. <number>
  18. <number>
  19. <number>
  20. <number>
  21. <number>
  22. <number>