SlideShare a Scribd company logo
1 of 26
re:Introduce Big Data and Hadoop Eco-system
Presented By:
Mohammed Shakir Ali
Oct 21st 2015.
2
What is Big Data ?
Big data is a popular term used to describe the exponential growth and availability of
data, both structured and unstructured. [Ref : www.sas.com]
Big data is a broad term for data sets so large or complex that traditional data processing
applications are inadequate. [Ref: www.wikipedia.com]
Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been
created in the last two years alone. (10^18 bytes = 1000 petabytes).
2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]
3
Characteristics of Big Data.
●
Volume
●
Variety
●
Velocity
●
Veracity
4
Characteristics of Big Data.
●
Volume
●
Variety
●
Velocity
●
Veracity
5
Is Big Data really new ?
Lets check...Google search terms for Big Data vs (Data Analysis and BI).
6
Is Big Data really new ?
Lets check...Google search terms for Big Data vs (Data Analysis and BI).
https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
7
Big Data Management Challenges.
Big Data just keeps growing and growing,...according to Forrester Research:
–The average organization will grow their data by 50 percent in the coming year.
–Overall corporate data will grow by a staggering 94 percent.
–Database systems will grow by 97 percent.
–Server backups for disaster recovery and continuity will expand by 89 percent.
8
Big Data Management Challenges.
Use case of a Leading Medical Research Facility:
-Generates 100 terabytes of data from various instruments,
-Data is copied by 10 different research departments,
- Departments further process the data and add 5 terabytes of additional synthesized data each.
-Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique.
-Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space
used to store it all.
Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique
data.
9
Big Data Management Challenges.
Three basic challenges:
–Storing,
–Processing and
–Managing it efficiently.
Reference:
http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/
Possible Solutions:
–Scale-out architectures to manage large Data
sets
-Reduce the data to unique set of data.
–Data Virtualization to incorporate centralized
management of Data set.
-Reuse of same data footprint and to reduce data
duplication.
Project Open Data
● Several governments around the world are making data available to public.
● Data is a valuable national resource and a strategic asset to the U.S.
Government, its partners, and the public.
● Managing this data as an asset and making it available, discoverable, and
usable – in a word, open – not only strengthens our democracy and
promotes efficiency and effectiveness in government, but also has the
potential to create economic opportunity and improve citizens’ quality of life.
● For example, when the U.S. Government released weather and GPS data to
the public, it fueled an industry that today is valued at tens of billions of
dollars per year.
Reference: https://project-open-data.cio.gov/
Benefits Big Data.
● Cost Reduction
Big data technologies like Hadoop and cloud-based analytics can provide substantial cost
advantages.
● Faster, better decision making
Analytics has always involved attempts to improve decision making, with high seed of
Hadoop and in-memory analytics, several organizations have speed up decision process
systems.
● New products and services.
Use of big data analytics is to create new products and services for customers.
Several organizations have come up with new products/services with help of Big Data.
● Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html
Conclusion
● Increased interest in Big Data and Hadoop eco-system is
seen in recent years.
● Recent trend in Data growth has created new challenges
for Data management, along with new opportunities.
● Several software products/solutions are available to
manage Big Data effectively.
Hadoop architecture Eco-system
14
What is Apache Hadoop
Apache Hadoop is an open-source software framework written in Java for distributed
storage and distributed processing of very large data sets.
- It runs on computer clusters built from commodity hardware.
- All the modules in Hadoop are designed to withstand hardware failures .
15
Apache Hadoop Framework.
Apache Hadoop framework is composed of the following modules:
1) Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on
commodity machines, providing very high aggregate bandwidth across the cluster;
2) Hadoop MapReduce – a programming model for large scale data processing.
3) Hadoop YARN – a resource-management platform responsible for managing computing
resources in clusters and using them for scheduling of users' applications and
4) Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
16
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
17
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21
PB of storage.
18
Apache Hadoop Adaption
On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux
cluster with more than 10,000 cores and produced data that was used in every Yahoo!
web search query.
In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21
PB of storage.
As of 2013, Hadoop adoption is widespread.
For example, more than half of the Fortune 50 use Hadoop
19
Search trends about Big Data.
HPC vs Hadoop search trends:
https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
20
Big Data and Hadoop Architecture
21
Apache Hadoop Architecture
22
Hadoop Cluster Setup
23
Apache Hadoop Projects
●
Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop.
●
Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop
●
Apache Spark: Apache Spark is an open source cluster computing framework originally
developed in the AMPLab at UC, Berkeley.
●
Apache Storm: Apache Storm is a distributed computation framework written
predominantly in the Clojure programming language.
●
Apache Hbase: HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and written in Java.
●
Apache Zookeeper, Impala, Flume, Sqoop…!
24
Search trends about Big Data.
Apache Hadoop vs Apache Spark search trends:
https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
25
Prominent Hadoop Distrubutors
●
Cloudera
●
Hortonworks
●
MapR
26
Hadoop preview:
Cloudera Quickstart VM:
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html
Big Data work flow.
http://insightdataengineering.com/blog/pipeline_map.html

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKristof Jozsa
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Hadoop essential setup
Hadoop essential setupHadoop essential setup
Hadoop essential setupOmid Mogharian
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceEdureka!
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrPranav Kulkarni
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 

What's hot (18)

Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Bigdata & Hadoop
Introduction to Bigdata & HadoopIntroduction to Bigdata & Hadoop
Introduction to Bigdata & Hadoop
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop essential setup
Hadoop essential setupHadoop essential setup
Hadoop essential setup
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Is Hadoop a necessity for Data Science
Is Hadoop a necessity for Data ScienceIs Hadoop a necessity for Data Science
Is Hadoop a necessity for Data Science
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 

Similar to re:Introduce Big Data and Hadoop Eco-system.

Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesJeff Kelly
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companiesRobert Smith
 

Similar to re:Introduce Big Data and Hadoop Eco-system. (20)

Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use cases
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big Data
Big DataBig Data
Big Data
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
BDtraining
BDtrainingBDtraining
BDtraining
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

re:Introduce Big Data and Hadoop Eco-system.

  • 1. re:Introduce Big Data and Hadoop Eco-system Presented By: Mohammed Shakir Ali Oct 21st 2015.
  • 2. 2 What is Big Data ? Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. [Ref : www.sas.com] Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. [Ref: www.wikipedia.com] Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. (10^18 bytes = 1000 petabytes). 2.5 Quintillion bytes = 2500 petabytes. [Ref: www.ibm.com/software/au/data/bigdata/]
  • 3. 3 Characteristics of Big Data. ● Volume ● Variety ● Velocity ● Veracity
  • 4. 4 Characteristics of Big Data. ● Volume ● Variety ● Velocity ● Veracity
  • 5. 5 Is Big Data really new ? Lets check...Google search terms for Big Data vs (Data Analysis and BI).
  • 6. 6 Is Big Data really new ? Lets check...Google search terms for Big Data vs (Data Analysis and BI). https://www.google.com/trends/explore#q=Big%20Data%2C%20Data%20Analysis%2C%20Business%20Intelligence&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 7. 7 Big Data Management Challenges. Big Data just keeps growing and growing,...according to Forrester Research: –The average organization will grow their data by 50 percent in the coming year. –Overall corporate data will grow by a staggering 94 percent. –Database systems will grow by 97 percent. –Server backups for disaster recovery and continuity will expand by 89 percent.
  • 8. 8 Big Data Management Challenges. Use case of a Leading Medical Research Facility: -Generates 100 terabytes of data from various instruments, -Data is copied by 10 different research departments, - Departments further process the data and add 5 terabytes of additional synthesized data each. -Now they must manage a total of over a Petabyte of data, of which less than 150 terabytes is unique. -Entire Petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space used to store it all. Now the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique data.
  • 9. 9 Big Data Management Challenges. Three basic challenges: –Storing, –Processing and –Managing it efficiently. Reference: http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/ Possible Solutions: –Scale-out architectures to manage large Data sets -Reduce the data to unique set of data. –Data Virtualization to incorporate centralized management of Data set. -Reuse of same data footprint and to reduce data duplication.
  • 10. Project Open Data ● Several governments around the world are making data available to public. ● Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public. ● Managing this data as an asset and making it available, discoverable, and usable – in a word, open – not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens’ quality of life. ● For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Reference: https://project-open-data.cio.gov/
  • 11. Benefits Big Data. ● Cost Reduction Big data technologies like Hadoop and cloud-based analytics can provide substantial cost advantages. ● Faster, better decision making Analytics has always involved attempts to improve decision making, with high seed of Hadoop and in-memory analytics, several organizations have speed up decision process systems. ● New products and services. Use of big data analytics is to create new products and services for customers. Several organizations have come up with new products/services with help of Big Data. ● Reference : https://www.sas.com/fr_fr/news/sascom/2014q3/Big-data-davenport.html
  • 12. Conclusion ● Increased interest in Big Data and Hadoop eco-system is seen in recent years. ● Recent trend in Data growth has created new challenges for Data management, along with new opportunities. ● Several software products/solutions are available to manage Big Data effectively.
  • 14. 14 What is Apache Hadoop Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets. - It runs on computer clusters built from commodity hardware. - All the modules in Hadoop are designed to withstand hardware failures .
  • 15. 15 Apache Hadoop Framework. Apache Hadoop framework is composed of the following modules: 1) Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster; 2) Hadoop MapReduce – a programming model for large scale data processing. 3) Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications and 4) Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
  • 16. 16 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query.
  • 17. 17 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.
  • 18. 18 Apache Hadoop Adaption On February 19, 2008, Yahoo! Inc. launched large Hadoop Cluster running on a Linux cluster with more than 10,000 cores and produced data that was used in every Yahoo! web search query. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. As of 2013, Hadoop adoption is widespread. For example, more than half of the Fortune 50 use Hadoop
  • 19. 19 Search trends about Big Data. HPC vs Hadoop search trends: https://www.google.com/trends/explore#q=HPC%2C%20Hadoop&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 20. 20 Big Data and Hadoop Architecture
  • 23. 23 Apache Hadoop Projects ● Apache Pig: is a high-level platform for creating MapReduce programs used with Hadoop. ● Apache Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop ● Apache Spark: Apache Spark is an open source cluster computing framework originally developed in the AMPLab at UC, Berkeley. ● Apache Storm: Apache Storm is a distributed computation framework written predominantly in the Clojure programming language. ● Apache Hbase: HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. ● Apache Zookeeper, Impala, Flume, Sqoop…!
  • 24. 24 Search trends about Big Data. Apache Hadoop vs Apache Spark search trends: https://www.google.com/trends/explore#q=Hadoop%2C%20Apache%20Spark&geo=US&date=1%2F2005%20121m&cmpt=q&tz=Etc%2FGMT-10
  • 26. 26 Hadoop preview: Cloudera Quickstart VM: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html Big Data work flow. http://insightdataengineering.com/blog/pipeline_map.html

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>
  15. <number>
  16. <number>
  17. <number>
  18. <number>
  19. <number>
  20. <number>
  21. <number>
  22. <number>