SlideShare a Scribd company logo
Structured, Unstructured and Complex Data
                              Management




                              Amit Chaudhary 11MCA03
                                  Karthik Iyer 11MCA05
Hadoop
 What is this?
 Structure of this
 Is this unknown thing right for me?
 Where is this used?
   Any idea? (Idea SIM card)
What is                     ?
 It is an open source project by the
  Apache Foundation to handle large
  data processing
 It was inspired by Google’s MapReduce
  and Google File System (GFS) papers
 It was originally conceived by Doug
  Cutting
 It is named after his son’s pet elephant
  incidentally
Large Data Means?
   1000 kilobytes = 1 Megabyte
   1000 Megabytes = 1 Gigabyte
   1000 Gigabytes = 1 Terabyte
   1000 Terabytes = 1 Petabyte
   1000 Petabytes = 1 Exabyte
   1000 Exabytes = 1 Zettabyte
   1000 Zettabytes = 1 Yottabyte
   1000 Yottabytes = 1 Bronobyte
   1000 Bronobytes = 1 Geopbyte
So what’s the big deal?
 Scalable: New nodes can be added as
  needed, without changing the formats
 Flexible: It is schema-less, and can
  absorb any type of data, structured or
  not, from any number of sources
 Fault tolerant: System redirects work to
  another location if a node fails
Hadoop = HDFS + MapReduce
 HDFS: For storing massive datasets
  using low-cost storage
 MapReduce: The algorithm on which
  Google built its empire
HDFS
 It is a fault-tolerant storage system
 Able to store huge amounts of
  information
 It creates clusters of machines and
  coordinates work among them
 If one fails, it continues to operate the
  cluster without losing data or interrupting
  work, by shifting work to the remaining
  machines in the cluster
HDFS
 It manages storage on the cluster by
  breaking incoming files into
  pieces, called blocks
 Stores each of the blocks redundantly
  across the pool of servers
 It stores three complete copies of each
  file by copying each piece to three
  different servers
How this works?
How this works?
Which companies are
using?
 LinkedIn
 Walt Disney
 Wal-mart
 General Electric
 Nokia
 Bank of America
 Foursquare
at Foursquare
   Foursquare: Mobile + Location + Social
    Networking
Is this unknown thing right for me?

More Related Content

What's hot

Introduction to Numetric (1)
Introduction to Numetric (1)Introduction to Numetric (1)
Introduction to Numetric (1)Matt Polson
 
Big Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce ParadigmsBig Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce Paradigms
Arundhati Kanungo
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
Praveen Hanchinal
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
BigDataCamp
 
Geospatial data
Geospatial dataGeospatial data
Geospatial data
MostafaAliAbbas
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Big Data
Big Data Big Data
Big Data
Sameer Sawhney
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
Ganesh Sanap
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure Cloud
Thilina Gunarathne
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou, MBA, PhD
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
Robert Grossman
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
LeMeniz Infotech
 
Hadoop MapReduce Paradigm
Hadoop MapReduce ParadigmHadoop MapReduce Paradigm
Hadoop MapReduce Paradigm
TarjMehta1
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Big Data Spain
 
Big data 101
Big data 101Big data 101
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
Dataconomy Media
 
Big data management
Big data managementBig data management
Big data management
zeba khanam
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
ideaport
 

What's hot (19)

Introduction to Numetric (1)
Introduction to Numetric (1)Introduction to Numetric (1)
Introduction to Numetric (1)
 
Big Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce ParadigmsBig Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce Paradigms
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
 
Geospatial data
Geospatial dataGeospatial data
Geospatial data
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data
Big Data Big Data
Big Data
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure Cloud
 
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou on "Big Data Technology, Strategy, and Applications"
Gail Zhou on "Big Data Technology, Strategy, and Applications"
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
 
Hadoop MapReduce Paradigm
Hadoop MapReduce ParadigmHadoop MapReduce Paradigm
Hadoop MapReduce Paradigm
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Big data 101
Big data 101Big data 101
Big data 101
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
 
Big data management
Big data managementBig data management
Big data management
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
 

Similar to Hadoop

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi34
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
Nikita Sure
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
Kalyan Hadoop
 
Final deck
Final deckFinal deck
Final deck
Steve Watt
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
Stefano Paluello
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Roushan Sinha
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
KUMARRISHAV37
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
gauravsc36
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
Shashwat Shriparv
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
Spotle.ai
 

Similar to Hadoop (20)

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Final deck
Final deckFinal deck
Final deck
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 

More from Amit Chaudhary

Synonyms 1
Synonyms 1Synonyms 1
Synonyms 1
Amit Chaudhary
 
Nouvelle Technologie 2nd Week
Nouvelle Technologie 2nd WeekNouvelle Technologie 2nd Week
Nouvelle Technologie 2nd WeekAmit Chaudhary
 
Nouvelle Technologie 1st week
Nouvelle Technologie 1st weekNouvelle Technologie 1st week
Nouvelle Technologie 1st weekAmit Chaudhary
 

More from Amit Chaudhary (6)

Synonyms 1
Synonyms 1Synonyms 1
Synonyms 1
 
Nouvelle Technologie 2nd Week
Nouvelle Technologie 2nd WeekNouvelle Technologie 2nd Week
Nouvelle Technologie 2nd Week
 
Nouvelle Technologie 1st week
Nouvelle Technologie 1st weekNouvelle Technologie 1st week
Nouvelle Technologie 1st week
 
Amazon silk browser
Amazon silk browserAmazon silk browser
Amazon silk browser
 
Gps Navigation System
Gps Navigation SystemGps Navigation System
Gps Navigation System
 
Firefox os
Firefox osFirefox os
Firefox os
 

Recently uploaded

Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 

Recently uploaded (20)

Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 

Hadoop

  • 1. Structured, Unstructured and Complex Data Management Amit Chaudhary 11MCA03 Karthik Iyer 11MCA05
  • 2. Hadoop  What is this?  Structure of this  Is this unknown thing right for me?  Where is this used?
  • 3. Any idea? (Idea SIM card)
  • 4. What is ?  It is an open source project by the Apache Foundation to handle large data processing  It was inspired by Google’s MapReduce and Google File System (GFS) papers  It was originally conceived by Doug Cutting  It is named after his son’s pet elephant incidentally
  • 5. Large Data Means?  1000 kilobytes = 1 Megabyte  1000 Megabytes = 1 Gigabyte  1000 Gigabytes = 1 Terabyte  1000 Terabytes = 1 Petabyte  1000 Petabytes = 1 Exabyte  1000 Exabytes = 1 Zettabyte  1000 Zettabytes = 1 Yottabyte  1000 Yottabytes = 1 Bronobyte  1000 Bronobytes = 1 Geopbyte
  • 6. So what’s the big deal?  Scalable: New nodes can be added as needed, without changing the formats  Flexible: It is schema-less, and can absorb any type of data, structured or not, from any number of sources  Fault tolerant: System redirects work to another location if a node fails
  • 7. Hadoop = HDFS + MapReduce  HDFS: For storing massive datasets using low-cost storage  MapReduce: The algorithm on which Google built its empire
  • 8. HDFS  It is a fault-tolerant storage system  Able to store huge amounts of information  It creates clusters of machines and coordinates work among them  If one fails, it continues to operate the cluster without losing data or interrupting work, by shifting work to the remaining machines in the cluster
  • 9. HDFS  It manages storage on the cluster by breaking incoming files into pieces, called blocks  Stores each of the blocks redundantly across the pool of servers  It stores three complete copies of each file by copying each piece to three different servers
  • 12. Which companies are using?  LinkedIn  Walt Disney  Wal-mart  General Electric  Nokia  Bank of America  Foursquare
  • 13. at Foursquare  Foursquare: Mobile + Location + Social Networking
  • 14. Is this unknown thing right for me?

Editor's Notes

  1. Hadoopis only one part under Apache FoundationAccording to IDC, the amount digital information produced in 2012 will be ten times that produced in 2006: 1800 exabytesThe majority of this data will be “unstructured” – complex data poorly-suited to management by structured storage systems like relational databases
  2. 1 Petabyte [where most SME corporations are?]1 Exabyte [where most large corporations are?]1 Zettabyte [where leaders like Facebook and Google are]
  3. -Flexible: Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.80% of the world’s data is unstructured, and most businesses don’t even attempt to use this data to their advantage. Imagine if you had a way to analyze that data?
  4. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodesMapReduce: It refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after a Map.MapReduce was first presented to the world via a 2004 white paper by Google where salient insights were blurt out. Yahoo re-implemented this technique and open sourced it via the Apache foundationAs an analogy, you can think of map and reduce tasks as the way a cen­sus was conducted in Roman times, where the census bureau would dis­patch its people to each city in the empire. Each census taker in each city would be tasked to count the number of people in that city and then return their results to the capital city. There, the results from each city would be reduced to a single count (sum of all cities) to determine the overall popula­tion of the empire. This mapping of people to cities, in parallel, and then com­bining the results (reducing) is much more efficient than sending a single per­son to count every person in the empire in a serial fashion.Large volumes of complex data can hide important insights. Are there buying patterns in point-of-sale data that can forecast demand for products a particular stores?Do user logs from a website, or calling records in a mobile network, contain information about relationships among individual customers? Companies that can extract facts like these from the huge volume of data can better control processes and costs, can better predict demand and build better products
  5. HDFS: Hadoop Distributed File SystemMapReduce: Parellel data-processing frameworkHadoop Common: A set of utilities that support the Hadoop subprojectsHbase: Hadoop database for random read/write accessHive: SQL-like queries and tables on large datasetsPig: Data flow language and compilerOozie: Workflow for interdependent Hadoop jobsSqoop: Integration of databases and data warehouses with HadoopFlume: Configurable streaming data collectionZookeeper: Coordination service for distributed applicationsHue: User interface framework and SDK for visual Hadoop applications
  6. In the very simple example shown, any two servers can fail, and the entire file will still be available. HDFS notices when a block or a node is lost, and creates a new copy of missing data from the replicas it manages. Because the cluster stores several copies of every block, more clients can read them at the same time without creating bottlenecks.
  7. Each of the server runs the analysis on its own block from the file. Results are collated and digested into a single result after each piece has been analyzedRunning the analysis on the nodes that actually store the data delivers much better performance than reading data over the network from a single centralized serverIt monitors jobs during execution, and will restart work lost due to node failure if necessary. In fact, if a particular node is running very slowly, it will restart its work on another server with a copy of the data
  8. All above companies are using for variety of tasks like marketing, advertising, and sentiment and risk analysis. IBM used the software as the engine for its Watson computer, which competed with the champions of TV game show Jeopardy.
  9. Foursquare aimed at letting your friends in almost every country know where you are and figuring where they are.As a platform, it is now aware of 25+ million venues worldwide, each of which can be described by unique signals about who is coming to these places, when, and for how long. To reward and incent users foursquare allows frequent users to collect points, prize “badges,” and eventually coupons, for check-ins