SlideShare a Scribd company logo
1 of 27
©copyright Ankur Raina 2012
• 3 million lines of code are tracking your
  checked baggage.
• A billion lines of code are included in the
  working of the latest airbus plane.
• A billion transistors per person.
• 4 billion mobile phone subscribers.
• St. Anthony Falls Bridge ( Minneapolis) is fitted
  with 200 embedded sensors.
                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
• 2001
 8 Lakh Petabytes of data

• 2020
 35 zettabytes of data

                 ©copyright Ankur Raina 2012
7 TB/day



                            10 TB/day
       ©copyright Ankur Raina 2012
The Trouble begins here…
• 80% of the world’s information is
  unstructured.

• Unstructured information is growing at 15
  times the rate of structured information.



                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Contents
•   What is Big Data ?
•   The 3Vs.
•   What is a Big Data platform ?
•   Needle in a haystack problem.
•   Big Data & Social Media.
•   The Call Centre mantra.
•   ABCs of Hadoop.
                   ©copyright Ankur Raina 2012
Big Data
The information which cannot be
  processed/analyzed using the
  traditional processes or tools.
        • Instrumentation
        • Interconnection
     • M2M interconnectivity
      • Intelligent Machines
          ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Big Data Platform
• Lets you store the data in its native business
  object format & get value out of it through
  massive parallelism on readily available
  components.

• It’s not a replacement of Data Warehouse.


                   ©copyright Ankur Raina 2012
Service Oriented Architecture (SOA )
                             This is what I need !!!




                  ©copyright Ankur Raina 2012
Social Media
We know…
• What are the people saying ?

But…
• Why are people saying what they are saying &
  behaving in the way they are behaving ?


                 ©copyright Ankur Raina 2012
• Super Bowl 2011 (4064 Ttps ,Feb 2011)
• Bin Laden’s death ( 5106 Ttps )
• Japan Earthquake ( 6939 Ttps )
• Paraghay’s football penalty shootout win over
  Brazil in the Copa America quarter-final
  peaked at 7166 Ttps
• Same day U.S match win in the FIFA women’s
  world cup -> 7196 Ttps
• Singer Beyonce’s pregnancy announcement
  (8868 Ttps )
                  ©copyright Ankur Raina 2012
• In-Motion Analytics ( Streams Computing )
• Using At Rest ( BigInsights)


                 ©copyright Ankur Raina 2012
HADOOP
• Creator: Doug Cutting
• Top-level Apache Project.
• Inspired by Google’s work on it GFS ( Google
  File System ).
• Function-to-data model & not data-to-
  function model.

                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop

                                       Hadoop
HDFS     Map Reduce                   Common
                                     Components


       ©copyright Ankur Raina 2012
Hadoop Distributed File System
• Data broken into blocks & distributed throughout the
  cluster.
• Data locality.
• Mean Time To Failure ( MTTF )
• Block size ( 64MB default )
• Higher block sizes available for longer files to reduce
  the amount of metadata. ( BigInsights 128 MB )
• Redundancy
• Name Node server
                     ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Map Reduce
• Map job which takes a set of data and
  converts it into another set of data where
  individual elements are broken down into
  tuples.
• Reduce job takes the output from a map as
  input & combines those data tuples into
  smaller set of tuples.

                 ©copyright Ankur Raina 2012
Map Reduce
•   Job
•   Tasks
•   Job Tracker
•   Task Tracker Agents
•   Shuffle
•   Combiner

                   ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop Common Components
• Set of libraries that support various Hadoop
  subprojects.
• /bin/hdfs dfs <args>
Command         Function


chmod           Changes the permissions for reading & writing to a given file/set
                of files.
chown           Changes the owner of a given file/set of files


copyFromLocal   Copies a file from the local file system into HDFS

                              ©copyright Ankur Raina 2012
Command       Function
copyToLocal   Copies a file from HDFS to the local file system.

cp            Copies HDFS files from one directory to another.

expunge       Empties all files that are in the trash.

cat           Copies the files to standard output.

ls            Displays a listing of files in a given directory.

mkdir         Creates a directory in HDFS.

mv            Moves files from one directory to another.

rm            Deletes a file 7 sends it to the trash. ( use –skiptrash option for
              deleting permanently).
                         ©copyright Ankur Raina 2012
References
• www.ibm.com
• www.hadoop.apache.org
• Understanding Big Data by Chris, Dirk, Tom,
  George & Paul ( McGraw Hill )
• Oracle Magazine



                 ©copyright Ankur Raina 2012

More Related Content

What's hot

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 

What's hot (20)

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache spark
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
 

Viewers also liked

Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
harirk1986
 

Viewers also liked (11)

Pengenalan kelas android v2.0
Pengenalan kelas android v2.0Pengenalan kelas android v2.0
Pengenalan kelas android v2.0
 
Oracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbai
Oracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbaiOracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbai
Oracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbai
 
Oracle11g
Oracle11gOracle11g
Oracle11g
 
Sql project presentation
Sql project presentationSql project presentation
Sql project presentation
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
 
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
 
01. Pengenalan OA Java Fundamentals
01. Pengenalan OA Java Fundamentals01. Pengenalan OA Java Fundamentals
01. Pengenalan OA Java Fundamentals
 
E
EE
E
 
Oracle-Mengendalikan User
Oracle-Mengendalikan UserOracle-Mengendalikan User
Oracle-Mengendalikan User
 
Oracle-Pengenalan Oracle
Oracle-Pengenalan OracleOracle-Pengenalan Oracle
Oracle-Pengenalan Oracle
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 

Similar to Big data

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 

Similar to Big data (20)

Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
DataCore Case Study on Hyperconverged
DataCore Case Study on HyperconvergedDataCore Case Study on Hyperconverged
DataCore Case Study on Hyperconverged
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Big Data
Big DataBig Data
Big Data
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Spark_Talha.pptx
Spark_Talha.pptxSpark_Talha.pptx
Spark_Talha.pptx
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 

More from Ankur Raina (6)

Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 
PyMongo for PyCon First Draft
PyMongo for PyCon First DraftPyMongo for PyCon First Draft
PyMongo for PyCon First Draft
 
Mug17 gurgaon
Mug17 gurgaonMug17 gurgaon
Mug17 gurgaon
 
Ankur py mongo.pptx
Ankur py mongo.pptxAnkur py mongo.pptx
Ankur py mongo.pptx
 
Oracle SQL Basics by Ankur Raina
Oracle SQL Basics by Ankur RainaOracle SQL Basics by Ankur Raina
Oracle SQL Basics by Ankur Raina
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Big data

  • 1.
  • 3. • 3 million lines of code are tracking your checked baggage. • A billion lines of code are included in the working of the latest airbus plane. • A billion transistors per person. • 4 billion mobile phone subscribers. • St. Anthony Falls Bridge ( Minneapolis) is fitted with 200 embedded sensors. ©copyright Ankur Raina 2012
  • 5. • 2001 8 Lakh Petabytes of data • 2020 35 zettabytes of data ©copyright Ankur Raina 2012
  • 6. 7 TB/day 10 TB/day ©copyright Ankur Raina 2012
  • 7. The Trouble begins here… • 80% of the world’s information is unstructured. • Unstructured information is growing at 15 times the rate of structured information. ©copyright Ankur Raina 2012
  • 9. Contents • What is Big Data ? • The 3Vs. • What is a Big Data platform ? • Needle in a haystack problem. • Big Data & Social Media. • The Call Centre mantra. • ABCs of Hadoop. ©copyright Ankur Raina 2012
  • 10. Big Data The information which cannot be processed/analyzed using the traditional processes or tools. • Instrumentation • Interconnection • M2M interconnectivity • Intelligent Machines ©copyright Ankur Raina 2012
  • 12. Big Data Platform • Lets you store the data in its native business object format & get value out of it through massive parallelism on readily available components. • It’s not a replacement of Data Warehouse. ©copyright Ankur Raina 2012
  • 13. Service Oriented Architecture (SOA ) This is what I need !!! ©copyright Ankur Raina 2012
  • 14. Social Media We know… • What are the people saying ? But… • Why are people saying what they are saying & behaving in the way they are behaving ? ©copyright Ankur Raina 2012
  • 15. • Super Bowl 2011 (4064 Ttps ,Feb 2011) • Bin Laden’s death ( 5106 Ttps ) • Japan Earthquake ( 6939 Ttps ) • Paraghay’s football penalty shootout win over Brazil in the Copa America quarter-final peaked at 7166 Ttps • Same day U.S match win in the FIFA women’s world cup -> 7196 Ttps • Singer Beyonce’s pregnancy announcement (8868 Ttps ) ©copyright Ankur Raina 2012
  • 16. • In-Motion Analytics ( Streams Computing ) • Using At Rest ( BigInsights) ©copyright Ankur Raina 2012
  • 17. HADOOP • Creator: Doug Cutting • Top-level Apache Project. • Inspired by Google’s work on it GFS ( Google File System ). • Function-to-data model & not data-to- function model. ©copyright Ankur Raina 2012
  • 19. Hadoop Hadoop HDFS Map Reduce Common Components ©copyright Ankur Raina 2012
  • 20. Hadoop Distributed File System • Data broken into blocks & distributed throughout the cluster. • Data locality. • Mean Time To Failure ( MTTF ) • Block size ( 64MB default ) • Higher block sizes available for longer files to reduce the amount of metadata. ( BigInsights 128 MB ) • Redundancy • Name Node server ©copyright Ankur Raina 2012
  • 22. Map Reduce • Map job which takes a set of data and converts it into another set of data where individual elements are broken down into tuples. • Reduce job takes the output from a map as input & combines those data tuples into smaller set of tuples. ©copyright Ankur Raina 2012
  • 23. Map Reduce • Job • Tasks • Job Tracker • Task Tracker Agents • Shuffle • Combiner ©copyright Ankur Raina 2012
  • 25. Hadoop Common Components • Set of libraries that support various Hadoop subprojects. • /bin/hdfs dfs <args> Command Function chmod Changes the permissions for reading & writing to a given file/set of files. chown Changes the owner of a given file/set of files copyFromLocal Copies a file from the local file system into HDFS ©copyright Ankur Raina 2012
  • 26. Command Function copyToLocal Copies a file from HDFS to the local file system. cp Copies HDFS files from one directory to another. expunge Empties all files that are in the trash. cat Copies the files to standard output. ls Displays a listing of files in a given directory. mkdir Creates a directory in HDFS. mv Moves files from one directory to another. rm Deletes a file 7 sends it to the trash. ( use –skiptrash option for deleting permanently). ©copyright Ankur Raina 2012
  • 27. References • www.ibm.com • www.hadoop.apache.org • Understanding Big Data by Chris, Dirk, Tom, George & Paul ( McGraw Hill ) • Oracle Magazine ©copyright Ankur Raina 2012