SlideShare a Scribd company logo
©copyright Ankur Raina 2012
• 3 million lines of code are tracking your
  checked baggage.
• A billion lines of code are included in the
  working of the latest airbus plane.
• A billion transistors per person.
• 4 billion mobile phone subscribers.
• St. Anthony Falls Bridge ( Minneapolis) is fitted
  with 200 embedded sensors.
                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
• 2001
 8 Lakh Petabytes of data

• 2020
 35 zettabytes of data

                 ©copyright Ankur Raina 2012
7 TB/day



                            10 TB/day
       ©copyright Ankur Raina 2012
The Trouble begins here…
• 80% of the world’s information is
  unstructured.

• Unstructured information is growing at 15
  times the rate of structured information.



                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Contents
•   What is Big Data ?
•   The 3Vs.
•   What is a Big Data platform ?
•   Needle in a haystack problem.
•   Big Data & Social Media.
•   The Call Centre mantra.
•   ABCs of Hadoop.
                   ©copyright Ankur Raina 2012
Big Data
The information which cannot be
  processed/analyzed using the
  traditional processes or tools.
        • Instrumentation
        • Interconnection
     • M2M interconnectivity
      • Intelligent Machines
          ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Big Data Platform
• Lets you store the data in its native business
  object format & get value out of it through
  massive parallelism on readily available
  components.

• It’s not a replacement of Data Warehouse.


                   ©copyright Ankur Raina 2012
Service Oriented Architecture (SOA )
                             This is what I need !!!




                  ©copyright Ankur Raina 2012
Social Media
We know…
• What are the people saying ?

But…
• Why are people saying what they are saying &
  behaving in the way they are behaving ?


                 ©copyright Ankur Raina 2012
• Super Bowl 2011 (4064 Ttps ,Feb 2011)
• Bin Laden’s death ( 5106 Ttps )
• Japan Earthquake ( 6939 Ttps )
• Paraghay’s football penalty shootout win over
  Brazil in the Copa America quarter-final
  peaked at 7166 Ttps
• Same day U.S match win in the FIFA women’s
  world cup -> 7196 Ttps
• Singer Beyonce’s pregnancy announcement
  (8868 Ttps )
                  ©copyright Ankur Raina 2012
• In-Motion Analytics ( Streams Computing )
• Using At Rest ( BigInsights)


                 ©copyright Ankur Raina 2012
HADOOP
• Creator: Doug Cutting
• Top-level Apache Project.
• Inspired by Google’s work on it GFS ( Google
  File System ).
• Function-to-data model & not data-to-
  function model.

                  ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop

                                       Hadoop
HDFS     Map Reduce                   Common
                                     Components


       ©copyright Ankur Raina 2012
Hadoop Distributed File System
• Data broken into blocks & distributed throughout the
  cluster.
• Data locality.
• Mean Time To Failure ( MTTF )
• Block size ( 64MB default )
• Higher block sizes available for longer files to reduce
  the amount of metadata. ( BigInsights 128 MB )
• Redundancy
• Name Node server
                     ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Map Reduce
• Map job which takes a set of data and
  converts it into another set of data where
  individual elements are broken down into
  tuples.
• Reduce job takes the output from a map as
  input & combines those data tuples into
  smaller set of tuples.

                 ©copyright Ankur Raina 2012
Map Reduce
•   Job
•   Tasks
•   Job Tracker
•   Task Tracker Agents
•   Shuffle
•   Combiner

                   ©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop Common Components
• Set of libraries that support various Hadoop
  subprojects.
• /bin/hdfs dfs <args>
Command         Function


chmod           Changes the permissions for reading & writing to a given file/set
                of files.
chown           Changes the owner of a given file/set of files


copyFromLocal   Copies a file from the local file system into HDFS

                              ©copyright Ankur Raina 2012
Command       Function
copyToLocal   Copies a file from HDFS to the local file system.

cp            Copies HDFS files from one directory to another.

expunge       Empties all files that are in the trash.

cat           Copies the files to standard output.

ls            Displays a listing of files in a given directory.

mkdir         Creates a directory in HDFS.

mv            Moves files from one directory to another.

rm            Deletes a file 7 sends it to the trash. ( use –skiptrash option for
              deleting permanently).
                         ©copyright Ankur Raina 2012
References
• www.ibm.com
• www.hadoop.apache.org
• Understanding Big Data by Chris, Dirk, Tom,
  George & Paul ( McGraw Hill )
• Oracle Magazine



                 ©copyright Ankur Raina 2012

More Related Content

What's hot

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 

What's hot (20)

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache spark
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
 

Viewers also liked

Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
harirk1986
 

Viewers also liked (11)

Pengenalan kelas android v2.0
Pengenalan kelas android v2.0Pengenalan kelas android v2.0
Pengenalan kelas android v2.0
 
Oracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbai
Oracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbaiOracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbai
Oracle11g training-course-navi-mumbai-oracle11gl-course-provider-navi-mumbai
 
Oracle11g
Oracle11gOracle11g
Oracle11g
 
Sql project presentation
Sql project presentationSql project presentation
Sql project presentation
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
 
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
 
01. Pengenalan OA Java Fundamentals
01. Pengenalan OA Java Fundamentals01. Pengenalan OA Java Fundamentals
01. Pengenalan OA Java Fundamentals
 
E
EE
E
 
Oracle-Mengendalikan User
Oracle-Mengendalikan UserOracle-Mengendalikan User
Oracle-Mengendalikan User
 
Oracle-Pengenalan Oracle
Oracle-Pengenalan OracleOracle-Pengenalan Oracle
Oracle-Pengenalan Oracle
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 

Similar to Big data

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 

Similar to Big data (20)

Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
DataCore Case Study on Hyperconverged
DataCore Case Study on HyperconvergedDataCore Case Study on Hyperconverged
DataCore Case Study on Hyperconverged
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Big Data
Big DataBig Data
Big Data
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Spark_Talha.pptx
Spark_Talha.pptxSpark_Talha.pptx
Spark_Talha.pptx
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 

More from Ankur Raina (6)

Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 
PyMongo for PyCon First Draft
PyMongo for PyCon First DraftPyMongo for PyCon First Draft
PyMongo for PyCon First Draft
 
Mug17 gurgaon
Mug17 gurgaonMug17 gurgaon
Mug17 gurgaon
 
Ankur py mongo.pptx
Ankur py mongo.pptxAnkur py mongo.pptx
Ankur py mongo.pptx
 
Oracle SQL Basics by Ankur Raina
Oracle SQL Basics by Ankur RainaOracle SQL Basics by Ankur Raina
Oracle SQL Basics by Ankur Raina
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Big data

  • 1.
  • 3. • 3 million lines of code are tracking your checked baggage. • A billion lines of code are included in the working of the latest airbus plane. • A billion transistors per person. • 4 billion mobile phone subscribers. • St. Anthony Falls Bridge ( Minneapolis) is fitted with 200 embedded sensors. ©copyright Ankur Raina 2012
  • 5. • 2001 8 Lakh Petabytes of data • 2020 35 zettabytes of data ©copyright Ankur Raina 2012
  • 6. 7 TB/day 10 TB/day ©copyright Ankur Raina 2012
  • 7. The Trouble begins here… • 80% of the world’s information is unstructured. • Unstructured information is growing at 15 times the rate of structured information. ©copyright Ankur Raina 2012
  • 9. Contents • What is Big Data ? • The 3Vs. • What is a Big Data platform ? • Needle in a haystack problem. • Big Data & Social Media. • The Call Centre mantra. • ABCs of Hadoop. ©copyright Ankur Raina 2012
  • 10. Big Data The information which cannot be processed/analyzed using the traditional processes or tools. • Instrumentation • Interconnection • M2M interconnectivity • Intelligent Machines ©copyright Ankur Raina 2012
  • 12. Big Data Platform • Lets you store the data in its native business object format & get value out of it through massive parallelism on readily available components. • It’s not a replacement of Data Warehouse. ©copyright Ankur Raina 2012
  • 13. Service Oriented Architecture (SOA ) This is what I need !!! ©copyright Ankur Raina 2012
  • 14. Social Media We know… • What are the people saying ? But… • Why are people saying what they are saying & behaving in the way they are behaving ? ©copyright Ankur Raina 2012
  • 15. • Super Bowl 2011 (4064 Ttps ,Feb 2011) • Bin Laden’s death ( 5106 Ttps ) • Japan Earthquake ( 6939 Ttps ) • Paraghay’s football penalty shootout win over Brazil in the Copa America quarter-final peaked at 7166 Ttps • Same day U.S match win in the FIFA women’s world cup -> 7196 Ttps • Singer Beyonce’s pregnancy announcement (8868 Ttps ) ©copyright Ankur Raina 2012
  • 16. • In-Motion Analytics ( Streams Computing ) • Using At Rest ( BigInsights) ©copyright Ankur Raina 2012
  • 17. HADOOP • Creator: Doug Cutting • Top-level Apache Project. • Inspired by Google’s work on it GFS ( Google File System ). • Function-to-data model & not data-to- function model. ©copyright Ankur Raina 2012
  • 19. Hadoop Hadoop HDFS Map Reduce Common Components ©copyright Ankur Raina 2012
  • 20. Hadoop Distributed File System • Data broken into blocks & distributed throughout the cluster. • Data locality. • Mean Time To Failure ( MTTF ) • Block size ( 64MB default ) • Higher block sizes available for longer files to reduce the amount of metadata. ( BigInsights 128 MB ) • Redundancy • Name Node server ©copyright Ankur Raina 2012
  • 22. Map Reduce • Map job which takes a set of data and converts it into another set of data where individual elements are broken down into tuples. • Reduce job takes the output from a map as input & combines those data tuples into smaller set of tuples. ©copyright Ankur Raina 2012
  • 23. Map Reduce • Job • Tasks • Job Tracker • Task Tracker Agents • Shuffle • Combiner ©copyright Ankur Raina 2012
  • 25. Hadoop Common Components • Set of libraries that support various Hadoop subprojects. • /bin/hdfs dfs <args> Command Function chmod Changes the permissions for reading & writing to a given file/set of files. chown Changes the owner of a given file/set of files copyFromLocal Copies a file from the local file system into HDFS ©copyright Ankur Raina 2012
  • 26. Command Function copyToLocal Copies a file from HDFS to the local file system. cp Copies HDFS files from one directory to another. expunge Empties all files that are in the trash. cat Copies the files to standard output. ls Displays a listing of files in a given directory. mkdir Creates a directory in HDFS. mv Moves files from one directory to another. rm Deletes a file 7 sends it to the trash. ( use –skiptrash option for deleting permanently). ©copyright Ankur Raina 2012
  • 27. References • www.ibm.com • www.hadoop.apache.org • Understanding Big Data by Chris, Dirk, Tom, George & Paul ( McGraw Hill ) • Oracle Magazine ©copyright Ankur Raina 2012