SlideShare a Scribd company logo
1 of 38
What is Big Data?
What makes data, “Big”
Data?
 No single standard definition…
“Big Data” is data whose scale, diversity, and
complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
 “Big Data” is similar to ‘small data’ but bigger in size.
 But having data bigger it require different approach
- Techniques, Tools and architecture.
 An aim to solve new problems or old problems in
better way.
 Big Data generates value from the storage and
processing of very large quantities of digital
information that can’t be analyzed with traditional
computing techniques.
 A typical PC might have had 10 GB of storage in 2000.
 There are around 6000 tweets every second, which calculates to
over 350,000 tweets per minute and 50 million tweets per day.
 Facebook has over 1.55 billion active users per month and around
1.39 billion mobile active users. Every minute on Facebook, 510
comments are posted, 293,000 statuses are updated and 136,000
photos are uploaded.
 The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of new,
constantly-updated data feeds containing environment, location
and other information, including video.
 Clickstreams and ad impressions capture user behavior at
millions of event per second.
 High frequency stock trading algorithm reflect market changes
within microseconds.
 Machine to machine processes exchange data between billions of
device.
 Infrastructure and sensor generate massive log data in real time.
 On-line gaming systems support million of concurrent users,
each producing multiple inputs per second.
 Big Data isn't just numbers, dates and strings. Big data is
also geospatial data, audio and video and unstructured
text, including log files and social media.
 Traditional database systems were designed to address
smaller volumes of structured data, fewer updates or a
predictable, consistent data structure.
 Big data analysis includes different type of data.
• Automatically generated by machine
(Sensor embedded in an engine)
• Typically an entirely new source of data
(use of internet)
• Not designed to be friendly
(Text Streams)
• May not have much values
(Need to be focus on important parts)
Analysis of data is a process of
inspecting, cleaning, transforming,
and modeling data with the goal of
discovering useful information,
suggesting conclusions, and
supporting decision-making.
Data analysis has multiple facets
and approaches, encompassing
diverse techniques under a variety
of names, in different business,
science, and social science
domains.
Traditional ETL
Vs
Big Data ETL
Analysis on
Big Data
HADOOP
Introduction
• Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting,
who was working at Yahoo! at the time, named it after his son’s toy
elephant.
• It was originally developed to support distribution for the Nutch search
engine project.
• After six years of gestation, Hadoop reaches 1.0.0 and this include
support for security
 HBase (append/hsynch/hflush, and security)
 webhdfs (with full support for security)
 performance enhanced access to local files for HBase
 other performance enhancements, bug fixes, and features
 Apache Hadoop is an open-source software framework written in
Java with some native code in C and command line utilities written
as shell scripts for distributed storage and distributed processing
of very large data sets.
 Computer clusters built from commodity hardware.
 The core of Apache Hadoop consists of a storage part, known as
Hadoop Distributed File System (HDFS), and a processing part
called Map Reduce.
• On July 6, 2015 Apache hadoop comes up with its latest stable release
2.7.1.
• Hadoop 2.7.1 comes after the 131 bug fixes and patches since the
previous release 2.7.0. Please look at the 2.7.0 section below for the
list of enhancements enabled by this first stable release of 2.7.x and
bug fixes.
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-
common/releasenotes.html
HADOOP
Architecture
Cluster is the set of commodity hardware or nodes. Multiple nodes forms a racks. This is the
hardware part of the infrastructure.
HDFS (Hadoop Distributed File System) provide the space of data storage with some
replication factor. File System of Hadoop.
Map Reduce is a programming model to process large set of data. It has
Map Phase
Practitioner Phase
Sort Phase
Shuffle Phase
Combiner Phase
Reducer Phase
YARN Infrastructure (Yet Another Resource Negotiator) is the framework responsible for
providing the computational resources (e.g., CPUs, memory, etc.) needed for application
executions. Two important elements are:
Big data and hadoop introduction
Big data and hadoop introduction
Big data and hadoop introduction

More Related Content

What's hot

big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular networkshubham patil
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"Nicola Ferraro
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analyticsAhmed Banafa
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataEd Dodds
 

What's hot (20)

Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Big data
Big dataBig data
Big data
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Big data management
Big data managementBig data management
Big data management
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Our big data
Our big dataOur big data
Our big data
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 
Big Data
Big DataBig Data
Big Data
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 

Viewers also liked

πειραματικό λύκειο μυτιλήνης ελμε προτύπων και πειραματικών.
πειραματικό λύκειο μυτιλήνης   ελμε προτύπων και πειραματικών.πειραματικό λύκειο μυτιλήνης   ελμε προτύπων και πειραματικών.
πειραματικό λύκειο μυτιλήνης ελμε προτύπων και πειραματικών.MHTSOS2007
 
Teatro y dramatizacion_en_la_escuela
Teatro y dramatizacion_en_la_escuelaTeatro y dramatizacion_en_la_escuela
Teatro y dramatizacion_en_la_escuelaEliairma
 
VIKASSINGH_RESUME
VIKASSINGH_RESUMEVIKASSINGH_RESUME
VIKASSINGH_RESUMEVikas Singh
 
americansyscorp b/o ascitconsultancyservices
americansyscorp b/o ascitconsultancyservicesamericansyscorp b/o ascitconsultancyservices
americansyscorp b/o ascitconsultancyservicesCarmor Bass
 
Ebony drugs front & back cover
Ebony drugs front & back coverEbony drugs front & back cover
Ebony drugs front & back covermspenner
 
Reference Letter from Emily
Reference Letter from EmilyReference Letter from Emily
Reference Letter from EmilyNancy Ahlers
 
المنيو
المنيوالمنيو
المنيوkoktelat
 
6 data types
6 data types6 data types
6 data typesjigeno
 
15 functional programming
15 functional programming15 functional programming
15 functional programmingjigeno
 
Edelweiss financial services broking ltd
Edelweiss financial services broking ltdEdelweiss financial services broking ltd
Edelweiss financial services broking ltdKUMAR PRASHIRSH
 
Masail Fiqhiyyah - Bayi Tabung dan Kloning
Masail Fiqhiyyah - Bayi Tabung dan KloningMasail Fiqhiyyah - Bayi Tabung dan Kloning
Masail Fiqhiyyah - Bayi Tabung dan KloningHaristian Sahroni Putra
 

Viewers also liked (16)

πειραματικό λύκειο μυτιλήνης ελμε προτύπων και πειραματικών.
πειραματικό λύκειο μυτιλήνης   ελμε προτύπων και πειραματικών.πειραματικό λύκειο μυτιλήνης   ελμε προτύπων και πειραματικών.
πειραματικό λύκειο μυτιλήνης ελμε προτύπων και πειραματικών.
 
Teatro y dramatizacion_en_la_escuela
Teatro y dramatizacion_en_la_escuelaTeatro y dramatizacion_en_la_escuela
Teatro y dramatizacion_en_la_escuela
 
VIKASSINGH_RESUME
VIKASSINGH_RESUMEVIKASSINGH_RESUME
VIKASSINGH_RESUME
 
2013 Travel Agency Industry Overview
2013 Travel Agency Industry Overview2013 Travel Agency Industry Overview
2013 Travel Agency Industry Overview
 
Fabaceas%20sub%20family
Fabaceas%20sub%20familyFabaceas%20sub%20family
Fabaceas%20sub%20family
 
americansyscorp b/o ascitconsultancyservices
americansyscorp b/o ascitconsultancyservicesamericansyscorp b/o ascitconsultancyservices
americansyscorp b/o ascitconsultancyservices
 
Y server
Y serverY server
Y server
 
Ebony drugs front & back cover
Ebony drugs front & back coverEbony drugs front & back cover
Ebony drugs front & back cover
 
Reference Letter from Emily
Reference Letter from EmilyReference Letter from Emily
Reference Letter from Emily
 
المنيو
المنيوالمنيو
المنيو
 
Taller recreación infantil.ubj
Taller recreación infantil.ubjTaller recreación infantil.ubj
Taller recreación infantil.ubj
 
6 data types
6 data types6 data types
6 data types
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
 
GMO - Bayi tabung
GMO - Bayi tabungGMO - Bayi tabung
GMO - Bayi tabung
 
Edelweiss financial services broking ltd
Edelweiss financial services broking ltdEdelweiss financial services broking ltd
Edelweiss financial services broking ltd
 
Masail Fiqhiyyah - Bayi Tabung dan Kloning
Masail Fiqhiyyah - Bayi Tabung dan KloningMasail Fiqhiyyah - Bayi Tabung dan Kloning
Masail Fiqhiyyah - Bayi Tabung dan Kloning
 

Similar to Big data and hadoop introduction

Similar to Big data and hadoop introduction (20)

Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
00 hadoop welcome_transcript
00 hadoop welcome_transcript00 hadoop welcome_transcript
00 hadoop welcome_transcript
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Big data and hadoop introduction

  • 1.
  • 2. What is Big Data? What makes data, “Big” Data?
  • 3.  No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
  • 4.  “Big Data” is similar to ‘small data’ but bigger in size.  But having data bigger it require different approach - Techniques, Tools and architecture.  An aim to solve new problems or old problems in better way.  Big Data generates value from the storage and processing of very large quantities of digital information that can’t be analyzed with traditional computing techniques.
  • 5.
  • 6.
  • 7.  A typical PC might have had 10 GB of storage in 2000.  There are around 6000 tweets every second, which calculates to over 350,000 tweets per minute and 50 million tweets per day.  Facebook has over 1.55 billion active users per month and around 1.39 billion mobile active users. Every minute on Facebook, 510 comments are posted, 293,000 statuses are updated and 136,000 photos are uploaded.  The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environment, location and other information, including video.
  • 8.  Clickstreams and ad impressions capture user behavior at millions of event per second.  High frequency stock trading algorithm reflect market changes within microseconds.  Machine to machine processes exchange data between billions of device.  Infrastructure and sensor generate massive log data in real time.  On-line gaming systems support million of concurrent users, each producing multiple inputs per second.
  • 9.  Big Data isn't just numbers, dates and strings. Big data is also geospatial data, audio and video and unstructured text, including log files and social media.  Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.  Big data analysis includes different type of data.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. • Automatically generated by machine (Sensor embedded in an engine) • Typically an entirely new source of data (use of internet) • Not designed to be friendly (Text Streams) • May not have much values (Need to be focus on important parts)
  • 15. Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
  • 17.
  • 18.
  • 19.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 29. • Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting, who was working at Yahoo! at the time, named it after his son’s toy elephant. • It was originally developed to support distribution for the Nutch search engine project. • After six years of gestation, Hadoop reaches 1.0.0 and this include support for security  HBase (append/hsynch/hflush, and security)  webhdfs (with full support for security)  performance enhanced access to local files for HBase  other performance enhancements, bug fixes, and features
  • 30.  Apache Hadoop is an open-source software framework written in Java with some native code in C and command line utilities written as shell scripts for distributed storage and distributed processing of very large data sets.  Computer clusters built from commodity hardware.  The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part called Map Reduce.
  • 31. • On July 6, 2015 Apache hadoop comes up with its latest stable release 2.7.1. • Hadoop 2.7.1 comes after the 131 bug fixes and patches since the previous release 2.7.0. Please look at the 2.7.0 section below for the list of enhancements enabled by this first stable release of 2.7.x and bug fixes. http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop- common/releasenotes.html
  • 32.
  • 34.
  • 35. Cluster is the set of commodity hardware or nodes. Multiple nodes forms a racks. This is the hardware part of the infrastructure. HDFS (Hadoop Distributed File System) provide the space of data storage with some replication factor. File System of Hadoop. Map Reduce is a programming model to process large set of data. It has Map Phase Practitioner Phase Sort Phase Shuffle Phase Combiner Phase Reducer Phase YARN Infrastructure (Yet Another Resource Negotiator) is the framework responsible for providing the computational resources (e.g., CPUs, memory, etc.) needed for application executions. Two important elements are: