SlideShare a Scribd company logo
1 of 25
Download to read offline
Bigdata and Hadoop Tutorial
By www.ITJobZone.biz
Audience
Students
Working Professionals
Developers
Database experts
Adminstrators
Those looking for career change.
Topics Covered
What is Bigdata?
Characteristics of Bigdata.
Use Cases and Need of Hadoop.
Apache Hadoop Ecosystem and its components
Brief History of Hadoop.
Hadoop Architecture.
Who Uses Hadoop
What is Bigdata?
Lots of Data (Terabytes or PetaBytes)
Bigdata is term for collection of datasets so large and
complex that it becomes difficult to process using on
hand database tools or Traditional data processing
applications.
The challenges include capture, curation, storage, search,
sharing, transfer, analysis and visualization.
What is Bigdata?
System /Enterprises generates huge amount of data from Terabytes to Petabytes of information
Stock Market generates around one terabyte of new data per day to
perform stock trading analysis and determine trends for optimal trades.
Bigdata Characteristics
Bigdata - Types of Data
● Structured Data
- Data from enterprise (ERP, CRM)
● Semi structured Data
- xml, json, csv, log files
● Unstructured Data
- audio, video, image, archive documents
Unstructured Data is Exploding
Bigdata Scenarios
Web and e-tailing
Recommendation Engines
Ad Targeting
Search Quality
Abuse and Click Fraud Detection
Telecommunication
Customer Churn Analysis and Prevention
Network Performance Optimization
Call Data Record (CDR) Analysis
Analysing Network to Predict Failure
Bigdata Scenarios (Cont…)
Government
Fraud Detection and Cyber Security
Welfare Schemes
Justice
Telecommunication
Health Information Exchange
Gene Sequencing
Serialization
Healthcare service Quality Improvements
Drug Safety
Brief history of Hadoop
Designed to answer the question
“ How to process big data with reasonable cost and time”?
Google ( published white papers)
GFS
2003
Mapreduce 2005
Yahoo( implementation by Doug Cutting)
HDFS 2006-07
Mapreduce 2007-08
What is Hadoop
Apache top level project, Opensource implementation of
frameworks for reliable, scalable, distributed computing
and storage.
It is a flexible and highly-available architecture for large
scale computation and data processing on a network of
commodity hardware.
Goals / Requirements
Abstract & facilitate the storage and processing of large and rapidly growing data.
• Structured and unstructured datasets
• Simple programming model
High scalability and availability
Use commodity hardware with little redundancy
Fault tolerance
Move computation rather data
Goals / Requirements
Hardware will fail.
Processing will be run in batches. Thus there is an emphasis on
high throughput as opposed to low latency.
Applications that run on HDFS have large data sets. A typical
file in HDFS is gigabytes to terabytes in size.
It should provide high aggregate data bandwidth and scale to
hundreds of nodes in a single cluster. It should support tens
of millions of files in a single instance.
Applications need a write-once-read-many access model.
Architecture
Distributed, with some centralization
Main nodes of cluster are where most of the computational power and storage
of the system lies
Main nodes run TaskTracker to accept and reply to MapReduce tasks, and
also DataNode to store needed blocks closely as possible
Central control node runs NameNode to keep track of HDFS directories &
files, and JobTracker to dispatch compute tasks to TaskTracker
Written in Java, also supports Python and Ruby
Architecture
Architecture
Architecture
Hadoop Distributed Filesystem
Tailored to needs of MapReduce
Targeted towards many reads of filestreams
Writes are more costly
High degree of data replication (3x by default)
No need for RAID on normal nodes
Large blocksize (64MB)
Location awareness of DataNodes in network
Architecture - NameNode:
Stores metadata for the files, like the directory structure of a typical FS.
The server holding the NameNode instance is quite crucial, as there is
only one.
Transaction log for file deletes/adds, etc. Does not use transactions for
whole blocks or file-streams, only metadata.
Handles creation of more replica blocks when necessary after a
DataNode failure
Architecture - DataNode:
Stores the actual data in HDFS
Can run on any underlying filesystem (ext3/4, NTFS, etc)
Notifies NameNode of what blocks it has
NameNode replicates blocks 2x in local rack, 1x elsewhere
Architecture
Some examples
•Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop;
biggest cluster: 2000 nodes (2*4cpu boxes with 4TB disk each); used to support
research for Ad Systems and Web Search
•AOL : Used for a variety of things ranging from statistics generation to running
advanced algorithms for doing behavioral analysis and targeting; cluster size is
50 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and
800 GB hard-disk giving us a total of 37 TB HDFS capacity.
•Facebook: To store copies of internal log and dimension data sources and use
it as a source for reporting/analytics and machine learning; 320 machine cluster
with 2,560 cores and about 1.3 PB raw storage;
Join Our Hadoop Online Training
http://www.itjobzone.biz/Big-Data-Hadoop-training.html
Get More Hadoop Tutorials on YouTube
Big Data and Hadoop Introduction Training Demo
Click - https://youtu.be/SHxf-v8ePk4
Apache Hadoop Standalone Installation Using Virtual box Training Demo
Click - https://youtu.be/u-YhaIkqubU

More Related Content

What's hot

Architecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automationArchitecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automationChadni Islam
 
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...PECB
 
Cyber Threat Intelligence
Cyber Threat IntelligenceCyber Threat Intelligence
Cyber Threat Intelligencemohamed nasri
 
Effective Cyber Defense Using CIS Critical Security Controls
Effective Cyber Defense Using CIS Critical Security ControlsEffective Cyber Defense Using CIS Critical Security Controls
Effective Cyber Defense Using CIS Critical Security ControlsBSides Delhi
 
DTS Solution - Cyber Security Services Portfolio
DTS Solution - Cyber Security Services PortfolioDTS Solution - Cyber Security Services Portfolio
DTS Solution - Cyber Security Services PortfolioShah Sheikh
 
Cybersecurity for Critical National Infrastructure
Cybersecurity for Critical National InfrastructureCybersecurity for Critical National Infrastructure
Cybersecurity for Critical National InfrastructureDr David Probert
 
SOC Architecture Workshop - Part 1
SOC Architecture Workshop - Part 1SOC Architecture Workshop - Part 1
SOC Architecture Workshop - Part 1Priyanka Aash
 
SOC: Use cases and are we asking the right questions?
SOC: Use cases and are we asking the right questions?SOC: Use cases and are we asking the right questions?
SOC: Use cases and are we asking the right questions?Jonathan Sinclair
 
Cyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptxCyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptxAbimbolaFisher1
 
Threat Hunting - Moving from the ad hoc to the formal
Threat Hunting - Moving from the ad hoc to the formalThreat Hunting - Moving from the ad hoc to the formal
Threat Hunting - Moving from the ad hoc to the formalPriyanka Aash
 
Cyber security from military point of view
Cyber security from military point of viewCyber security from military point of view
Cyber security from military point of viewS.E. CTS CERT-GOV-MD
 
Integrated Security Operations Center (ISOC) for Cybersecurity Collaboration
Integrated Security Operations Center (ISOC) for Cybersecurity CollaborationIntegrated Security Operations Center (ISOC) for Cybersecurity Collaboration
Integrated Security Operations Center (ISOC) for Cybersecurity CollaborationPriyanka Aash
 
NIST CyberSecurity Framework: An Overview
NIST CyberSecurity Framework: An OverviewNIST CyberSecurity Framework: An Overview
NIST CyberSecurity Framework: An OverviewTandhy Simanjuntak
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief OverviewHal Kalechofsky
 
Cybersecurity trends - What to expect in 2023
Cybersecurity trends - What to expect in 2023Cybersecurity trends - What to expect in 2023
Cybersecurity trends - What to expect in 2023PECB
 
AI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and SolutionsAI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and SolutionsZoneFox
 
Cybersecurity Risk Management Program and Your Organization
Cybersecurity Risk Management Program and Your OrganizationCybersecurity Risk Management Program and Your Organization
Cybersecurity Risk Management Program and Your OrganizationMcKonly & Asbury, LLP
 
Security on Cloud Computing
Security on Cloud Computing Security on Cloud Computing
Security on Cloud Computing Reza Pahlava
 

What's hot (20)

Architecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automationArchitecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automation
 
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
Information Security vs. Data Governance vs. Data Protection: What Is the Rea...
 
Cyber Threat Intelligence
Cyber Threat IntelligenceCyber Threat Intelligence
Cyber Threat Intelligence
 
Effective Cyber Defense Using CIS Critical Security Controls
Effective Cyber Defense Using CIS Critical Security ControlsEffective Cyber Defense Using CIS Critical Security Controls
Effective Cyber Defense Using CIS Critical Security Controls
 
DTS Solution - Cyber Security Services Portfolio
DTS Solution - Cyber Security Services PortfolioDTS Solution - Cyber Security Services Portfolio
DTS Solution - Cyber Security Services Portfolio
 
Cybersecurity for Critical National Infrastructure
Cybersecurity for Critical National InfrastructureCybersecurity for Critical National Infrastructure
Cybersecurity for Critical National Infrastructure
 
SOC Architecture Workshop - Part 1
SOC Architecture Workshop - Part 1SOC Architecture Workshop - Part 1
SOC Architecture Workshop - Part 1
 
SOC: Use cases and are we asking the right questions?
SOC: Use cases and are we asking the right questions?SOC: Use cases and are we asking the right questions?
SOC: Use cases and are we asking the right questions?
 
Cyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptxCyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptx
 
CYBER SECURITY
CYBER SECURITYCYBER SECURITY
CYBER SECURITY
 
Threat Hunting - Moving from the ad hoc to the formal
Threat Hunting - Moving from the ad hoc to the formalThreat Hunting - Moving from the ad hoc to the formal
Threat Hunting - Moving from the ad hoc to the formal
 
Cyber security from military point of view
Cyber security from military point of viewCyber security from military point of view
Cyber security from military point of view
 
Integrated Security Operations Center (ISOC) for Cybersecurity Collaboration
Integrated Security Operations Center (ISOC) for Cybersecurity CollaborationIntegrated Security Operations Center (ISOC) for Cybersecurity Collaboration
Integrated Security Operations Center (ISOC) for Cybersecurity Collaboration
 
NIST CyberSecurity Framework: An Overview
NIST CyberSecurity Framework: An OverviewNIST CyberSecurity Framework: An Overview
NIST CyberSecurity Framework: An Overview
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Cybersecurity trends - What to expect in 2023
Cybersecurity trends - What to expect in 2023Cybersecurity trends - What to expect in 2023
Cybersecurity trends - What to expect in 2023
 
AI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and SolutionsAI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and Solutions
 
Cybersecurity Risk Management Program and Your Organization
Cybersecurity Risk Management Program and Your OrganizationCybersecurity Risk Management Program and Your Organization
Cybersecurity Risk Management Program and Your Organization
 
Understanding cyber resilience
Understanding cyber resilienceUnderstanding cyber resilience
Understanding cyber resilience
 
Security on Cloud Computing
Security on Cloud Computing Security on Cloud Computing
Security on Cloud Computing
 

Viewers also liked

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inTIB Academy
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
java tutorial for beginner - Free Download
java tutorial for beginner - Free Downloadjava tutorial for beginner - Free Download
java tutorial for beginner - Free DownloadTIB Academy
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4Pavel Kravchenko, PhD
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...Edureka!
 
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014Ahmed Mahmoud
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (14)

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
java tutorial for beginner - Free Download
java tutorial for beginner - Free Downloadjava tutorial for beginner - Free Download
java tutorial for beginner - Free Download
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Smart devices
Smart devicesSmart devices
Smart devices
 
Smart devices
Smart devicesSmart devices
Smart devices
 
Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
 
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similar to Introduction to Big Data Hadoop Training Online by www.itjobzone.biz (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
paper
paperpaper
paper
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 

More from ITJobZone.biz

Sailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overviewSailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overviewITJobZone.biz
 
CyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzoneCyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzoneITJobZone.biz
 
Worksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorialWorksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorialITJobZone.biz
 
SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User ITJobZone.biz
 
Introduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online TrainingIntroduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online TrainingITJobZone.biz
 
90/10-principle-how-you-react?
90/10-principle-how-you-react?90/10-principle-how-you-react?
90/10-principle-how-you-react?ITJobZone.biz
 
Team spirit and attitude
Team spirit and attitudeTeam spirit and attitude
Team spirit and attitudeITJobZone.biz
 

More from ITJobZone.biz (7)

Sailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overviewSailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overview
 
CyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzoneCyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzone
 
Worksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorialWorksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorial
 
SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User
 
Introduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online TrainingIntroduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online Training
 
90/10-principle-how-you-react?
90/10-principle-how-you-react?90/10-principle-how-you-react?
90/10-principle-how-you-react?
 
Team spirit and attitude
Team spirit and attitudeTeam spirit and attitude
Team spirit and attitude
 

Recently uploaded

Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningMarc Dusseiller Dusjagr
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of PlayPooky Knightsmith
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use CasesTechSoup
 

Recently uploaded (20)

Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use Cases
 

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

  • 1. Bigdata and Hadoop Tutorial By www.ITJobZone.biz
  • 3. Topics Covered What is Bigdata? Characteristics of Bigdata. Use Cases and Need of Hadoop. Apache Hadoop Ecosystem and its components Brief History of Hadoop. Hadoop Architecture. Who Uses Hadoop
  • 4. What is Bigdata? Lots of Data (Terabytes or PetaBytes) Bigdata is term for collection of datasets so large and complex that it becomes difficult to process using on hand database tools or Traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.
  • 5. What is Bigdata? System /Enterprises generates huge amount of data from Terabytes to Petabytes of information Stock Market generates around one terabyte of new data per day to perform stock trading analysis and determine trends for optimal trades.
  • 7. Bigdata - Types of Data ● Structured Data - Data from enterprise (ERP, CRM) ● Semi structured Data - xml, json, csv, log files ● Unstructured Data - audio, video, image, archive documents
  • 9. Bigdata Scenarios Web and e-tailing Recommendation Engines Ad Targeting Search Quality Abuse and Click Fraud Detection Telecommunication Customer Churn Analysis and Prevention Network Performance Optimization Call Data Record (CDR) Analysis Analysing Network to Predict Failure
  • 10. Bigdata Scenarios (Cont…) Government Fraud Detection and Cyber Security Welfare Schemes Justice Telecommunication Health Information Exchange Gene Sequencing Serialization Healthcare service Quality Improvements Drug Safety
  • 11. Brief history of Hadoop Designed to answer the question “ How to process big data with reasonable cost and time”? Google ( published white papers) GFS 2003 Mapreduce 2005 Yahoo( implementation by Doug Cutting) HDFS 2006-07 Mapreduce 2007-08
  • 12. What is Hadoop Apache top level project, Opensource implementation of frameworks for reliable, scalable, distributed computing and storage. It is a flexible and highly-available architecture for large scale computation and data processing on a network of commodity hardware.
  • 13. Goals / Requirements Abstract & facilitate the storage and processing of large and rapidly growing data. • Structured and unstructured datasets • Simple programming model High scalability and availability Use commodity hardware with little redundancy Fault tolerance Move computation rather data
  • 14. Goals / Requirements Hardware will fail. Processing will be run in batches. Thus there is an emphasis on high throughput as opposed to low latency. Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance. Applications need a write-once-read-many access model.
  • 15. Architecture Distributed, with some centralization Main nodes of cluster are where most of the computational power and storage of the system lies Main nodes run TaskTracker to accept and reply to MapReduce tasks, and also DataNode to store needed blocks closely as possible Central control node runs NameNode to keep track of HDFS directories & files, and JobTracker to dispatch compute tasks to TaskTracker Written in Java, also supports Python and Ruby
  • 18. Architecture Hadoop Distributed Filesystem Tailored to needs of MapReduce Targeted towards many reads of filestreams Writes are more costly High degree of data replication (3x by default) No need for RAID on normal nodes Large blocksize (64MB) Location awareness of DataNodes in network
  • 19. Architecture - NameNode: Stores metadata for the files, like the directory structure of a typical FS. The server holding the NameNode instance is quite crucial, as there is only one. Transaction log for file deletes/adds, etc. Does not use transactions for whole blocks or file-streams, only metadata. Handles creation of more replica blocks when necessary after a DataNode failure
  • 20. Architecture - DataNode: Stores the actual data in HDFS Can run on any underlying filesystem (ext3/4, NTFS, etc) Notifies NameNode of what blocks it has NameNode replicates blocks 2x in local rack, 1x elsewhere
  • 22.
  • 23.
  • 24. Some examples •Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes (2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search •AOL : Used for a variety of things ranging from statistics generation to running advanced algorithms for doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity. •Facebook: To store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw storage;
  • 25. Join Our Hadoop Online Training http://www.itjobzone.biz/Big-Data-Hadoop-training.html Get More Hadoop Tutorials on YouTube Big Data and Hadoop Introduction Training Demo Click - https://youtu.be/SHxf-v8ePk4 Apache Hadoop Standalone Installation Using Virtual box Training Demo Click - https://youtu.be/u-YhaIkqubU