SlideShare a Scribd company logo
1 of 79
Introduction to Hadoop
• Tarjei Romtveit
• Co-founder of Monokkel AS
• Former CTO – Integrasco AS
• My story with Hadoop
www.monokkel.io
• Daglig leder i Monokkel AS
• Tidligere COO i Integrasco AS
• Persistering, Prosessering og Presentasjon av data
Persistering – Prosessering – Presentasjon
Bombshell
If you work with data today and not start to
learn the Hadoop ecosystem: You may be
unemployed soon
Agenda
• Context – Big Data and how to handle it
• What is Hadoop?
• Demo
• Distributions and/or demo
• “Deepdive” into Hadoop - Architecure
– HDFS
– YARN
– MapReduce
• Languages and ecosystem
What we not will cover
• Security
• Integrations with database X or system Y
• Running Hadoop in production
Big Data
Big Data – hype and hipsters
Big Data
Big Data – Let’s add some letters
• Volume
• Variety
• Velocity
• Variability
• Veracity / Data quality
and the step-brother
• Complexity
Big Data – Example
The Nordic Hotel Tycoon
1600 Hotels in 5 countries
I am a digital champion:
The website
I am a digital champion:
The desk
I am a digital champion:
The external provider
I am a digital champion
The IoT case
I am a digital champion
Social
Houston we have a problem
• Sales is declining and my stock price is
tumbling
The CEO
How can the CEO manage his
problem?
• Get control over the data
• Implement analytical
processes to aid sales
The data he need to handle
• Volume – Gigabytes/Terabyte
• Variety – Click stream, Voice, emails, sensor data,
social data, different languages, timestamp data,
transactional data, third party data
• Variability – Various quality
• Velocity – MB per second
The data he need to handle
• Veracity / Data quality – Inconsistent data quality
• Complexity – Many legacy domain models
How to handle ?
Web
Emails
Sensors
Social
Processing
RDBMS
Search
How to understand ?
Web
Emails
Sensors
Social
Processing
RDBMS
Search
So what do Hadoop solve?
Processing
What is Hadoop?
What is Hadoop?
An operating system for data
An OS need software on top
Distributions
'
Distributions
• ”Stable” compilation of the Hadoop Ecosystem
• Operational tools
• Integration tools and frameworks
• Data governance and data management tools
• Security
Distributions
HADOOP
An operating system for data
Layman’s terms
• Store huge files (unstructured) on many
machines
• Query and modify data
• Can run sophisticated analytics on top
How to start:
Alt 1
• https://hadoop.apache.org/
• Getting Started
• Download
• Unzip
• bin/hadoop <commandline arguments>
Alt 2
• http://hortonworks.com/products/hortonworks-sandbox/#install
• Install VMWare Player or VirtualBox
• Download image (6 GB)
• Install and run (give it lots of memory)
DEMO
– Transform and modify data
– Machine learning with Spark
– Integrate with ElasticSearch
NEXT: ARCHITECHTURE AND HOW IT WORKS
DEMO
• Hortonworks Sandbox
• Hortonworks Ambari
• Hortonworks Hue
Hadoop - Architecture
HDFS
YARN
MapReduce
2.X.X
• Hadoop Distributed File System (HDFS)
• YARN (Yet Another Resource Negotiator)
• MapReduce
HDFS
D1
D2
DX
Name
Node
Failover
Name
Node
Client
HDFS
Block index
D1
D2
D3
Data
Nodes
B: 1, D1
B: 2, D2
B: 3, D3
B: 4, D1
B: 5, D2
B: 6, D3
Name node
HDFS
Block index
D1
D2
D3
Data
Nodes
B: 1, D1
B: 2, D2
B: 3, D3
B: 4, D1
B: 5, D2
B: 6, D3
Name node
HDFS Write
Client
/path/to/document1, R:2, B:{1,2}
Name Node
I need to write a
document!
Client
/path/to/document1, R:2, B:{1,2}
Name Node
I need to write
/path/to/document1, R:2, B:{3,4}
/path/to/document1, R:2, B:{5,6}
HDFS Write
Client
Name
Node
You can write to
: D1,D2,D3 D1
D2
D3
Data Nodes
HDFS Write
Client
Name
Node
D1
D2
D3
B:{D2:5,D3:6}
B:{D3:3,D1:4}
B:{D1:1,D2:2}
Split and write
HDFS Write
HDFS Write
Client
Name
Node
D1
D2
D3
Replicate
B:1 to
D2:2
Success
HDFS Read
Client
Name
Node
D1
D2
D3B:{D3:3,D3:6}
B:{D2:2,D2:5}
• HDFS blocks are immutable you can not change them!
• Deletes and updates are written as new blocks
• The node name takes care of overwriting deleted
blocks
• Small files are consuming a lot of name node memory
HDFS Delete/Update
HDFS Scalability
D1
D2
DX
Name
Node
Failover
Name
Node
YARN
HOW DOES HADOOP PROCESS
THE DATA STORED IN HDFS?
YARN
Client
Resource Manager
Scheduler
Applications manager
I want to process file
“docuemt1” with
my-app.jar?
YARN
Resource Manager
Scheduler
Applications manager
You can process on D1!
YARN
D1 D2
Node Manager Node Manager
Resource Manager
Scheduler
Applications manager
Start my-app.jar
Application Master
YARN
D1 D2
Node Manager Node Manager
Resource Manager
Scheduler
Applications manager
Application Master
AM to RM: “document1” is
located on d1 and d2 and I
need X Gb RAM
YARN
D1 D2
Node Manager Node Manager
Application Master Container
Resource Manager
Scheduler
Applications manager
my-app.jar is running here!
Start my-app.jar
YARN + HDFS
D1
D2
D3
Name
Node
Client
Client
Client
• YARN will try to make
sure data is processed
where it is stored
• ….. data locality
YARN + HDFS
• Blocks are immutable. This enables high write speeds
• Data is schema free! You can store any data you want
• Data locality is what differentiates HDFS from other data
storage
• You can read massive amounts of data only limited by
disk read speeds
MapReduce and others
OK… BUT HOW DO I
PROCESS ?
YARN
Tez MapReduce <Name here>
Libraries: Mahout, MLib, GraphX, Oryx
Languages: Hive, Pig, R, Spark SQL, Stinger
YARN
Tez <Name here>
Languages: Hive, Pig, R, Spark SQL, Stinger
Libraries: Mahout, Crunch, Mlib, GraphX, Oryx
MapReduce
MapReduce
Document
Deer Bear River
Car Car River
Deer Car BearDocument
stored in HDFS
Splitting
Deer Bear River
Deer Car Bear
Deer Bear River
Car Car River
Car Car River
Deer Car Bear
Mapping
Deer Bear River
Car Car River
Deer Car Bear
Deer 1
Bear 1
River 1
Car 1
Car 1
River 1
Deer 1
Car 1
Bear 1
Shuffling
Deer 1
Bear 1
River 1
Deer 1
Car 1
Bear 1
Car 1
Car 1
River 1
Deer 1
Deer 1
Deer 1
Bear 1
Bear 1
Car 1
Car 1
River 1
River 1
Reduce
Deer 1
Deer 1
Deer 1
Bear 1
Bear 1
Car 1
Car 1
River 1
River 1
Deer 3
Bear 2
Car 2
River 2
Deer 3
Bear 2
Car 2
River 2
HDFS
API: Mapper
interface
API: Reduce
interface
API: Main
How to run
$ bin/hadoop jar wc.jar WordCount /hdfs/dir/in /hdfs/dir/out
MapReduce
• Mappers and reducers are distributed in YARN
containers
• Chaining of MapReduce jobs make them slow
• Easy to scale but difficult to code
• … use the data DSL languages instead
Languages
YARN
Tez MapReduce <Name here>
Languages: Hive, Pig, R, Spark SQL, Stinger
Libraries: Mahout, Crunch, MLib, GraphX, Oryx
”Languages”
PIG
• Procedural language
• Execute on YARN
• Great for
• Structuring
• Moving
• Transforming
Hive/Drill/Spark
SQL
• Declarative / SQL-like languages
• Great for
• Column data / Database dumps
• Aggregations
• Connect BI tools and Dashboards
• Data Warehouse for Hadoop++
Spark
• Core language (runs in YARN or standalone)
• Great for
• Anything that MapReduce can do
• Analytics, Machine Learning
• In memory and languages in Java, Scala and
Python
Summary
• Hadoop is designed to handle/process massive amounts of data
through HDFS and/or YARN
• The data do not need to be structured before it is stored in HDFS
• Hadoop is an ecosystem and have languages/frameworks for data
extraction, data management, data analysis and data integration
• It is most convenient to begin with Hadoop by testing distributions.
E.g. Hortonworks, Cloudera, MapR etc.
• Learn MapReduce and learn to understand languages and a few
integration tools
Is it a fad?

More Related Content

What's hot

Hadoop for sys_admin
Hadoop for sys_adminHadoop for sys_admin
Hadoop for sys_adminJustin Miller
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataCyanny LIANG
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 

What's hot (20)

Hadoop for sys_admin
Hadoop for sys_adminHadoop for sys_admin
Hadoop for sys_admin
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop
Hadoop Hadoop
Hadoop
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 

Viewers also liked

Handbook Cover 3 Example Designs
Handbook Cover 3 Example Designs Handbook Cover 3 Example Designs
Handbook Cover 3 Example Designs Yasha Kehn
 
nhận làm phim quảng cáo bảo đảm
nhận làm phim quảng cáo bảo đảmnhận làm phim quảng cáo bảo đảm
nhận làm phim quảng cáo bảo đảmjina443
 
kuolin_GISpractical_redo
kuolin_GISpractical_redokuolin_GISpractical_redo
kuolin_GISpractical_redoKuolin Lo
 
Sleep country scotia back to school conference 2016
Sleep country scotia back to school conference 2016Sleep country scotia back to school conference 2016
Sleep country scotia back to school conference 2016SleepCountry
 
Jennifer Martin Resume 2012
Jennifer Martin Resume 2012Jennifer Martin Resume 2012
Jennifer Martin Resume 2012jfrtx
 

Viewers also liked (10)

Handbook Cover 3 Example Designs
Handbook Cover 3 Example Designs Handbook Cover 3 Example Designs
Handbook Cover 3 Example Designs
 
nhận làm phim quảng cáo bảo đảm
nhận làm phim quảng cáo bảo đảmnhận làm phim quảng cáo bảo đảm
nhận làm phim quảng cáo bảo đảm
 
kuolin_GISpractical_redo
kuolin_GISpractical_redokuolin_GISpractical_redo
kuolin_GISpractical_redo
 
JA3 - kurssin aloitus
JA3 - kurssin aloitusJA3 - kurssin aloitus
JA3 - kurssin aloitus
 
Apus
ApusApus
Apus
 
Tema 3 unidades 1 y 2
Tema 3 unidades 1 y 2Tema 3 unidades 1 y 2
Tema 3 unidades 1 y 2
 
GPI 5: ROPA DE TRABAJO
GPI 5: ROPA DE TRABAJOGPI 5: ROPA DE TRABAJO
GPI 5: ROPA DE TRABAJO
 
Sleep country scotia back to school conference 2016
Sleep country scotia back to school conference 2016Sleep country scotia back to school conference 2016
Sleep country scotia back to school conference 2016
 
Ellen Cruz
Ellen CruzEllen Cruz
Ellen Cruz
 
Jennifer Martin Resume 2012
Jennifer Martin Resume 2012Jennifer Martin Resume 2012
Jennifer Martin Resume 2012
 

Similar to Introduction to hadoop V2

Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data Mindgrub Technologies
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemLior Sidi
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similar to Introduction to hadoop V2 (20)

Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data
 
Bw tech hadoop
Bw tech hadoopBw tech hadoop
Bw tech hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Introduction to hadoop V2

Editor's Notes

  1. - Startet med å bygge distribuerte systemer for store mengder data Hoppet på Hadoop da det skulle løse alle problemer ca 2009/2010 Hoppet av igjen Hopper på igjen nå
  2. Hvor mange jobber med data Hvor mange har jobbet med Hadoop Hvor mange har jobbet med ElasticSearch Hvor mange er konsulenter Hvor mange konsuleter/ansatt I industrien/olje/manufacturing Hvor mange konsuleter/ansatt I merkantile/handel/service/IT Hvor mange konsuleter/ansatt I statlig
  3. Noen med erfaring med Hadoop?
  4. Detter er det jeg forbinder med Big Data akkurat nå Veldig mye buzz… men la oss se hva det er i kjærnen og hvor Hadoop kommer inn i dette bilde
  5. Doug Laney the inventor of big data back in 2001
  6. MASSE KJEDELIGE ORD… LA OSS PRØVE Å SE PÅ ET EKSEMPEL Volume – Variety – Many datasets Velocity – The speed of generation of data Variability – Data can be inconsise and come in various form Veracity – Quality of data Complexity
  7. Doug Laney the inventor of big data back in 2001
  8. Doug Laney the inventor of big data back in 2001
  9. Clickstream data Ratings
  10. Clickstream data Ratings
  11. External agreements on ratings and traffic
  12. -Stuepiken is registering all activities -IoT
  13. -Stuepiken is registering all activities -IoT
  14. Doug Laney the inventor of big data back in 2001
  15. An OpenSource operationg system for data
  16. 2002: Open source crawler Nutch by Dough Cutting and Mike Cafarella: The internet crawler. Web was maximumily 1 billion pages large. Limited scalability capabilities. 2003: Google releases their GFS paper for massively distributed filesystem.. Cutting and Cafarella incorporates the filesystem into Nutch 2004: Google releases their Map Reduce paper for massively parallell computing. This is incorporated into Nutch as well 2006: Yahoo hires Dough Cutting and the filesystem and Map Reduce component is extracted into the Hadoop project from the Nutch project.
  17. 2002: Open source crawler Nutch by Dough Cutting and Mike Cafarella: The internet crawler. Web was maximumily 1 billion pages large. Limited scalability capabilities. 2003: Google releases their GFS paper for massively distributed filesystem.. Cutting and Cafarella incorporates the filesystem into Nutch 2004: Google releases their Map Reduce paper for massively parallell computing. This is incorporated into Nutch as well 2006: Yahoo hires Dough Cutting and the filesystem and Map Reduce component is extracted into the Hadoop project from the Nutch project.
  18. 2008: Hadoop was storing all data. Even financial data was trusted to Hadoop 2008: Cloudera was the first commercial company that supported Hadoop 2011: 42 000 nodes storing petabytes of data 2011: Hortonworks was spun out of Yahoo as hadoop company. This company only focuses on the open source software from with its origin @ yahoo
  19. 2011 – First feature complete 1.0 version of Hadoop. MapReduce and HDFS is tighly integrated in 1.0 and pre versions 2013 – First large refactor of the operating system. Map Reduce is detached and Hadoop is more generalized to handle different processing paradigms
  20. Data Nodes contains disks only
  21. Data Nodes contains disks only
  22. Scheduler is allocating based on information available from the node ApplicationsManager track the state of all applications (managers) in the cluster
  23. Node Managers constantly updates the ResourceManager with the current resource situatuon Node Managers start the ApplicationMaster and Container Application Masters are negotiating resources and allocates more containers if allowed
  24. Node Managers constantly updates the ResourceManager with the current resource situatuon Node Managers start the ApplicationMaster and Container Application Masters are negotiating resources and allocates more containers if allowed
  25. Application Masters are negotiating resources and allocates more containers if allowed. CPU cores, and Memory is requested, and that my file is located on D2 The application started by the Node Manager does not need to be Java.
  26. De store selskapene: Spotify, Google, Netflix, === disruptorene