SlideShare a Scribd company logo
1 of 56
Download to read offline
●

●

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●
●
●
●

●

●
●

●
●

●

●
●
●
●

●
●

●

> hadoop fs
hadoop fs
●
●
●
●
●
●
●
●
●

$ hadoop fs

ls

●

$ hadoop fs –help ls
●

$ hadoop fs –ls <path>
$ hadoop fs –ls /
●

$ hadoop fs -ls
$ hadoop fs –ls /user/cloudera
●
●
●

$ hadoop fs -mkdir data
$ hadoop fs -ls
●

$ cd ~/bigdata/Exercises/hadoop/data
$ ls -l
$ hadoop fs –put mammograms.zip data
●
●
●

http://localhost:50070
fsck: an HDFS utility

$ hadoop fsck /user/cloudera/data/mammograms.zip 
-blocks -locations -files
●

$ head -n 100 ato_centenary.txt 
| hadoop fs –put - data/ato100.txt
●

$ head -n 1000 ato_centenary.txt 
| hadoop fs –put - data/ato100.txt
●

put: ‘data/ato100.txt': File exists
●

$ hadoop fs -rm data/ato100.txt
$ head -n 1000 ato_centenary.txt 
| hadoop fs –put - data/ato100.txt
●

$ hadoop fs -cat data/ato100.txt | less
●

$ hadoop fs -get data/ato100.txt ato100.txt
●

-mv, -cp, -rmdir, -stat ...
●
●
●
●
●
●
●
●
●
●
●

●
●
○
■
●
○
●
○
●
○
●
○
○
○
●
●

●
●
●
●
●
●
●

●

$ javac –classpath

`hadoop classpath` *.java

●

$ jar cvf csiro.jar *.class
●

$ hadoop jar csiro.jar Csiro input_dir output_dir
●

○
●
●

map(in_key, in_value) ->
(inter_key, inter_value) list
●

○
■
■
■

●
●

let map(key, value) =
emit(key.toUpper(), value.
toUpper())
(‘csiro’, ‘cci’) -> (‘CSIRO’, ‘CCI’)
(‘csiro’, ‘cesre’) -> (‘CSIRO’, ‘CESRE’)
(‘csiro’, ‘cmse’) -> (‘CSIRO’, ‘CMSE’)
(‘toyota’, ‘yaris’) -> (‘TOYOTA’,
‘YARIS’)
●

let map(key, value) =
foreach char c in value:
emit(key, c)
(‘cci’, ‘csiro’) -> (‘cci’, ‘c’), (‘cci’, ’s’),
(‘cci’, ‘i’), (‘cci’, ‘r’),
(‘cci’, ‘o’)
(‘open’, ‘nasa’) -> (‘open’, ‘n’), (‘open’, ’a’),
(‘open’, ‘s’), (‘open’, ‘a’)
●
let map(key, value) =
emit(value.length(), value)
(‘csiro’, ‘cci’) -> (‘3’, ‘cci’)
(‘csiro’, ‘cesre’) -> (‘5’, ‘cesre’)
(‘csiro’, ‘cmse’) -> (‘4’, ‘cmse’)
(‘toyota’, ‘yaris’) -> (‘5’, ‘yaris’)
●
●
○
○
○
●
○
●
map(String input_key, String input_value)
foreach word w in input_value:
emit(w, 1)
reduce(String output_key,
Iterator<int> intermediate_values)
set count = 0
foreach v in intermediate_values:
count += v
emit(output_key, count)
●

Wordcount
$ cd ~/bigdata/Exercises/hadoop/wordcount; ls
WordCount.java
WordMapper.java
SumReducer.java

●

$ javac –classpath

`hadoop classpath` *.java

●

$ jar cvf wc.jar *.class
●

$ hadoop jar wc.jar WordCount data/ato100.txt ato_wc
●

$ hadoop fs ls ato_wc
$ hadoop fs -cat ato_wc/part-r-00000 | less
$ hadoop fs -cat ato_wc/* | grep ‘ATO|CSIRO’
●

$ hadoop fs -rm -r ato_wc
●

Average max temperature
●
●

$ cd ~/bigdata/Exercises/hadoop/data
$ less nsw_temp.csv
$ less bom_data_Note.txt
●

map(String input_key, String input_value):
emit(input_value[3], input_value[5])

(‘IDCJAC0010,061087,1965,01,02,32.2,1,Y’)->(‘01’, 32.2)
(‘IDCJAC0010,066062,1890,04,27,20.2,1,Y’)->(‘04’, 20.2)
(‘IDCJAC0010,066062,2012,02,03,21.0,1,Y’)->(‘02’, 21.1)
●

reduce(String month, Iterator<double> values)
set count = 0
set sum = 0
foreach v in values:
sum += v
count++
set mean = sum/count
emit(month, mean)
●
$ cd ../averagetemp
$ gedit *.java&
AverageTemp.java
AverageTempMapper.java
AverageReducer.java

●

$ cd ../wordcount
$ gedit *.java&
●
●
$ hadoop fs -put ../data/nsw_temp.csv data
$ javac –classpath `hadoop classpath` *.java
$ jar cvf avt.jar *.class
$ hadoop jar avt.jar AverageTemp data/nsw_temp.csv avt
●
$ hadoop fs -cat avt/part-1-00000

~/bigdata/Exercises/hadoop/averagetemp/sample_solution
●
○

○
●
●
●
○
●
●
●

●
●
●
●

●
●

●
○
○
●

●
○
●
●

●
●
●
●
●
●
●
●
○
○
○
●
○
●
○
○
○
●
●

●

○
○
○
○
○
○
https://github.com/tomaszbednarz/pig-abc-toilets

●
●
●

We have list of local ABC Radio
stations in Australia
We have list of all Public Toilets
across Australia
We want to find a closest toilet to
a Radio Station

Demonstration of:
●
●
●

Data Schemas
Use of external libraries
Google Maps API
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

More Related Content

What's hot

高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
Ryousei Takano
 
Plone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope RpxPlone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope Rpx
Paris, France
 
Как показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь дискуКак показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь диску
CEE-SEC(R)
 
第4章 存储器管理实验
第4章  存储器管理实验第4章  存储器管理实验
第4章 存储器管理实验
guest332a57
 
20090622 Vimm4
20090622 Vimm420090622 Vimm4
20090622 Vimm4
id774
 
bioinfolec7th20071005
bioinfolec7th20071005bioinfolec7th20071005
bioinfolec7th20071005
guest0fd313
 

What's hot (19)

高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
 
Plone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope RpxPlone Conference 2008 Lightning Talk Static Zope Rpx
Plone Conference 2008 Lightning Talk Static Zope Rpx
 
CGI.pm - 3ло?!
CGI.pm - 3ло?!CGI.pm - 3ло?!
CGI.pm - 3ло?!
 
goto dengan C++
goto dengan C++goto dengan C++
goto dengan C++
 
Database api
Database apiDatabase api
Database api
 
Как показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь дискуКак показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь диску
 
Как показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь дискуКак показать 90 млн картинок и сохранить жизнь диску
Как показать 90 млн картинок и сохранить жизнь диску
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
 
第4章 存储器管理实验
第4章  存储器管理实验第4章  存储器管理实验
第4章 存储器管理实验
 
ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
 
20090622 Vimm4
20090622 Vimm420090622 Vimm4
20090622 Vimm4
 
A Shiny Example-- R
A Shiny Example-- RA Shiny Example-- R
A Shiny Example-- R
 
mdpress(MarkDown Press)を使ったプレゼンテーション作成
mdpress(MarkDown Press)を使ったプレゼンテーション作成mdpress(MarkDown Press)を使ったプレゼンテーション作成
mdpress(MarkDown Press)を使ったプレゼンテーション作成
 
mongodb-introduction
mongodb-introductionmongodb-introduction
mongodb-introduction
 
Program to sort array using insertion sort
Program to sort array using insertion sortProgram to sort array using insertion sort
Program to sort array using insertion sort
 
bioinfolec7th20071005
bioinfolec7th20071005bioinfolec7th20071005
bioinfolec7th20071005
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 

Similar to Hadoop, HDFS, MapReduce and Pig

pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Command Prompt., Inc
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
Roland Bouman
 

Similar to Hadoop, HDFS, MapReduce and Pig (20)

C&cpu
C&cpuC&cpu
C&cpu
 
Python 1
Python 1Python 1
Python 1
 
Malcon2017
Malcon2017Malcon2017
Malcon2017
 
C c++-meetup-1nov2017-autofdo
C c++-meetup-1nov2017-autofdoC c++-meetup-1nov2017-autofdo
C c++-meetup-1nov2017-autofdo
 
HDFS metadata (fsimage and edits) difference CDH3 and CDH4
HDFS metadata (fsimage and edits) difference CDH3 and CDH4HDFS metadata (fsimage and edits) difference CDH3 and CDH4
HDFS metadata (fsimage and edits) difference CDH3 and CDH4
 
Internationalizing CakePHP Applications
Internationalizing CakePHP ApplicationsInternationalizing CakePHP Applications
Internationalizing CakePHP Applications
 
Bash Scripting Workshop
Bash Scripting WorkshopBash Scripting Workshop
Bash Scripting Workshop
 
Coding with Vim
Coding with VimCoding with Vim
Coding with Vim
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
 
Statsd eskimi
Statsd eskimiStatsd eskimi
Statsd eskimi
 
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de Vylder
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de VylderOSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de Vylder
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de Vylder
 
OSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De Vylder
OSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De VylderOSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De Vylder
OSMC 2015 | The Road to Lazy Monitoring with Icinga 2 & Puppet by Tom De Vylder
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Backups
BackupsBackups
Backups
 
dplyr
dplyrdplyr
dplyr
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
 

More from Tomasz Bednarz

More from Tomasz Bednarz (16)

eResearch AU 2015, intro slides
eResearch AU 2015, intro slideseResearch AU 2015, intro slides
eResearch AU 2015, intro slides
 
Four Hats of Math: CFD
Four Hats of Math: CFDFour Hats of Math: CFD
Four Hats of Math: CFD
 
NVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 PresentationNVIDIA GTC 2018 Presentation
NVIDIA GTC 2018 Presentation
 
Multi-Modal High-End Visualization System
Multi-Modal High-End Visualization SystemMulti-Modal High-End Visualization System
Multi-Modal High-End Visualization System
 
Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)Expanded Perception and Interaction Centre (EPICentre)
Expanded Perception and Interaction Centre (EPICentre)
 
Seminar 2019 at CSE
Seminar 2019 at CSESeminar 2019 at CSE
Seminar 2019 at CSE
 
High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS)
 
EPICentre UNSW
EPICentre UNSWEPICentre UNSW
EPICentre UNSW
 
SIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening CeremonySIGGRAPH Asia 2019 Opening Ceremony
SIGGRAPH Asia 2019 Opening Ceremony
 
SoS
SoSSoS
SoS
 
STEM Camp Virtual Reality
STEM Camp Virtual RealitySTEM Camp Virtual Reality
STEM Camp Virtual Reality
 
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015Demoscene Stories, and Old-School Code Tricks presented at FMX2015
Demoscene Stories, and Old-School Code Tricks presented at FMX2015
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Design + Art + Science, and Demoscene
Design + Art + Science, and DemosceneDesign + Art + Science, and Demoscene
Design + Art + Science, and Demoscene
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
Big Data in Finance, 2012
Big Data in Finance, 2012Big Data in Finance, 2012
Big Data in Finance, 2012
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 

Hadoop, HDFS, MapReduce and Pig