SlideShare a Scribd company logo
Workshop on data analytics
using big data tools ‘ 2016 –
bharathiar uniVErsity
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
introduction to
prEsEntEd by
K.SANTHIYA
ph.d rEsEarch scholar
dEpartmEnt of computEr
applications
bharathiar uniVErsity
undEr thE guidancE of
dr.V.bhuVanEsWari
assistant profEssor
dEpartmEnt of computEr
applications
bharathiar uniVErsityK.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
agEnda
• WORLD OF DATA
 Few Instances
• CONVENTIONAL APPROACHES
 Limitations
• HADOOP FRAMEWORK
 Terminology Review
• HADOOP COMPONENTS
 HDFS & MAPREDUCE
• HDFS – IN DETAIL
• HADOOP ECOSYSTEM
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
data EXplosion
2.5 quintillion bytes of data is
created each day…..
1
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
World WidE data
Since the
beginning of
Time
Last two years
2
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
2.9 375 20 24 50 700 1.3 72
Million MB Hrs PB Million Billion Exabytes items
thE World of data
3
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
minimum sizE that a big data
filE starts With is at lEast
1 tErabytE
4
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
5
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
&
6
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
conVEntional
approachEs
RDBMS
OS FILE SYSTEM
SQL QUERIES
CUSTOM FRAMEWORK
* C / C++
* PERL
* PYTHON
35
7
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
issuEs in lEgacy
systEms
LIMITED STORAgE CAPACITY
LIMITED PROCESSINg CAPACITY
NO SCALABILITY
SINgLE POINT OF FAILURE
SEQUENTIAL PROCESSINg
RDBMSS CAN HANDLE STRUCTURED DATA
REQUIRES PREPROCESSINg OF DATA
INFORMATION IS COLLECTED ACCORDINg
TO CURRENT BUSINESS NEEDS
8
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
How do we
mine (and mind)
all this data?
HOW TO RESOLVE ALL THESE
ISSUES?
9
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Mr. HADOOP sAys He HAs
A sOlutiOn tO Our BiG
PrOBleM !
1
0K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
1
1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
43
1
2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
COMPAnies usinG
1
3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
WHAt is
APACHe HADOOP is A frAMeWOrk tHAt
AllOWs
fOr tHe DistriButeD PrOCessinG Of lArGe
DAtAsets ACrOss Clusters Of COMMODity
COMPuters usinG A siMPle PrOGrAMMinG
MODel.
Concept
Moving computation is more efficient than moving
large data
1
4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
STORAGE
COMPUTATION
COMPLEXITY
1
5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
tWO DAeMOns Of
HADOOP
44
1
6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
ARCHITECTURE
1
7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
terMiniOlOGy reVieW
Node 1
Node 2
Node N
:
:
Rack 1
Node 1
Node 2
Node N
:
:
Rack 2
:
:
clusteR
1
8
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HADOOP Cluster
ArCHiteCture
1
9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
2
0K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HADOOP COre serViCes
i. nAMe nODe
ii.DAtA nODe
iii.resOurCe MAnAGer
iV.APPliCAtiOn MAster
V.nODe MAnAGer
Vi.seCOnDAry nAMe nODe
2
1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HDFS – REAL LIFE CONNECT
• A college library was gifted a massive collection of books by a patron. The
books were very popular titles. The librarian decided to arrange the books in
a small rack, and distribute multiple copies of each book in other racks, so
that students can find the books easily. Similarly, HDFS creates multiple
copies of a data block, and keeps them in separate systems for easy access.
2
2
K.Santhiya , Ph.d Research
Scholar , Dr.V.Bhuvaneswari,
WHAT IS HDFS
• Hadoop distributed File system
Highly Fault tolerant , distributed , reliable ,
scalable file system for data storage.
Stores multiple copies of data on different
nodes
A File is split up into blocks and stored on
multiple machines
Hadoop cluster typically has a single
namenode and no. of data nodes to form a
hadoop cluster.
2
3
K.Santhiya , Ph.d Research
Scholar , Dr.V.Bhuvaneswari,
HDFS BLOCKS
• Files are broken in to large blocks.
 Typically 128 MB block size
 Blocks are replicated for reliability
 One replica on local node, Another replica on a remote rack,
 Third replica on local rack, Additional replicas are randomly placed
2
4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HDFS BLOCKS CONTD.,
ADVANTAGES OF HDFS BLOCKS
Fixed Size
Chunk of file < block size : Only needed space is
used.
Eg : 420 MB file is split as
2
5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HDFS OpERATION pRINCIpLE
2
6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
NAME NODE
2
7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
DATA NODE
2
8K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
SECONDARY NAME NODE
2
9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HDFS ARCHITECTURE
3
0
K.Santhiya , Ph.d Research
Scholar , Dr.V.Bhuvaneswari,
HDFS – BLOCK REpLICATION
ARCHITECTURE
3
1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
NAMENODE IN HA MODE
3
2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
NAME NODE HA ARCHITECTURE
3
3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
BUSINESS SCENARIO
olivia tyler is the evp of it operations
with
nutri worldwide, inc.,and she has
decided to use hdfs for storing big data.
she will use hdfs shell to store the data
in a hadoop file system, and she will
execute various commands on it.
3
4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
3
5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
hadoop shell commands
hadoop fs -mkdir /learning
hadoop fs –copyFromLocal test.txt /learning
hadoop fs -ls /learning
hadoop fs -cat/learning/test.txt
3
6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
hadoop ecosystem
components
3
7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
data transfer components
3
8K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
data store components
• following are the data store components of
the hadoop ecosystem.
DISTRIBUTED
SCALABLE
BIG DATA STORE
SCALABLE
CONSISTENT
DISTRIBUTED
STRUCTURED KEY
VALUE STORE
SORTED
DISTRIBUTED KEY
VALUE DATA
STORAGE AND
RETRIEVAL SYSTEM
HBASE CASSANDRA ACCUMULO
3
9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
serialization components
• The serialization components are Avro,
Trevni, and Thrift.
• Avro is a data serialization system.
• Trevni is a column file format used to
permit compatible, independent
implementations that read and /or write
files in this format.
• Thrift is a framework for scalable, cross-
language services development. 4
0
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
Job execution components
• Following are the job execution components :
4
1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
worK management
components
4
2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
conclusion
56
4
3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
references
• J. Gantz and D. Reinsel, ``The digital universe in 2020: Big data, bigger digital shadows,
and biggest growth in the far east,'' in Proc. IDC iView,IDC Anal. Future, 2012.
• (2015) Available : [online] http://expandedramblings.com/index.php/by-the-numbers-a-
gigantic-list-of-google-stats-and-facts/
• D. Evans and R. Hutley, ``The explosion of data,'' white paper, 2010.
• Seema Acharya, Subhashini Chelleppan " Big Data and Analytics "Wiley India Pvt Ltd ,
2015
• Dhruba Borthakur , " HDFS Architecture Guide " , 2013.
• Available:[Online]http:// hortonworks.com/hadoop/flume/#section_2
• Marko Grobelnik , " Big-Data tutorial" , white paper,2012.
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016
4
4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

More Related Content

What's hot

Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop
HadoopHadoop
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
Nicola Ferraro
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
Dan Gunter
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
PoojaShah174393
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
Mohammed Guller
 
Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage Overview
Cloudian
 

What's hot (20)

Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop
HadoopHadoop
Hadoop
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage Overview
 

Viewers also liked

Python for Data Anaysis第2回勉強会4,5章
Python for Data Anaysis第2回勉強会4,5章Python for Data Anaysis第2回勉強会4,5章
Python for Data Anaysis第2回勉強会4,5章
Makoto Kawano
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
Great Wide Open
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
Karthik Padmanabhan ( MLE℠)
 
Data Analytics using R
Data Analytics using RData Analytics using R
Data Analytics using R
richards9696
 
Intoroduction of Pandas with Python
Intoroduction of Pandas with PythonIntoroduction of Pandas with Python
Intoroduction of Pandas with PythonAtsushi Hayakawa
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
Ajay Ohri
 
RHadoop
RHadoopRHadoop
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
Boston Consulting Group
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
Ajay Ohri
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Marko Grobelnik
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
Ajay Ohri
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
Ajay Ohri
 
PythonとRによるデータ分析環境の構築と機械学習によるデータ認識
PythonとRによるデータ分析環境の構築と機械学習によるデータ認識PythonとRによるデータ分析環境の構築と機械学習によるデータ認識
PythonとRによるデータ分析環境の構築と機械学習によるデータ認識
Katsuhiro Morishita
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
Ajay Ohri
 

Viewers also liked (14)

Python for Data Anaysis第2回勉強会4,5章
Python for Data Anaysis第2回勉強会4,5章Python for Data Anaysis第2回勉強会4,5章
Python for Data Anaysis第2回勉強会4,5章
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Data Analytics using R
Data Analytics using RData Analytics using R
Data Analytics using R
 
Intoroduction of Pandas with Python
Intoroduction of Pandas with PythonIntoroduction of Pandas with Python
Intoroduction of Pandas with Python
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
RHadoop
RHadoopRHadoop
RHadoop
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
PythonとRによるデータ分析環境の構築と機械学習によるデータ認識
PythonとRによるデータ分析環境の構築と機械学習によるデータ認識PythonとRによるデータ分析環境の構築と機械学習によるデータ認識
PythonとRによるデータ分析環境の構築と機械学習によるデータ認識
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 

Similar to Introduction to hadoop

Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
karthika karthi
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
Dr.Bhuvaneswari Velumani
 
Big data analytics -hive
Big data analytics -hiveBig data analytics -hive
Big data analytics -hive
karthika karthi
 
Data analytics
Data analyticsData analytics
Data analytics
Dr.Bhuvaneswari Velumani
 
Big data road map
Big data road mapBig data road map
Big data road map
karthika karthi
 
Martin_Phelan_Resume
Martin_Phelan_ResumeMartin_Phelan_Resume
Martin_Phelan_ResumeMartin Phelan
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start Recommendations
Matthias Braunhofer
 
Resume Gaurav Gandhi
Resume Gaurav  GandhiResume Gaurav  Gandhi
Resume Gaurav Gandhigauravcgandhi
 
UKON 2014
UKON 2014UKON 2014
Covering Letter9-4-16
Covering Letter9-4-16Covering Letter9-4-16
Covering Letter9-4-16JICSE Journal
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
karthika karthi
 
1
11
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
University of Malaya
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
Levi Waldron
 
VTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesVTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pages
athiathi3
 

Similar to Introduction to hadoop (20)

Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data analytics -hive
Big data analytics -hiveBig data analytics -hive
Big data analytics -hive
 
Data analytics
Data analyticsData analytics
Data analytics
 
Big data road map
Big data road mapBig data road map
Big data road map
 
Martin_Phelan_Resume
Martin_Phelan_ResumeMartin_Phelan_Resume
Martin_Phelan_Resume
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start Recommendations
 
Resume Gaurav Gandhi
Resume Gaurav  GandhiResume Gaurav  Gandhi
Resume Gaurav Gandhi
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
Covering Letter9-4-16
Covering Letter9-4-16Covering Letter9-4-16
Covering Letter9-4-16
 
updated resume
updated resumeupdated resume
updated resume
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
1
11
1
 
Gopi_Master_Thesis
Gopi_Master_ThesisGopi_Master_Thesis
Gopi_Master_Thesis
 
Updated CV of SPSingh2017
Updated CV of SPSingh2017Updated CV of SPSingh2017
Updated CV of SPSingh2017
 
Cse 2008 7
Cse 2008 7Cse 2008 7
Cse 2008 7
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
 
VTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesVTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pages
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

Introduction to hadoop

  • 1. Workshop on data analytics using big data tools ‘ 2016 – bharathiar uniVErsity K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 2. introduction to prEsEntEd by K.SANTHIYA ph.d rEsEarch scholar dEpartmEnt of computEr applications bharathiar uniVErsity undEr thE guidancE of dr.V.bhuVanEsWari assistant profEssor dEpartmEnt of computEr applications bharathiar uniVErsityK.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 3. agEnda • WORLD OF DATA  Few Instances • CONVENTIONAL APPROACHES  Limitations • HADOOP FRAMEWORK  Terminology Review • HADOOP COMPONENTS  HDFS & MAPREDUCE • HDFS – IN DETAIL • HADOOP ECOSYSTEM K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 4. data EXplosion 2.5 quintillion bytes of data is created each day….. 1 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 5. World WidE data Since the beginning of Time Last two years 2 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 6. 2.9 375 20 24 50 700 1.3 72 Million MB Hrs PB Million Billion Exabytes items thE World of data 3 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 7. minimum sizE that a big data filE starts With is at lEast 1 tErabytE 4 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 8. 5 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 9. & 6 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 10. conVEntional approachEs RDBMS OS FILE SYSTEM SQL QUERIES CUSTOM FRAMEWORK * C / C++ * PERL * PYTHON 35 7 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 11. issuEs in lEgacy systEms LIMITED STORAgE CAPACITY LIMITED PROCESSINg CAPACITY NO SCALABILITY SINgLE POINT OF FAILURE SEQUENTIAL PROCESSINg RDBMSS CAN HANDLE STRUCTURED DATA REQUIRES PREPROCESSINg OF DATA INFORMATION IS COLLECTED ACCORDINg TO CURRENT BUSINESS NEEDS 8 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 12. How do we mine (and mind) all this data? HOW TO RESOLVE ALL THESE ISSUES? 9 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 13. Mr. HADOOP sAys He HAs A sOlutiOn tO Our BiG PrOBleM ! 1 0K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 14. 1 1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 15. 43 1 2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 16. COMPAnies usinG 1 3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 17. WHAt is APACHe HADOOP is A frAMeWOrk tHAt AllOWs fOr tHe DistriButeD PrOCessinG Of lArGe DAtAsets ACrOss Clusters Of COMMODity COMPuters usinG A siMPle PrOGrAMMinG MODel. Concept Moving computation is more efficient than moving large data 1 4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 18. STORAGE COMPUTATION COMPLEXITY 1 5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 19. tWO DAeMOns Of HADOOP 44 1 6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 20. ARCHITECTURE 1 7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 21. terMiniOlOGy reVieW Node 1 Node 2 Node N : : Rack 1 Node 1 Node 2 Node N : : Rack 2 : : clusteR 1 8 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 22. HADOOP Cluster ArCHiteCture 1 9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 23. 2 0K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 24. HADOOP COre serViCes i. nAMe nODe ii.DAtA nODe iii.resOurCe MAnAGer iV.APPliCAtiOn MAster V.nODe MAnAGer Vi.seCOnDAry nAMe nODe 2 1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 25. HDFS – REAL LIFE CONNECT • A college library was gifted a massive collection of books by a patron. The books were very popular titles. The librarian decided to arrange the books in a small rack, and distribute multiple copies of each book in other racks, so that students can find the books easily. Similarly, HDFS creates multiple copies of a data block, and keeps them in separate systems for easy access. 2 2 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari,
  • 26. WHAT IS HDFS • Hadoop distributed File system Highly Fault tolerant , distributed , reliable , scalable file system for data storage. Stores multiple copies of data on different nodes A File is split up into blocks and stored on multiple machines Hadoop cluster typically has a single namenode and no. of data nodes to form a hadoop cluster. 2 3 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari,
  • 27. HDFS BLOCKS • Files are broken in to large blocks.  Typically 128 MB block size  Blocks are replicated for reliability  One replica on local node, Another replica on a remote rack,  Third replica on local rack, Additional replicas are randomly placed 2 4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 28. HDFS BLOCKS CONTD., ADVANTAGES OF HDFS BLOCKS Fixed Size Chunk of file < block size : Only needed space is used. Eg : 420 MB file is split as 2 5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 29. HDFS OpERATION pRINCIpLE 2 6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 30. NAME NODE 2 7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 31. DATA NODE 2 8K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 32. SECONDARY NAME NODE 2 9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 33. HDFS ARCHITECTURE 3 0 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari,
  • 34. HDFS – BLOCK REpLICATION ARCHITECTURE 3 1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 35. NAMENODE IN HA MODE 3 2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 36. NAME NODE HA ARCHITECTURE 3 3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 37. BUSINESS SCENARIO olivia tyler is the evp of it operations with nutri worldwide, inc.,and she has decided to use hdfs for storing big data. she will use hdfs shell to store the data in a hadoop file system, and she will execute various commands on it. 3 4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 38. 3 5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 39. hadoop shell commands hadoop fs -mkdir /learning hadoop fs –copyFromLocal test.txt /learning hadoop fs -ls /learning hadoop fs -cat/learning/test.txt 3 6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 40. hadoop ecosystem components 3 7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 41. data transfer components 3 8K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 42. data store components • following are the data store components of the hadoop ecosystem. DISTRIBUTED SCALABLE BIG DATA STORE SCALABLE CONSISTENT DISTRIBUTED STRUCTURED KEY VALUE STORE SORTED DISTRIBUTED KEY VALUE DATA STORAGE AND RETRIEVAL SYSTEM HBASE CASSANDRA ACCUMULO 3 9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 43. serialization components • The serialization components are Avro, Trevni, and Thrift. • Avro is a data serialization system. • Trevni is a column file format used to permit compatible, independent implementations that read and /or write files in this format. • Thrift is a framework for scalable, cross- language services development. 4 0 K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 44. Job execution components • Following are the job execution components : 4 1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 45. worK management components 4 2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 46. conclusion 56 4 3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 47. references • J. Gantz and D. Reinsel, ``The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east,'' in Proc. IDC iView,IDC Anal. Future, 2012. • (2015) Available : [online] http://expandedramblings.com/index.php/by-the-numbers-a- gigantic-list-of-google-stats-and-facts/ • D. Evans and R. Hutley, ``The explosion of data,'' white paper, 2010. • Seema Acharya, Subhashini Chelleppan " Big Data and Analytics "Wiley India Pvt Ltd , 2015 • Dhruba Borthakur , " HDFS Architecture Guide " , 2013. • Available:[Online]http:// hortonworks.com/hadoop/flume/#section_2 • Marko Grobelnik , " Big-Data tutorial" , white paper,2012. K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
  • 48. 4 4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016