SlideShare a Scribd company logo
1 of 9
Download to read offline
D.P.H.E
( DATA PROCESSING IN
HADOOP ECOSYSTEM)
Prepared by:
Ritwik Jain
Prepared on:
Wednesday,16 July 2015
TABLE OF CONTENTS :
 INTRODUCTION.
 OVERVIEW.
 TECHNOLOGIES.
 PROJECT OVERVIEW.
 DEVELOPMENT ENVIRONMENT.
 DEVELOPMENT AND TESTING
 Screenshots of IDE’s and INPUT files
 MongoDB Database
 Cloudera
 Screenshots Of Output files
1. INTRODUCTION
1.1 Overview:
Data Processing project aims at retrieving clusters of data from multiple
files , upload into Hadoop Ecosystem (HDFS) , sorting it according to the
client’s need and provide desired output.
1.2 Technologies:
The following technologies are used in order to build Data processing
system:
 HDFS.
 JAVA.
 MAP/REDUCE.
 H-BASE
 JAVA
 MONGODB
2. PROJECT OVERVIEW
 Data Processing project focuses on how one can play with
millions and tons of data.
 It aims at loading the Big data from the files of any
networking sites into the HDFS (Hadoop File System) .
 Then sorting and shuffling of data according to the
necessity of the client with the help of Map/Reduce
Algorithms.
 Putting back into NoSQL Database MongoDB and could be
easily read by the client.
3. DEVELOPMENT ENVIRONMENT:
 LINUX(CentOS 6.4)
 ECLIPSE JUNO
 CLOUDERA VM (include Daemon tools of hadoop)
 MONGOVUE 1.6.9
4. DEVELOPMENT & TESTING:
 Screenshots of IDE’s and INPUT files:
•MongoDB Database:
 CloudERA:
 Screenshots Of Output files:

More Related Content

What's hot

Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...The HDF-EOS Tools and Information Center
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache SparkElvis Saravia
 
Globus toolkit in grid
Globus toolkit in gridGlobus toolkit in grid
Globus toolkit in gridDeevena Dayaal
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopMatthew Hayes
 
Slides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSlides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSabrina Kirstein
 
Data Center Operating System
Data Center Operating SystemData Center Operating System
Data Center Operating SystemKeshav Yadav
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
 
Pivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-workingPivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-workingtts2086
 
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemChallenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemAlluxio, Inc.
 
Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...
Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...
Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...inside-BigData.com
 
Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...
Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...
Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...Globus
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference
 
Object Tagged RBAC Model for Hadoop Ecosystem
Object Tagged RBAC Model for Hadoop EcosystemObject Tagged RBAC Model for Hadoop Ecosystem
Object Tagged RBAC Model for Hadoop EcosystemMaanak Gupta, Ph.D.
 

What's hot (20)

Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Access Control Model in Hadoop
Access Control Model in HadoopAccess Control Model in Hadoop
Access Control Model in Hadoop
 
XDC demo: CTA
XDC demo: CTAXDC demo: CTA
XDC demo: CTA
 
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
 
Globus toolkit in grid
Globus toolkit in gridGlobus toolkit in grid
Globus toolkit in grid
 
CSB_community
CSB_communityCSB_community
CSB_community
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
 
Slides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSlides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMiner
 
Data Center Operating System
Data Center Operating SystemData Center Operating System
Data Center Operating System
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Pivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-workingPivotal-HadoopOverview2016-working
Pivotal-HadoopOverview2016-working
 
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemChallenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data System
 
Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...
Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...
Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...
 
Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...
Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...
Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Geospatial data
Geospatial dataGeospatial data
Geospatial data
 
Object Tagged RBAC Model for Hadoop Ecosystem
Object Tagged RBAC Model for Hadoop EcosystemObject Tagged RBAC Model for Hadoop Ecosystem
Object Tagged RBAC Model for Hadoop Ecosystem
 

Viewers also liked

Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"
Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"
Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"Ksenia BIT
 
Kenézy Kórház Debrecen EFI összefoglaló
Kenézy Kórház Debrecen EFI összefoglalóKenézy Kórház Debrecen EFI összefoglaló
Kenézy Kórház Debrecen EFI összefoglalóMáté Rab
 
Ashok Leyland Fundamental Report by swastika Investmart
Ashok Leyland Fundamental Report by swastika InvestmartAshok Leyland Fundamental Report by swastika Investmart
Ashok Leyland Fundamental Report by swastika InvestmartSwastika Investmart
 
Recent co productions
Recent co productionsRecent co productions
Recent co productionsAdam Jacobs
 
What makes iChineseLearning different?
What makes iChineseLearning different?What makes iChineseLearning different?
What makes iChineseLearning different?Yuyu Zhao
 
Balaoing Fredalyn B (Types of literary conflict)
Balaoing Fredalyn B (Types of literary conflict)Balaoing Fredalyn B (Types of literary conflict)
Balaoing Fredalyn B (Types of literary conflict)balaoing
 

Viewers also liked (11)

My_project
My_projectMy_project
My_project
 
Teknik Pengutaraan
Teknik Pengutaraan Teknik Pengutaraan
Teknik Pengutaraan
 
Unit 01 - LO1
Unit 01 - LO1Unit 01 - LO1
Unit 01 - LO1
 
Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"
Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"
Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"
 
Kenézy Kórház Debrecen EFI összefoglaló
Kenézy Kórház Debrecen EFI összefoglalóKenézy Kórház Debrecen EFI összefoglaló
Kenézy Kórház Debrecen EFI összefoglaló
 
Ashok Leyland Fundamental Report by swastika Investmart
Ashok Leyland Fundamental Report by swastika InvestmartAshok Leyland Fundamental Report by swastika Investmart
Ashok Leyland Fundamental Report by swastika Investmart
 
Interview tips for employers
Interview tips for employersInterview tips for employers
Interview tips for employers
 
Recent co productions
Recent co productionsRecent co productions
Recent co productions
 
What makes iChineseLearning different?
What makes iChineseLearning different?What makes iChineseLearning different?
What makes iChineseLearning different?
 
samit cv2015
samit cv2015samit cv2015
samit cv2015
 
Balaoing Fredalyn B (Types of literary conflict)
Balaoing Fredalyn B (Types of literary conflict)Balaoing Fredalyn B (Types of literary conflict)
Balaoing Fredalyn B (Types of literary conflict)
 

Similar to D.P.H.E. in Hadoop Ecosystem

Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on webcsandit
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenHadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenrohinig10
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydsrikanth K
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for womenHadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for womenrohinig10
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Rajat Mittal
 

Similar to D.P.H.E. in Hadoop Ecosystem (20)

Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenHadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for womenHadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
 
Mukul-Resume
Mukul-ResumeMukul-Resume
Mukul-Resume
 
Presentation1
Presentation1Presentation1
Presentation1
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 

D.P.H.E. in Hadoop Ecosystem

  • 1. D.P.H.E ( DATA PROCESSING IN HADOOP ECOSYSTEM) Prepared by: Ritwik Jain Prepared on: Wednesday,16 July 2015
  • 2. TABLE OF CONTENTS :  INTRODUCTION.  OVERVIEW.  TECHNOLOGIES.  PROJECT OVERVIEW.  DEVELOPMENT ENVIRONMENT.  DEVELOPMENT AND TESTING  Screenshots of IDE’s and INPUT files  MongoDB Database  Cloudera  Screenshots Of Output files
  • 3. 1. INTRODUCTION 1.1 Overview: Data Processing project aims at retrieving clusters of data from multiple files , upload into Hadoop Ecosystem (HDFS) , sorting it according to the client’s need and provide desired output. 1.2 Technologies: The following technologies are used in order to build Data processing system:  HDFS.  JAVA.  MAP/REDUCE.  H-BASE  JAVA  MONGODB
  • 4. 2. PROJECT OVERVIEW  Data Processing project focuses on how one can play with millions and tons of data.  It aims at loading the Big data from the files of any networking sites into the HDFS (Hadoop File System) .  Then sorting and shuffling of data according to the necessity of the client with the help of Map/Reduce Algorithms.  Putting back into NoSQL Database MongoDB and could be easily read by the client.
  • 5. 3. DEVELOPMENT ENVIRONMENT:  LINUX(CentOS 6.4)  ECLIPSE JUNO  CLOUDERA VM (include Daemon tools of hadoop)  MONGOVUE 1.6.9
  • 6. 4. DEVELOPMENT & TESTING:  Screenshots of IDE’s and INPUT files:
  • 9.  Screenshots Of Output files: