D.P.H.E. in Hadoop Ecosystem

•

0 likes•241 views

This document provides an overview of a data processing project in the Hadoop ecosystem. It uses technologies like HDFS, Java, MapReduce, HBase, and MongoDB. The project loads big data from files into HDFS, sorts and shuffles the data using MapReduce algorithms, and stores the output in the MongoDB database. The development environment includes Linux, Eclipse IDE, Cloudera VM with Hadoop daemons, and MongoDB tools. Screenshots are provided of the input files, MongoDB database, Cloudera dashboard, and output files.

D.P.H.E
( DATA PROCESSING IN
HADOOP ECOSYSTEM)
Prepared by:
Ritwik Jain
Prepared on:
Wednesday,16 July 2015

TABLE OF CONTENTS :
 INTRODUCTION.
 OVERVIEW.
 TECHNOLOGIES.
 PROJECT OVERVIEW.
 DEVELOPMENT ENVIRONMENT.
 DEVELOPMENT AND TESTING
 Screenshots of IDE’s and INPUT files
 MongoDB Database
 Cloudera
 Screenshots Of Output files

1. INTRODUCTION
1.1 Overview:
Data Processing project aims at retrieving clusters of data from multiple
files , upload into Hadoop Ecosystem (HDFS) , sorting it according to the
client’s need and provide desired output.
1.2 Technologies:
The following technologies are used in order to build Data processing
system:
 HDFS.
 JAVA.
 MAP/REDUCE.
 H-BASE
 JAVA
 MONGODB

2. PROJECT OVERVIEW
 Data Processing project focuses on how one can play with
millions and tons of data.
 It aims at loading the Big data from the files of any
networking sites into the HDFS (Hadoop File System) .
 Then sorting and shuffling of data according to the
necessity of the client with the help of Map/Reduce
Algorithms.
 Putting back into NoSQL Database MongoDB and could be
easily read by the client.

3. DEVELOPMENT ENVIRONMENT:
 LINUX(CentOS 6.4)
 ECLIPSE JUNO
 CLOUDERA VM (include Daemon tools of hadoop)
 MONGOVUE 1.6.9

4. DEVELOPMENT & TESTING:
 Screenshots of IDE’s and INPUT files:

What's hot

Release webinar: Sansa and OntarioBigData_Europe

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1

Access Control Model in HadoopMaanak Gupta, Ph.D.

XDC demo: CTAEOSC-hub project

Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...The HDF-EOS Tools and Information Center

An Introduction to Apache SparkElvis Saravia

Globus toolkit in gridDeevena Dayaal

CSB_communityAlbert Anthony Gavino, MBA

Hourglass: a Library for Incremental Processing on HadoopMatthew Hayes

Slides PAPIs.io'14 RapidMinerSabrina Kirstein

Data Center Operating SystemKeshav Yadav

Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen

Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu

Pivotal-HadoopOverview2016-workingtts2086

Challenge And Evolution Of Data Orchestration at Rakuten Data SystemAlluxio, Inc.

Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...inside-BigData.com

Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...Globus

Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference

Geospatial dataMostafaAliAbbas

Object Tagged RBAC Model for Hadoop EcosystemMaanak Gupta, Ph.D.

What's hot (20)

Release webinar: Sansa and Ontario

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women

Access Control Model in Hadoop

XDC demo: CTA

Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...

An Introduction to Apache Spark

Globus toolkit in grid

CSB_community

Hourglass: a Library for Incremental Processing on Hadoop

Slides PAPIs.io'14 RapidMiner

Data Center Operating System

Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...

Mixing Analytic Workloads with Greenplum and Apache Spark

Pivotal-HadoopOverview2016-working

Challenge And Evolution Of Data Orchestration at Rakuten Data System

Building a Tiered Digital Storage Environment on User-Defined Metadata to Ena...

Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...

Session 1.2 enrich your knowledge graphs: linked data integration with pool...

Geospatial data

Object Tagged RBAC Model for Hadoop Ecosystem

Viewers also liked

My_projectRitwik Jain

Teknik Pengutaraan Uswatun Nisa

Unit 01 - LO1Morgan Pearson

Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"Ksenia BIT

Kenézy Kórház Debrecen EFI összefoglalóMáté Rab

Ashok Leyland Fundamental Report by swastika InvestmartSwastika Investmart

Interview tips for employerschristianlopez2210

Recent co productionsAdam Jacobs

What makes iChineseLearning different?Yuyu Zhao

samit cv2015samit kumar trifaley

Balaoing Fredalyn B (Types of literary conflict)balaoing

Viewers also liked (11)

My_project

Teknik Pengutaraan

Unit 01 - LO1

Первый БИТ. Челябинск: "МИС регионального уровня на платформе 1С"

Kenézy Kórház Debrecen EFI összefoglaló

Ashok Leyland Fundamental Report by swastika Investmart

Interview tips for employers

Recent co productions

What makes iChineseLearning different?

samit cv2015

Balaoing Fredalyn B (Types of literary conflict)

Similar to D.P.H.E. in Hadoop Ecosystem

Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals

G017143640IOSR Journals

Building a Big Data platform with the Hadoop ecosystemGregg Barrett

B04 06 0918International Journal of Engineering Inventions www.ijeijournal.com

Design architecture based on webcsandit

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf

hadoop resumeHassan Qureshi

Overview of big data & hadoop v1Thanh Nguyen

B04 06 0918International Journal of Engineering Inventions www.ijeijournal.com

Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar

Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for womenrohinig10

Srikanth hadoop 3.6yrs_hydsrikanth K

Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann

Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for womenrohinig10

Mukul-Resumemukul upadhyay

Presentation1Atul Singh

Hadoop Cluster Analysis and AssessmentInternational Journal of Modern Research in Engineering and Technology

Sentiment Analysis using Big Data Rajat Mittal

What is hadoopAsis Mohanty

Similar to D.P.H.E. in Hadoop Ecosystem (20)

Big Data Analysis and Its Scheduling Policy – Hadoop

G017143640

Building a Big Data platform with the Hadoop ecosystem

B04 06 0918

Design architecture based on web

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...

hadoop resume

Overview of big data & hadoop v1

B04 06 0918

Hadoop a Natural Choice for Data Intensive Log Processing

Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women

Srikanth hadoop 3.6yrs_hyd

Survey on Performance of Hadoop Map reduce Optimization Methods

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)

Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women

Mukul-Resume

Presentation1

Hadoop Cluster Analysis and Assessment

Sentiment Analysis using Big Data

What is hadoop

D.P.H.E. in Hadoop Ecosystem

1. D.P.H.E ( DATA PROCESSING IN HADOOP ECOSYSTEM) Prepared by: Ritwik Jain Prepared on: Wednesday,16 July 2015

2. TABLE OF CONTENTS :  INTRODUCTION.  OVERVIEW.  TECHNOLOGIES.  PROJECT OVERVIEW.  DEVELOPMENT ENVIRONMENT.  DEVELOPMENT AND TESTING  Screenshots of IDE’s and INPUT files  MongoDB Database  Cloudera  Screenshots Of Output files

3. 1. INTRODUCTION 1.1 Overview: Data Processing project aims at retrieving clusters of data from multiple files , upload into Hadoop Ecosystem (HDFS) , sorting it according to the client’s need and provide desired output. 1.2 Technologies: The following technologies are used in order to build Data processing system:  HDFS.  JAVA.  MAP/REDUCE.  H-BASE  JAVA  MONGODB

4. 2. PROJECT OVERVIEW  Data Processing project focuses on how one can play with millions and tons of data.  It aims at loading the Big data from the files of any networking sites into the HDFS (Hadoop File System) .  Then sorting and shuffling of data according to the necessity of the client with the help of Map/Reduce Algorithms.  Putting back into NoSQL Database MongoDB and could be easily read by the client.

5. 3. DEVELOPMENT ENVIRONMENT:  LINUX(CentOS 6.4)  ECLIPSE JUNO  CLOUDERA VM (include Daemon tools of hadoop)  MONGOVUE 1.6.9

6. 4. DEVELOPMENT & TESTING:  Screenshots of IDE’s and INPUT files:

7. •MongoDB Database:

8.  CloudERA:

9.  Screenshots Of Output files:

D.P.H.E. in Hadoop Ecosystem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to D.P.H.E. in Hadoop Ecosystem

Similar to D.P.H.E. in Hadoop Ecosystem (20)

D.P.H.E. in Hadoop Ecosystem