SlideShare a Scribd company logo
Big Data Summer Training
BigData Analytics-BigData Platforms
Prepared and Presenting By: Amrit Chhetri (Certified BigData Analyst),
Principal Techno-Functional Consultant and Principal IT Security Consultant
About Me :

Me:

I’m Amrit Chhetri from Bara Mangwa, West Bengal, India, a beautiful Village/Place in Darjeeling.

I am CSCU, CEH, CHFI,CPT, CAD, CPD, IOT & BigData Analyst( University of California), Information
Security Specialist(Open University, UK) and Machine Learning Enthusiast ( University of
California[USA] and Open University[UK]), Certified Cyber Physical System Exert( Open
University[UK]) and Certified Smart City Expert.

Current Position:

Principal IT Security Consultant/Instructor, Principal Forensics Investigator and Principal Techno-
Functional Consultant/Instructor

BigData Consultant to KnowledgeLab

Experiences:

I was J2EE Developer and BI System Architect/Designer of DSS for APL and Disney World

I have played the role of BI Evangelist and Pre-Sales Head for BI System* from OST

I have worked as Business Intelligence Consultant for national and multi-national companies including
HSBC, APL, Disney, Fidedality , LG(India) , Fidelity, BOR( currently ICICI), Reliance Power. * Top 5
Indian BI System ( by NASSCOM)
Apache Hadoop on Ubuntu 15.10/04
Agendas/Modules:

BigData Analytics with Apache Hadoop

Apache Hadoop on Ubuntu 15.10 or 15.04
BigData Analytics with Apache
Hadoop:

Apache Hadoop 2.7.2 works well Ubuntu 15.10 or 15.04 and Ubuntu 16.x is
not fully compatible

A typical BigData Analytics with Apache Hadoop consists of :

Hadoop, Hive, Pig

Hbase, Stoop, Spark and BIRT

Other platforms for BigData Programming are:

Eclipse – Plugins : Pydev, hadoop-eclipse, scala

Database –MongoDB, MySQL, SDK /API -Python, Scala, Java

Other Tools:

Toad, Pentahoo ETL, Toad for MySQL

Java Jars : jabc(mysql, mongodb), Pig.jar
Apache Hadoop on Ubuntu 15.10 or
15.04:

Platforms tested and finalized:

Ubuntu :15.10 or 15.04

JDK: 1.8

Hadoop: 2.7.2

Eclipse : Mars R1/R2

MySQL : 5.x

Two ways to have Apache Hadoop:

Pre-configured

Self-configured ( steps are given)

SSH installation requires Ubuntu’s updates using $sudo [<http_proxy>] apt-get update

sudo [<http_proxy>] apt-get update install mysql-server install MySQL Server
HBase
Agendas/Modules:

HBase Introduction

Features of HBase

HBase Accessibility

HBase Basic Commands

HBase Table Commands
HBase Introduction:

HBase is an Open-Source Non-Relational and Columnar Database System

It is Distributed Database System built on top of Hadoop's HDFS

Good for random-access and real-time read/write access of massive
data(BigData)

Initially written in Scala and new supports/features added using Java

It is based on Google BigTable and supports linear and modular Scalability

Supports Hadoop's MapReduce Jobs

Leverages direct access to data on Tables

Leverages easy to use Java API

Supports exporting metrics via the Hadoop metrics subsystem
Features of HBase:

Supports random access to data

Supports fast access to large set of data

Supports cryptographic storage mechanism too
HBase Basic Commands:

A common set of Tools for accessing HBase:

HBase Shell : $/# hbase shell

Status : hbase> status

HBase Version : hbase> version

Current User : hbase> whoami

Create Table : hbase>CREATE 'products', 'id','name','price‘

Exiting : hbase>exit

Besides, it also is exposed or accessed using:

JDBC and ODBC interfaces
HBase Table Commands:

Getting hbase shell:

$hbase shell

Create table:

hbase> create 'table1' , 'col1‘ ;hbase> list 'table1‘

Putting Data into table:

hbase> put 'table1', 'row1' , 'col1:a' , '23‘

hbase> put 'table1', 'row2' , 'col1:b' , '24‘

hbase> put 'table1', 'row3' , 'col1:c' , '25‘

Scaning Data:

hbase > scan 'table1‘

Getting single row:

hbase> get 'table1', 'row1’
Advanced Pythone
Agendas/Modules:

Python In-Built Functions

Python on Linux- Terminal and PyDev

Python Collection Module
Python In-Built Functions:

Python has huge set of built-in and here are some examples:

random.random() : Generates values between 0.0 to 1.0

random.randint(min, max) : Generates integer number randomly between minimum and
maximum values

math.sqrt(number): Return the square root of the given function

s.getcwd() or os.getcwdu() : Gives current working directory

os.system() : Allows to run OS commands

Example:
import random ;import math ;import os
x=int(input("Enter First Number :"))
y=int(input("Enter Second Number: "))
print("Power:", pow(x,y))
print("Factorial:", math.factorial(x))
print(random.randint(y,y))
os.system("netstat -an ")
Python on Linux-Terminal & PyDev:

Python on Linux is most used combination of Data Science

In Linux, Python can be installed in two ways:

Using IDE : Eclipse and Pydev

On Terminal using standard editor : gedit and python command

In Linux, Python script can be executed as $ python <scriptname.py> arg1 arg2

Steps to configure Python to run with Eclipse on Linux are:

Install JDK 1.7 or 1.8

Install Eclipse Mars 2

Open Eclipse and click on Help -> New Software -> click on ‘Add’ button and give http://pydev.org/updates to
install Pydev plugin

In preferences, click Pydev and add python executable file
BigData, ETL and Analytics Designing
Agendas/Modules:

ETL with Sqoop

ETL with Talend
ETL With Sqoop:

Sqoop is Data Integration Service developed on Open Source Hadoop
Technology

Sqoop is meant to transform data between Hadoop Cluster and Database
using JDBC ,in bi-direction

Scoop- Data Integration Steps- Moving to HDFS:

import --connect jdbc:mysql://<DB IP>/database --table orders --username <DB User> –P

Data Integration Steps- Moving to Hive:

#sqoop import --hive-import --create-hive-table --hive-table orders –connect
jdbc:mysql://<DB IP>/database --table orders --username <DB User> -P <password>
ETL with Talend Studio:

Talend is the world-class ETL tool available for BigData and the Data Systems

It is used to move data to BigData Warehouse designed using HBase, Hive
and other technology

Steps to transform data using Talend Studio:

Install JDK on Windows or Linux ( or Ubuntu)

Install Talend Studio to and run it

Create a transformation and create sources for targeted source like HBase, Hive and
others

Draw the transformation mapping on main screen

Now, either schedule the job to run in future and to run immediately
THANK YOU ALL

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
Ahmed Salman
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
DataWorks Summit
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
Sankarapu Anjaneyulu
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
Frans van Noort
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
Nguyen Cao
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
Yukti Kaura
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
BIG DATA
BIG DATABIG DATA
BIG DATA
Shashank Shetty
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 

What's hot (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Hadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or questionHadoop, SQL and NoSQL, No longer an either/or question
Hadoop, SQL and NoSQL, No longer an either/or question
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 

Viewers also liked

Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
N Masahiro
 
20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか
Katsunori Kanda
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Jason Shih
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
Ofir Manor
 
Viet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud ComputingViet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud Computing
Vietnam Open Infrastructure User Group
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computing
Marco Quartulli
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
Chicago Hadoop Users Group
 
Bigdata Analytics using Hadoop
Bigdata Analytics using HadoopBigdata Analytics using Hadoop
Bigdata Analytics using Hadoop
Nagamani Gurram
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser
 
Coastal Urban DEM project - Mapping the vulnerability of Australia's Coast
Coastal Urban DEM project - Mapping the vulnerability of Australia's CoastCoastal Urban DEM project - Mapping the vulnerability of Australia's Coast
Coastal Urban DEM project - Mapping the vulnerability of Australia's Coast
Fungis Queensland
 
MATLAB Fundamentals (1)
MATLAB Fundamentals (1)MATLAB Fundamentals (1)
MATLAB Fundamentals (1)
Carina Thiemann
 
Digital image processing - What is digital image processign
Digital image processing - What is digital image processignDigital image processing - What is digital image processign
Digital image processing - What is digital image processign
E2MATRIX
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
Jie-Han Chen
 
Geomagic_Control (EN)
Geomagic_Control (EN)Geomagic_Control (EN)
Geomagic_Control (EN)
Quoc Tuan Duong, ing.
 
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...
PyData
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on Tez
Gw Liu
 
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
Naoki (Neo) SATO
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
gauravsc36
 
Data Visualization(s) Using Python
Data Visualization(s) Using PythonData Visualization(s) Using Python
Data Visualization(s) Using Python
Aniket Maithani
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
Cloudera, Inc.
 

Viewers also liked (20)

Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
 
20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか20150207 何故scalaを選んだのか
20150207 何故scalaを選んだのか
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Viet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud ComputingViet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud Computing
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computing
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Bigdata Analytics using Hadoop
Bigdata Analytics using HadoopBigdata Analytics using Hadoop
Bigdata Analytics using Hadoop
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Coastal Urban DEM project - Mapping the vulnerability of Australia's Coast
Coastal Urban DEM project - Mapping the vulnerability of Australia's CoastCoastal Urban DEM project - Mapping the vulnerability of Australia's Coast
Coastal Urban DEM project - Mapping the vulnerability of Australia's Coast
 
MATLAB Fundamentals (1)
MATLAB Fundamentals (1)MATLAB Fundamentals (1)
MATLAB Fundamentals (1)
 
Digital image processing - What is digital image processign
Digital image processing - What is digital image processignDigital image processing - What is digital image processign
Digital image processing - What is digital image processign
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
Geomagic_Control (EN)
Geomagic_Control (EN)Geomagic_Control (EN)
Geomagic_Control (EN)
 
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...
Up and Down the Python Data & Web Visualization Stack by Rob Story PyData SV ...
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on Tez
 
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Data Visualization(s) Using Python
Data Visualization(s) Using PythonData Visualization(s) Using Python
Data Visualization(s) Using Python
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 

Similar to BigData Analytics with Hadoop and BIRT

BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
Krishna Kumar Sharma
 
Information Security Analytics
Information Security AnalyticsInformation Security Analytics
Information Security Analytics
Amrit Chhetri
 
Shiv shakti resume
Shiv shakti resumeShiv shakti resume
Shiv shakti resume
Shiv Shakti
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updated
Hari Duche
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
Travis Oliphant
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
Ambuj Kumar
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
Alluxio, Inc.
 
Mukul-Resume
Mukul-ResumeMukul-Resume
Mukul-Resume
mukul upadhyay
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
Amar kumar
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Raphael Branger
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
Sumitra Pundlik
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
Hao Chen
 
Open event presentation.3 2
Open event presentation.3 2Open event presentation.3 2
Open event presentation.3 2
Jorge López-Lago
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 
Backstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBackstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptx
BrandenTimm1
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
J Langley
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
Hao Chen
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 

Similar to BigData Analytics with Hadoop and BIRT (20)

BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
Information Security Analytics
Information Security AnalyticsInformation Security Analytics
Information Security Analytics
 
Shiv shakti resume
Shiv shakti resumeShiv shakti resume
Shiv shakti resume
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updated
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Mukul-Resume
Mukul-ResumeMukul-Resume
Mukul-Resume
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
Open event presentation.3 2
Open event presentation.3 2Open event presentation.3 2
Open event presentation.3 2
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Backstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBackstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptx
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 

Recently uploaded

Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
cjimenez2581
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 

Recently uploaded (20)

Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 

BigData Analytics with Hadoop and BIRT

  • 1. Big Data Summer Training BigData Analytics-BigData Platforms Prepared and Presenting By: Amrit Chhetri (Certified BigData Analyst), Principal Techno-Functional Consultant and Principal IT Security Consultant
  • 2. About Me :  Me:  I’m Amrit Chhetri from Bara Mangwa, West Bengal, India, a beautiful Village/Place in Darjeeling.  I am CSCU, CEH, CHFI,CPT, CAD, CPD, IOT & BigData Analyst( University of California), Information Security Specialist(Open University, UK) and Machine Learning Enthusiast ( University of California[USA] and Open University[UK]), Certified Cyber Physical System Exert( Open University[UK]) and Certified Smart City Expert.  Current Position:  Principal IT Security Consultant/Instructor, Principal Forensics Investigator and Principal Techno- Functional Consultant/Instructor  BigData Consultant to KnowledgeLab  Experiences:  I was J2EE Developer and BI System Architect/Designer of DSS for APL and Disney World  I have played the role of BI Evangelist and Pre-Sales Head for BI System* from OST  I have worked as Business Intelligence Consultant for national and multi-national companies including HSBC, APL, Disney, Fidedality , LG(India) , Fidelity, BOR( currently ICICI), Reliance Power. * Top 5 Indian BI System ( by NASSCOM)
  • 3. Apache Hadoop on Ubuntu 15.10/04
  • 4. Agendas/Modules:  BigData Analytics with Apache Hadoop  Apache Hadoop on Ubuntu 15.10 or 15.04
  • 5. BigData Analytics with Apache Hadoop:  Apache Hadoop 2.7.2 works well Ubuntu 15.10 or 15.04 and Ubuntu 16.x is not fully compatible  A typical BigData Analytics with Apache Hadoop consists of :  Hadoop, Hive, Pig  Hbase, Stoop, Spark and BIRT  Other platforms for BigData Programming are:  Eclipse – Plugins : Pydev, hadoop-eclipse, scala  Database –MongoDB, MySQL, SDK /API -Python, Scala, Java  Other Tools:  Toad, Pentahoo ETL, Toad for MySQL  Java Jars : jabc(mysql, mongodb), Pig.jar
  • 6. Apache Hadoop on Ubuntu 15.10 or 15.04:  Platforms tested and finalized:  Ubuntu :15.10 or 15.04  JDK: 1.8  Hadoop: 2.7.2  Eclipse : Mars R1/R2  MySQL : 5.x  Two ways to have Apache Hadoop:  Pre-configured  Self-configured ( steps are given)  SSH installation requires Ubuntu’s updates using $sudo [<http_proxy>] apt-get update  sudo [<http_proxy>] apt-get update install mysql-server install MySQL Server
  • 8. Agendas/Modules:  HBase Introduction  Features of HBase  HBase Accessibility  HBase Basic Commands  HBase Table Commands
  • 9. HBase Introduction:  HBase is an Open-Source Non-Relational and Columnar Database System  It is Distributed Database System built on top of Hadoop's HDFS  Good for random-access and real-time read/write access of massive data(BigData)  Initially written in Scala and new supports/features added using Java  It is based on Google BigTable and supports linear and modular Scalability  Supports Hadoop's MapReduce Jobs  Leverages direct access to data on Tables  Leverages easy to use Java API  Supports exporting metrics via the Hadoop metrics subsystem
  • 10. Features of HBase:  Supports random access to data  Supports fast access to large set of data  Supports cryptographic storage mechanism too
  • 11. HBase Basic Commands:  A common set of Tools for accessing HBase:  HBase Shell : $/# hbase shell  Status : hbase> status  HBase Version : hbase> version  Current User : hbase> whoami  Create Table : hbase>CREATE 'products', 'id','name','price‘  Exiting : hbase>exit  Besides, it also is exposed or accessed using:  JDBC and ODBC interfaces
  • 12. HBase Table Commands:  Getting hbase shell:  $hbase shell  Create table:  hbase> create 'table1' , 'col1‘ ;hbase> list 'table1‘  Putting Data into table:  hbase> put 'table1', 'row1' , 'col1:a' , '23‘  hbase> put 'table1', 'row2' , 'col1:b' , '24‘  hbase> put 'table1', 'row3' , 'col1:c' , '25‘  Scaning Data:  hbase > scan 'table1‘  Getting single row:  hbase> get 'table1', 'row1’
  • 14. Agendas/Modules:  Python In-Built Functions  Python on Linux- Terminal and PyDev  Python Collection Module
  • 15. Python In-Built Functions:  Python has huge set of built-in and here are some examples:  random.random() : Generates values between 0.0 to 1.0  random.randint(min, max) : Generates integer number randomly between minimum and maximum values  math.sqrt(number): Return the square root of the given function  s.getcwd() or os.getcwdu() : Gives current working directory  os.system() : Allows to run OS commands  Example: import random ;import math ;import os x=int(input("Enter First Number :")) y=int(input("Enter Second Number: ")) print("Power:", pow(x,y)) print("Factorial:", math.factorial(x)) print(random.randint(y,y)) os.system("netstat -an ")
  • 16. Python on Linux-Terminal & PyDev:  Python on Linux is most used combination of Data Science  In Linux, Python can be installed in two ways:  Using IDE : Eclipse and Pydev  On Terminal using standard editor : gedit and python command  In Linux, Python script can be executed as $ python <scriptname.py> arg1 arg2  Steps to configure Python to run with Eclipse on Linux are:  Install JDK 1.7 or 1.8  Install Eclipse Mars 2  Open Eclipse and click on Help -> New Software -> click on ‘Add’ button and give http://pydev.org/updates to install Pydev plugin  In preferences, click Pydev and add python executable file
  • 17. BigData, ETL and Analytics Designing
  • 19. ETL With Sqoop:  Sqoop is Data Integration Service developed on Open Source Hadoop Technology  Sqoop is meant to transform data between Hadoop Cluster and Database using JDBC ,in bi-direction  Scoop- Data Integration Steps- Moving to HDFS:  import --connect jdbc:mysql://<DB IP>/database --table orders --username <DB User> –P  Data Integration Steps- Moving to Hive:  #sqoop import --hive-import --create-hive-table --hive-table orders –connect jdbc:mysql://<DB IP>/database --table orders --username <DB User> -P <password>
  • 20. ETL with Talend Studio:  Talend is the world-class ETL tool available for BigData and the Data Systems  It is used to move data to BigData Warehouse designed using HBase, Hive and other technology  Steps to transform data using Talend Studio:  Install JDK on Windows or Linux ( or Ubuntu)  Install Talend Studio to and run it  Create a transformation and create sources for targeted source like HBase, Hive and others  Draw the transformation mapping on main screen  Now, either schedule the job to run in future and to run immediately