SlideShare a Scribd company logo
1 of 33
Download to read offline
Big Data 
With 
Hadoop Setup 
Mandakini Kumari
Agenda 
1. Big Data ? 
2. Limitation of Existing System 
3. Advantage Of Hadoop 
4. Disadvantage of Hadoop 
5. Hadoop Echo System & Components 
6. Prerequisite for Hadoop 1.x 
7. Install Hadoop 1.X
1.1 Characteristics of Big Data
1.2 In Every 60 seconds on the internet
2.1 Limitation of Existing Data 
Analytics Architecture
3.1 Advantage of Hadoop 
•Hadoop: storage + Computational capabilities both together. While 
RDBMS computation done in CPU which required BUS for data transfer from HardDisk to CPU 
•Fault-tolerant hardware is expensive V/S Hadoop is design to 
run on cheap commodity hardware 
•Complicated Data Replication & Failure System v/s 
Hadoop autometically handles datareplication and node failure. 
•HDFS (storage) is optimized for high throughput. 
•Large block sizes of HDFS helps in large files(GB, PB...) 
• HDFS have high Scalability and Availability for achieve 
data replication and fault tolerance. 
•Extremely scalable 
•MR Framework allows parallel work over a huge data. 
•Job schedule for remote execution on the slave/datanodes 
allow parallel & fast job executions. 
•MR deal with business and HDFS with storage independently
3.2 Advantage of Hadoop
3.3 Advantage of Hadoop
4.1 Disadvantage of Hadoop 
•HDFS is inefficient for handling small files 
•Hadoop 1.X single points of failure at NN 
•Create problem if cluster is more then 4000 because all 
meta data will store on only one NN RAM. 
•Hadoop 2.x don't have single points of failure. 
•Security is major concern because Hadoop 1.X does 
offer a security model But by default it is disabled 
because of its high complexity. 
•Hadoop 1.X does not offer storage or network level 
encryption which is very big concern for government 
sector application data.
5.1 HADOOP ECO SYSTEM
5.2 ADVANTAGE OF HDFS
5.3 NAMENODE: HADOOP COMPONENT 
•It is Master with high end H/W. 
•Store all Metadata in Main Memory i.e. RAM. 
•Type of MetaData: List of files, Blocks for each file, 
DN for each block 
•File attributes: Access time, replication factor 
•JobTracker report to NN after JOB completed. 
•Receive heartbeat from each DN 
•Transaction Log: Records file create / delete etc.
5.4 DATANODE: HADOOP COMPONENT 
•A Slave/commodity H/W 
•File Write operation in DN preferred as sequential 
process. If parallel then issue in data replication. 
•File write in DN is parallel process 
•Provides actual storage. 
•Responsible for read/write data for clients 
•Heartbeat: NN receive heartbeat from DN in 
every 5 or 10 sec. If heartbeat not received then 
data will replicated to another datanode.
5.5 SECONDARY NAMENODE: HADOOP 
COMPONENT 
•Not a hot standby for the NameNode(NN) 
•If NN fail only Read operation can performed no 
block replicated or deleted. 
•If NN failed system will go in safe mode 
•Secondary NameNode connect to NN in every 
hour and get backup of NN metadata 
•Saved metadata can build a failed NameNode
5.6 MAPREDUCE(BUSINESS LOGIC) ENGINE 
•TaskTracker(TT) is slave 
•TT act like resource who work on task 
•Jobtracker(Master) act like manager who split JOB into TASK
5.7 HDFS: HADOOP 
COMPONENT
5.8 FAULT TOLERANCE: 
REPLICATION AND RACK AWARENESS
6. Hadoop Installation: Prerequisites 
1. Ubuntu Linux 12.04.3 LTS 
2. Installing Java v1.5+ 
3. Adding dedicated Hadoop system user. 
4. Configuring SSH access. 
5. Disabling IPv6. 
For Putty user: sudo apt-get install openssh-server 
Run command sudo apt-get update
6.1 Install Java v1.5+ 
6.1.1) Download latest oracle java linux version 
wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz 
OR 
To avoid passing username and password use 
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F 
%2Fwww.oracle.com" 
https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz 
6.1.2) Copy Java binaries into the /usr/local/java directory. 
sudo cp -r jdk-7u25-linux-x64.tar.gz /usr/local/java 
6.1.3) Change the directory to /usr/local/java: cd /usr/local/java 
6.1.4) Unpack the Java binaries, in /usr/local/java 
sudo tar xvzf jdk-7u25-linux-x64.tar.gz 
6.1.5) Edit the system PATH file /etc/profile 
sudo nano /etc/profile or sudo gedit /etc/profile
6.1 Install Java v1.5+ 
6.1.6) At end of /etc/profile file add the following system 
variables to your system path: 
JAVA_HOME=/usr/local/java/jdk1.7.0_25 
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin 
export JAVA_HOME 
export PATH 
6.1.7)Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is 
located. 
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 
6.1.8) Reload system wide PATH /etc/profile: . /etc/profile 
6.1.9) Test Java: Java -version
6.2 Add dedicated Hadoop system user 
6.2.1) Adding group: sudo addgroup Hadoop 
6.2.2) Creating a user and adding the user to 
a group: 
sudo adduser –ingroup Hadoop hduser
6.3 Generae an SSH key for the hduser user 
6.3.1) Login as hduser with sudo 
6.3.2) Run this Key generation command: ssh-keyegen -t rsa -P “” 
6.3.3) It will ask to provide the file name in which to save the 
key, just press has entered so that it will generate the key at 
‘/home/hduser/ .ssh’ 
6.3.4)Enable SSH access to your local machine with this 
newly created key. 
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 
6.3.5) Test SSH setup by connecting to your local machine 
with the hduser user. 
ssh hduser@localhost 
This will add localhost permanently to the list of known hosts
6.4 Disabling IPv6 
6.4.1)We need to disable IPv6 because Ubuntu is 
using 0.0.0.0 IP for different Hadoop configurations. 
Run command : sudo gedit /etc/sysctl.conf 
Add the following lines to the end of the file and 
reboot the machine, to update the configurations 
correctly. 
#disable ipv6 
net.ipv6.conf.all.disable_ipv6 = 1 
net.ipv6.conf.default.disable_ipv6 = 1 
net.ipv6.conf.lo.disable_ipv6 = 1
Install Hadoop 1.2 
Ubuntu Linux 12.04.3 LTS 
Hadoop 1.2.1, released August, 2013 
Download and extract Hadoop: 
Command: wget 
http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/Command: tar -xvf hadoop-1.2.0.tar.gz
Edit Core-Site.Xml 
Command: sudo gedit hadoop/conf/core-site.xml 
<property> 
<name>fs.default.name</name> 
<value>hdfs://localhost:8020</value> 
</property>
Edit hdfs-site.xml 
Command: sudo gedit hadoop/conf/hdfs-site.xml 
<property> 
<name>dfs.replication</name> 
<value>1</value> 
</property> 
<property> 
<name>dfs.permissions</name> 
<value>false</value> 
</property>
Edit mapred-site.xml 
Command: sudo gedit hadoop/conf/mapred 
-site.xml 
<property> 
<name>mapred.job.tracker</name> 
<value>localhost:8021</value> 
</property>
Get your ip address 
Command: ifconfig 
Command: sudo gedit /etc/hosts
CREATE A SSH KEY 
•Command: ssh-keygen -t rsa 
–P "" 
•Moving the key to 
authorized key: 
•Command: cat 
$HOME/.ssh/id_rsa.pub >> 
$HOME/.ssh/authorized_key 
s
Configuration 
•Reboot the system 
• Add JAVA_HOME in hadoop-env.sh file: 
Command: sudo gedit hadoop/conf/hadoop-env.sh 
Type :export JAVA_HOME=/usr/lib/jvm/java-6- 
openjdk-i386
JAVA_HOME
Hadoop Command 
Format the name node 
Command: bin/hadoop namenode -format 
Start the namenode, datanode 
Command: bin/start-dfs.sh 
Start the task tracker and job tracker 
Command: bin/start-mapred.sh 
To check if Hadoop started correctly 
Command: jps
Thank you 
References: 
http://bigdatahandler.com/2013/10/24/what-is-apache-hadoop/ 
edureka.in 
CONTACT ME @ 
http://in.linkedin.com/pub/mandakini-kumari/ 
18/93/935 
http://www.slideshare.net/mandakinikumari

More Related Content

What's hot

phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialWim Godden
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaPublicis Sapient Engineering
 
Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!Harald Zeitlhofer
 
Rihards Olups - Zabbix 3.0: Excited for new features?
Rihards Olups -  Zabbix 3.0: Excited for new features?Rihards Olups -  Zabbix 3.0: Excited for new features?
Rihards Olups - Zabbix 3.0: Excited for new features?Zabbix
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
Website Performance Basics
Website Performance BasicsWebsite Performance Basics
Website Performance Basicsgeku
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
 
Raymond Kuiper - Working the API like a Unix Pro
Raymond Kuiper - Working the API like a Unix ProRaymond Kuiper - Working the API like a Unix Pro
Raymond Kuiper - Working the API like a Unix ProZabbix
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsPhase2
 
Mongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksMongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksVladimir Malyk
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchVic Hargrave
 
Application Logging With The ELK Stack
Application Logging With The ELK StackApplication Logging With The ELK Stack
Application Logging With The ELK Stackbenwaine
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notesPerrin Harkins
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Maarten Balliauw
 

What's hot (19)

phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorial
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 
Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!
 
Rihards Olups - Zabbix 3.0: Excited for new features?
Rihards Olups -  Zabbix 3.0: Excited for new features?Rihards Olups -  Zabbix 3.0: Excited for new features?
Rihards Olups - Zabbix 3.0: Excited for new features?
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Website Performance Basics
Website Performance BasicsWebsite Performance Basics
Website Performance Basics
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Raymond Kuiper - Working the API like a Unix Pro
Raymond Kuiper - Working the API like a Unix ProRaymond Kuiper - Working the API like a Unix Pro
Raymond Kuiper - Working the API like a Unix Pro
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
Mongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksMongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricks
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
The tale of 100 cve's
The tale of 100 cve'sThe tale of 100 cve's
The tale of 100 cve's
 
Application Logging With The ELK Stack
Application Logging With The ELK StackApplication Logging With The ELK Stack
Application Logging With The ELK Stack
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...
 

Similar to Big data with hadoop Setup on Ubuntu 12.04

Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
Micro Datacenter & Data Warehouse
Micro Datacenter & Data WarehouseMicro Datacenter & Data Warehouse
Micro Datacenter & Data Warehousemdcdwh
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
High performance content hosting
High performance content hosting High performance content hosting
High performance content hosting Aleksey Korzun
 
Rh202 q&amp;a-demo-cert magic
Rh202 q&amp;a-demo-cert magicRh202 q&amp;a-demo-cert magic
Rh202 q&amp;a-demo-cert magicEllina Beckman
 

Similar to Big data with hadoop Setup on Ubuntu 12.04 (20)

Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Micro Datacenter & Data Warehouse
Micro Datacenter & Data WarehouseMicro Datacenter & Data Warehouse
Micro Datacenter & Data Warehouse
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
HDFS Issues
HDFS IssuesHDFS Issues
HDFS Issues
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
High performance content hosting
High performance content hosting High performance content hosting
High performance content hosting
 
Rh202 q&amp;a-demo-cert magic
Rh202 q&amp;a-demo-cert magicRh202 q&amp;a-demo-cert magic
Rh202 q&amp;a-demo-cert magic
 

More from Mandakini Kumari

Emerging Trends In Cloud Computing.pptx
Emerging Trends In Cloud Computing.pptxEmerging Trends In Cloud Computing.pptx
Emerging Trends In Cloud Computing.pptxMandakini Kumari
 
Building an Edge Computing Strategy - Distributed infrastructure.pptx
Building an Edge Computing Strategy - Distributed infrastructure.pptxBuilding an Edge Computing Strategy - Distributed infrastructure.pptx
Building an Edge Computing Strategy - Distributed infrastructure.pptxMandakini Kumari
 
Emerging Trends in Cloud Computing.pptx
Emerging Trends in Cloud Computing.pptxEmerging Trends in Cloud Computing.pptx
Emerging Trends in Cloud Computing.pptxMandakini Kumari
 
Women in IT & Inspirational Individual of the Year.pptx
Women in IT & Inspirational Individual of the Year.pptxWomen in IT & Inspirational Individual of the Year.pptx
Women in IT & Inspirational Individual of the Year.pptxMandakini Kumari
 
Php basic for vit university
Php basic for vit universityPhp basic for vit university
Php basic for vit universityMandakini Kumari
 
Web services soap and rest by mandakini for TechGig
Web services soap and rest by mandakini for TechGigWeb services soap and rest by mandakini for TechGig
Web services soap and rest by mandakini for TechGigMandakini Kumari
 
Drupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotechDrupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotechMandakini Kumari
 
Introduction of drupal7 by ayushi infotech
Introduction of drupal7 by ayushi infotechIntroduction of drupal7 by ayushi infotech
Introduction of drupal7 by ayushi infotechMandakini Kumari
 
Drupal 7 theme by ayushi infotech
Drupal 7 theme by ayushi infotechDrupal 7 theme by ayushi infotech
Drupal 7 theme by ayushi infotechMandakini Kumari
 

More from Mandakini Kumari (9)

Emerging Trends In Cloud Computing.pptx
Emerging Trends In Cloud Computing.pptxEmerging Trends In Cloud Computing.pptx
Emerging Trends In Cloud Computing.pptx
 
Building an Edge Computing Strategy - Distributed infrastructure.pptx
Building an Edge Computing Strategy - Distributed infrastructure.pptxBuilding an Edge Computing Strategy - Distributed infrastructure.pptx
Building an Edge Computing Strategy - Distributed infrastructure.pptx
 
Emerging Trends in Cloud Computing.pptx
Emerging Trends in Cloud Computing.pptxEmerging Trends in Cloud Computing.pptx
Emerging Trends in Cloud Computing.pptx
 
Women in IT & Inspirational Individual of the Year.pptx
Women in IT & Inspirational Individual of the Year.pptxWomen in IT & Inspirational Individual of the Year.pptx
Women in IT & Inspirational Individual of the Year.pptx
 
Php basic for vit university
Php basic for vit universityPhp basic for vit university
Php basic for vit university
 
Web services soap and rest by mandakini for TechGig
Web services soap and rest by mandakini for TechGigWeb services soap and rest by mandakini for TechGig
Web services soap and rest by mandakini for TechGig
 
Drupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotechDrupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotech
 
Introduction of drupal7 by ayushi infotech
Introduction of drupal7 by ayushi infotechIntroduction of drupal7 by ayushi infotech
Introduction of drupal7 by ayushi infotech
 
Drupal 7 theme by ayushi infotech
Drupal 7 theme by ayushi infotechDrupal 7 theme by ayushi infotech
Drupal 7 theme by ayushi infotech
 

Recently uploaded

PPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggPPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggbhadratanusenapati1
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Neo4j
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performancePrithaVashisht1
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxEmmanuel Dauda
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 

Recently uploaded (20)

PPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggPPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfgggg
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performance
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potx
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 

Big data with hadoop Setup on Ubuntu 12.04

  • 1. Big Data With Hadoop Setup Mandakini Kumari
  • 2. Agenda 1. Big Data ? 2. Limitation of Existing System 3. Advantage Of Hadoop 4. Disadvantage of Hadoop 5. Hadoop Echo System & Components 6. Prerequisite for Hadoop 1.x 7. Install Hadoop 1.X
  • 4. 1.2 In Every 60 seconds on the internet
  • 5. 2.1 Limitation of Existing Data Analytics Architecture
  • 6. 3.1 Advantage of Hadoop •Hadoop: storage + Computational capabilities both together. While RDBMS computation done in CPU which required BUS for data transfer from HardDisk to CPU •Fault-tolerant hardware is expensive V/S Hadoop is design to run on cheap commodity hardware •Complicated Data Replication & Failure System v/s Hadoop autometically handles datareplication and node failure. •HDFS (storage) is optimized for high throughput. •Large block sizes of HDFS helps in large files(GB, PB...) • HDFS have high Scalability and Availability for achieve data replication and fault tolerance. •Extremely scalable •MR Framework allows parallel work over a huge data. •Job schedule for remote execution on the slave/datanodes allow parallel & fast job executions. •MR deal with business and HDFS with storage independently
  • 9. 4.1 Disadvantage of Hadoop •HDFS is inefficient for handling small files •Hadoop 1.X single points of failure at NN •Create problem if cluster is more then 4000 because all meta data will store on only one NN RAM. •Hadoop 2.x don't have single points of failure. •Security is major concern because Hadoop 1.X does offer a security model But by default it is disabled because of its high complexity. •Hadoop 1.X does not offer storage or network level encryption which is very big concern for government sector application data.
  • 10. 5.1 HADOOP ECO SYSTEM
  • 12. 5.3 NAMENODE: HADOOP COMPONENT •It is Master with high end H/W. •Store all Metadata in Main Memory i.e. RAM. •Type of MetaData: List of files, Blocks for each file, DN for each block •File attributes: Access time, replication factor •JobTracker report to NN after JOB completed. •Receive heartbeat from each DN •Transaction Log: Records file create / delete etc.
  • 13. 5.4 DATANODE: HADOOP COMPONENT •A Slave/commodity H/W •File Write operation in DN preferred as sequential process. If parallel then issue in data replication. •File write in DN is parallel process •Provides actual storage. •Responsible for read/write data for clients •Heartbeat: NN receive heartbeat from DN in every 5 or 10 sec. If heartbeat not received then data will replicated to another datanode.
  • 14. 5.5 SECONDARY NAMENODE: HADOOP COMPONENT •Not a hot standby for the NameNode(NN) •If NN fail only Read operation can performed no block replicated or deleted. •If NN failed system will go in safe mode •Secondary NameNode connect to NN in every hour and get backup of NN metadata •Saved metadata can build a failed NameNode
  • 15. 5.6 MAPREDUCE(BUSINESS LOGIC) ENGINE •TaskTracker(TT) is slave •TT act like resource who work on task •Jobtracker(Master) act like manager who split JOB into TASK
  • 16. 5.7 HDFS: HADOOP COMPONENT
  • 17. 5.8 FAULT TOLERANCE: REPLICATION AND RACK AWARENESS
  • 18. 6. Hadoop Installation: Prerequisites 1. Ubuntu Linux 12.04.3 LTS 2. Installing Java v1.5+ 3. Adding dedicated Hadoop system user. 4. Configuring SSH access. 5. Disabling IPv6. For Putty user: sudo apt-get install openssh-server Run command sudo apt-get update
  • 19. 6.1 Install Java v1.5+ 6.1.1) Download latest oracle java linux version wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz OR To avoid passing username and password use wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F %2Fwww.oracle.com" https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz 6.1.2) Copy Java binaries into the /usr/local/java directory. sudo cp -r jdk-7u25-linux-x64.tar.gz /usr/local/java 6.1.3) Change the directory to /usr/local/java: cd /usr/local/java 6.1.4) Unpack the Java binaries, in /usr/local/java sudo tar xvzf jdk-7u25-linux-x64.tar.gz 6.1.5) Edit the system PATH file /etc/profile sudo nano /etc/profile or sudo gedit /etc/profile
  • 20. 6.1 Install Java v1.5+ 6.1.6) At end of /etc/profile file add the following system variables to your system path: JAVA_HOME=/usr/local/java/jdk1.7.0_25 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH 6.1.7)Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 6.1.8) Reload system wide PATH /etc/profile: . /etc/profile 6.1.9) Test Java: Java -version
  • 21. 6.2 Add dedicated Hadoop system user 6.2.1) Adding group: sudo addgroup Hadoop 6.2.2) Creating a user and adding the user to a group: sudo adduser –ingroup Hadoop hduser
  • 22. 6.3 Generae an SSH key for the hduser user 6.3.1) Login as hduser with sudo 6.3.2) Run this Key generation command: ssh-keyegen -t rsa -P “” 6.3.3) It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at ‘/home/hduser/ .ssh’ 6.3.4)Enable SSH access to your local machine with this newly created key. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 6.3.5) Test SSH setup by connecting to your local machine with the hduser user. ssh hduser@localhost This will add localhost permanently to the list of known hosts
  • 23. 6.4 Disabling IPv6 6.4.1)We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. Run command : sudo gedit /etc/sysctl.conf Add the following lines to the end of the file and reboot the machine, to update the configurations correctly. #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
  • 24. Install Hadoop 1.2 Ubuntu Linux 12.04.3 LTS Hadoop 1.2.1, released August, 2013 Download and extract Hadoop: Command: wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/Command: tar -xvf hadoop-1.2.0.tar.gz
  • 25. Edit Core-Site.Xml Command: sudo gedit hadoop/conf/core-site.xml <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property>
  • 26. Edit hdfs-site.xml Command: sudo gedit hadoop/conf/hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>
  • 27. Edit mapred-site.xml Command: sudo gedit hadoop/conf/mapred -site.xml <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property>
  • 28. Get your ip address Command: ifconfig Command: sudo gedit /etc/hosts
  • 29. CREATE A SSH KEY •Command: ssh-keygen -t rsa –P "" •Moving the key to authorized key: •Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_key s
  • 30. Configuration •Reboot the system • Add JAVA_HOME in hadoop-env.sh file: Command: sudo gedit hadoop/conf/hadoop-env.sh Type :export JAVA_HOME=/usr/lib/jvm/java-6- openjdk-i386
  • 32. Hadoop Command Format the name node Command: bin/hadoop namenode -format Start the namenode, datanode Command: bin/start-dfs.sh Start the task tracker and job tracker Command: bin/start-mapred.sh To check if Hadoop started correctly Command: jps
  • 33. Thank you References: http://bigdatahandler.com/2013/10/24/what-is-apache-hadoop/ edureka.in CONTACT ME @ http://in.linkedin.com/pub/mandakini-kumari/ 18/93/935 http://www.slideshare.net/mandakinikumari