SlideShare a Scribd company logo
1 of 31
Hadoop Cluster Setup
A Simple Way by Cloudera Manager
Peng-Yi Lai
Co-graph confidential
Outline
▪ Cloudera Manager – Set Up Your Hadoop

▪ Flume – Data Collection Tool

Co-graph confidential
Before Starting
▪ Ask yourself what do you want!

An expert to make
Hadoop itself better

Provide Service by
Using Hadoop

Co-graph confidential
As a Hadoop Expert

Better to know Hadoop as detail as possible
Companies like Cloudera and MapR
Co-graph confidential
Other Usages on Hadoop
1. Learn how to use
Hadoop to solve
problems more
effectively and
efficiently
2. Find an easiest
way to make sure
your Hadoop can
work properly

Co-graph confidential
Desired Skills
▪ Network knowledge is imperative
▪ Every node in a cluster communicates with each
other through network
▪ Even with cloudera manager, you still need to
handle it on your own

▪ Linux administration
▪ Everyone knows that!!

Co-graph confidential
Requirement for Cloudera
Manager (1)
▪ Prepare Your Machines
▪ Supported OS version
▪ Only 64bit Linux-based

▪ Supported Browsers
▪ For admin console

▪ Supported Database
▪ If you need to use custom database other than embedded PostgreSQL database

▪ Supported JDK version
▪ Cloudera Manager would install it for you if there is no JDK installed

▪ Repositories
▪ All hosts must have to access standard packages repositories and Cloudera
Hadoop repositories

Co-graph confidential
Requirement for Cloudera
Manager (2)
▪ Networking and Security
▪ Properly configuring DNS or /etc/hosts
▪ Everyone should know who’s who

▪ Using root account ro password-less sudo permision ssh
access to all cluster machines
▪ No blocking by iptables or firewalls
▪ 7180 port is used to access Cloudera Manager

▪ No blocking by Security-Enhanced Linux (SELinux)
▪ disabled

▪ There are more details on cloudera.com
▪ If there is a problem, don’t feel ashamed to google!
Co-graph confidential
Set Up a Hadoop Cluster
▪ After everything is done, install clouderamanager-installer.bin from the Cloudera
Downlaods page
▪ Change the permission and install
▪ Login to admin console on http://<Server
host>:7180
▪ Follow the steps by Cloudera Manager
▪ Done!

Co-graph confidential
Cloudera Manager Login

Co-graph confidential
Specify Hosts

Co-graph confidential
Hosts Found

Co-graph confidential
Waiting for Installation

Co-graph confidential
Home

Co-graph confidential
Actions of Services

Co-graph confidential
HDFS Service

Co-graph confidential
Configuration of HDFS

Co-graph confidential
Selected Services

Co-graph confidential
Services to Add

Co-graph confidential
All Hosts

Co-graph confidential
Information of a Host

Co-graph confidential
More about Cloudera Manager
▪ Easy to upgrade your CHD version

▪ Easy to add/delete a host and a cluster
▪ Easy to configure High Availability (HA)
▪ Support Hadoop security by using
Kerberos
▪ Support backup and disaster recovery

Co-graph confidential
For Developer
▪ Use Hue (another topic)

Co-graph confidential
Observation

Co-graph confidential
Flume
A Data Collection Tool
Co-graph confidential
Two Ways to Use Flume
Independent of Hadoop
cluster
• Flume can totally run by
itself
• Configure flume.conf in
/etc/flume-ng/conf

On cluster of Hadoop
Or a node managed by
Cloudera Manager
• Easy to keep the agent
nodes under control
• Start, Stop, Restart
service on admin console
• Configure flume on admin
console
• Convenient to check log
file

Co-graph confidential
3 Important Settings
Source
• Define what kind of events sent by external source
to accept
Channel
• Define which way to keep the event until it’s
consumed by a Flume sink
Sink
• Define which repository like HDFS or Flume agent
to put/forward the event kept in Channel

Co-graph confidential
Type Example
▪ Source
▪
▪
▪
▪
▪
▪
▪
▪
▪

Avro Source
Exec Source
JMS Source
NetCat Source
Syslog TCP
Source
Syslog UDP
Source
HTTP Source
Thrift Legacy
Source
…etc

▪ Channel
▪ Memory
Channel
▪ JDBC
Channel

▪ File Channel
▪ Pseudo
Transaction
Channel
▪ Custom
Channel

Co-graph confidential

▪ Sink
▪ HDFS Sink
▪ Logger Sink

▪ Avro Sink
▪ Thrift Sink
▪ IRC Sink
▪ File Roll Sink

▪ HBaseSink
▪ …etc
Example of Setting

Co-graph confidential
Use Cloudera Manager

Co-graph confidential
Co-graph confidential

More Related Content

What's hot

Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingCloudera, Inc.
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on DockerMariaDB plc
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANUsing Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANphilip_stoev
 
OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015Jesse Pretorius
 
MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013Colin Charles
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariJayush Luniya
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...DataStax
 
Do more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloudDo more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloudphilip_stoev
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase ReplicationHBaseCon
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerSpark Summit
 
MHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksMHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksColin Charles
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 

What's hot (20)

Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANUsing Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WAN
 
OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015
 
MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
 
Do more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloudDo more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloud
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
MHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksMHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirks
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 

Viewers also liked

บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-MagazineIMC Institute
 
Installation and setup hadoop published
Installation and setup hadoop publishedInstallation and setup hadoop published
Installation and setup hadoop publishedDipendra Kusi
 
קורס אנדרואיד
קורס אנדרואידקורס אנדרואיד
קורס אנדרואידNathan Krasney
 
Introduction to big data
Introduction to big data Introduction to big data
Introduction to big data Nathan Krasney
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2IMC Institute
 
Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"LinkedIn France Presse
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartIMC Institute
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 

Viewers also liked (17)

บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazine
 
Installation and setup hadoop published
Installation and setup hadoop publishedInstallation and setup hadoop published
Installation and setup hadoop published
 
Hdfs
HdfsHdfs
Hdfs
 
קורס אנדרואיד
קורס אנדרואידקורס אנדרואיד
קורס אנדרואיד
 
Introduction to big data
Introduction to big data Introduction to big data
Introduction to big data
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 

Similar to Setup Hadoop Cluster with Cloudera Manager

Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operationsMarc Cluet
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionCloudera, Inc.
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News AggregatorMário Almeida
 
Hadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfHadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfSpiritsoftsTraining
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Setting up a local WordPress Environment
Setting up a local WordPress EnvironmentSetting up a local WordPress Environment
Setting up a local WordPress EnvironmentChris La Nauze
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityDinesh Chitlangia
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsAhmed Mekawy
 
Modernize Your Drupal Development
Modernize Your Drupal DevelopmentModernize Your Drupal Development
Modernize Your Drupal DevelopmentChris Tankersley
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 

Similar to Setup Hadoop Cluster with Cloudera Manager (20)

Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
 
Hadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfHadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdf
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Setting up a local WordPress Environment
Setting up a local WordPress EnvironmentSetting up a local WordPress Environment
Setting up a local WordPress Environment
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
 
Hadoop Futures
Hadoop FuturesHadoop Futures
Hadoop Futures
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
 
Modernize Your Drupal Development
Modernize Your Drupal DevelopmentModernize Your Drupal Development
Modernize Your Drupal Development
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 

More from Co-graph Inc.

ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》Co-graph Inc.
 
HAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼンHAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼンCo-graph Inc.
 
[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門Co-graph Inc.
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法Co-graph Inc.
 

More from Co-graph Inc. (6)

ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》
 
HAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼンHAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼン
 
[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門
 
MongoDB + XSD/XML
MongoDB + XSD/XMLMongoDB + XSD/XML
MongoDB + XSD/XML
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法
 
Watch Your Log!
Watch Your Log!Watch Your Log!
Watch Your Log!
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Setup Hadoop Cluster with Cloudera Manager

  • 1. Hadoop Cluster Setup A Simple Way by Cloudera Manager Peng-Yi Lai Co-graph confidential
  • 2. Outline ▪ Cloudera Manager – Set Up Your Hadoop ▪ Flume – Data Collection Tool Co-graph confidential
  • 3. Before Starting ▪ Ask yourself what do you want! An expert to make Hadoop itself better Provide Service by Using Hadoop Co-graph confidential
  • 4. As a Hadoop Expert Better to know Hadoop as detail as possible Companies like Cloudera and MapR Co-graph confidential
  • 5. Other Usages on Hadoop 1. Learn how to use Hadoop to solve problems more effectively and efficiently 2. Find an easiest way to make sure your Hadoop can work properly Co-graph confidential
  • 6. Desired Skills ▪ Network knowledge is imperative ▪ Every node in a cluster communicates with each other through network ▪ Even with cloudera manager, you still need to handle it on your own ▪ Linux administration ▪ Everyone knows that!! Co-graph confidential
  • 7. Requirement for Cloudera Manager (1) ▪ Prepare Your Machines ▪ Supported OS version ▪ Only 64bit Linux-based ▪ Supported Browsers ▪ For admin console ▪ Supported Database ▪ If you need to use custom database other than embedded PostgreSQL database ▪ Supported JDK version ▪ Cloudera Manager would install it for you if there is no JDK installed ▪ Repositories ▪ All hosts must have to access standard packages repositories and Cloudera Hadoop repositories Co-graph confidential
  • 8. Requirement for Cloudera Manager (2) ▪ Networking and Security ▪ Properly configuring DNS or /etc/hosts ▪ Everyone should know who’s who ▪ Using root account ro password-less sudo permision ssh access to all cluster machines ▪ No blocking by iptables or firewalls ▪ 7180 port is used to access Cloudera Manager ▪ No blocking by Security-Enhanced Linux (SELinux) ▪ disabled ▪ There are more details on cloudera.com ▪ If there is a problem, don’t feel ashamed to google! Co-graph confidential
  • 9. Set Up a Hadoop Cluster ▪ After everything is done, install clouderamanager-installer.bin from the Cloudera Downlaods page ▪ Change the permission and install ▪ Login to admin console on http://<Server host>:7180 ▪ Follow the steps by Cloudera Manager ▪ Done! Co-graph confidential
  • 19. Services to Add Co-graph confidential
  • 21. Information of a Host Co-graph confidential
  • 22. More about Cloudera Manager ▪ Easy to upgrade your CHD version ▪ Easy to add/delete a host and a cluster ▪ Easy to configure High Availability (HA) ▪ Support Hadoop security by using Kerberos ▪ Support backup and disaster recovery Co-graph confidential
  • 23. For Developer ▪ Use Hue (another topic) Co-graph confidential
  • 25. Flume A Data Collection Tool Co-graph confidential
  • 26. Two Ways to Use Flume Independent of Hadoop cluster • Flume can totally run by itself • Configure flume.conf in /etc/flume-ng/conf On cluster of Hadoop Or a node managed by Cloudera Manager • Easy to keep the agent nodes under control • Start, Stop, Restart service on admin console • Configure flume on admin console • Convenient to check log file Co-graph confidential
  • 27. 3 Important Settings Source • Define what kind of events sent by external source to accept Channel • Define which way to keep the event until it’s consumed by a Flume sink Sink • Define which repository like HDFS or Flume agent to put/forward the event kept in Channel Co-graph confidential
  • 28. Type Example ▪ Source ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Avro Source Exec Source JMS Source NetCat Source Syslog TCP Source Syslog UDP Source HTTP Source Thrift Legacy Source …etc ▪ Channel ▪ Memory Channel ▪ JDBC Channel ▪ File Channel ▪ Pseudo Transaction Channel ▪ Custom Channel Co-graph confidential ▪ Sink ▪ HDFS Sink ▪ Logger Sink ▪ Avro Sink ▪ Thrift Sink ▪ IRC Sink ▪ File Roll Sink ▪ HBaseSink ▪ …etc