HDP-1 introduction for HUG France

HDP-1
Steve Loughran– Hortonworks
stevel at hortonworks.com
@steveloughran

Paris, June 2012

© Hortonworks Inc. 2012

Hortonworks Data Platform

Develop Interact

Non-Relational Database
Scripting Query

Talend Open Studio for Big Data, Sqoop)
(Pig) (Hive)

Data Extraction & Loading
Workflow & Scheduling
Management & Monitoring

(HCatalog APIs, WebHDFS,
(HBase)

Metadata Management
(Ambari, Zookeeper)

(HCatalog)
(Oozie)

Operate Distributed Processing Integrate
(MapReduce)

Distributed Storage
(HDFS)

Hortonworks Data Platform
Page 2

Hortonworks Data Platform (HDP)
Fully Integrated, Extensively Tested, Enterprise Supported

Challenge:
• Integrate, manage, and support
changes across a wide range of open
source projects that power the Hadoop
platform; each with their own release
schedules, versions, & dependencies.

• Time-intensive, Complex, Expensive

Solution: Hortonworks Data Platform
• Integrated certified platform distributions

• Extensive Q/A process: many apps
across small, medium, & large clusters

• Industry-leading Support with clear
Hadoop Pig HCatalog Hive Ambari Zookeeper
service levels for updates and patches
Core

= New Version
Page 3

HDP 1.0 Components
Component Version
Apache Hadoop (HDFS & MapReduce) 1.0.3+

Apache HCatalog 0.4.0+

Apache Pig 0.9.2

Apache Hive 0.9.0+

Apache HBase 0.92.1+

Talend Open Studio for Big Data 5.1.0

Apache Sqoop 1.4.1+

Apache Oozie 3.1.3+

Apache Zookeeper 3.3.4

0.1
Apache Ambari
(Technology Preview)

Page 4

Management & Monitoring: Ambari
• 100% Open Source

• Wizard-based install, provisioning & configuration
management

• Monitoring and alerting dashboards

• Goals: ease of installation, scale to large clusters,
effective monitoring of all services

Page 5

Cluster Provisioning through Web UI

Download and try from http://hortonworks.com

Page 6

Monitoring and alerting dashboards

Page 7

Installation and Provisioning
HMC Installer -GUI, puppet-driven
– Installs Java and up;
– Configures entire cluster
– Sets up HMC for cluster monitoring
– Web UI + text files listing nodes
gsInstall
– Command line installer -file driven
RPM/YUM for custom installation processes
– Configuration left as an exercise
– Use if you have other cluster management tooling

Qualified at scale on RHEL5.8 & Java 6u26

Page 8

Enterprise Data Integration -> Talend
• Talend Open Studio for Big Data
– Feature-rich Job Designer
– Rich palette of pre-built templates
– Supports HDFS, Pig, Hive, HBase, HCatalog
– Apache-licensed, bundled with HDP

• Key benefits
– Graphical development
– Robust and scalable execution
– Broadest connectivity to support
all systems:
450+ components
– Real-time debugging

Page 9

Metadata Management -> HCatalog
• Simplifies data sharing between Hadoop and other data systems
– Enables Hadoop data to be described in a schema & accessed as tables
• Provides consistent data access for MapReduce, Hive and Pig
– Minimizes hard coding of data structure, storage format, and location
• Manages metadata for table storage
– Based on Hive’s metadata server
– Uses Hive language for metadata manipulation operations
• Tables may be stored in RCFile, Text files, or SequenceFiles

Page 10

RESTful API Front-door for Hadoop
• Opens the door to languages other than Java
• Thin clients via web services vs. fat-clients in gateway
• Insulation from interface changes release to release

HCatalog web interfaces
FS
HD
eb
W
MapReduce Pig Hive

HCatalog

External
HDFS HBase
Store

Page 11

WebHDFS: HDFS over HTTP
~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?
op=open

GATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54,
GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31,
GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11,
GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33,
...

Potential Uses:
Out of cluster access to HDFS
Cross-cluster, cross version HDFS access
Native filesystem clients

dfs.webhdfs.enabled=true
Page 12

The Web HDFS & service APIs
isolate Hadoop internals from
stable public interfaces

Long-haul, cross-language, stable, secure

Page 13

My project: HA on vSphere

Page 14

Release Schedule
HDP 1.x : quarterly releases
– Large-scale QA process
– Validate performance as well as functionality

Technology Preview Program
– Early access; help w/ testing
– Access to new features such as
– HA
– Windows Integration

Predictable timetable of stable releases
Page 15

Ready and free to use today:

http://hortonworks.com/download/

Page 16

Thank You!
Des questions?

Page 17

HDP-1 introduction for HUG France

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to HDP-1 introduction for HUG France

Similar to HDP-1 introduction for HUG France (20)

More from Steve Loughran

More from Steve Loughran (20)

Recently uploaded

Recently uploaded (20)

HDP-1 introduction for HUG France

Editor's Notes