• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop in three use cases
 

Hadoop in three use cases

on

  • 263 views

 

Statistics

Views

Total Views
263
Views on SlideShare
263
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop in three use cases Hadoop in three use cases Presentation Transcript

    • 2 December 2011Hadoop in Three Use CasesJoey Echeverria | Solutions Architectjoey@cloudera.com | @fwiffo
    • About Joey • Solutions Architect • 6 months • 3+ years • Local2 ©2011 Cloudera, Inc. All Rights Reserved.
    • Cloudera’s Distribution including Apache Hadoop File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Workflow Scheduling Metadata APACHE OOZIE* APACHE OOZIE* APACHE HIVE Languages / Compilers APACHE PIG, APACHE HIVE Fast Read/Write Data Integration Access APACHE FLUME*, APACHE SQOOP* APACHE HBASE Coordination APACHE ZOOKEEPER *currently under incubation in the Apache Software Foundation3 Copyright 2011 Cloudera Inc. All rights reserved
    • Extract, Transform, and Load4 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL before Hadoop Difficult to maintain, not scalable Relational Databases Logs Custom ETL Enterprise Data Scripts Warehouse Files5 ©2011 Cloudera, Inc. All Rights Reserved.
    • ETL before Hadoop May be scalable, expensive Relational Databases Logs Enterprise Data Warehouse SQL: raw table → warehouse tables Files6 ©2011 Cloudera, Inc. All Rights Reserved.
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Logs Enterprise Data Warehouse Files7 ©2011 Cloudera, Inc. All Rights Reserved.
    • Steps 1. In 2. Process 3. Out8 Copyright 2011 Cloudera Inc. All rights reserved
    • Flume9 Copyright 2011 Cloudera Inc. All rights reserved
    • Flume10 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Enterprise Data Logs Flume Warehouse Files11 ©2011 Cloudera, Inc. All Rights Reserved.
    • HDFS12 Copyright 2011 Cloudera Inc. All rights reserved
    • HDFS 02, 06, 10 NameNodeopen(“file.txt”) DataNode DataNode DataNode 01 05 09 DataNode DataNode DataNode 02 06 10 Client data data DataNode DataNode DataNode data 03 07 11 DataNode DataNode DataNode 04 08 12 13 Copyright 2011 Cloudera Inc. All rights reserved
    • HDFS • Distributed • Replication • Bulk I/O • Fault tolerant • Scalable • Append only • Not POSIX14 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Enterprise Data Logs Flume HDFS Warehouse Files15 ©2011 Cloudera, Inc. All Rights Reserved.
    • FUSE-DFS16 Copyright 2011 Cloudera Inc. All rights reserved
    • FUSE-DFS • FUSE – User space – File systems • FUSE-DFS – /hdfs – Mostly transparent17 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Enterprise Data Logs Flume HDFS Warehouse FUSE-DFS Files18 ©2011 Cloudera, Inc. All Rights Reserved.
    • Sqoop19 Copyright 2011 Cloudera Inc. All rights reserved
    • Sqoop • SQL to Hadoop • Parallel import • File formats20 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Sqoop Enterprise Data Logs Flume HDFS Warehouse FUSE-DFS Files21 ©2011 Cloudera, Inc. All Rights Reserved.
    • Pig22 Copyright 2011 Cloudera Inc. All rights reserved
    • Pig • Scripting language • Generates MapReduce jobs • Perl for Hadoop • Great for ETLA = LOAD data USING PigStorage() AS (f1:int, f2:int, f3:int);B = GROUP A BY f1;C = FOREACH B GENERATE COUNT ($0);DUMP C; 23 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Pig Sqoop Enterprise Data Logs Flume HDFS Warehouse FUSE-DFS Files24 ©2011 Cloudera, Inc. All Rights Reserved.
    • Sqoop with connectors25 Copyright 2011 Cloudera Inc. All rights reserved
    • Sqoop with connectors • MySQL* • PostgreSQL* • Teradata* • Netezza* • Oracle* • Couchbase* • Microsoft SQL Server • VoltDB *Cloudera certified connector26 Copyright 2011 Cloudera Inc. All rights reserved
    • ETL with Hadoop Managed, flexible, scalable Relational Databases Pig Sqoop Enterprise Data Logs Flume HDFS Warehouse FUSE-DFS Sqoop Files27 ©2011 Cloudera, Inc. All Rights Reserved.
    • Recommendations28 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Application Logs29 ©2011 Cloudera, Inc. All Rights Reserved.
    • Flume30 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Application Logs Flume31 ©2011 Cloudera, Inc. All Rights Reserved.
    • HDFS32 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Application Logs Flume HDFS33 ©2011 Cloudera, Inc. All Rights Reserved.
    • Sqoop34 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Sqoop Application Logs Flume HDFS35 ©2011 Cloudera, Inc. All Rights Reserved.
    • Pig36 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Sqoop Application Logs Flume HDFS Pig37 ©2011 Cloudera, Inc. All Rights Reserved.
    • Mahout38 Copyright 2011 Cloudera Inc. All rights reserved
    • Mahout • Scalable machine learning algorithms – Collaborative Filtering – User and Item based recommenders – K-Means, Fuzzy K-Means clustering – Mean Shift clustering – Singular value decomposition – Complementary Naive Bayes classifier …39 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Sqoop Application Logs Flume HDFS Pig Mahout40 ©2011 Cloudera, Inc. All Rights Reserved.
    • MapReduce41 Copyright 2011 Cloudera Inc. All rights reserved
    • MapReduce map shuffle reduce :1 toOne() :1 :1 :[1,1,1,1] count() :4 :[1,1] :2 :1 toOne() :1 :1 :[1,1] count() :2 :1 :[1] :1 toOne() :1 :142 Copyright 2011 Cloudera Inc. All rights reserved
    • MapReduce • Distributed • Code to data • Reliable • Scalable43 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Sqoop Application Logs Flume HDFS Pig Mahout MapReduce Pig44 ©2011 Cloudera, Inc. All Rights Reserved.
    • Oozie45 Copyright 2011 Cloudera Inc. All rights reserved
    • Oozie • Workflows • Coordinator – Triggers46 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Sqoop Application Logs Flume HDFS Oozie Pig Mahout MapReduce Pig47 ©2011 Cloudera, Inc. All Rights Reserved.
    • HBase48 Copyright 2011 Cloudera Inc. All rights reserved
    • HBase • Key/value store • Data stored in HDFS • Access model is get/put/del – Plus range scans and versions • Random reads and writes for Hadoop49 Copyright 2011 Cloudera Inc. All rights reserved
    • Recommendations with Hadoop CUSTOMERS Relational Databases Web Sqoop Application Logs Flume HDFS HBase Oozie Pig Mahout MapReduce Pig50 ©2011 Cloudera, Inc. All Rights Reserved.
    • Business Intelligence51 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases BI / Analytics Logs52 ©2011 Cloudera, Inc. All Rights Reserved.
    • Flume53 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases BI / Analytics Logs Flume54 ©2011 Cloudera, Inc. All Rights Reserved.
    • HDFS55 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases BI / Analytics Logs Flume HDFS56 ©2011 Cloudera, Inc. All Rights Reserved.
    • Sqoop57 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases Sqoop BI / Analytics Logs Flume HDFS58 ©2011 Cloudera, Inc. All Rights Reserved.
    • Hive59 Copyright 2011 Cloudera Inc. All rights reserved
    • Hive • Data warehouse • Ad-hoc queries – Not real-time (minutes) • SQL • Tables • Joins60 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases Sqoop BI / Analytics Logs Flume HDFS Hive61 ©2011 Cloudera, Inc. All Rights Reserved.
    • MapReduce62 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases Sqoop BI / Analytics Logs Flume HDFS Hive MapReduce63 ©2011 Cloudera, Inc. All Rights Reserved.
    • Oozie64 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases Sqoop BI / Analytics Logs Flume HDFS Oozie Hive MapReduce65 ©2011 Cloudera, Inc. All Rights Reserved.
    • HBase66 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases Sqoop BI / Analytics Logs Flume HDFS HBase Oozie Hive MapReduce67 ©2011 Cloudera, Inc. All Rights Reserved.
    • Hive68 Copyright 2011 Cloudera Inc. All rights reserved
    • Hive for Business Intelligence • JDBC – JasperReports* – Pentaho* • ODBC – MicroStrategy*^ * Vender certified connector ^ Cloudera certified connector69 Copyright 2011 Cloudera Inc. All rights reserved
    • Business Intelligence with Hadoop ANALYSTS Relational Databases Sqoop BI / Analytics Logs Flume HDFS Hive HBase Oozie Hive MapReduce70 ©2011 Cloudera, Inc. All Rights Reserved.
    • CDH File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Workflow Scheduling Metadata APACHE OOZIE* APACHE OOZIE* APACHE HIVE Languages / Compilers APACHE PIG, APACHE HIVE Fast Read/Write Data Integration Access APACHE FLUME*, APACHE APACHE HBASE SQOOP* Coordination APACHE ZOOKEEPER *currently under incubation in the Apache Software Foundation71 Copyright 2011 Cloudera Inc. All rights reserved
    • What’s next? • Cloudera Training Videos • CDH Virtual Machines • Hadoop: The Definitive Guide, 2nd Edition • Cloudera University – Developer Training in Columbia, MD • Dec 13-16, Feb 13-16 – Administrator Training in Herndon, VA • Jan 4-6 – Private Training72 Copyright 2011 Cloudera Inc. All rights reserved
    • We’re Hiring! • http://www.cloudera.com/company/careers/ • Customer Operations – Customer Operations Engineer – Customer Operations Tools Developer • Customer Solutions – Solutions Architect • Engineering – Senior Data Integration Developer – Senior Distributed Systems Engineer – Senior UI Engineer – Software Quality Engineer – Technical Writer • IT/Operations – Systems Administrator73 Copyright 2011 Cloudera Inc. All rights reserved
    • 74