Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016


Published on

日本マイクロソフト: 藤田 稜氏、日立ソリューションズ:岩永 匡希氏、Cloudera: 川崎
Cloudera World Tokyo 2016 ( の講演内容です

Published in: Technology
  • Be the first to comment

Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016

  1. 1. Cloudera Microsoft 
 Hadoop Cloudera World Tokyo 2016 Microsoft Corporation Global Blackbelt Sales Japan OSS TSP Rio Fujita Cloudera, Inc. Senior Technical Manager Tatsuo Kawasaki Hitachi Solutions Lead Engineer Masaki Iwanaga
  2. 2. AGENDA • Self Introduction • Stories • Solution 2
  3. 3. OSS ON CLOUD 3
  4. 4. Stories
  5. 5. Hadoop • • ... • ... • = Hadoop • 
  6. 6. 6
  7. 7. Hadoop 10 10 “Hadoop” (HDFS) (MapReduce) 7
  8. 8. “Hadoop” Core Hadoop 
 8 Hadoop 10
  9. 9. Azure • Windows Azure • PaaS • Microsoft Azure • IaaS • Linux / FreeBSD 9
  10. 10. 10 Platform Services Infrastructure Services Web Apps Mobile Apps API Management API Apps Logic Apps Notification Hubs Content Delivery Network (CDN) Media Services BizTalk Services Hybrid Connections Service Bus Storage Queues Hybrid Operations Backup StorSimple Azure Site Recovery Import/Export SQL Database DocumentDB Redis Cache Azure Search Storage Tables Data Warehouse Azure AD Health Monitoring AD Privileged Identity Management Operational Analytics Cloud Services Batch RemoteApp Service Fabric Visual Studio App Insights Azure SDK VS Online Domain Services HDInsight Machine Learning Stream Analytics Data Factory Event Hubs Mobile Engagement Data Lake IoT Hub Data Catalog Security & Management Azure Active Directory Multi-Factor Authentication Automation Portal Key Vault Store/ Marketplace VM Image Gallery & VM Depot Azure AD B2C Scheduler
  11. 11. Infrastructure +Hundreds of community supported images on 
 Databases SQL App Clients Management Applications Web App Gallery Dozens of .NET & PHP 1/3 : Linux 11
  12. 12. Hadoop • 
 • / 
 → Hadoop 12
  13. 13. 13
  14. 14. 14
  15. 15. Azure • 15
  16. 16. Generally Available Coming Soon 28 + 6, 2 16
  17. 17. Japan
 West Japan
 East 6 duplicates 17
  18. 18. Datacenter Internet Exchange Terrestrial Network Subsea Network Edge Node CDN Locations 56Earth laps 18
  19. 19. Compliant Source : 19
  20. 20. Hadoop • 20
  21. 21. Hadoop Finance Government Telecom Manufacturing Energy Healthcare - RFID Web CRM / / ERP IT 21
  22. 22. DATA-DRIVEN PRODUCTS • Azure Cloudera Hadoop 22
  24. 24. Azure • • • • 24
  25. 25. 9s / 45s (1GB) 18,550km 25
  26. 26. Source : 26
  27. 27. ExpressRoute Microsoft Edge Customer’s 
 network ExpressRoute Partner
 Edge Traffic to public IP addresses in Azure Traffic to Virtual Networks Traffic to Office 365 Services 27
  28. 28. Hadoop • Hadoop Hadoop 28
  29. 29. 29 2006 2008 2009 2010 2011 2012 2013 Core Hadoop (HDFS, MapReduce) HBase ZooKeeper Solr Pig Core Hadoop Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop 2007 Solr Pig Core Hadoop Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop 2014 2015 Kudu RecordService Ibis Falcon Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop
  30. 30. SQL 30
  31. 31. Azure • … 31
  32. 32. Microsoft Power BI SORACOM Red Hat JBoss BRMS Beam SORACOM Air SORACOM Beam HDInsight SQL Database Storage Machine Learning Stream Analytics Event Hubs Microsoft Azure IoT Microsoft Azure IoT RED HAT ENTERPRISE LINUX App JBoss BRMS/CEP JBoss EAP Java VM DataBase Hadoop SORACOM Red Hat Cloudera 32
  33. 33. NoSQL MongoDB • NoSQL • Hadoop • NoSQL RDBMS • Hadoop NoSQL HBase Kudu 33
  34. 34. 34 with Parquet Analytics(FastScans) Online (Fast Random Access)Slow Fast SlowFast
  35. 35. Hadoop SQL • BI Impara + Pentaho, Tableau, PowerBI • SQL Hive on Spark, Spark SQL • Java, Python, Scala…. 35
  36. 36. SQL or Impala SQL 36 BI 
 SQL /
  37. 37. Azure GUI • JSON Powershell • 37
  38. 38. Azure Quickstart Templates 38
  39. 39. Apache Spark Hadoop • Apache Spark • Hadoop MapReduce 
 • Hadoop 39
  40. 40. 40
  41. 41. Spark Hadoop Spark Impala Search MR Others YARN HDFS, HBase, Kudu Spark
 Streaming MLlib /
 Spark ML SparkSQL
 DataFrame 41
  42. 42. Azure Stack • Azure 
 Azure • Azure 
 Azure 42
  43. 43. 43
  44. 44. Hadoop • 
 • • • 44
  45. 45. 

  46. 46. 46 ! Azure G A
  47. 47. Azure • Template • ”HPC Linux Workload” 47
  48. 48. HPC Pack cluster for Linux workloads 48
  49. 49. Cloudera Hadoop • 49
  50. 50. Fast - How can you make real-time analytics a reality? How can you unblock ad hoc data access? How can you optimize the right workloads for Hadoop? 50
  51. 51. Easy - How many people know how to manage your system? How fast can you troubleshoot and fix issues? How do you scale? 51
  52. 52. Secure - How can you protect everything? How can you control access for the entire platform? How can you know what happened and is happening? 52
  53. 53. Hadoop Cloudera Enterprise 53
  54. 54. Hadoop 
  55. 55. Hadoop 55 MapReduce HDFS /
  56. 56. Hadoop 56
  57. 57. Hadoop 57
  58. 58.   recommendation accounts for all hardware components such as CPU, memory, and disk options, including ephemeral  SSDs.   The following table describes workload categories and services typically combined for workload types.  Table 1: Workload categories.  Workload Type  Typical Services  Comments  Low  ● MRv2 (YARN)  ● Hive  ● Pig  ● Crunch  Suitable for workloads that are predominantly batch  oriented and involve MapReduce.   Medium  ● HBase  ● Solr  ● Spark  Suitable for higher resource­consuming services and  production workloads but limited to one of these  running at any time.  High / Full EDH  ● Impala  ● All CDH services  Full­scale production workloads with multiple services  running in parallel on a multi­tenant cluster.  Table 2 identifies service roles for the different node types.  Table 2: Service Roles and Node Types    Master Node 1  Master Node 2  Master Node 3  Worker Nodes  ZooKeeper  ZooKeeper  ZooKeeper  ZooKeeper    HDFS  NN, QJN  NN, QJN  QJN  DataNode  YARN  RM  RM  History Server  NodeManager  Hive      MetaStore, WebHCat,  HiveServer2    Management  (misc)  Cloudera Agent  Cloudera Agent  Cloudera Agent,  Oozie, Management  Services, Cloudera  Manager (CM will be  on dedicated node for  Director  Deployments)  Cloudera Agent  Hue      Hue Server    Cloudera Enterprise Reference Architecture for Azure Deployments​  |9    Azure Hadoop CPU 
 • Azure 
 58   must use the ​Director Command Line Interface​ and configuration files based on the published examples. The log and  data locations must not change, except as needed to reflect the number of data drives.   Azure Marketplace  The ​Cloudera Enterprise Data Hub​ offering in the Azure Marketplace allows you to quickly deploy a properly  configured cluster in Azure.The automation logic assigns the correct number of master and worker nodes. The same  configuration can also be created using the high availability (HA) option from the Azure portal. This offering provides  simple, one­click provisioning of a cluster for proof­of­concept, prototype, development, and production  environments. The provisioned cluster runs on the latest distribution of CDH with services such as HDFS, YARN,  Impala, Oozie, Hive, Hue, and ZooKeeper with Cloudera Manager.   Deployment Scripts  You can use the ​Cloudera on CentOS​ ​Azure Quickstart Template​ as a starting point for a simple or customized  deployment to Azure.  The scripts demonstrate proper setup of Nodes, Role Assignment, and Configuration.  Figure 1 shows a deployment scenario using Cloudera Director, Azure Marketplace offer or Deployment Scripts to  launch a Cloudera cluster.  Figure 1: Deployment Scenario    Edge Security  The accessibility of the Cloudera Enterprise cluster is defined by the Azure Network Security Group (NSG)  configurations assigned to VMs and/or Virtual Network subnets and depends on security requirements and workload.  Typically, edge nodes (also known as client nodes or gateway nodes) have direct access to the cluster’s internal  network. Through client applications on these edge nodes, users interact with the cluster and its data. These edge  nodes can run web applications for real­time serving workloads, BI tools, or the Hadoop command­line client used to  interact with HDFS.  Azure Resource Quotas  The default quota for cores within an Azure subscription is 20 cores per region; for more than 20 cores, customers will  need engage Microsoft Support within the Azure portal to request an increased quota. The available quota for the  region you are deploying to has to be greater than the number of cores used by all VMs you are launching.  Workloads & Roles  In this reference architecture, we consider different kinds of workloads that are run on top of an Enterprise Data Hub.  The initial requirements focus on instance types that are suitable for a diverse set of workloads. As service offerings  change, these requirements may change to specify instance types that are unique to specific workloads. The  Cloudera Enterprise Reference Architecture for Azure Deployments​  |8        Cloudera Enterprise  Reference Architecture  for Azure Deployments   
  59. 59. Cloudera 
 Hadoop Azure • 59
  60. 60. 60
  61. 61. Azure Hadoop 
  62. 62. Solution
  63. 63. Case Studies : UNCOVER TRUTH 63
  64. 64. © Hitachi Solutions, Ltd. 2016. All rights reserved.
  65. 65. © Hitachi Solutions, Ltd. 2016. All rights reserved. 5
  66. 66. © Hitachi Solutions, Ltd. 2016. All rights reserved. 6 ü  ü 
  67. 67. © Hitachi Solutions, Ltd. 2016. All rights reserved. END