• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
 

Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013

on

  • 1,304 views

 

Statistics

Views

Total Views
1,304
Views on SlideShare
1,288
Embed Views
16

Actions

Likes
2
Downloads
28
Comments
0

1 Embed 16

https://twitter.com 16

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Dependency validation is validation to make sure the cluster can run once it is deployed.In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file
  • Dependency validation is validation to make sure the cluster can run once it is deployed.In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file.Note that SQL Authentication is shown in the sqlConnectionString. In production environment, Integrated Authentication is/should be used.
  • Dependency validation is validation to make sure the cluster can run once it is deployed.Examples include…Is there Package Definition that matches the package specified in the Host-Component-Mapping?Are host groups consistent across Host-Component-Mapping and Host Manifest files?If Hive is selected to install, are its dependencies selected and available?In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file
  • Deployment DB is populated with ordered steps for installing Hadoop (and other packages). For example…Install HDFS service before MapReduceInstall NameNode component before DataNode component
  • Deployment Agents stores states of steps for re-trys upon failures.E.g. if namenode install fails, it will retryIf namenode install fails, datanode will not proceed.Once issue is resolved, deployment agent will pick from last successful step
  • Deployment Service is transparent to users.Deployment Service is a Cloud Service running in Windows Azure.Currently, the manifest files are mostly static. The HostManifest file isn’t used at all. VM information is handled by Azure Fabric.We have flexibility going forward to incorporate user input (e.g. configuration overrides).Manifest files are stored in user storage account.HDP and other packages are in HDInsight blob storage account.
  • Web/Worker Roles are logical host groups in Windows Azure (the types of VMs)VM sizes are fixed (for now).
  • Deployment Agent is the same code that is used in System Center scenario. Logic is forked based on environment.

Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013 Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013 Presentation Transcript

  • How Ambari manifest files are used by System Center and Windows Azure Brian Swan Program Manager, HDInsight Team Microsoft
  • A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resources, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
  • A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
  • A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
  • A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
  • A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
  • Deployment using System Center Note: The tools described here for deploying Hadoop clusters using System Center are prototype tools used internally at Microsoft. The intent here is to demonstrate one consumer of cluster manifest files.
  • System Center – Prerequisites Deployment DB System Center Virtual Machine Manager (VMM) HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • System Center 2013 • VM running Virtual Machine Manager (VMM) with… • Hadoop Service Template • Windows Server VHD • HDInsight Deployment Tool • Deployment Database (SQL Server)
  • Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. Manifest Files
  • Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. • Update the Deployment Tool configuration file.
  • Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. • Update HDInsightDeployment.exe.config. • Start deployment with HDInsightDeployment.exe. • Deployment tool reads and validates manifest files. • Schema validation. • Dependency validation.
  • Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. • Update HDInsightDeployment.exe.config. • Start deployment with HDInsightDeployment.exe. • Deployment tool reads and validates manifest files. • Schema validation. • Dependency validation. • Deployment DB is populated with steps for creating system resources on hosts (e.g. Users/Groups/Firewall Rules/etc.) • Deployment DB is populated with ordered steps for installing Hadoop (and other packages).
  • Phase 2: Download Packages Deployment DB System Center VMM HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • Deployment tool downloads/copies packages to VMM based on information in PackageDefinition.json.
  • VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file.
  • VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file. VM1 VM2 VM3 VM4 MASTER_HOSTS SLAVE_HOSTS
  • VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file. • Hadoop Service Template (a VMM template) specifies which system components to install (e.g. Deployment Agent) • Starts Deployment Agent VM1 VM2 VM3 VM4
  • VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file. • Template specifies which system components to install (e.g. Deployment Agent) • Starts Deployment Agent VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent
  • VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest file. • Template specifies which system components to install (e.g. Deployment Agent) • Starts Deployment Agent • Deployment Agents pull packages from SCVMM VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent
  • Phase 4: Create System Resources, Install Packages Deployment DB System Center VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent • Deployment Agents create system resources (Users/Groups/Firewall Rules/etc.) from steps in Deployment DB hdfs_user hadoop_admin mapred_user hadoop_admin hdfs_user mapred_user hdfs_user mapred_user
  • Phase 4: Create System Resources, Install Packages Deployment DB System Center VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent • Deployment Agents create system resources (Users/Groups/Firewall Rules/etc.) from steps in Deployment DB • Deployment Agents work through steps for installing Hadoop (and other packages) • Packages contain scripts that will be invoked for installing custom components (e.g. Java, Python, etc.) HDFS NameNode MapReduce JobTracker HDFS, MapReduce DataNode, TaskTracker HDFS, MapReduce DataNode, TaskTracker
  • Phase 4: Create System Resources, Install Packages Deployment DB System Center VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent • Deployment Agents create system resources (Users/Groups/Firewall Rules/etc.) from steps in Deployment DB • Deployment Agents work through steps for installing Hadoop (and other packages) • Packages contain scripts that will be invoked for installing custom components (e.g. Java, Python, etc.) • Deployment Agents stores states of steps for re-trys upon failures.
  • Deployment in Windows Azure
  • WA Blob Storage Phase 1: Submit request, generate manifest files Windows Azure Deployment Service • Cluster creation request submitted via Windows Azure Portal. • Deployment Service generates and validates manifest files. • DA stores manifest files in Blob Storage. • (Hadoop package files are already in Blob Storage.)
  • Windows Azure Fabric WA Blob Storage Phase 2: Generate/submit deployment files Windows Azure Deployment Service • Deployment Service generates Cloud Service deployment files. • .cspkg: contains Deployment Agent • .cscfg: contains instance counts for VMs and location of generated manifest files. • Cloud Service deployment files are submitted to Windows Azure Fabric. .cspkg .cscfg
  • WA Blob Storage Phase 3: Provision VMs, Deployment Agent Windows Azure Deployment Service • Windows Azure Fabric provisions VMs and deploys Deployment Agent on VMs Windows Azure Fabric
  • WA Blob Storage Phase 3: Provision VMs, Deployment Agent Windows Azure • Windows Azure Fabric provisions VMs and deploys Deployment Agent on VMsWindows Azure Fabric VM1 VM2 VM3 VM4 WEB_ROLES WORKER_ROLES Deployment Agent Deployment Agent Deployment Agent Deployment Agent
  • VM1 WA Blob Storage Phase 4: Get manifest files, install components Windows Azure • Deployment Agent determines environment and VM type. • Deployment Agent gets manifest files based on location in .cscfg file. Windows Azure Fabric VM2 VM3 VM4 Deployment Agent Deployment Agent Deployment Agent Deployment Agent WEB_ROLES WORKER_ROLES
  • VM1 WA Blob Storage Phase 4: Get manifest files, install components Windows Azure • Deployment Agent generates in-memory list of activities for installing components. • Deployment Agent retrieves packages (based on repo location in PackageDefinition file). Windows Azure Fabric VM2 VM3 VM4 Deployment Agent Deployment Agent Deployment Agent Deployment Agent • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ----------
  • VM1 WA Blob Storage Phase 4: Get manifest files, install components Windows Azure • Deployment Agent installs components.Windows Azure Fabric VM2 VM3 VM4 Deployment Agent Deployment Agent Deployment Agent Deployment Agent NameNode JobTracker DataNode, TaskTracker DataNode, TaskTracker • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ----------