Your SlideShare is downloading. ×
0
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013

1,109

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,109
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
32
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Dependency validation is validation to make sure the cluster can run once it is deployed.In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file
  • Dependency validation is validation to make sure the cluster can run once it is deployed.In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file.Note that SQL Authentication is shown in the sqlConnectionString. In production environment, Integrated Authentication is/should be used.
  • Dependency validation is validation to make sure the cluster can run once it is deployed.Examples include…Is there Package Definition that matches the package specified in the Host-Component-Mapping?Are host groups consistent across Host-Component-Mapping and Host Manifest files?If Hive is selected to install, are its dependencies selected and available?In azure, Deployment DB is replaced by in-memory storage of info.In Azure and VMM, hostmanifest only specifies the number of instances in each logical host group. The host groups are defined in the template (in VMM), or by Azure.PackageDefinition: specifes settings for components selected in Host-Component-Mapping file
  • Deployment DB is populated with ordered steps for installing Hadoop (and other packages). For example…Install HDFS service before MapReduceInstall NameNode component before DataNode component
  • Deployment Agents stores states of steps for re-trys upon failures.E.g. if namenode install fails, it will retryIf namenode install fails, datanode will not proceed.Once issue is resolved, deployment agent will pick from last successful step
  • Deployment Service is transparent to users.Deployment Service is a Cloud Service running in Windows Azure.Currently, the manifest files are mostly static. The HostManifest file isn’t used at all. VM information is handled by Azure Fabric.We have flexibility going forward to incorporate user input (e.g. configuration overrides).Manifest files are stored in user storage account.HDP and other packages are in HDInsight blob storage account.
  • Web/Worker Roles are logical host groups in Windows Azure (the types of VMs)VM sizes are fixed (for now).
  • Deployment Agent is the same code that is used in System Center scenario. Logic is forked based on environment.
  • Transcript

    • 1. How Ambari manifest files are used by System Center and Windows Azure Brian Swan Program Manager, HDInsight Team Microsoft
    • 2. A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resources, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
    • 3. A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
    • 4. A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
    • 5. A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
    • 6. A representation of a software packages to be installed on a cluster (typically Hadoop, but also any custom packages, such as Java or Python). This representation captures all the invariants such as services, components, properties associated with a specific package. Authored by package distributor. A mapping between a package component and one or more logical host groups defined in the host manifest. Authored by Hadoop Admin. Contains a list of logical host definitions, system-level resourced, and (optionally) the actual hosts that fall into the host def categories. When actual hosts are not described, references that are realized by on-demand services (such as a cloud provider) are included. A logical group may contain one or more hosts. Authored by System Admin. Captures the specific configuration for a deployment at the cluster level, as well as overrides at the service and component levels. Authored by Hadoop Admin. HostComponentMapping.json Manifest Files - Overview HostManifest.json PackageDefinition.json PackageConfiguration.json
    • 7. Deployment using System Center Note: The tools described here for deploying Hadoop clusters using System Center are prototype tools used internally at Microsoft. The intent here is to demonstrate one consumer of cluster manifest files.
    • 8. System Center – Prerequisites Deployment DB System Center Virtual Machine Manager (VMM) HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • System Center 2013 • VM running Virtual Machine Manager (VMM) with… • Hadoop Service Template • Windows Server VHD • HDInsight Deployment Tool • Deployment Database (SQL Server)
    • 9. Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. Manifest Files
    • 10. Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. • Update the Deployment Tool configuration file.
    • 11. Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. • Update HDInsightDeployment.exe.config. • Start deployment with HDInsightDeployment.exe. • Deployment tool reads and validates manifest files. • Schema validation. • Dependency validation.
    • 12. Phase 1: Parse, Validate, Populate DB Deployment DB System Center VMM HadoopServiceTemplate.xml >HDInsightDeployment.exe • Copy manifest files to Deployment Tool directory. • Update HDInsightDeployment.exe.config. • Start deployment with HDInsightDeployment.exe. • Deployment tool reads and validates manifest files. • Schema validation. • Dependency validation. • Deployment DB is populated with steps for creating system resources on hosts (e.g. Users/Groups/Firewall Rules/etc.) • Deployment DB is populated with ordered steps for installing Hadoop (and other packages).
    • 13. Phase 2: Download Packages Deployment DB System Center VMM HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • Deployment tool downloads/copies packages to VMM based on information in PackageDefinition.json.
    • 14. VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file.
    • 15. VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file. VM1 VM2 VM3 VM4 MASTER_HOSTS SLAVE_HOSTS
    • 16. VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file. • Hadoop Service Template (a VMM template) specifies which system components to install (e.g. Deployment Agent) • Starts Deployment Agent VM1 VM2 VM3 VM4
    • 17. VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest.json file. • Template specifies which system components to install (e.g. Deployment Agent) • Starts Deployment Agent VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent
    • 18. VMM Phase 3: Provision VMs, Install Packages Deployment DB System Center HadoopServiceTemplate.xml Win.vhd >HDInsightDeployment.exe • VMM does VM provisioning based on HostManifest file. • Template specifies which system components to install (e.g. Deployment Agent) • Starts Deployment Agent • Deployment Agents pull packages from SCVMM VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent
    • 19. Phase 4: Create System Resources, Install Packages Deployment DB System Center VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent • Deployment Agents create system resources (Users/Groups/Firewall Rules/etc.) from steps in Deployment DB hdfs_user hadoop_admin mapred_user hadoop_admin hdfs_user mapred_user hdfs_user mapred_user
    • 20. Phase 4: Create System Resources, Install Packages Deployment DB System Center VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent • Deployment Agents create system resources (Users/Groups/Firewall Rules/etc.) from steps in Deployment DB • Deployment Agents work through steps for installing Hadoop (and other packages) • Packages contain scripts that will be invoked for installing custom components (e.g. Java, Python, etc.) HDFS NameNode MapReduce JobTracker HDFS, MapReduce DataNode, TaskTracker HDFS, MapReduce DataNode, TaskTracker
    • 21. Phase 4: Create System Resources, Install Packages Deployment DB System Center VM1 Deployment Agent VM2 Deployment Agent VM3 Deployment Agent VM4 Deployment Agent • Deployment Agents create system resources (Users/Groups/Firewall Rules/etc.) from steps in Deployment DB • Deployment Agents work through steps for installing Hadoop (and other packages) • Packages contain scripts that will be invoked for installing custom components (e.g. Java, Python, etc.) • Deployment Agents stores states of steps for re-trys upon failures.
    • 22. Deployment in Windows Azure
    • 23. WA Blob Storage Phase 1: Submit request, generate manifest files Windows Azure Deployment Service • Cluster creation request submitted via Windows Azure Portal. • Deployment Service generates and validates manifest files. • DA stores manifest files in Blob Storage. • (Hadoop package files are already in Blob Storage.)
    • 24. Windows Azure Fabric WA Blob Storage Phase 2: Generate/submit deployment files Windows Azure Deployment Service • Deployment Service generates Cloud Service deployment files. • .cspkg: contains Deployment Agent • .cscfg: contains instance counts for VMs and location of generated manifest files. • Cloud Service deployment files are submitted to Windows Azure Fabric. .cspkg .cscfg
    • 25. WA Blob Storage Phase 3: Provision VMs, Deployment Agent Windows Azure Deployment Service • Windows Azure Fabric provisions VMs and deploys Deployment Agent on VMs Windows Azure Fabric
    • 26. WA Blob Storage Phase 3: Provision VMs, Deployment Agent Windows Azure • Windows Azure Fabric provisions VMs and deploys Deployment Agent on VMsWindows Azure Fabric VM1 VM2 VM3 VM4 WEB_ROLES WORKER_ROLES Deployment Agent Deployment Agent Deployment Agent Deployment Agent
    • 27. VM1 WA Blob Storage Phase 4: Get manifest files, install components Windows Azure • Deployment Agent determines environment and VM type. • Deployment Agent gets manifest files based on location in .cscfg file. Windows Azure Fabric VM2 VM3 VM4 Deployment Agent Deployment Agent Deployment Agent Deployment Agent WEB_ROLES WORKER_ROLES
    • 28. VM1 WA Blob Storage Phase 4: Get manifest files, install components Windows Azure • Deployment Agent generates in-memory list of activities for installing components. • Deployment Agent retrieves packages (based on repo location in PackageDefinition file). Windows Azure Fabric VM2 VM3 VM4 Deployment Agent Deployment Agent Deployment Agent Deployment Agent • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ----------
    • 29. VM1 WA Blob Storage Phase 4: Get manifest files, install components Windows Azure • Deployment Agent installs components.Windows Azure Fabric VM2 VM3 VM4 Deployment Agent Deployment Agent Deployment Agent Deployment Agent NameNode JobTracker DataNode, TaskTracker DataNode, TaskTracker • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ---------- • ----------

    ×