Extending your Hadoop Implementation to the Cloud


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Extending your Hadoop Implementation to the Cloud

  1. 1. EXTENDING YOUR HADOOP IMPLEMENTATION TO THE CLOUD Matt Winkler Principal Lead Program Manager Big Data @ Microsoft  We’re Hiring @mwinkle
  2. 2. AGENDA Decisions IaaS Cloud Storage Hadoop as a Service Hybrid Scenarios Next Steps
  3. 3. WHY CLOUD? Elasticity Cost Optimization Economic flexibility Support for bursting workloads Global footprint
  4. 4. WHY ON-PREMISES? Compliance requirements Specific control over hardware/networking Integration requirements for additional apps to be close to cluster
  5. 5. WHEN CLOUD? Data born in the cloud Global apps Satisfy geopolitical or compliance constraints Dev/Test Backup Geo-Redundancy Bursting to cloud
  6. 6. IAAS – RUN YOUR HADOOP IN THE CLOUD IaaS offerings across the cloud providers offer:  OS choice  Node configuration  Customized networking topology  Repeatable, scriptable deployments You still have to:  Set up the cluster  Manage data movement into the cluster  Integrate with your other applications  Manage patching and updates of OS and apps  Obtain support and/or licenses
  7. 7. DEMO Deploying a Hadoop Cluster to Azure
  8. 8. LEVERAGE CLOUD STORAGE FOR FLEXIBILITY Cloud storage enables economic flexibility, scale and rich features  Size clusters independent of storage needs  Clusters become stateless to operate across the data  Price continues decreasing  Geo-Redundancy allows for business continuity/disaster recover planning
  9. 9. CLOUD STORAGE USAGE PATTERNS HDFS within the cluster  Move data in from cloud storage on boot  (optional) backup/age data to cloud storage  (optional) move data out to cloud storage to rebuild cluster Default file system using cloud storage connectors  To Hadoop apps, they just see a path to data and most things “just work”  Apps which rely specifically on HDFS may encounter compat issues  The physics change in exchange for flexibility
  10. 10. LEVERAGING HADOOP AS A SERVICE Hadoop Services  Cluster creation on demand  Default integration with cloud storage  Integration across services and apps  Higher level abstractions  API set for integrating into apps Azure HDInsight  Clusters provisioned on top of Azure Blob storage  Deploy clusters of any size  Entire stack supported by Microsoft Azure Active Directory Service Bus Scheduler Multi-Factor Authentication Express Route Azure SQL Database Azure Web Site Some example services
  11. 11. DEMO Getting Started with HDInsight
  12. 12. HYBRID SCENARIOS Key scenarios  Offsite backup  Dev/Test  Burst to Cloud The decision is not an XOR, it’s on-premises AND cloud
  13. 13. Microsoft Azure Azure Storage HDInsight (Hadoop) Hadoop cluster deployed to IaaS DEMO On-Premises Hadoop Cluster (HDP 2.1) Running on CentOS HDFS YARN Tez Hive MR Falcon
  14. 14. GETTING STARTED Get started in the cloud (getting started cards available @ the Microsoft booth and up here at the stage) Create an HDInsight cluster, or try out deploying a Hadoop cluster to Azure http://aka.ms/howtohdinsight
  15. 15. Falcon command line
  16. 16. Falcon configuration files
  18. 18. Register & schedule
  19. 19. Data being landed into HDFS, on-prem
  20. 20. Syncing to blob store
  21. 21. New file @ 1:38pm
  22. 22. @ 2:01 pm
  23. 23. From HDI Cluster