Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Audi‘s Hadoop Journey into the Hybrid Cloud

172 views

Published on

After having run Hadoop on-premise in production for some time, we decided to build a Hadoop platform in AWS to extend the on-premise Hadoop cluster to a hybrid platform.

In this presentation, we first briefly state our motivation and requirements for building a cloud platform. Moving to the cloud not only offers new technical possibilities, it also helped us to make your way of working more agile. We explain how we setup a team of internal and external experts, defined an agile working mode and how this approach worked for us.

Published in: Technology
  • Be the first to comment

Audi‘s Hadoop Journey into the Hybrid Cloud

  1. 1. DataWorks Summit 2019 - Barcelona Audi‘s Hadoop Journey into the Hybrid Cloud Carsten Herbe (Audi Business Innovation GmbH, Germany)
  2. 2. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe2 About us
  3. 3. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe3 Audi AG 1,8 million cars per year*, 90.000 employees worldwide* * source: https://www.audi.com/de/company.html
  4. 4. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe4 Audi mobility innovations Audi on demand Audi balanced technologies Audi e-gas Audi customer IT solutions Audi Business Innovation GmbH Munich based subsidiary of Audi AG Carsten Herbe Audi Business Innovation GmbH » Data Platform & Solution Architecture » Technical Product Owner & Architect for Cloud Hadoop » 5 years Hadoop, 3 years Kafka, 1 year AWS » 10+ years Data Warehousing & BI
  5. 5. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe5 HAAP – Hybrid Audi Analytic Platform Big Data Capabilities & Focus data domains ! Data Domains Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists Embed Analytics Analyze Data Store, Distribute and Process Data Deliver InformationSecurity Infrastructure & Services Provision Data Deliver Service Manage Information Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analytical APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Cloud Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage HAAP – HYBRID AUDI ANALYTIC PLATTFORM File Systems (HDFS) Stream Processing Machine Learning
  6. 6. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe6 Why cloud?
  7. 7. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe7 Audi’s motivation to extend its Hadoop platform to the cloud • Audi is moving many applications to the cloud • Data of one important use case is already in the cloud Data “Locality” • Scaling clusters: number of nodes, node types, … • Scaling stages: testing new features, upgrades, … Scalability • Adding nodes with GPUs • Use a more flexible staging process • Cloud services: S3, RDS, Docker Registry, … • Reducing work on infrastructure Functionality
  8. 8. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe8 Goals One platform as a hybrid solution • Some related system are currently only on-premise: • DWH, Reporting Tool, … • Some data sources remain on-premise (e.g. manufacturing) Hybrid • Write once, run everywhere: identical tech stack • Single sign-on: on-prem principals used for cloud • Data: easy data movement & shared metadata One platform
  9. 9. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe9 Project Setup
  10. 10. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe10 Team setup & project mode • Companies: internal (Audi + ABI) + external (2 partner + HWX) • Bases: 4 cities in 2 countries • Nationalities: 5 different nationalities Mixed Team • Scrum based • Weekly 2 days on-site workshop at the Audi project office • Tools: Jira, Bitbucket, RocketChat Collaboration • get experts on various topics (devops, Hadoop, AWS) together • Knowledge transfer from external to internal Goals
  11. 11. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe11 Sprint Structure and on-site workshops Week 1 Day1 - 10:00 Check-in Day2 – 15:00 Check-out Week 2 Day1 - 10:00 Check-in Day2 – 15:00 Check-out Week 3 Day1 - 10:00 Review Day1 - 13:00 Retrospective Day1 – 15:00 Planning Day2 – 15:00 Check-out alignment on-prem team co-location Review on-demand Merge-Meeting/Call Design Meeting/Call on-demand Merge-Meeting/Call Design Meeting/Call on-demand Merge-Meeting/Call Design Meeting/Call
  12. 12. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe12 Choice of Technologies
  13. 13. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe13 Finding the best fitting tech stack for Audi • CloudFormation • Terraform AWS Infrastructure setup Terraform • already used by other projects • Terraform + Bash • Ansible • … Configuration Management Ansible • switched from Bash as complexity increased • already used by other projects • Ambari Blueprints • Cloudbreak Hadoop Deployment Ambari Blueprints • Cloudbreak is difficult to integrate into existing environment • No versioning with Cloudbreak yet • Local users manually • Integrate with corporate AD/LDAP • Our own FreeIPA User management FreeIPA • AD integration was not possible (yet) • Highest flexibility (+AD later) • DNS, Certificate Authority
  14. 14. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe14 Hybrid Architecture
  15. 15. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe15 HAAP Architecture – Big Picture FW XTR AAP messaging zone AAP data zone Kafka Data Warehouse AAP BI App Zone Tableau FW LSZ FW LSZ on premise KDC HDP KDC Splunk FW XTR AWS Frankfurt – CAAP VPC AWS Ireland Kafka Deploy Automation AWS Frankfurt - Hub VPC public cloud CAAP KDC FreeIPA FW Cloud DXC
  16. 16. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe16 High-level AWS network architecture hub VPC Cisco Router Direct Connect VPG Spoke VPC C Spoke VPC D Spoke VPC A Spoke VPC B VPG VPG VPG Cloud On-Premise FW Cloud WAN Distri
  17. 17. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe17 Cloud Hadoop Platform: detailed view mgmt public subnet mgmt private subnet blue public subnet blue private hdp subnet Cisco Router bastion deploy FreeIPA IGW DXC NAT GW IGWNAT GW VPG Ambari KDC Edge 1 Master 1 Data 1 Data 2 Data 3 LLAP 1 SG bastion SG deploy SG edge SG IDM SG master SG workerSG Ambari SG KDC SG hdp RDS Postgres blue private rds subnet ECR registry VPG S3 terraform state backup projects S3 endpoint S3 endpoint CloudWatch CloudTrail IAM blue VPChub VPCmgmt VPC
  18. 18. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe18 User Management & Kerberos Trust Cloud DEV MIT KDC DEV.CAAP.AUDI.VWG Cloud PRD MIT KDC PRD.CAAP.AUDI.VWG FreeIPA KDC CAAP.AUDI.VWG on-prem DEV MIT KDC DEV.AUDI.VWG on-prem PRD MIT KDC PRD.AUDI.VWG one-way trust one-way trust one-way trust LDAP carsten: <dev> carsten-adm: <dev, prd> > kinit carsten@DEV.AUDI.VWG > hdfs dfs –ls //ONPREMDEV:8020/user/carsten > hdfs dfs –ls //CLOUDDEV:8020/user/carsten > kinit carsten@CAAP.AUDI.VWG > hdfs dfs –ls //CLOUDDEV:8020/user/carsten > hdfs dfs –ls //CLOUDPRD:8020/user/carsten > hdfs dfs –ls //ONPREMDEV:8020/user/carsten ü û ü ü ü ü one-way trust OS: local user mgmt OS: local user mgmt û OS: FreeIPA user integration OS: FreeIPA user integration
  19. 19. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe19 Lessons learned
  20. 20. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe20 With great freedom come great responsibilities … • you can do anything you want right away! • but you have to do it yourself: e.g. DNS, LDAP, … • Automation pays off but requires initial invest • Security must be considered from the start Cloud • Agile • Strong involvement of product owner required • Distributed teams costs lot of travelling time • Different experts required: Cloud (AWS), Networking, DevOps, Hadoop, … • Fluctuation: distribute knowledge Project setup
  21. 21. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe21 Looking into the Future
  22. 22. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe22 Staging process for projects and platform PRD <projects> feature A <platform> DEV <platform> feature B <platform> DEV & INT <projects> INT <projects> PRD <projects> DEV <projects> INT <projects>
  23. 23. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe23 Technologies on the road map • on demand nodes with GPU for machine learning • S3/Glacier for „cold“ data • Looking into Kafka as a Service (Confluent, AWS) Cloud • Data Steward Service for hybrid Data Governance • Data Lifecycle Manager for data transfers and backup Data Plane • Using Docker under Yarn for more flexibility/functionality • Hive3 Kafka Integration HDP3.x • on demand nodes with GPU for machine learning • Data Science Workbench Machine Learing
  24. 24. WE ARE HIRING https://www.audi.com/corporate/de/karriere/einstieg-bei-audi.html https://karriere.audibusinessinnovation.com/

×