Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop

502 views

Published on

Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop

  1. 1. Successes, Challenges and Pitfalls Migrating a SAAS Business to Hadoop Shaun Klopfenstein, CTO Eric Kienle, Chief Architect
  2. 2. The Vision
  3. 3. Requirements
  4. 4. Page 4 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Business Requirements • Near real-time activity processing • 1 billion activities per customer per day • Improve cost efficiency of operations while scaling up • Global enterprise grade security and governance
  5. 5. Page 5 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Architecture Requirements • Maximize utilization of hardware • Multitenancy support with fairness • Encryption, Authorization & Authentication • Applications must scale horizontally
  6. 6. Technology Bake Off
  7. 7. Page 7 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Bake Off • Technology Selection • Storm/Spark Streaming • HBase/Cassandra • Built POC with each permutation + Kafka • Load tested with one day of web traffic
  8. 8. Page 8 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 The Winner Is… Our First Challenge • We hoped to find a clear winner… we didn’t exactly • Truth is all the POCs worked at the scale we tested • It’s possible if we had scaled up the test, we would have found more differences
  9. 9. Page 9 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 How We Chose • Community • Features • Team Skillset • History • The winners: HBase/Kafka/Spark streaming
  10. 10. Architecture & Design
  11. 11. Page 11 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Marketo Lambda Architecture CRM Sync Partner APIs Other Marketing Activities Spark Streaming Consumers Campaign Triggers Solr Indexing Solr Spark Streaming Indexer Ingestion Processor Scala/Tomcat HBase HDFS Kafka Event Stream Web Activity RTP Activity Mobile Activity Marketo UI Campaign Detail Lead Detail Other Clients CRM Sync Revenue Cycle Analylitcs APIs Email Report Loader Web Activity Processor
  12. 12. • Enhanced Lambda Architecture • Inbound activities written to Ingestion Processor • Hbase and then Kafka • High volume (e.g. web) activities • First written to Kafka, then enriched • Spark Streaming applications consume events from Kafka • Solr Indexing • Email Reports • Campaign Processing • HBase is used for simple historical queries, and is system of record High Level Architecture
  13. 13. Build It Implementation
  14. 14. Page 14 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Expertise • We had a few people with Hadoop and Spark experience • We decided to grow knowledge in house • Focus on training - HortonWorks boot camp for operations • In house courses and tech talks for engineering/QE
  15. 15. Page 15 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Expertise - Successes • Critical to kick start the project • Built excitement • Created foundation for the design process
  16. 16. Page 16 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Expertise – Context Challenge Challenge • Training packed a lot of information into a short period • Teams that didn’t leverage the training right away lost context Recommendation • Create environments for hands on experience early • Hands on experience across all teams right after training
  17. 17. Page 17 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Expertise – Experience Challenge Challenge • Hadoop technology is like playing a piano… knowing how to read music doesn’t mean you can play • Many ways to design, configure, manage - Only a few right ways and the reasons can be subtle Recommendation • Find your experts! • Partner and hire
  18. 18. Page 18 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Our First Cluster • Initial sizing and capacity planning of first Hadoop Clusters • Perform load tests to get initial capacity plan • Decided that disk I/O and storage would be the leading indicator • Went with industry best practice on hardware and network configuration
  19. 19. Page 19 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Our First Cluster- Success • Leading indicator ended up being compute • But cluster sizing ended up being close enough to start • Clusters can always be expanded… So don’t get too hung up
  20. 20. Page 20 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Building Our First Cluster – Zookeeper & VM Challenge • We started with Zookeeper virtualized • Didn’t perform properly (we think because of disk IO) • Caused random outages Recommendation • We ended up migrating zookeeper to physical boxes • Don’t use VMs for zookeeper!
  21. 21. Page 21 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Security • All data at rest must be encrypted • Applications sharing Hadoop must be isolated from each other • Applications must have hard quotas for both compute and disk resources
  22. 22. Page 22 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Security - Success • Enabled Kerberos security for Hadoop cluster • Kerberos allowed us to leveraged HDFS native encryption • Used encrypted disks for Kafka servers • Created separate secure Yarn queues to isolate applications • Each application uses a separate Kerberos principal
  23. 23. Page 23 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Security – Kerberos Challenge Challenge • Kerberos can’t be added to a Hadoop cluster without prolonged downtime and patches • Needed weeks of developer time to accommodate security changes • Added several months to the overall rollout schedule Recommendation • Allow extra time for Kerberos • Educate your team beforehand, find an expert to guide you • Be prepared for different levels of Kerberos support across the Hadoop ecosystem
  24. 24. Page 24 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Security – Kafka and Spark Challenge Challenge • Kafka doesn’t support data encryption (and won’t) • HDP version we had didn’t fully support Kerberos Kafka and Spark clients properly Recommendation • Move Kafka and Spark out of Ambari • Only encrypt Kafka data if you absolutely must, as it adds complexity
  25. 25. Test It
  26. 26. Page 26 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Validation • Changing the engines on a plane while in flight is hard • Required all components implemented “Passive mode” • The new code ran in the background and continuously compared results with the legacy system • Automated functional tests kicked off from Jenkins • Performance testing at AWS
  27. 27. Page 27 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Validation - Success • Passive mode is one of the best moves we made! • Allowed for testing of components with real world data and load • Found countless performance and logic issues with minimal operational impact
  28. 28. Page 28 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Validation – Passive Mode “Minimal Impact” Challenge • By design passive mode wrote to both Legacy and Hadoop systems • We impacted performance during an outage of our cluster Recommendation • Use asynchronous writes or tight timeouts in passive mode • Monitoring for the Hadoop cluster should be in place before passive testing
  29. 29. Deploying It
  30. 30. Page 30 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Migration and Management • We are here! • Migrate over 6,000 subscriptions with no service interruption or data loss • Track and monitor migration and provide management tools for the new platform • Achieve the end goal of removing the safety net
  31. 31. Page 31 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Migration and Management - Successes • Created a new management console called Sirius • Close architectural coordination of all teams during migration • If problems arose, we had a quick, automated, fallback path to the legacy system • Daily cross-functional standup meetings to track the rollout
  32. 32. Challenge • Oozie workflows can be challenging to build and debug • Capacity planning and resource management in the shared Hadoop cluster is very complex Recommendation • Only use Oozie workflows for automating complex or long running processes, or use a different orchestration platform • Constantly reevaluate your capacity plan based on current deployment Migration and Management Challenges
  33. 33. Running It
  34. 34. Page 35 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Monitoring • Needed to monitor hundreds of new Hadoop and other infrastructure servers • Our custom Spark Streaming applications required all new metrics and monitors • Capacity planning requires trend analysis of both the infrastructure and our applications • Don’t overwhelm our already busy Cloud Platform Team
  35. 35. Page 36 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Monitoring - Successes • Built a custom monitoring infrastructure using OpenTSDB and Grafana • Added business SLA metrics to our Sirius console to provide real-time alerts • Added comprehensive Hadoop monitors into our pre-existing production monitoring system
  36. 36. Page 37 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Monitoring - Challenges Challenges • Adding hundreds of servers and a dozen new applications makes for a huge monitoring task • Nagios is a very general purpose system and isn’t designed to monitor Hadoop out of the box Recommendations • Make sure that you have monitors and trend analysis in place and tested before migration • Be prepared to constantly refine and improve the your monitors and alerts
  37. 37. Page 38 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Patching and Upgrading • We have a zero-downtime requirement for applications • Patching and upgrading of either the infrastructure or our own applications is problematic • Keeping up with the community requires frequent patching • Eventually hundreds of Spark Streaming jobs will need to be constantly processing data with no interruption
  38. 38. Page 39 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Patching and Upgrading - Successes • Use Sirius console to manage Spark Streaming jobs • Marketo’s Kafka consumer allows streaming jobs to pick up where they left off after a restart • Integrated existing Jenkins infrastructure with the Sirius console to provide painless automated patching/upgrades
  39. 39. Page 40 Marketo Proprietary and Confidential | © Marketo, Inc. 7/11/2016 Infrastructure Patching and Upgrading - Challenges Challenges • Patches/upgrades managed with Ambari – not perfect! • We almost never get through an upgrade without one or more Hadoop components having downtime (so far) Recommendations • Test all infrastructure patches and upgrades in a loaded non-production environment • Check out the start and stop scripts from the component specific open source communities, rather than rely on Ambari
  40. 40. We’re Hiring! Http://Marketo.Jobs Q & A

×