Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building A Production-Level Machine Learning Pipeline


Published on

With so many options to choose from how do you select the right technologies to use for your machine learning pipeline? Do you purchase bare metal and hire a devops team, install Spark on EC2 instances, use EMR and other AWS services, combine Spark and Elasticsearch?! View this talk to get a first-hand experience of building ML pipelines: what options were looked at, how the final solution was selected, the tradeoffs made and the final results.

Published in: Technology

Building A Production-Level Machine Learning Pipeline

  1. 1. Building a Production-Level Machine Learning Pipeline Robert Dempsey, CEO Atlantic Dominion Solutions
  2. 2. Production ML Pipelines Robert Dempsey 2 Entrepreneur, Software Engineer Books and online courses Lotus Guides, District Data Labs Atlantic Dominion Solutions, LLC Professional Author Instructor Owner
  3. 3. Production ML Pipelines We’ve mastered three jobs so you can focus on one - growing your business. 3
  4. 4. Production ML Pipelines The Three Jobs At Atlantic Dominion Solutions we perform three functions for our customers: Consulting: we assess and advise in the areas of technology, team and process to determine how machine learning can have the biggest impact on your business. Implementation: after a strategy session to determine the work you need we get to work using our proven methodology and begin delivering smarter applications. Training: continuous improvement requires continuous learning. We provide both on-premises and online training. 4
  5. 5. Production ML Pipelines Writing the Book Co-authoring the book Building Machine Learning Pipelines. Written for software developers and data scientists, Building Machine Learning Pipelines teaches the skills required to create and use the infrastructure needed to run modern intelligent systems. 5
  6. 6. Production ML Pipelines6 What’s your biggest issue?
  7. 7. Production ML Pipelines7 Technology is LEAST important
  8. 8. Production ML Pipelines8 The REPORT Framework™
  9. 9. Production ML Pipelines REPORT Framework™ Risk Tolerance Expectations Product Operations Results Team 9
  10. 10. Production ML Pipelines Risk Tolerance Question: How risk averse are you? Some companies happily deploy beta and release candidate versions of cutting edge open source software. Others enjoy the freedom of open source and look for only mature applications. And yet a third category swear off open source all together and only buy software that comes with a license and a support contract. Where does your company sit on the risk aversion spectrum? Question: What are your non-technology risks? Technology aside, what happens if your project fails? Do you get fired? Does the entire team get fired? Do the naysayers get to say “I told you so” in a meeting? 10
  11. 11. Production ML Pipelines Expectations Question: What are the expectations around the project? Here are a few questions to get you started: • Non-Technical • How long do you think the project will take? How much do you expect it to cost? • What are others expecting the system will be able to do? • Technical • How much volume does the system need to be able to process? In what amount of time? • What level of downtime can you absorb? 11
  12. 12. Production ML Pipelines Product Question: What does the product roadmap say? At a minimum a bullet point list will help set the expectations of others, and allow you to make trade-offs as the project moves forward. It also helps you measure results - discussed later - on an incremental basis, which will help your team know if they are making progress, or not. Question: What’s the budget and estimated ROI? As with expectations and product roadmap, whether formalized or not, there is always, or should always be a budget as well as an estimated ROI. Write it down and use it as one of your metrics. 12
  13. 13. Production ML Pipelines Operations Question: Got DevOps? DevOps, sometimes called TechOps, is a group that manages and maintains the technology infrastructure of the organization. Just because you have a DevOps team doesn’t mean you want to add additional strain on them by firing up more servers. With cloud providers like AWS you still have to do some infrastructure support and maintenance. The larger your business the more support work there will be. 13
  14. 14. Production ML Pipelines Results Question: What does the end result look like? Here’s a very partial list of results we’ve seen measured: • The project was completed on X date by X time. • The project cost $X amount of money to complete. • The team worked no more than 40 hours each week to get the project done. • X, Y and Z features are in the product and have 90% automated test coverage. 14
  15. 15. Production ML Pipelines Team Question: Are the right people on the bus to get the project completed? Having the right people with the right skills, both hard and soft, can make or break a project. Question: Does each team member have the tools and support they need to be successful? • Does the team have the support of senior leadership? • Are they going to encounter a deluge of bureaucratic red tape that will slow their progress? • Are development and testing environments available? 15
  16. 16. Production ML Pipelines ML Pipeline Toolbox 16
  17. 17. Production ML Pipelines The “Standard” ML Pipeline 17 Collect Store Enrich Train / Apply Visualize Infrastructure
  18. 18. Production ML Pipelines Infrastructure • Servers • Amazon EC2 • Data center • Container Technologies • Docker • Amazon Elastic Container Service (ECS) 18
  19. 19. Production ML Pipelines Collect • Programming Languages • Python • Scala • Go • R • Pre-Built Tools • Pentaho Data Integration • Various web scraping tools 19
  20. 20. Production ML Pipelines Store • Elasticsearch • Apache Kafka • Redis • Cassandra • MongoDB • SQL • Amazon S3 • HDFS • Many others 20
  21. 21. Production ML Pipelines Enrich • Apache Storm • Apache Spark • Amazon Elastic MapReduce (EMR) • Apache Nifi • Airflow (Airbnb) 21
  22. 22. Production ML Pipelines Train / Apply • Python Libraries • Scikit-learn • Pandas • Spark Libraries • MLlib • Deep Learning • Tensorflow • PyTorch 22
  23. 23. Production ML Pipelines Visualize • Kibana • Grafana • Amazon Athena (for S3) • Flask • D3.js 23
  24. 24. Production ML Pipelines Machine Learning Pipeline Architectures 24
  25. 25. Production ML Pipelines Architecture 1 25 Agent File System Apache Spark File System Agent ES 1 2 3
  26. 26. Production ML Pipelines Architecture 1 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: not an issue. • Expectations: the pipeline needed to be run in production and be able to handle the amount of data the company had in a timely fashion. • Product: this was a short-term solution to process data until the desired pipeline was ready to be deployed into production. 26
  27. 27. Production ML Pipelines Architecture 1 Choices • Operations: due to its simplicity and limited functionality, the solution became a one-server solution deployed by an engineer working in unison with an internal devops team member. • Results: the pipeline was deployed on time and was able to process all the data within the parameters • Team: after a consultant built the first version of the application an internal team member took over and deployed it into production. 27
  28. 28. Production ML Pipelines Architecture 2 28 Agent 1 2 3 Agent Agent ES S3 HDFS Apache Kafka Apache Storm
  29. 29. Production ML Pipelines Architecture 2 Choices This pipeline was built at a startup focused on data collection and was core to the product. • Risk Aversion: this was the second version of a previously developed and well proven pipeline so risk aversion was low. • Expectations: as a core product the pipeline was expected to be continuously evolving, able to be horizontally scaled, able to handle a growing amount of data, and have 100% uptime. • Product: the functionality built was in line with a product roadmap that was reviewed on a monthly basis. 29
  30. 30. Production ML Pipelines Architecture 2 Choices • Operations: an internal devops team managed the infrastructure while engineers were expected to support the associated applications and data processors • Results: the pipeline could be horizontally scaled, handled between 1-2TB of data per day, and had 99.9% uptime. • Team: the devops and engineering teams worked together to produce and support it. 30
  31. 31. Production ML Pipelines Architecture 3 31 Agent 1 2 3 Agent Agent Athena S3 S3 Apache Spark
  32. 32. Production ML Pipelines Architecture 3 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: this system was mission critical for delivering data in real-time to customers. Failure was not an option so best in class practices needed to be implemented included using hosted solutions such as Databricks and S3. • Expectations: this system would scale as data collection efforts grew and would be extremely fault tolerant. 32
  33. 33. Production ML Pipelines Architecture 3 Choices • Product: this system would be extended to accommodate additional product offerings so flexibility was important. • Operations: this system was maintained by the engineers who built it as there no separate devops team. • Results: the system processed several TBs of data per hour (need to double check this) with minimal downtime. • Team: the team supporting the pipeline set up monitoring and alerting to ensure uptime and worked with other engineering groups to deconflict deployments that might impact the pipeline. 33
  34. 34. Production ML Pipelines Architecture 4 34 Agent 1 2 3 Agent Agent ES S3 HDFS Apache Kafka Apache Spark HBase
  35. 35. Production ML Pipelines Architecture 4 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: this system supported a key customer and was being implemented as a means to resolve data loss and data discrepancies that had plagued a legacy system. • Expectations: this system would be resilient in the event of an outage so that no data would be lost. • Product: this system would ultimately be replaced by a more general system designed to support multiple customers, so it was considered extremely critical yet a one-off. 35
  36. 36. Production ML Pipelines Architecture 4 Choices • Operations: this system was maintained by the engineers who built it as at the time there was no technical operations team in place. • Results: the system processed hundreds of GBs of data per day with infrequent outages. • Team: once deployed, the team of developers who built this pipeline began work on incorporating its features into a more generalized stream processing platform. 36
  37. 37. Production ML Pipelines Q&A 37
  38. 38. Production ML Pipelines Free Guide 38
  39. 39. Production ML Pipelines Where to Find Me Website Lotus Guides LinkedIn Twitter Github 39 robertwdempsey rdempsey rdempsey
  40. 40. Production ML Pipelines Thank You! 40