Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Building Data Science into Organizations: Field Experience

Download to read offline

We will share our experiences in building Data Science and Machine Learning (DS/ML) into organizations. As new DS/ML teams are created, many wrestle with questions such as: How can we most efficiently achieve short-term goals while planning for scale and production long-term? How should DS/ML be incorporated into a company?
We will bring unique perspectives: one as a previous Databricks customer leading a DS team, one as the second ML engineer at Databricks, and both as current Solutions Architects guiding customers through their DS/ML journeys.We will cover best practices through the crawl-walk-run journey of DS/ML: how to immediately become more productive with an initial team, how to scale and move towards production when needed, and how to integrate effectively with the broader organization.
This talk is meant for technical leaders who are building new DS/ML teams or helping to spread DS/ML practices across their organizations. Technology discussion will focus on Databricks, but the lessons apply to any tech platforms in this space.

  • Be the first to like this

Building Data Science into Organizations: Field Experience

  1. 1. Building Data Science into Organizations: Field Experience Chris Robison Joseph Bradley Data + AI Summit 2021
  2. 2. Joseph Bradley ● Sr. Solutions Architect ● 2nd ML Engineer at Databricks ● Apache Spark committer and PMC member Our perspectives Chris Robison ● Sr. Solutions Architect ● Former Director of Data Science and Omni-channel Marketing at Overstock.com ● Career data scientist and avid Apache Spark user
  3. 3. 5000+ Across the globe CUSTOMERS Lakehouse One simple platform to unify all of your data, analytics, and AI workloads The Data and AI Company ORIGINAL CREATORS
  4. 4. So you want to do Data Science... 98.8% 14.4% of Fortune 1,000 companies are investing in strategic Big Data & AI initiatives. of Fortune 1,000 companies say they have deployed AI capabilities into widespread production. Source: New Vantage Partners
  5. 5. Long-term ● Show business impact ● Increase productivity ● Scale DS across the organization Short-term ● Validate that DS is worthwhile ● Get resources: ○ Data ○ Data Scientists ○ Executive sponsorship ● Show vision Goals of a DS/ML/AI program
  6. 6. Technology and platform ● Poor integration between Data Science and other data teams ● Planning for scale and production, under investment constraints Organization ● Team building: skill sets, hiring, and training ● Team organization: embedded vs. standalone ● Business and executive alignment ● R&D Challenges of a DS/ML/AI program
  7. 7. Set up reliable and efficient production processes. Scale and automate DS/ML workloads. Use popular tools. Emphasize productivity. Platform Improve executive visibility and cross-team integration. Build communication channels. Think about products. Get quick wins. Plan for the future. Philosophy Strategy Organization Crawl Walk Run Embed Data Science in the organization’s DNA. Reproduce end-to-end and across multiple verticals.
  8. 8. Execution Use agile processes for data science ● Iterate with sprints and standups ● Fail fast in R&D Transparency is key ● Communicate frequently to your business partners and executives ● Make business partners and consumers an integral part of process Collaborate with the data and platform teams ● Make your needs known and understood ● Beware shortcuts which build technical debt
  9. 9. Set up reliable and efficient production processes. Scale and automate DS/ML workloads. Use popular tools. Emphasize productivity. Platform Improve executive visibility and cross-team integration. Build communication channels. Think about products. Get quick wins. Plan for the future. Philosophy Strategy Organization Crawl Walk Run Embed Data Science in the organization’s DNA. Reproduce end-to-end and across multiple verticals.
  10. 10. ML/AI Success ● Successful MVPs with a few models manually in production ● Starting to build an AI/ML Strategy ● In discovery phase for new projects and low-hanging fruit Company ● Desire to become data driven ● Smaller in size (startup) or an existing organization with new data initiatives Team ● 1-2 Data Scientists (likely) reporting to a CTO ● Acting as full stack data scientists ● Typically a math or computer science background Organization building -- “Crawl” stage
  11. 11. Common tools Descriptions Notebooks and IDEs Python notebooks, R Studio, Local IDEs Languages Python, R -- and potentially SQL, Scala, Java, etc. ML libraries Standard libraries, plus bring-your-own libraries and versions Git Notebook versioning, and syncing across platforms with Git Data Pandas, Spark, Koalas; any data sources or formats Visualization Matplotlib, Plotly, Seaborn, etc. Integrations Platforms must integrate with any libraries, systems, or services. Platforms which are cloud-native and have both UIs and APIs are ideal. Keep using familiar tools
  12. 12. Build around OSS standards for portability # Downloads / month 990K 350K 1.7M 516K
  13. 13. Be more productive with self-service analytics Compute resources Libraries and environment With popular ML libraries Plug & play environments requirements.txt conda.yaml And customization Start up machines or clusters on demand Cost controls: Autoscaling, auto-termination, spot instances, cost tracking Governance: Cluster policies for enforcement Option 2: Share clusters, with separate Python env per user or project. Option 1: Use your own cluster
  14. 14. Running example: ML prioritization of Sales opps Platform enablement and improvement Customer history and Sales data access Long-term platform and data pipeline planning Develop DL model Use notebooks + TensorBoard for interactive development. Analyze results Review auto-logged MLflow metrics to analyze model performance. Load data Efficient data loading from S3, ADLS, etc. Get an ML workspace Simple machine or cluster creation. Ready-to-go DS environments. Share results Share insights with other stakeholders Sync code Import .py or .ipynb notebook, and sync with Git. Discussion with Sales stakeholders to understand the problem and data, and to set expectations Explanation of results and future potential to Sales Build executive alignment and buy-in for long-term initiatives DS team training and hiring
  15. 15. Set up reliable and efficient production processes. Scale and automate DS/ML workloads. Use popular tools. Emphasize productivity. Platform Improve executive visibility and cross-team integration. Build communication channels. Think about products. Get quick wins. Plan for the future. Philosophy Strategy Organization Crawl Walk Run Embed Data Science in the organization’s DNA. Reproduce end-to-end and across multiple verticals.
  16. 16. ML/AI Success ● Successful MVPs and production models in multiple business units ● Uniform testing standards are being established Company ● Data initiatives being discussed at the executive level ● Business units pushing for data projects ● Emerging business champions for AI/ML Team ● Data Science team(s) supporting multiple business units ● Integrations with software engineering for production ● Diversifying skill-sets for domain expertise Organization building -- “Walk” stage
  17. 17. Data Preparation Feature Engineering Model Training Model Evaluation Model Deployment Model Tuning Model Consumption ● Koalas ● Spark DataFrames ● Spark UDFs ● Larger instances ● GPUs ● Distributed training (Spark ML, HorovodRunner, etc.) ● Hyperopt ● MLflow ● Spark DataFrames & UDFs ● Jobs & Model Servers ● Mlflow Scaling in a typical machine learning workflow
  18. 18. Auto-logging for reproducibility Reproduce Run feature: ✓ ✓ ✓ ✓ Code versioning Data versioning Cluster configuration Environment specification Reproducibility checklist: Job scheduling in platform Automation: schedule, alert, retry, API Automate and reproduce wherever possible Secure: IAM Passthrough | Cluster Policies | Table ACLs
  19. 19. Your Existing Data Lake Ingestion Tables Data Catalog Feature Store Azure Data Lake Storage Amazon S3 Streaming Batch 3rd Party Data Marketplace Files for Data Science and ML ● Schema enforced high quality data ● Optimized performance ● Full data lineage / governance ● Reproducibility through time travel ML Runtime IAM Passthrough | Cluster Policies | Table ACLs | Automated Jobs Infrastructure Data Engineering Data Science ML Engineer
  20. 20. Running example: ML-driven products Scale up or out Larger machines. Multiple GPUs. Distributed training. Schedule training and inference jobs Create jobs from notebooks or libraries. Add schedules, retries, and alerts. Model validation checks. Automate for downstream consumption Integrate with 3rd-party tools and systems to export ML insights to business stakeholders Integrate with data pipelines Automate ingestion of new data for ML and output of ML insights for business/product Scale tuning with Hyperopt + SparkTrials. Manage tuning with MLflow autologging. Improve modeling process Executive <> Data Science team alignment on data-driven initiatives Knowledge sharing across business units for ML-driven projects Education for business stakeholders to understand ML models and insights Platform adoption by multiple business units Increased governance needs for platform, covering needs of more business units and personas Platform plays a key role in establishing best practices
  21. 21. Set up reliable and efficient production processes. Scale and automate DS/ML workloads. Use popular tools. Emphasize productivity. Platform Improve executive visibility and cross-team integration. Build communication channels. Think about products. Get quick wins. Plan for the future. Philosophy Strategy Organization Crawl Walk Run Embed Data Science in the organization’s DNA. Reproduce end-to-end and across multiple verticals.
  22. 22. ML/AI Success ● Successful production models in multiple verticals ● Uniform testing standards established ● Program to grow citizen data scientists Company ● Data initiatives are reported at the board level ● Data driven decision making across an organization Team ● Multiple Data Science teams across verticals led by an AI executive ● Standard development and deployment processes for models ● COE across verticals Organization building -- “Run” stage
  23. 23. model lifecycle Staging Production Archived Data Scientists Deployment Engineers v1 v2 Models Tracking Flavor 2 Flavor 1 Model Registry Custom Models In-Line Code Containers Batch & Stream Scoring Cloud Inference Services OSS Serving Solutions Serving Parameters Metrics Artifacts Models Metadata Model Deployment Options
  24. 24. Example of ML Ops Training Model Validation Job Production Batch Inference Job Email Create model version Webhook for new model versions in staging Comment with test results + transition request to production Webhook for new model version in production ML Ops person receives email that transition request to production was made Approve new production model Model Registry
  25. 25. Modes of deployment Model training Batch Model Tracking and Registry Streaming REST API Embedded Delta Lake / Feature Store Latency Cost Minutes Low Sec - Min Low - Med < 1 Sec High varies varies BI tools
  26. 26. Repeatable Data Science lifecycle Business understanding Executive sponsorship Center of Excellence for DS & ML End user feedback Metric discussions and KPIs Business value realization Exploratory data analysis Data ingestion and preparation Model deployment and automation ML modeling Model monitoring and feedback ML and Data platform and pipeline integration Simple onboarding process for new teams and use cases Data and resource sharing and governance Standard handoff process for production jobs Sharable documentation and usage education
  27. 27. Resources to learn more Related talks and blogs ▪ Building Machine Learning Platforms Webinar ▪ MLflow Model Registry on Databricks Simplifies MLOps With CI/CD Features Customer success stories ▪ Comcast, Starbucks, H&M ▪ Searchable customer stories Databricks ▪ Data science and machine learning product page ▪ Managed MLflow product page
  28. 28. Set up reliable and efficient production processes. Scale and automate DS/ML workloads. Use popular tools. Emphasize productivity. Platform Improve executive visibility and cross-team integration. Build communication channels. Think about products. Get quick wins. Plan for the future. Philosophy Strategy Organization Crawl Walk Run Embed Data Science in the organization’s DNA. Reproduce end-to-end and across multiple verticals.

We will share our experiences in building Data Science and Machine Learning (DS/ML) into organizations. As new DS/ML teams are created, many wrestle with questions such as: How can we most efficiently achieve short-term goals while planning for scale and production long-term? How should DS/ML be incorporated into a company? We will bring unique perspectives: one as a previous Databricks customer leading a DS team, one as the second ML engineer at Databricks, and both as current Solutions Architects guiding customers through their DS/ML journeys.We will cover best practices through the crawl-walk-run journey of DS/ML: how to immediately become more productive with an initial team, how to scale and move towards production when needed, and how to integrate effectively with the broader organization. This talk is meant for technical leaders who are building new DS/ML teams or helping to spread DS/ML practices across their organizations. Technology discussion will focus on Databricks, but the lessons apply to any tech platforms in this space.

Views

Total views

102

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

9

Shares

0

Comments

0

Likes

0

×