Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshmanan

Download to read offline

Learn how Google Cloud addresses the key challenges when building an Agile Data & AI platform. This lecture is important regardless of the Cloud you are (will be) using because most businesses face the same 6 challenges:
1. High-quality AI requires a lot of data
2. AI Expertise is in high demand
3. Getting the value of ML requires a modern data platform
4. Activating ML requires surfacing AI into decision UIs
5. Operationalizing ML is hard
6. State-of-the-art changes rapidly

The lecture recording with Q&A is at https://youtu.be/ntBEQdD1IeQ


Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshmanan

  1. 1. Evening with Lak Lakshmanan Head, Data Analytics & AI Solutions, GCP
  2. 2. Welcome to ServerlessToronto.org 2 Introduce Yourself: - Where from? - Why are you here? Fill the survey to win prises! Aug 9, 2021: “Building a Data Cloud to enable Analytics and AI-Driven Innovation” starts at 6:10pm…
  3. 3. Serverless Evolution (since FaaS started) 3 Serverless is New Agile & Mindset #1 We started as Back- end FaaS (Serverless) Developers who enjoyed “gluing” other people’s APIs and Managed Services) #3 We're obsessed by creating business value (meaningful MVPs, Products), focusing on Outcomes – NOT Outputs, and we mesh well with Product Managers #2 We build bridges between Serverless Community (“Dev leg”), and Front-end, Voice-First & UX folks (“UX leg”) #4 Achieve agility NOT by “sprinting” faster (like in Scrum), but working smarter (by using bigger building blocks and less Ops)
  4. 4. Disconnect between IT & Business needs 4 How to help companies accelerate? Technology is not the point => We are here to create Value Adopting Serverless Mindset allowed us to shift the focus from “pimping up our cars” (infrastructure/code), towards “driving” (the business) forward. ≠
  5. 5. Let’s bridge the Businesses & IT Gap by: 5 1. bringing more Business-focused topics (like one today) to educate, 2. offering free Second Opinions on Application/Data Architecture modernization (to Businesses), 3. offering for-fee Consulting service (regardless of how short they are), 4. connecting Cloud enthusiasts from the Community with Employers Fill the Survey help us serve you better, plus to win Manning raffle: https://forms.gle/oH2ZTnSgMTH41xsg7
  6. 6. Knowledge Sponsor 1. Go to www.manning.com 2. Select *any* e-Book, Video course, or liveProject you want! 3. Add it to your shopping cart (no more than 1 item in the cart) 4. Raffle winners will send me the emails (used in Manning portal), 5. So the publisher can move it to your Dashboard – as if purchased. Fill the survey to win!
  7. 7. Upcoming ServerlessToronto.org Meetups 7 1) Get started with Dialogflow & Contact Center AI on Google Cloud – Lee Boonstra, Conversational AI @ Google 2) Dr. Maloy – Empowering Developers to be Healthcare Heroes 3) Snowflake talk… getting closer 4) YOUR “This is my Architecture” style presentations are welcome! Regardless how big or small your learning & sharing will be ☺ Please rate us on Meetup, Tell your peers We’re here to Help YOU help others
  8. 8. 8 Feature Presentation: Lak Lakshmanan
  9. 9. Building a Data Cloud to enable Analytics and AI-Driven Innovation How Google Cloud addresses key challenges when building an agile Data & AI Platform Lak Lakshmanan, Director, Analytics & AI Solutions @lak_gcp https://www.meetup.com/Serverless-Toronto/events/277818918/
  10. 10. Proprietary + Confidential Bring great products to market faster Unique customer segmentation Fundamentally change the way you go to market: identify highest value customers and provide right products at the right time Build better products Improve end-user experience through data-driven innovation and launch of new revenue-generating opportunities Adapt in real-time Run advanced analytics and predict the future through real-time tests that allows you to react and respond immediately to evolving user & market needs
  11. 11. Proprietary + Confidential Achieve better economics to scale your business Operations productivity Apply machine learning techniques to identify and rectify systemic inefficiencies, allowing you to learn and adapt your operations efficiently Developer productivity Discover patterns in code and build tools that improve developer productivity, e.g., code recommendation and automatic bug fixing AI/ML infrastructure efficiencies Reduce your cash burn rate by utilizing machine learning to run and execute your models quicker
  12. 12. Why is AI/ML so exciting today? Why all the hype? Artificial Intelligence Machine Learning Deep Learning Class of problems we can solve when computers think/act like humans Scalably solve those problems using data examples (not custom code) Even when that data consists of unstructured data like images, speech, video, natural language text, etc.
  13. 13. Challenge #1 High-quality AI requires a lot of data
  14. 14. Many recent AI advances can be attributed to increases in data size and compute power Deep Learning scaling is predictable, empirically https://arxiv.org/abs/1712.00409 https://blog.openai.com/ai-and-compute/
  15. 15. Fortunately, Transfer Learning works in many scenarios but requires high-quality pre-trained models
  16. 16. Customizing Google’s video models with AutoML Video SHOT CLASSIFICATION ACTION RECOGNITION OBJECT TRACKING VIDEO CLASSIFICATION Predict labels on entire videos (not segments within the video) Predict shot boundaries inside a video, and predict labels on each of those those shots Use a 1-second sliding window to predict actions, e.g., goal celebration Predict bounding boxes and start/end tracks of objects inside videos, e.g., track a drifting car Enable powerful content discovery and engaging video experiences
  17. 17. Proprietary + Confidential Data Science Key to ML on structured data is having more features – it’s essential to break down data silos and do ML while minimizing data movement BigQuery Analytics Project: Sales Data Mesh Transactions dataset Project: Customers Data Mesh CRM dataset Project: Products Data Mesh Products dataset Offline tickets Online orders Cust. Details P. Referential ML tooling BI tooling Global Logical Semantic Layer Raw data access still possible
  18. 18. Challenge #2 AI Expertise is in high demand
  19. 19. Very few people can create net-new ML models today 10K DL researchers 2M ML experts +23M Developers +100M Business users
  20. 20. Democratize predictive analytics for business users using BigQuery ML 1 2 3 Execute ML initiatives without moving data from BigQuery Iterate on models in SQL in BigQuery to increase development speed Automate common ML tasks, and hyperparameter tuning CREATE MODEL my_models.car_accidents OPTIONS(type=‘logistic_reg’, labels=[‘bad_accident’]) AS SELECT speed, age, ..., FROM input_table); SELECT label FROM ml.PREDICT( MODEL my_models.car_accidents, (SELECT speed, age, ... FROM input_table));
  21. 21. Proprietary + Confidential Build end-to-end AI Deploy custom AI Build a portfolio of AI use cases Requires Effort Benefit Use AI out of the box Maximize the value AI delivers into business workflows Low effort, high volume High quality AI building blocks and industry solutions Extract value from your data Medium effort, customization High quality baselines and ease-of-use Use AI on your data to differentiate your product High effort, low volume Powerful, easy-to-use platform that allows you to reuse pre-built and/or customizable components Similar value creation (“reward”) from all 3 buckets Need a unified platform that supports all 3 buckets
  22. 22. AI Platform for every level of expertise Pre-trained APIs No training data needed, get started right away Custom AI with AutoML Easily create custom models (A no-code approach) End-to-end AI with core tools Help data scientists and ML engineers build and deploy AI
  23. 23. Proprietary + Confidential Google Cloud AI Prebuilt ML APIs AI Platform AutoML AI Solutions Language Conversation Horizontal solutions Structured Data Language Contact Center AI Notebooks Industry solutions Data Labeling Training Prediction Continuous evaluation Explainability Pipelines Data Science and Machine Learning Sight Sight Vision Video Translate Natural Language Tables Video Intelligence Vision Natural Language Translate Speech-to-Text Text-to-Speech Document AI Dialogflow Talent Solution Recommendation AI Buy Build Customize
  24. 24. Challenge #3 Getting value of ML requires a modern data platform
  25. 25. Google’s data cloud embraces the full data life cycle
  26. 26. Google’s data cloud embraces the full data life cycle
  27. 27. Google’s data cloud embraces the full data life cycle
  28. 28. Google’s data cloud embraces the full data life cycle
  29. 29. Google’s data cloud allows customers to unify data across the entire organization Break down silos Increase agility Innovate faster Get value from data Support business transformation +consistent governance & security
  30. 30. The data and ML infrastructure have to be integrated because real-time, personalized machine learning is where the value is Speed of Analytics Systems must be able to ingest, process, and serve data in real-time, or opportunities are lost Speed of Action Machine-learning drives personalized services, based on the customer’s context
  31. 31. Deliver serverless analytics, not infrastructure Build for growth to any scale Embed ML and drive an end-to- end lifecycle Empower analytics across the entire data lifecycle Enable the best OSS technologies Google Cloud significantly simplifies big data analytics
  32. 32. Proprietary + Confidential Data Analytics & Management Google Cloud Smart Analytics & AI Prebuilt ML APIs Foundation AI Platform AutoML AI Solutions Language Conversation Horizontal solutions Structured Data Language Frameworks Compute Contact Center AI Ingestion and Processing Storage and Analytics Orchestration Notebooks Industry solutions Data Labeling Training Prediction Continuous evaluation Explainability Pipelines Compute Engine Cloud TPU Cloud GPU Cloud scheduler Cloud Composer Instrumentation Cloud Build Container Registry Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Data Fusion Cloud Storage BigQuery Cloud Bigtable Cloud SQL Data Catalog Data Studio Data Science and Machine Learning Sight Sight Vision Video Translate Natural Language Tables Video Intelligence Vision Natural Language Translate Speech-to-Text Text-to-Speech Document AI Dialogflow Talent Solution Recommendation AI
  33. 33. Supports the entire Data Science team Data Engineer Uses Pub/Sub, Dataflow, and Dataprep to ingest, prepare and transform data Data Scientist Uses AI Platform Notebooks and Training services to build, and evaluate models ML Engineer Uses AI Platform Predictions to serve models, and Kubeflow Pipelines to encapsulate ML workflows for reuse. Developer Collaborates with data scientists to embed AI through REST APIs into applications Business Analyst Discovers solutions from AI Hub and deploys it into production
  34. 34. Proprietary + Confidential A platform for all users and intents throughout the data lifecycle Fine-grained access control Cloud IAM Metadata management Data Catalog Always encrypted Data at rest and in transit Redact sensitive data Cloud DLP Security Admin Protecting data Messaging PubSub Data Processing Dataflow Data Apps Looker (LookML) OSS Engines Dataproc (Spark, Flink) Developer Intelligent apps DW & DB BigQuery , BigTable Data processing (OSS) pipelines Dataproc (Spark, Presto, Flink) Data Processing (Native) pipelines Dataflow Orchestration Composer Data engineer Get clean, useful data Messaging PubSub or Confluent Kafka CDW BigQuery CDW & Orchestration BigQuery Visual data Integration Data Fusion ML in SQL BigQuery ML Data models, catalog Looker, Data Catalog Data analyst Query and analyze Ingestion BigQuery Streaming & DTS Governed BI Looker CDW in a Spreadsheet Connected Sheets Natural Language Query Data QnA Business User Insights Everywhere Data models, catalog Looker, Data Catalog CDW BigQuery Portable notebooks AI Platform Notebooks Simplified ML BigQuery ML & Auto ML Collaboration Feature Store, AI Platform Pipelines Spark Dataproc / Dataproc Hub Data scientist Models that work CDW BigQuery Secure data sharing BigQuery
  35. 35. Dataflow: fully integrated data processor for ML BigQuery ML Onboard training data for BQML In batch or streaming mode Use BQML models for online inference Export BQML-trained models as SavedModel and do streaming inference Cloud AI Platform Tensorflow Extended (TFX) Dataflow Dataflow Dataflow Integrated with Kubeflow for production ML pipelines Data processor for CAIP training On CPUs and GPUs Enables large-scale processing for TFX Transform, Data Validation, Model Analysis Brings streaming events and generates Examples for training and inference Robust ingestion services Advanced Analytics at speed Actionable Intelligence
  36. 36. Challenge #4 Activating ML requires surfacing AI into decision UIs
  37. 37. Proprietary + Confidential Use Looker to push data into apps OEM White labeled Looker Customers log directly into Looker Embed/iFrames Displaying data visualizations and all advanced BI capabilities using iframes
  38. 38. Proprietary + Confidential A lot of ML predictions happen at the edge On-premises deployments IoT Devices Machine Learning at the Edge Device inference framework Model deployment to the Edge ML HW accelerator for the Edge Anthos on-prem Create, manage, and upgrade Kubernetes clusters in on-premises environments Cloud IoT Core Managed service to securely connect, manage, and ingest data from global device fleets Tensorflow Lite Deep learning framework for on-device inference Edge Manager for ML Deploy, manage and run ML models on edge devices with Cloud AI Platform Coral A platform of hardware components, software tools and precompiled models
  39. 39. Challenge #5 Operationalizing ML is hard
  40. 40. ML requires operationalizing both data and code Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring ML Code Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
  41. 41. Get models into production faster Ingestion Analysis Transform Training Tracking Evaluating Deploying Managed dataset AutoML Models Evaluation Endpoints and Batch predictions START END Custom Code Labeling tasks Unmanaged dataset Flexibility in training methods, and running parallel experiments Access model evaluation, optimization, and XAI capabilities built into the platform Robust backend for deployment with all relevant MLOps services
  42. 42. Proprietary + Confidential Scaling ML workflows with AI Platform Pipelines and the MLOps suite of services Artifact Store Cloud Storage Scalable Inference AI Platform Prediction Processing Cloud Dataflow Serverless Training AI Platform Training Extract Data Prepare Data Train Model Validate Data Vertex AI Pipelines Evaluate Model Validate Model Deploy Model Container Registry Data warehouse BigQuery
  43. 43. Proprietary + Confidential AutoML Experiment Train Deploy Data Labeling TensorBoard Model Builder SDK Training Vizier NAS Prediction Model Monitoring Explainable AI Feature Store ML Metadata Pipelines Notebooks Vision Video Language Tables Forecast Custom training workflow No code / low code workflow Vertex AI GA NEW NEW NEW NEW NEW NEW NEW NEW Matching Engine NEW BigQuery ML Translation NEW
  44. 44. Proprietary + Confidential Unified machine learning and data science Increased productivity & reduced learning curve Learn a single workflow and vocabulary for all of our AI products, regardless of the layer of abstraction. Easy experimentation Train models quickly using AutoML by building on Google’s proprietary IP and compare the results easily against custom-built models trained on the same dataset and managed in one place on the unified platform. Seamless integration & exibility Easily interchange custom and AutoML-trained models as they are now leveraging the same format and technical foundation. Take them with you and deploy them anywhere. MLOps Tooling and automated workflows for rapid, continuous delivery and management of models to production
  45. 45. Challenge #6 State-of-the-art changes rapidly
  46. 46. Proprietary + Confidential Use Google’s best-in-class algorithms like NAS Use of Neural Architecture Search (NAS) at Waymo 20–30% lower latency/same quality 8–10% lower error rate/same latency NAS model in 2 weeks vs months (1 year of GPU time) searching over 10k architectures “Going from months of engineering time to generate and fine tune a architecture manually to "automatically generating" neural nets with NAS” NAS Waymo ML Expert
  47. 47. Unstructured Documents { Type: Check Amount: $100 To:Allstate …} Enterprise Knowledge Graph (EKG) Normalize, validate & link entities across your data Capture Unstructured Content Content Warehouse Integrated unstructured + structured storage { Type: Check Amount: $100 To: Allstate Insurance,Inc …} “10 checks indicate $100 payments to Allstate Insurance, Inc” BigQuery analysis engine Unified Analytics Easily join structured & unstructured data into analysis, models, and processes “We’re seeing new payment patterns to All-state Insurance, Inc correlating with Jumbo Loan volumes in the North East” The AI-Powered Enterprise Data Warehouse DocAI + EKG + CMS = Unstructured Data ETL Process Best-in-class AI Store Unified Data Lake Analyze Data Warehousing Use Advanced analytics Document AI Get structured data from unstructured content Human in the Loop (HITL) Comprehensive tooling for human review of AI model creation & outputs
  48. 48. Proprietary + Confidential Train on less data Get a jump start by customizing Google’s high-quality APIs through AutoML Google Cloud can help you leverage AI more effectively Develop AI fast Train ML models in SQL without moving data around. Less code to maintain. Large Scale ML Take advantage of AI accelerators, notebooks to dramatically speed up ML development Integrated with Data Cloud Build unified AI and data pipelines to support recommendations, streaming, and other use cases Activate Analytics, ML easily Incorporate Looker embedded analytics widgets into websites and mobile applications Operationalize ML Take advantage of Vertex AI Pipelines, Feature Store, Notebooks, Continuous Evaluation, etc. Deploy ML on the Edge Leverage TensorFlow Lite and Coral to deploy AI to iOS, Android, custom hardware Work with leader in Data & AI Google Cloud sets the innovation bar in data and AI. Work with us.
  49. 49. Thank you
  50. 50. Join www.ServerlessToronto.org Home of “Less IT Mess”

Learn how Google Cloud addresses the key challenges when building an Agile Data & AI platform. This lecture is important regardless of the Cloud you are (will be) using because most businesses face the same 6 challenges: 1. High-quality AI requires a lot of data 2. AI Expertise is in high demand 3. Getting the value of ML requires a modern data platform 4. Activating ML requires surfacing AI into decision UIs 5. Operationalizing ML is hard 6. State-of-the-art changes rapidly The lecture recording with Q&A is at https://youtu.be/ntBEQdD1IeQ

Views

Total views

141

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×