SlideShare a Scribd company logo
1 of 38
Building an A.I. Cloud
What We Learned from PredictionIO
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
machine-learning-saas
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Simon Chan
Sr. Director, Product Management, Salesforce
Co-founder, PredictionIO
PhD, University College London
simon@salesforce.com
The A.I. Developer Platform Dilemma
FlexibleSimple
“ Every prediction problem is unique.
3 Approaches to Customize Prediction
GUIAutomation
Custom
Code & Script
FlexibleSimple
10 KEY STEPS
to build-your-own A.I.
P.S. Choices = Complexity
One platform, build multiple apps. Here are 3 examples.
1. E-Commerce
Recommend
products
2. Subscription
Predict churn
3. Social Network
Detect spam
Let’s Go Beyond Textbook Tutorial
// Example from Spark ML website
import org.apache.spark.ml.classification.LogisticRegression
// Load training data
val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
Define the Prediction Problem
Be clear about the goal
1
Define the Prediction Problem
Basic Ideas:
◦ What is the Business Goal?
▫ Better user experience
▫ Maximize revenue
▫ Automate a manual task
▫ Forecast future events
◦ What’s the input query?
◦ What’s the prediction output?
◦ What is a good prediction?
◦ What is a bad prediction?
Decide on the Presentation
It’s (still) all about human perception
2
Decide on the Presentation
NOT SPAM SPAM
NOT SPAM True Negative False Negative
SPAM False Positive True Positive
Actual
Predicted
Mailing List and Social Network, for example, may tolerate false predictions differently
Decide on the Presentation
Some UX/UI Choices:
◦ Toleration to Bad Prediction?
◦ Suggestive or Decisive?
◦ Prediction Explainability?
◦ Intelligence Visualization?
◦ Human Interactive?
◦ Score; Ranking; Comparison; Charts; Groups
◦ Feedback Interface
▫ Explicit or Implicit
Import Free-form Data Source
Life is more complicated than MNIST and MovieLens datasets
3
Import Free-form Data Sources
Some Types of Data:
◦ User Attributes
◦ Item (product/content) Attributes
◦ Activities / Events
Estimate (guess) what you need.
Import Free-form Data Sources
Some Ways to Transfer Data:
◦ Transactional versus Batch
◦ Batch Frequency
◦ Full data or Changed Delta Only
Don’t forget continuous data sanity checking
Construct Features & Labels
from Data
Make it algorithm-friendly!
4
Construct Features & Labels from Data
Some Ways of Transformation:
◦ Qualitative to Numerical
◦ Normalize and Weight
◦ Aggregate - Sum, Average?
◦ Time Range
◦ Missing Records
Different algorithms may need different things
Label-specific:
◦ Delayed Feedback
◦ Implicit Assumptions
◦ Reliability of Explicit Opinion
Construct Features & Labels from Data
Qualitative to Numerical
// Example from Spark ML website - TF-IDF
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
val sentenceData = sqlContext.createDataFrame(Seq(
(0, "Hi I heard about Spark"),
(0, "I wish Java could use case classes"),
(1, "Logistic regression models are neat")
)).toDF("label", "sentence")
val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")
val wordsData = tokenizer.transform(sentenceData)
val hashingTF = new HashingTF()
.setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20)
val featurizedData = hashingTF.transform(wordsData)
val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
val idfModel = idf.fit(featurizedData)
val rescaledData = idfModel.transform(featurizedData)
Set Evaluation Metrics
Measure things that matter
5
Set Evaluation Metrics
Some Challenges:
◦ How to Define an Offline Evaluation that
Reflects Real Business Goal?
◦ Delayed Feedback (again)
◦ How to Present The Results to Everyone?
◦ How to Do Live A/B Test?
Clarify “Real-time”
The same word can mean different things
6
Clarify “Real-time”
Different Needs:
◦ Batch Model Update, Batch Queries
◦ Batch Model Update, Real-time Queries
◦ Real-time Model Update, Real-time Queries
When to Train/Re-train for Batch?
Find the Right Model
The “cool” modeling part - algorithms and hyperparameters
7
Find the Right Model
Example of model hyperparameter selection
// Example from Spark ML website
// We use a ParamGridBuilder to construct a grid of parameters to search over.
val paramGrid = new ParamGridBuilder()
.addGrid(hashingTF.numFeatures, Array(10, 100, 1000))
.addGrid(lr.regParam, Array(0.1, 0.01)).build()
// Note that the evaluator here is a BinaryClassificationEvaluator and its default metric
// is areaUnderROC.
val cv = new CrossValidator()
.setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(2) // Use 3+ in practice
// Run cross-validation, and choose the best set of parameters.
val cvModel = cv.fit(training)
Find the Right Model
Some Typical Challenges:
◦ Classification, Regression, Recommendation
or Something Else?
◦ Overfitting / Underfitting
◦ Cold-Start (New Users/Items)
◦ Data Size
◦ Noise
Serve Predictions
Time to Use the Result
8
Serve Predictions
Some Approaches:
◦ Real-time Scoring
◦ Batch Scoring
Real-time Business Logics/Filters is often added
on top.
Collect Feedback for Improvement
Machine Learning is all about “Learning”
9
Collect Feedback for Improvement
Some Mechanism:
◦ Explicit User Feedback
▫ Allow users to correct, or express opinion
on, prediction manually
◦ Implicit Feedback
▫ Learn from subsequence effects of the
previous predictions
▫ Compare predicted results with the
actual reality
Keep Monitoring & Be Crisis-Ready
Make sure things are still working
10
Keep Monitoring & Be Crisis-Ready
Some Ideas:
◦ Real-time Alert (email, slack, pagerduty)
◦ Daily Reports
◦ Possibly integrate with existing monitoring tools
◦ Ready for production rollback
◦ Version control
For both prediction accuracy and production
issues.
Summary: Processes We Need to Simplify
Define the Prediction Problem
Decide on the Presentation
Import Free-form Data Sources
Construct Features & Labels from Data
Set Evaluation Metrics
Clarify “Real-Time”
Find the Right Model
Serve Predictions
Collect Feedback for Improvement
Keep Monitoring & Be Crisis-Ready
The Future of A.I.
is the automation of A.I.
Thanks! Any Questions?
WE ARE HIRING.
simon@salesforce.com
@simonchannet
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
machine-learning-saas

More Related Content

Viewers also liked

Wellbeing week 1 jan 16
Wellbeing week 1 jan 16Wellbeing week 1 jan 16
Wellbeing week 1 jan 16mwalsh2015
 
Elena massetti foster
Elena massetti   fosterElena massetti   foster
Elena massetti fosterinfoprogetto
 
Insurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market SegmentsInsurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market SegmentsAgile Financial Technologies
 
5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back Pain5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back PainEason Chan
 
5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides24Slides
 
Verizon communications
Verizon communicationsVerizon communications
Verizon communicationsPeter Jon
 
Ways to keep allergies at bay
Ways to keep allergies at bayWays to keep allergies at bay
Ways to keep allergies at bayEason Chan
 

Viewers also liked (7)

Wellbeing week 1 jan 16
Wellbeing week 1 jan 16Wellbeing week 1 jan 16
Wellbeing week 1 jan 16
 
Elena massetti foster
Elena massetti   fosterElena massetti   foster
Elena massetti foster
 
Insurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market SegmentsInsurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market Segments
 
5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back Pain5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back Pain
 
5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides
 
Verizon communications
Verizon communicationsVerizon communications
Verizon communications
 
Ways to keep allergies at bay
Ways to keep allergies at bayWays to keep allergies at bay
Ways to keep allergies at bay
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Building an AI in the Cloud

  • 1. Building an A.I. Cloud What We Learned from PredictionIO
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ machine-learning-saas
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Simon Chan Sr. Director, Product Management, Salesforce Co-founder, PredictionIO PhD, University College London simon@salesforce.com
  • 5. The A.I. Developer Platform Dilemma FlexibleSimple
  • 6. “ Every prediction problem is unique.
  • 7. 3 Approaches to Customize Prediction GUIAutomation Custom Code & Script FlexibleSimple
  • 8. 10 KEY STEPS to build-your-own A.I. P.S. Choices = Complexity
  • 9. One platform, build multiple apps. Here are 3 examples. 1. E-Commerce Recommend products 2. Subscription Predict churn 3. Social Network Detect spam
  • 10. Let’s Go Beyond Textbook Tutorial // Example from Spark ML website import org.apache.spark.ml.classification.LogisticRegression // Load training data val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training)
  • 11. Define the Prediction Problem Be clear about the goal 1
  • 12. Define the Prediction Problem Basic Ideas: ◦ What is the Business Goal? ▫ Better user experience ▫ Maximize revenue ▫ Automate a manual task ▫ Forecast future events ◦ What’s the input query? ◦ What’s the prediction output? ◦ What is a good prediction? ◦ What is a bad prediction?
  • 13. Decide on the Presentation It’s (still) all about human perception 2
  • 14. Decide on the Presentation NOT SPAM SPAM NOT SPAM True Negative False Negative SPAM False Positive True Positive Actual Predicted Mailing List and Social Network, for example, may tolerate false predictions differently
  • 15. Decide on the Presentation Some UX/UI Choices: ◦ Toleration to Bad Prediction? ◦ Suggestive or Decisive? ◦ Prediction Explainability? ◦ Intelligence Visualization? ◦ Human Interactive? ◦ Score; Ranking; Comparison; Charts; Groups ◦ Feedback Interface ▫ Explicit or Implicit
  • 16. Import Free-form Data Source Life is more complicated than MNIST and MovieLens datasets 3
  • 17. Import Free-form Data Sources Some Types of Data: ◦ User Attributes ◦ Item (product/content) Attributes ◦ Activities / Events Estimate (guess) what you need.
  • 18. Import Free-form Data Sources Some Ways to Transfer Data: ◦ Transactional versus Batch ◦ Batch Frequency ◦ Full data or Changed Delta Only Don’t forget continuous data sanity checking
  • 19. Construct Features & Labels from Data Make it algorithm-friendly! 4
  • 20. Construct Features & Labels from Data Some Ways of Transformation: ◦ Qualitative to Numerical ◦ Normalize and Weight ◦ Aggregate - Sum, Average? ◦ Time Range ◦ Missing Records Different algorithms may need different things Label-specific: ◦ Delayed Feedback ◦ Implicit Assumptions ◦ Reliability of Explicit Opinion
  • 21. Construct Features & Labels from Data Qualitative to Numerical // Example from Spark ML website - TF-IDF import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} val sentenceData = sqlContext.createDataFrame(Seq( (0, "Hi I heard about Spark"), (0, "I wish Java could use case classes"), (1, "Logistic regression models are neat") )).toDF("label", "sentence") val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words") val wordsData = tokenizer.transform(sentenceData) val hashingTF = new HashingTF() .setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20) val featurizedData = hashingTF.transform(wordsData) val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features") val idfModel = idf.fit(featurizedData) val rescaledData = idfModel.transform(featurizedData)
  • 22. Set Evaluation Metrics Measure things that matter 5
  • 23. Set Evaluation Metrics Some Challenges: ◦ How to Define an Offline Evaluation that Reflects Real Business Goal? ◦ Delayed Feedback (again) ◦ How to Present The Results to Everyone? ◦ How to Do Live A/B Test?
  • 24. Clarify “Real-time” The same word can mean different things 6
  • 25. Clarify “Real-time” Different Needs: ◦ Batch Model Update, Batch Queries ◦ Batch Model Update, Real-time Queries ◦ Real-time Model Update, Real-time Queries When to Train/Re-train for Batch?
  • 26. Find the Right Model The “cool” modeling part - algorithms and hyperparameters 7
  • 27. Find the Right Model Example of model hyperparameter selection // Example from Spark ML website // We use a ParamGridBuilder to construct a grid of parameters to search over. val paramGrid = new ParamGridBuilder() .addGrid(hashingTF.numFeatures, Array(10, 100, 1000)) .addGrid(lr.regParam, Array(0.1, 0.01)).build() // Note that the evaluator here is a BinaryClassificationEvaluator and its default metric // is areaUnderROC. val cv = new CrossValidator() .setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(2) // Use 3+ in practice // Run cross-validation, and choose the best set of parameters. val cvModel = cv.fit(training)
  • 28. Find the Right Model Some Typical Challenges: ◦ Classification, Regression, Recommendation or Something Else? ◦ Overfitting / Underfitting ◦ Cold-Start (New Users/Items) ◦ Data Size ◦ Noise
  • 29. Serve Predictions Time to Use the Result 8
  • 30. Serve Predictions Some Approaches: ◦ Real-time Scoring ◦ Batch Scoring Real-time Business Logics/Filters is often added on top.
  • 31. Collect Feedback for Improvement Machine Learning is all about “Learning” 9
  • 32. Collect Feedback for Improvement Some Mechanism: ◦ Explicit User Feedback ▫ Allow users to correct, or express opinion on, prediction manually ◦ Implicit Feedback ▫ Learn from subsequence effects of the previous predictions ▫ Compare predicted results with the actual reality
  • 33. Keep Monitoring & Be Crisis-Ready Make sure things are still working 10
  • 34. Keep Monitoring & Be Crisis-Ready Some Ideas: ◦ Real-time Alert (email, slack, pagerduty) ◦ Daily Reports ◦ Possibly integrate with existing monitoring tools ◦ Ready for production rollback ◦ Version control For both prediction accuracy and production issues.
  • 35. Summary: Processes We Need to Simplify Define the Prediction Problem Decide on the Presentation Import Free-form Data Sources Construct Features & Labels from Data Set Evaluation Metrics Clarify “Real-Time” Find the Right Model Serve Predictions Collect Feedback for Improvement Keep Monitoring & Be Crisis-Ready
  • 36. The Future of A.I. is the automation of A.I.
  • 37. Thanks! Any Questions? WE ARE HIRING. simon@salesforce.com @simonchannet
  • 38. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ machine-learning-saas