SlideShare a Scribd company logo
©2015 IBM Corporation1 10 April 2017
presentation starting soon… sit down
©2015 IBM Corporation2 10 April 2017
Flink Forward- San Francisco
Trevor Grant
@rawkintrevo
April 11th, 2017
Introduction to Online Machine
Learning Algorithms
©2015 IBM Corporation3 10 April 2017
• Who is this guy?
• Why should I care?
• What’s going on here?
• This seems boring and mathy, maybe I should leave…
Intro
Buzzwords
Basic Online Learners
Challenges
Lambda Recommender
Conclusions
Table of Contents
©2015 IBM Corporation 10 April 20174
▪ Trevor Grant
▪ Things I do:
– Open Source Technical Evangelist, IBM
– PMC Apache Mahout
– Blog: http://rawkintrevo.org
▪ Schooling
– MS Applied Math, Illinois State
– MBA, Illinois State
▪ How to get ahold of me:
– @rawkintrevo
– trevor.grant@ibm.com / rawkintrevo@apache.org
– Mahout Dev and User Mailing Lists
Branding
©2015 IBM Corporation 10 April 20175
▪ To disambiguate terms related to machine learning / streaming machine learning.
▪ Hopefully after this you
– Won’t keep using words wrong
– Will know when someone else is
▪ be pretentious
▪ or don’t
▪ Bonus material:
– We build a fairly cool, yet super simple online recommender
– Apache Flink + Apache Spark + Apache Mahout
Why does any of this matter?
©2015 IBM Corporation 10 April 20176
This talk invokes the following types of maths
▪ Weighted Averaging
▪ Matrix Times Vector
Also there’s pictures.
Math. Eewww.
©2015 IBM Corporation 10 April 20177
▪ Useful animations
http://eli.thegreenplace.net/images/2016/regressionfit.gif
▪ Unrelated animal pictures
Types of Pictures
Macs are good for keeping cat butts warm…
and not much else.
©2015 IBM Corporation8 10 April 2017
• On the virtues of not throwing around buzzwords…
• Online vs. Offline
• Lambda vs. Kappa (w.r.t. machine learning)
• Statistical vs Adversarial
• Real-Time (one buzzword to rule them all)
Intro
Buzzwords
Basic Online Learners
Challenges
Lambda Recommender
Conclusions
Table of Contents
©2015 IBM Corporation 10 April 20179
Online
▪ Input processed piece by piece in a serial
fashion
▪ Each new piece of information generates
an event
– Not mini-batching
– Possibly on a sliding window of record 1
▪ Not necessarily low latency
Offline
▪ Input processed in batches
▪ Not necessarily high latency
Online vs. Offline
©2015 IBM Corporation 10 April 201710
Slow Online
Stock broker in Des Moines Iowa writes
Python program that get’s EOD
prices/statistics as they are published and
then executes orders.
Fast Offline
HFT algorithm, executes trades based on
tumbling windows of 15 milliseconds worth
of activity
Fast offline, slow online and stack order
Online doesn’t mean fast, online doesn’t mean streaming, online only means that it processes
information as soon as it is received.
Consider an online algorithm (the slow online example), exists behind an offline EOD batch job.
- This is an extreme case, but no algorithm receives data as it is created.
- Best case- limited by speed of light (?)
©2015 IBM Corporation 10 April 201711
Lambda
Leaning happens (i.e. models are fitted)
offline
Model used by streaming engine to make
decisions online
Kappa
Learning happens (i.e. models are fitted)
online
Online decision model updates for each new
record seen
Model can change structure e.g. new words
in TF-IDF or new categories in ‘factor model
linear regression’
Lambda vs. Kappa (Machine Learning)
©2015 IBM Corporation 10 April 201712
▪ A trained model expects structurally the same as training data.
▪ In linear regression, categorical features are “one-hot-encoded”. A feature with 3
categories expressed as a vector in 2 columns.
▪ What if a new category pops up?
– Depends how you program it-
▪ ignore the input
▪ serve a bad response
▪ Consider clustering classification on text… new words?
– Ignore: (probably what you’ll do)
– Word might be very important...
Lambda with Novell Information
©2015 IBM Corporation 10 April 201713
▪ In Kappa, training happens with each new piece of data
– Model data can account for structural change in data instantly
▪ New words can be introduced into TF-IDF
▪ New categories into a factor variable
▪ Both examples (and others) causes input vector to change.
Kappa with Novell Information
©2015 IBM Corporation 10 April 201714
Traditional
Common statistical methods
▪ Supervised
▪ Unsupervised
Graded by
▪ Statistical Fitness Tests
▪ Out of core testing
▪ E.g.
– Confusion Matrix, AuROC
– MSE, MAPE, R2, MSE
Adversarial
Algorithm Versus Environment
▪ vs. Spammers
▪ vs. Hackers
▪ vs. Nature
Graded by
▪ Directionally can use some tests
▪ Really A/B testing
– Adversaries may get smarter over time
– Type of test where you automate
adversary.
Statistical vs. Adversarial
©2015 IBM Corporation 10 April 201715
▪ Subjective
▪ A good buzzword for something that:
– Doesn’t fall into any of the above
categories cleanly
– Doesn’t fall into the category you want it
to fall into
– You’re not really sure which buzzword to
use, so you need a ‘safe’ word that no
one can call you on.
– Days
– Weeks?
– JJs
Real-time
©2015 IBM Corporation16 10 April 2017
• Streaming K-Means
• Streaming Linear Regression
• Why would I ever do with this?
Intro
Buzzwords
Basic Online Learners
Challenges
Lambda Recommender
Conclusions
Table of Contents
©2015 IBM Corporation 10 April 201717
K-Means
Photo Credit: Simply Statistics http://simplystatistics.org/wp-content/uploads/2014/02/kmeans.gif
©2015 IBM Corporation 10 April 201718
Online K-Means
©2015 IBM Corporation 10 April 201719
Online K-Means
New Point
©2015 IBM Corporation 10 April 201720
Online K-Means
Probably Red
©2015 IBM Corporation 10 April 201721
Online K-Means
Update Red “Center”
©2015 IBM Corporation 10 April 201722
http://eli.thegreenplace.net/images/2016/regressionfit.gif
Linear Regression (Stochastic)
©2015 IBM Corporation 10 April 201723
Online Linear Regression
Fit Line
Last Point received
©2015 IBM Corporation 10 April 201724
Online Linear Regression
Fit Line
Last Point received
new point
©2015 IBM Corporation 10 April 201725
Online Linear Regression
Fit Line
Last Point received
new point
Temp fit line
©2015 IBM Corporation 10 April 201726
Online Linear Regression
Original Fit Line
New fit line (weighted avg)
Last Point received
new point
Temp fit line
©2015 IBM Corporation 10 April 201727
Online Linear Regression
Original Fit Line
old point
©2015 IBM Corporation 10 April 201728
Online Linear Regression
Original Fit Line
New point
old point
©2015 IBM Corporation 10 April 201729
Online Linear Regression
Original Fit Line
New point
old point
Temp fit line
©2015 IBM Corporation 10 April 201730
Online Linear Regression
Original Fit Line
New fit line (weighted avg)
New point
old point
Temp fit line
©2015 IBM Corporation 10 April 201731
This would work on neural networks too.
Also “Deep Learning” is another buzz word.
Deep learning
©2015 IBM Corporation 10 April 201732
▪ Mostly Anomaly Detection (moving average, then something deviates)
– A very popular use case of online/streaming algorithms (more talks today about this)
– Algorithm learns what is normal (either online or offline)
– When normality is sufficiently violated- the algorithm sounds an alarm
– All anomaly detections some flavor of this. Usually referred to as:
_______ Anomaly Detection, only to specify what algorithm was used for defining
normality (or lack there-of).
– Architecture: online-offline training choices depend primarily on how fast ‘normality’
changes in your specific use case
Why?
©2015 IBM Corporation33 10 April 2017
• Adversarial Analysis
• Scoring in Real Time (how do you know you’re right?)
• A/B Tests
Intro
Buzzwords
Basic Online Learners
Challenges / Solutions
Lambda Recommender
Conclusions
Table of Contents
©2015 IBM Corporation 10 April 201734
CHALLENGE:
How do you know how far you ‘missed’ prediction? In real life ‘correct’ answers may arrive
later.
Corollary: If you have ‘correct’ answer why are you trying to predict it?
Not insurmountable, but prevents ‘one size fits all’ approaches (context dependence).
Learning in real-time with supervised methods
(challenge)
©2015 IBM Corporation 10 April 201735
You’ve only got so much hardware.
Latency and Normal Streaming Problems
©2015 IBM Corporation 10 April 201736
Simple Adversary-
How well does the algorithm do against “offline” version?
Consider Linear Regression with SGD
▪ Offline algorithm gets over full data set, then predicts
▪ Online model gets single pass to train and predict
How much worse is online than offline?
Adversarial Analysis
©2015 IBM Corporation 10 April 201737
Online algos are often interacting with the environment.
Learning rates, other knobs.
A/B Tests- The gold standard
©2015 IBM Corporation38 10 April 2017
• Correlated Co Occurrence – Brief Primer
• Architecture Overview
• Code walk through
• Looking at (pointless) results.
Intro
Buzzwords
Basic Online Learners
Challenges
Lambda Recommender
Conclusions
Table of Contents
©2015 IBM Corporation 10 April 201739
▪ Overview of CCO
– Collaborative Filtering (Like ALS, etc.)
– Behavior Based (also like ALS)
– Uses co-occurrence (no matrix factorization, unlike ALS)
– Multi-modal: more than one behavior considered (unlike ALS / CO)
▪ Benefits of CCO
– Many types of behaviors can be considered at once
– Can make recommendations for users never seen before.
Correlated Co Occurrence Recommender:
Overview / Benefits
©2015 IBM Corporation 10 April 201740
▪
CCO Math
A Simple Co-Occurrence Recommender
©2015 IBM Corporation 10 April 201741
CCO Math
Correlated Co-Occurrence Recommender
©2015 IBM Corporation 10 April 201742
Calculate:
AtA, AtB, AtC, …
Streaming pizza
tweets
CCO
Precomputed
Matrices
Architecture: Lambda CCO
(Logo soup)
Historical Tweets
about Pizza
Online Recommendations
©2015 IBM Corporation 10 April 201743
561918328478785536,None
561918357851897858,None
561909179716481024,pizzagate
561909179716481024,gamergate
561949040011931649,None
561948991777038336,None
561947869805285377,superbowl
561947869805285377,pizzapizza
561918920282476545,None
561926796778565632,gunfriendly
561927577351503873,None
Python pulled in historical tweets and did this
UserID - HashTag
©2015 IBM Corporation 10 April 201744
561684486380068865,savethem000
561684486380068865,i
561684486380068865,dunno
561684486380068865,smiles
561684486380068865,want
561684486380068865,to
561684486380068865,get
561684486380068865,some
561684486380068865,pizza
561684486380068865,or
561684486380068865,something
561684441526194176,pizza
561684441526194176,de
561684441526194176,queso
561684441526194176,lista
561684441526194176,para
Python pulled in historical tweets and did this
UserID - Words
©2015 IBM Corporation 10 April 201745
import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark
import org.apache.mahout.math.cf.SimilarityAnalysis
val baseDir = "/home/rawkintrevo/gits/ffsf17-twitter-recos/data"
// We need to turn our raw text files into RDD[(String, String)]
val userFriendsRDD = sc.textFile(baseDir + "/user-friends.csv")
.map(line => line.split(",")).filter(_.length == 2).map(a => (a(0), a(1)))
val userFriendsIDS = IndexedDatasetSpark.apply(userFriendsRDD)(sc)
val userHashtagsRDD = sc.textFile(baseDir + "/user-ht.csv")
.map(line => line.split(",")).filter(_.length == 2).map(a => (a(0), a(1)))
val userHashtagsIDS = IndexedDatasetSpark.apply(userHashtagsRDD)(sc)
val userWordsRDD = sc.textFile(baseDir + "/user-words.csv")
.map(line => line.split(",")).filter(_.length == 2).map(a => (a(0), a(1)))
val userWordsIDS = IndexedDatasetSpark.apply(userWordsRDD)(sc)
val hashtagReccosLlrDrmListByUser = SimilarityAnalysis.cooccurrencesIDSs(
Array(userHashtagsIDS, userWordsIDS, userFriendsIDS),
maxInterestingItemsPerThing = 100,
maxNumInteractions = 500,
randomSeed = 1234)
Some Spark Code
©2015 IBM Corporation 10 April 201746
CCO Math
Spark+Mahout just Calculated these:
©2015 IBM Corporation 10 April 201747
streamSource.map(jsonString => {
val result = JSON.parseFull(jsonString)
val output = result match {
case Some(e) => {
/*****************************************************************************************
* Some pretty lazy tweet handling
val tweet: Map[String, Any] = e.asInstanceOf[Map[String, Any]]
val text: String = tweet("text").asInstanceOf[String]
val words: Array[String] = text.split("s+").map(word => word.replaceAll("[^A-Za-z0-9]", "").toLowerCase())
val entities = tweet("entities").asInstanceOf[Map[String, List[Map[String, String]]]]
val hashtags: List[String] = entities("hashtags").toArray.map(m => m.getOrElse("text","").toLowerCase()).toList
val mentions: List[String] = entities("user_mentions").toArray.map(m => m.getOrElse("id_str", "")).toList
/****************************************************************************************
* Mahout CCO
val hashtagsMat = sparse(hashtagsProtoMat.map(m => svec(m, cardinality = hashtagsBiDict.size)):_*)
val wordsMat = sparse(wordsProtoMat.map(m => svec(m, cardinality= wordsBiDict.size)):_*)
val friendsMat = sparse(friendsProtoMat.map(m => svec(m, cardinality = friendsBiDict.size)):_*)
val userWordsVec = listOfStringsToSVec(words.toList, wordsBiDict)
val userHashtagsVec = listOfStringsToSVec(hashtags, hashtagsBiDict)
val userMentionsVec = listOfStringsToSVec(mentions, friendsBiDict)
val reccos = hashtagsMat %*% userHashtagsVec + wordsMat %*% userWordsVec + friendsMat %*% userMentionsVec
/*****************************************************************************************
* Sort and Pretty Print
Some Flink Code
©2015 IBM Corporation 10 April 201748
CCO Math
Flink+Mahout just Calculated these:
©2015 IBM Corporation 10 April 201749
Tweets
text: joemalicki josephchmura well i can make a pizza i bet he cant so there
userWordsVec: so a well i there can he make pizza cant
hashtags used: List()
hashtags reccomended:
(ruinafriendshipin5words : 13.941270843461098)
(worstdayin4words : 8.93444123705558)
(recipes : 8.423061768672596)
text: people people dipping pizza in milk im done
userWordsVec: people in im done pizza
hashtags used: List()
hashtags reccomended:
(None : 18.560367273335828)
(vegan : 10.84782189800353)
(fromscratch : 10.84782189800353)
*Results were cherry picked- no preprocessing, this was a garbage in-garbage out algo for illustration
purposes only.
©2015 IBM Corporation 10 April 201750
Buzzword Soup
©2015 IBM Corporation 10 April 201751
Don’t do this in real life, probably. (you would use a service)
Also...
©2015 IBM Corporation52 10 April 2017
• Trevor attempts to tie everything together into a cohesive thought
• Audience members asks easy questions
• Audience members buy speaker beer at after party
Intro
Buzzwords
Basic Online Learners
Challenges
Lambda Recommender
Conclusions
Table of Contents
©2015 IBM Corporation 10 April 201753
A lot of buzzwords have been flying around especially with respect to machine learning and
streaming.
▪ Online
▪ Lambda / Kappa architecture
▪ Streaming machine learning
▪ Real time predictive model
▪ machine learning
▪ artificial/machine/cognitive intelligence
▪ cognitive
▪ blah- ^^ pick 2.
Final Thoughts
©2015 IBM Corporation 10 April 201754
Now that you’ve sat through this talk hopefully you can:
1. Call people out for trying to make their product/service/open source project/startup
sound like a bigger deal than it is
2. Church up your product/service/open source project/startup to get clients/VC dummies
excited about it without technically lying
Final Thoughts
©2015 IBM Corporation55 10 April 2017
Buy trevor beers.
Questions?
https://github.com/rawkintrevo/fsf17-twitter-recos

More Related Content

What's hot

SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Databricks
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
Paolo Platter
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem
Jiangjie Qin
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
DataWorks Summit/Hadoop Summit
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
Shivaji Dutta
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Databricks
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Anass Bensrhir - Senior Data Scientist
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
MLconf
 
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Databricks
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 

What's hot (20)

SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
 
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
 

Similar to Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning Algorithms

Introductions to Online Machine Learning Algorithms
Introductions to Online Machine Learning AlgorithmsIntroductions to Online Machine Learning Algorithms
Introductions to Online Machine Learning Algorithms
DataWorks Summit
 
JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?
Steve Poole
 
LIMS Implementation
LIMS ImplementationLIMS Implementation
LIMS Implementation
Robin Emig
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Steve Poole
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
Joakim Lindbom
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Amazon Web Services
 
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
Joakim Lindbom
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
Steven Miller
 
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspectiveRational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
Joakim Lindbom
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive eraPreparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
Steven Miller
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the myths
Chris Swan
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
Panorama Software
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Citrix
 
GluonCV
GluonCVGluonCV
GluonCV
Soji Adeshina
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
Tools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipelineTools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipeline
Matteo Emili
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
Anne-Marie Tousch
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
Hakka Labs
 
SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...
SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...
SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...
Steve Poole
 

Similar to Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning Algorithms (20)

Introductions to Online Machine Learning Algorithms
Introductions to Online Machine Learning AlgorithmsIntroductions to Online Machine Learning Algorithms
Introductions to Online Machine Learning Algorithms
 
JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?JVMCON Java in the 21st Century: are you thinking far enough ahead?
JVMCON Java in the 21st Century: are you thinking far enough ahead?
 
LIMS Implementation
LIMS ImplementationLIMS Implementation
LIMS Implementation
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
 
Vision2015-CBS-1148-Final
Vision2015-CBS-1148-FinalVision2015-CBS-1148-Final
Vision2015-CBS-1148-Final
 
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
 
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspectiveRational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive eraPreparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the myths
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
 
GluonCV
GluonCVGluonCV
GluonCV
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Tools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipelineTools and practices to use in a Continuous Delivery pipeline
Tools and practices to use in a Continuous Delivery pipeline
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...
SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...
SkillsMatter June 2018: Java in the 21st Century: Are You Thinking Far Enough...
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 

Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning Algorithms

  • 1. ©2015 IBM Corporation1 10 April 2017 presentation starting soon… sit down
  • 2. ©2015 IBM Corporation2 10 April 2017 Flink Forward- San Francisco Trevor Grant @rawkintrevo April 11th, 2017 Introduction to Online Machine Learning Algorithms
  • 3. ©2015 IBM Corporation3 10 April 2017 • Who is this guy? • Why should I care? • What’s going on here? • This seems boring and mathy, maybe I should leave… Intro Buzzwords Basic Online Learners Challenges Lambda Recommender Conclusions Table of Contents
  • 4. ©2015 IBM Corporation 10 April 20174 ▪ Trevor Grant ▪ Things I do: – Open Source Technical Evangelist, IBM – PMC Apache Mahout – Blog: http://rawkintrevo.org ▪ Schooling – MS Applied Math, Illinois State – MBA, Illinois State ▪ How to get ahold of me: – @rawkintrevo – trevor.grant@ibm.com / rawkintrevo@apache.org – Mahout Dev and User Mailing Lists Branding
  • 5. ©2015 IBM Corporation 10 April 20175 ▪ To disambiguate terms related to machine learning / streaming machine learning. ▪ Hopefully after this you – Won’t keep using words wrong – Will know when someone else is ▪ be pretentious ▪ or don’t ▪ Bonus material: – We build a fairly cool, yet super simple online recommender – Apache Flink + Apache Spark + Apache Mahout Why does any of this matter?
  • 6. ©2015 IBM Corporation 10 April 20176 This talk invokes the following types of maths ▪ Weighted Averaging ▪ Matrix Times Vector Also there’s pictures. Math. Eewww.
  • 7. ©2015 IBM Corporation 10 April 20177 ▪ Useful animations http://eli.thegreenplace.net/images/2016/regressionfit.gif ▪ Unrelated animal pictures Types of Pictures Macs are good for keeping cat butts warm… and not much else.
  • 8. ©2015 IBM Corporation8 10 April 2017 • On the virtues of not throwing around buzzwords… • Online vs. Offline • Lambda vs. Kappa (w.r.t. machine learning) • Statistical vs Adversarial • Real-Time (one buzzword to rule them all) Intro Buzzwords Basic Online Learners Challenges Lambda Recommender Conclusions Table of Contents
  • 9. ©2015 IBM Corporation 10 April 20179 Online ▪ Input processed piece by piece in a serial fashion ▪ Each new piece of information generates an event – Not mini-batching – Possibly on a sliding window of record 1 ▪ Not necessarily low latency Offline ▪ Input processed in batches ▪ Not necessarily high latency Online vs. Offline
  • 10. ©2015 IBM Corporation 10 April 201710 Slow Online Stock broker in Des Moines Iowa writes Python program that get’s EOD prices/statistics as they are published and then executes orders. Fast Offline HFT algorithm, executes trades based on tumbling windows of 15 milliseconds worth of activity Fast offline, slow online and stack order Online doesn’t mean fast, online doesn’t mean streaming, online only means that it processes information as soon as it is received. Consider an online algorithm (the slow online example), exists behind an offline EOD batch job. - This is an extreme case, but no algorithm receives data as it is created. - Best case- limited by speed of light (?)
  • 11. ©2015 IBM Corporation 10 April 201711 Lambda Leaning happens (i.e. models are fitted) offline Model used by streaming engine to make decisions online Kappa Learning happens (i.e. models are fitted) online Online decision model updates for each new record seen Model can change structure e.g. new words in TF-IDF or new categories in ‘factor model linear regression’ Lambda vs. Kappa (Machine Learning)
  • 12. ©2015 IBM Corporation 10 April 201712 ▪ A trained model expects structurally the same as training data. ▪ In linear regression, categorical features are “one-hot-encoded”. A feature with 3 categories expressed as a vector in 2 columns. ▪ What if a new category pops up? – Depends how you program it- ▪ ignore the input ▪ serve a bad response ▪ Consider clustering classification on text… new words? – Ignore: (probably what you’ll do) – Word might be very important... Lambda with Novell Information
  • 13. ©2015 IBM Corporation 10 April 201713 ▪ In Kappa, training happens with each new piece of data – Model data can account for structural change in data instantly ▪ New words can be introduced into TF-IDF ▪ New categories into a factor variable ▪ Both examples (and others) causes input vector to change. Kappa with Novell Information
  • 14. ©2015 IBM Corporation 10 April 201714 Traditional Common statistical methods ▪ Supervised ▪ Unsupervised Graded by ▪ Statistical Fitness Tests ▪ Out of core testing ▪ E.g. – Confusion Matrix, AuROC – MSE, MAPE, R2, MSE Adversarial Algorithm Versus Environment ▪ vs. Spammers ▪ vs. Hackers ▪ vs. Nature Graded by ▪ Directionally can use some tests ▪ Really A/B testing – Adversaries may get smarter over time – Type of test where you automate adversary. Statistical vs. Adversarial
  • 15. ©2015 IBM Corporation 10 April 201715 ▪ Subjective ▪ A good buzzword for something that: – Doesn’t fall into any of the above categories cleanly – Doesn’t fall into the category you want it to fall into – You’re not really sure which buzzword to use, so you need a ‘safe’ word that no one can call you on. – Days – Weeks? – JJs Real-time
  • 16. ©2015 IBM Corporation16 10 April 2017 • Streaming K-Means • Streaming Linear Regression • Why would I ever do with this? Intro Buzzwords Basic Online Learners Challenges Lambda Recommender Conclusions Table of Contents
  • 17. ©2015 IBM Corporation 10 April 201717 K-Means Photo Credit: Simply Statistics http://simplystatistics.org/wp-content/uploads/2014/02/kmeans.gif
  • 18. ©2015 IBM Corporation 10 April 201718 Online K-Means
  • 19. ©2015 IBM Corporation 10 April 201719 Online K-Means New Point
  • 20. ©2015 IBM Corporation 10 April 201720 Online K-Means Probably Red
  • 21. ©2015 IBM Corporation 10 April 201721 Online K-Means Update Red “Center”
  • 22. ©2015 IBM Corporation 10 April 201722 http://eli.thegreenplace.net/images/2016/regressionfit.gif Linear Regression (Stochastic)
  • 23. ©2015 IBM Corporation 10 April 201723 Online Linear Regression Fit Line Last Point received
  • 24. ©2015 IBM Corporation 10 April 201724 Online Linear Regression Fit Line Last Point received new point
  • 25. ©2015 IBM Corporation 10 April 201725 Online Linear Regression Fit Line Last Point received new point Temp fit line
  • 26. ©2015 IBM Corporation 10 April 201726 Online Linear Regression Original Fit Line New fit line (weighted avg) Last Point received new point Temp fit line
  • 27. ©2015 IBM Corporation 10 April 201727 Online Linear Regression Original Fit Line old point
  • 28. ©2015 IBM Corporation 10 April 201728 Online Linear Regression Original Fit Line New point old point
  • 29. ©2015 IBM Corporation 10 April 201729 Online Linear Regression Original Fit Line New point old point Temp fit line
  • 30. ©2015 IBM Corporation 10 April 201730 Online Linear Regression Original Fit Line New fit line (weighted avg) New point old point Temp fit line
  • 31. ©2015 IBM Corporation 10 April 201731 This would work on neural networks too. Also “Deep Learning” is another buzz word. Deep learning
  • 32. ©2015 IBM Corporation 10 April 201732 ▪ Mostly Anomaly Detection (moving average, then something deviates) – A very popular use case of online/streaming algorithms (more talks today about this) – Algorithm learns what is normal (either online or offline) – When normality is sufficiently violated- the algorithm sounds an alarm – All anomaly detections some flavor of this. Usually referred to as: _______ Anomaly Detection, only to specify what algorithm was used for defining normality (or lack there-of). – Architecture: online-offline training choices depend primarily on how fast ‘normality’ changes in your specific use case Why?
  • 33. ©2015 IBM Corporation33 10 April 2017 • Adversarial Analysis • Scoring in Real Time (how do you know you’re right?) • A/B Tests Intro Buzzwords Basic Online Learners Challenges / Solutions Lambda Recommender Conclusions Table of Contents
  • 34. ©2015 IBM Corporation 10 April 201734 CHALLENGE: How do you know how far you ‘missed’ prediction? In real life ‘correct’ answers may arrive later. Corollary: If you have ‘correct’ answer why are you trying to predict it? Not insurmountable, but prevents ‘one size fits all’ approaches (context dependence). Learning in real-time with supervised methods (challenge)
  • 35. ©2015 IBM Corporation 10 April 201735 You’ve only got so much hardware. Latency and Normal Streaming Problems
  • 36. ©2015 IBM Corporation 10 April 201736 Simple Adversary- How well does the algorithm do against “offline” version? Consider Linear Regression with SGD ▪ Offline algorithm gets over full data set, then predicts ▪ Online model gets single pass to train and predict How much worse is online than offline? Adversarial Analysis
  • 37. ©2015 IBM Corporation 10 April 201737 Online algos are often interacting with the environment. Learning rates, other knobs. A/B Tests- The gold standard
  • 38. ©2015 IBM Corporation38 10 April 2017 • Correlated Co Occurrence – Brief Primer • Architecture Overview • Code walk through • Looking at (pointless) results. Intro Buzzwords Basic Online Learners Challenges Lambda Recommender Conclusions Table of Contents
  • 39. ©2015 IBM Corporation 10 April 201739 ▪ Overview of CCO – Collaborative Filtering (Like ALS, etc.) – Behavior Based (also like ALS) – Uses co-occurrence (no matrix factorization, unlike ALS) – Multi-modal: more than one behavior considered (unlike ALS / CO) ▪ Benefits of CCO – Many types of behaviors can be considered at once – Can make recommendations for users never seen before. Correlated Co Occurrence Recommender: Overview / Benefits
  • 40. ©2015 IBM Corporation 10 April 201740 ▪ CCO Math A Simple Co-Occurrence Recommender
  • 41. ©2015 IBM Corporation 10 April 201741 CCO Math Correlated Co-Occurrence Recommender
  • 42. ©2015 IBM Corporation 10 April 201742 Calculate: AtA, AtB, AtC, … Streaming pizza tweets CCO Precomputed Matrices Architecture: Lambda CCO (Logo soup) Historical Tweets about Pizza Online Recommendations
  • 43. ©2015 IBM Corporation 10 April 201743 561918328478785536,None 561918357851897858,None 561909179716481024,pizzagate 561909179716481024,gamergate 561949040011931649,None 561948991777038336,None 561947869805285377,superbowl 561947869805285377,pizzapizza 561918920282476545,None 561926796778565632,gunfriendly 561927577351503873,None Python pulled in historical tweets and did this UserID - HashTag
  • 44. ©2015 IBM Corporation 10 April 201744 561684486380068865,savethem000 561684486380068865,i 561684486380068865,dunno 561684486380068865,smiles 561684486380068865,want 561684486380068865,to 561684486380068865,get 561684486380068865,some 561684486380068865,pizza 561684486380068865,or 561684486380068865,something 561684441526194176,pizza 561684441526194176,de 561684441526194176,queso 561684441526194176,lista 561684441526194176,para Python pulled in historical tweets and did this UserID - Words
  • 45. ©2015 IBM Corporation 10 April 201745 import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark import org.apache.mahout.math.cf.SimilarityAnalysis val baseDir = "/home/rawkintrevo/gits/ffsf17-twitter-recos/data" // We need to turn our raw text files into RDD[(String, String)] val userFriendsRDD = sc.textFile(baseDir + "/user-friends.csv") .map(line => line.split(",")).filter(_.length == 2).map(a => (a(0), a(1))) val userFriendsIDS = IndexedDatasetSpark.apply(userFriendsRDD)(sc) val userHashtagsRDD = sc.textFile(baseDir + "/user-ht.csv") .map(line => line.split(",")).filter(_.length == 2).map(a => (a(0), a(1))) val userHashtagsIDS = IndexedDatasetSpark.apply(userHashtagsRDD)(sc) val userWordsRDD = sc.textFile(baseDir + "/user-words.csv") .map(line => line.split(",")).filter(_.length == 2).map(a => (a(0), a(1))) val userWordsIDS = IndexedDatasetSpark.apply(userWordsRDD)(sc) val hashtagReccosLlrDrmListByUser = SimilarityAnalysis.cooccurrencesIDSs( Array(userHashtagsIDS, userWordsIDS, userFriendsIDS), maxInterestingItemsPerThing = 100, maxNumInteractions = 500, randomSeed = 1234) Some Spark Code
  • 46. ©2015 IBM Corporation 10 April 201746 CCO Math Spark+Mahout just Calculated these:
  • 47. ©2015 IBM Corporation 10 April 201747 streamSource.map(jsonString => { val result = JSON.parseFull(jsonString) val output = result match { case Some(e) => { /***************************************************************************************** * Some pretty lazy tweet handling val tweet: Map[String, Any] = e.asInstanceOf[Map[String, Any]] val text: String = tweet("text").asInstanceOf[String] val words: Array[String] = text.split("s+").map(word => word.replaceAll("[^A-Za-z0-9]", "").toLowerCase()) val entities = tweet("entities").asInstanceOf[Map[String, List[Map[String, String]]]] val hashtags: List[String] = entities("hashtags").toArray.map(m => m.getOrElse("text","").toLowerCase()).toList val mentions: List[String] = entities("user_mentions").toArray.map(m => m.getOrElse("id_str", "")).toList /**************************************************************************************** * Mahout CCO val hashtagsMat = sparse(hashtagsProtoMat.map(m => svec(m, cardinality = hashtagsBiDict.size)):_*) val wordsMat = sparse(wordsProtoMat.map(m => svec(m, cardinality= wordsBiDict.size)):_*) val friendsMat = sparse(friendsProtoMat.map(m => svec(m, cardinality = friendsBiDict.size)):_*) val userWordsVec = listOfStringsToSVec(words.toList, wordsBiDict) val userHashtagsVec = listOfStringsToSVec(hashtags, hashtagsBiDict) val userMentionsVec = listOfStringsToSVec(mentions, friendsBiDict) val reccos = hashtagsMat %*% userHashtagsVec + wordsMat %*% userWordsVec + friendsMat %*% userMentionsVec /***************************************************************************************** * Sort and Pretty Print Some Flink Code
  • 48. ©2015 IBM Corporation 10 April 201748 CCO Math Flink+Mahout just Calculated these:
  • 49. ©2015 IBM Corporation 10 April 201749 Tweets text: joemalicki josephchmura well i can make a pizza i bet he cant so there userWordsVec: so a well i there can he make pizza cant hashtags used: List() hashtags reccomended: (ruinafriendshipin5words : 13.941270843461098) (worstdayin4words : 8.93444123705558) (recipes : 8.423061768672596) text: people people dipping pizza in milk im done userWordsVec: people in im done pizza hashtags used: List() hashtags reccomended: (None : 18.560367273335828) (vegan : 10.84782189800353) (fromscratch : 10.84782189800353) *Results were cherry picked- no preprocessing, this was a garbage in-garbage out algo for illustration purposes only.
  • 50. ©2015 IBM Corporation 10 April 201750 Buzzword Soup
  • 51. ©2015 IBM Corporation 10 April 201751 Don’t do this in real life, probably. (you would use a service) Also...
  • 52. ©2015 IBM Corporation52 10 April 2017 • Trevor attempts to tie everything together into a cohesive thought • Audience members asks easy questions • Audience members buy speaker beer at after party Intro Buzzwords Basic Online Learners Challenges Lambda Recommender Conclusions Table of Contents
  • 53. ©2015 IBM Corporation 10 April 201753 A lot of buzzwords have been flying around especially with respect to machine learning and streaming. ▪ Online ▪ Lambda / Kappa architecture ▪ Streaming machine learning ▪ Real time predictive model ▪ machine learning ▪ artificial/machine/cognitive intelligence ▪ cognitive ▪ blah- ^^ pick 2. Final Thoughts
  • 54. ©2015 IBM Corporation 10 April 201754 Now that you’ve sat through this talk hopefully you can: 1. Call people out for trying to make their product/service/open source project/startup sound like a bigger deal than it is 2. Church up your product/service/open source project/startup to get clients/VC dummies excited about it without technically lying Final Thoughts
  • 55. ©2015 IBM Corporation55 10 April 2017 Buy trevor beers. Questions? https://github.com/rawkintrevo/fsf17-twitter-recos