SlideShare a Scribd company logo
Revenue Growth
Through Machine Learning

 Ted Dunning – March 21, 2013
Agenda
• Intelligence – Artificial or Reflected
• Quick survey of machine learning
  – without a PhD
  – not all of it
• Available components
• What do customers really want
Artificial Intelligence?
Artificial Intelligence?
• Turing and the intelligent machine

• Rules?

• Neural networks?

• Logic?
Reflected Intelligence!
• Society is not just a million individuals

• A web service with a million users is not the
  same as a million users each with a computer

• Social computing emerges
What is Machine Learning?
• Statistics, but …
• New focus on prediction rather than
  hypothesis testing
• Prediction means held-out data, not just the
  future (now-casting)
The Classics
• Unsupervised
  – AKA clustering (but not what you think that is)
  – Mixture models, Markov models and more
  – Learn from unlabeled data, describe it predictively
• Supervised
  – AKA classification
  – Learn from labeled data, guess labels for new data
• Also semi-supervised and hundreds of variants
Recent Insurgents
• Collaborative learning
  – models that learn about you based on others

• Meta-modeling
  – models that learn to reason about what other
    models say

• Interactive systems
  – systems that pick what to learn from
Techniques
•   Surprise and coincidence
•   Anomalous indicators
•   Non-textual search using textual tools
•   Dithering
•   Meta-learning
Surprise and coincidence
• What is accidental or uninteresting?

• What is surprising and informative?
A vice president of South Carolina Bank and Trust in Bamberg,
Maxwell has served as a tireless champion for economic
development in Bamberg County since 1999, welcoming
industrial prospects to the county and working with existing
industries in their expansion efforts. Maxwell served for many
years as the president of the Bamberg County Chamber of
Commerce and remains an active member today.
The goal of learning is prediction. Learning falls into many
categories, including supervised learning, unsupervised learning,
online learning, and reinforcement learning. From the
perspective of statistical learning theory, supervised learning is
best understood.
Surprise and Coincidence
• Which words stand out in these examples?

• Which are just there because these are in
  English?

• The words “the” and “Bamberg” both occur 3
  times in the second article
  – which is the more interesting statistic? Why?
More Surprise
• Anomalous indicators
  – Events that occur before other events
  – But occur anomalously often


• Indicators are not causes

• Nor certain
Example #1- Auto Insurance
• Predict probability of attrition and loss for
  auto insurance customers
• Transactional variables include
  – Claim history
  – Traffic violation history
  – Geographical code of residence(s)
  – Vehicles owned
• Observed attrition and loss define past
  behavior
Derived Variables
• Split training data according to observable
  classes
• Define LLR variables for each class/variable
  combination
• These 2 m v derived variables can be used
  for clustering (spectral, k-means, neural gas
  ...)
• Proximity in LLR space to clusters are the
  new modeling variables
Example #2 – Fraud Detection
• Predict probability that an account is likely
  to result in charge-off due to fraud
• Transactional variables include
  – Zip code
  – Recent payments and charges
  – Recent non-monetary transactions
• Bad payments, charge-off, delinquency are
  observable behavioral outcomes
Derived Variables
• Split training data according to observable
  classes
• Define LLR variables for each class/variable
  combination
• These 2 m v derived variables can be used
  directly as model variables
Search Abuse
• Non-textual search using textual tools
  – A document can contain non-word tokens
  – These might be anomalous indicators of an event


• SolR and similar engines can search for
  indicators
  – If we have a history of recent indicators, search
    finds possible follow-on events
Introducing Noise
• Dithering
  – add noise
  – less for high ranks, more for low ranks
• Softens page boundary effects
• Introduces more exploration
Meta-learning

• Which settings work best?
• Which indicators?

• A/B testing for the back-end
Available components
• Mahout
  – LLR test for anomaly
  – Coocurrence computations
  – Baseline components of Bayesian Bandits
• SolR
  – Ready to roll for search
History matrix

One row per user

One column per thing
Recommendation based on
cooccurrence

Cooccurrence gives item-item
mapping

One row and column per thing
Cooccurrence matrix can also be
implemented as a search index
Input Data
• User transactions
   – user id, merchant id
   – SIC code, amount


• Offer transactions
   – user id, offer id
   – vendor id, merchant id’s,
   – offers, views, accepts
Input Data
• User transactions
   – user id, merchant id
   – SIC code, amount

• Offer transactions
   – user id, offer id
   – vendor id, merchant id’s,
   – offers, views, accepts
                                 • Derived merchant data
• Derived user data                 –   local top40
   – merchant id’s
                                    –   SIC code
   – SIC codes
                                    –   vendor code
   – offer & vendor id’s
                                    –   amount distribution
Cross-recommendation
• Per merchant indicators
  – merchant id’s
  – chain id’s
  – SIC codes
  – offer vendor id’s


• Computed by finding anomalous (indicator =>
  merchant) rates
Search-based Recommendations
• Sample document
  –   Merchant Id
  –   Field for text description
  –   Phone
  –   Address
  –   Location
Search-based Recommendations
• Sample document
  –   Merchant Id
  –   Field for text description
  –   Phone
  –   Address
  –   Location

  –   Indicator merchant id’s
  –   Indicator industry (SIC) id’s
  –   Indicator offers
  –   Indicator text
  –   Local top40
Search-based Recommendations
• Sample document                     • Sample query
  –   Merchant Id                       – Current location
  –   Field for text description        – Recent merchant
  –   Phone                               descriptions
  –   Address                           – Recent merchant id’s
  –   Location                          – Recent SIC codes
                                        – Recent accepted offers
  –   Indicator merchant id’s           – Local top40
  –   Indicator industry (SIC) id’s
  –   Indicator offers
  –   Indicator text
  –   Local top40
SolR
                               SolR
Complete    Cooccurrence       Indexer
                             Solr
                             Indexer
  history     (Mahout)     indexing




              Item meta-        Index
                 data          shards
SolR
                          SolR
  User                    Indexer
                        Solr
          Web tier      Indexer
history                search




          Item meta-
                           Index
             data         shards
Objective Results
• At a very large credit card company

• History is all transactions, all web interaction

• Processing time cut from 20 hours per day to 3

• Recommendation engine load time decreased
  from 8 hours to 3 minutes
Platform Needs
• Need to root web services and search system on the
  cluster
   – Copying negates unification

• Legacy indexers are extremely fast … but they assume
  conventional file access

• High performance search engines need high
  performance file I/O

• Need coordinated process management
Additional Opportunities
• Cross recommend from search queries to
  documents

• Result is semantic search engine

• Uses reflected intelligence instead of artificial
  intelligence
• What do customers really want?
Another Example
• Users enter queries (A)
  – (actor = user, item=query)
• Users view videos (B)
  – (actor = user, item=video)
• A’A gives query recommendation
  – “did you mean to ask for”
• B’B gives video recommendation
  – “you might like these videos”
The punch-line
• B’A recommends videos in response to a
  query
  – (isn’t that a search engine?)
  – (not quite, it doesn’t look at content or meta-data)
Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
  – “hombres del paco” times 400
  – not much else
• Recommendation based search:
  – Flamenco guitar and dancers
  – Spanish and classical guitar
  – Van Halen doing a classical/flamenco riff
Real-life example
Hypothetical Example
• Want a navigational ontology?
• Just put labels on a web page with traffic
  – This gives A = users x label clicks
• Remember viewing history
  – This gives B = users x items
• Cross recommend
  – B’A = label to item mapping
• After several users click, results are whatever
  users think they should be
Next Steps
• That is up to you
• But I can help
  – platforms (Solr, MapR)
  – techniques (Mahout, math)


tdunning@maprtech.com
@ted_dunning
@ApacheMahout

More Related Content

Similar to Summit EU Machine Learning

Data_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .pptData_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .ppt
sadeshcsevelalar
 
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2Con USA 2017: Analytics Patterns for Your Digital EnterpriseWSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2
 
Analytics Patterns for Your Digital Enterprise
Analytics Patterns for Your Digital EnterpriseAnalytics Patterns for Your Digital Enterprise
Analytics Patterns for Your Digital Enterprise
Sriskandarajah Suhothayan
 
Emerging Trends and Impacts in IT & DSS
Emerging Trends and Impacts in IT & DSSEmerging Trends and Impacts in IT & DSS
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Marco Brambilla
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
Lucidworks
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
AsifImran37
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
IfedayoOladeji1
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
ImXaib
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
PrasadG76
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
NivaTripathy2
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z os
Cuneyt Goksu
 
Wrap up
Wrap upWrap up
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
Amazon Web Services
 
Discovery
DiscoveryDiscovery
Discovery
Pat Ferrel
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Data Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailData Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and Retail
Andrei Lopatenko
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
TanyaRaina3
 

Similar to Summit EU Machine Learning (20)

Data_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .pptData_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .ppt
 
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2Con USA 2017: Analytics Patterns for Your Digital EnterpriseWSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
 
Analytics Patterns for Your Digital Enterprise
Analytics Patterns for Your Digital EnterpriseAnalytics Patterns for Your Digital Enterprise
Analytics Patterns for Your Digital Enterprise
 
Emerging Trends and Impacts in IT & DSS
Emerging Trends and Impacts in IT & DSSEmerging Trends and Impacts in IT & DSS
Emerging Trends and Impacts in IT & DSS
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z os
 
Haifa
HaifaHaifa
Haifa
 
Wrap up
Wrap upWrap up
Wrap up
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 
Discovery
DiscoveryDiscovery
Discovery
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Data Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailData Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and Retail
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
Ted Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
Ted Dunning
 
T digest-update
T digest-updateT digest-update
T digest-update
Ted Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
Ted Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 

Recently uploaded

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Summit EU Machine Learning

  • 1. Revenue Growth Through Machine Learning Ted Dunning – March 21, 2013
  • 2. Agenda • Intelligence – Artificial or Reflected • Quick survey of machine learning – without a PhD – not all of it • Available components • What do customers really want
  • 4. Artificial Intelligence? • Turing and the intelligent machine • Rules? • Neural networks? • Logic?
  • 5. Reflected Intelligence! • Society is not just a million individuals • A web service with a million users is not the same as a million users each with a computer • Social computing emerges
  • 6. What is Machine Learning? • Statistics, but … • New focus on prediction rather than hypothesis testing • Prediction means held-out data, not just the future (now-casting)
  • 7. The Classics • Unsupervised – AKA clustering (but not what you think that is) – Mixture models, Markov models and more – Learn from unlabeled data, describe it predictively • Supervised – AKA classification – Learn from labeled data, guess labels for new data • Also semi-supervised and hundreds of variants
  • 8. Recent Insurgents • Collaborative learning – models that learn about you based on others • Meta-modeling – models that learn to reason about what other models say • Interactive systems – systems that pick what to learn from
  • 9. Techniques • Surprise and coincidence • Anomalous indicators • Non-textual search using textual tools • Dithering • Meta-learning
  • 10. Surprise and coincidence • What is accidental or uninteresting? • What is surprising and informative?
  • 11. A vice president of South Carolina Bank and Trust in Bamberg, Maxwell has served as a tireless champion for economic development in Bamberg County since 1999, welcoming industrial prospects to the county and working with existing industries in their expansion efforts. Maxwell served for many years as the president of the Bamberg County Chamber of Commerce and remains an active member today.
  • 12. The goal of learning is prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood.
  • 13. Surprise and Coincidence • Which words stand out in these examples? • Which are just there because these are in English? • The words “the” and “Bamberg” both occur 3 times in the second article – which is the more interesting statistic? Why?
  • 14. More Surprise • Anomalous indicators – Events that occur before other events – But occur anomalously often • Indicators are not causes • Nor certain
  • 15. Example #1- Auto Insurance • Predict probability of attrition and loss for auto insurance customers • Transactional variables include – Claim history – Traffic violation history – Geographical code of residence(s) – Vehicles owned • Observed attrition and loss define past behavior
  • 16. Derived Variables • Split training data according to observable classes • Define LLR variables for each class/variable combination • These 2 m v derived variables can be used for clustering (spectral, k-means, neural gas ...) • Proximity in LLR space to clusters are the new modeling variables
  • 17. Example #2 – Fraud Detection • Predict probability that an account is likely to result in charge-off due to fraud • Transactional variables include – Zip code – Recent payments and charges – Recent non-monetary transactions • Bad payments, charge-off, delinquency are observable behavioral outcomes
  • 18. Derived Variables • Split training data according to observable classes • Define LLR variables for each class/variable combination • These 2 m v derived variables can be used directly as model variables
  • 19. Search Abuse • Non-textual search using textual tools – A document can contain non-word tokens – These might be anomalous indicators of an event • SolR and similar engines can search for indicators – If we have a history of recent indicators, search finds possible follow-on events
  • 20. Introducing Noise • Dithering – add noise – less for high ranks, more for low ranks • Softens page boundary effects • Introduces more exploration
  • 21. Meta-learning • Which settings work best? • Which indicators? • A/B testing for the back-end
  • 22. Available components • Mahout – LLR test for anomaly – Coocurrence computations – Baseline components of Bayesian Bandits • SolR – Ready to roll for search
  • 23. History matrix One row per user One column per thing
  • 24. Recommendation based on cooccurrence Cooccurrence gives item-item mapping One row and column per thing
  • 25. Cooccurrence matrix can also be implemented as a search index
  • 26. Input Data • User transactions – user id, merchant id – SIC code, amount • Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts
  • 27. Input Data • User transactions – user id, merchant id – SIC code, amount • Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts • Derived merchant data • Derived user data – local top40 – merchant id’s – SIC code – SIC codes – vendor code – offer & vendor id’s – amount distribution
  • 28. Cross-recommendation • Per merchant indicators – merchant id’s – chain id’s – SIC codes – offer vendor id’s • Computed by finding anomalous (indicator => merchant) rates
  • 29. Search-based Recommendations • Sample document – Merchant Id – Field for text description – Phone – Address – Location
  • 30. Search-based Recommendations • Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40
  • 31. Search-based Recommendations • Sample document • Sample query – Merchant Id – Current location – Field for text description – Recent merchant – Phone descriptions – Address – Recent merchant id’s – Location – Recent SIC codes – Recent accepted offers – Indicator merchant id’s – Local top40 – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40
  • 32. SolR SolR Complete Cooccurrence Indexer Solr Indexer history (Mahout) indexing Item meta- Index data shards
  • 33. SolR SolR User Indexer Solr Web tier Indexer history search Item meta- Index data shards
  • 34. Objective Results • At a very large credit card company • History is all transactions, all web interaction • Processing time cut from 20 hours per day to 3 • Recommendation engine load time decreased from 8 hours to 3 minutes
  • 35. Platform Needs • Need to root web services and search system on the cluster – Copying negates unification • Legacy indexers are extremely fast … but they assume conventional file access • High performance search engines need high performance file I/O • Need coordinated process management
  • 36. Additional Opportunities • Cross recommend from search queries to documents • Result is semantic search engine • Uses reflected intelligence instead of artificial intelligence
  • 37. • What do customers really want?
  • 38. Another Example • Users enter queries (A) – (actor = user, item=query) • Users view videos (B) – (actor = user, item=video) • A’A gives query recommendation – “did you mean to ask for” • B’B gives video recommendation – “you might like these videos”
  • 39. The punch-line • B’A recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 40. Real-life example • Query: “Paco de Lucia” • Conventional meta-data search results: – “hombres del paco” times 400 – not much else • Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 42. Hypothetical Example • Want a navigational ontology? • Just put labels on a web page with traffic – This gives A = users x label clicks • Remember viewing history – This gives B = users x items • Cross recommend – B’A = label to item mapping • After several users click, results are whatever users think they should be
  • 43. Next Steps • That is up to you • But I can help – platforms (Solr, MapR) – techniques (Mahout, math) tdunning@maprtech.com @ted_dunning @ApacheMahout