SlideShare a Scribd company logo
H2O.ai

Machine Intelligence
BUILDING MACHINE LEARNING
APPLICATIONS WITH SPARKLING WATER
AV N I WA D H WA & V I N O D I Y E N G A R
H2O.ai

Machine Intelligence
Sparkling Water
• Seamless integration of H2O with Spark ecosystem
• Transparent use of H2O data structures and algorithms
with Spark API
• Excels in existing Spark workflows requiring advanced
Machine Learning algorithms
Provides the following:
H2O.ai

Machine Intelligence
Sparkling Water Requirements
• Spark Version 1.4
• Sparkling Water 1.4.3 (download at h2o.ai/download)
H2O.ai

Machine Intelligence
Tech News Use Case
• The goal is to predict the tag based on the short summary of the
article
H2O.ai

Machine Intelligence
Tech News Use Case— Crawler
Used import.io to create a crawler which went through numerous pages of techcrunch.com
news and and acquired data regarding the title of the article, the author, a 2-3 sentence
opening from the beginning of the articles, and the tags associated with the article
H2O.ai

Machine Intelligence
Tech News Use Case
First manipulation of words involves eliminating words that could
occur frequently and do not add value to the classification process.
Sample Scala code:
H2O.ai

Machine Intelligence
Tech News Use Case
We now eliminate words that do not add value to the
classification process
• ie punctuation, stopwords, and words that do not occur
frequently
Sample Scala code:
H2O.ai

Machine Intelligence
Tech News Use Case — Word2Vec
A mathematical way to represent a word as a vector of numbers.
These vector ‘representations’ encode information about the
given word. In other words, the vector captures the meaning of
the word.
Text blurb
Word2Vec
Model
GBM

Model
Word2Vec
Categorize
the text
Train a model
“This article is
related to gadgets”
“Apple has been tinkering
with ways to make
the iPhone better at managing battery life…”
Article Blurb
Tech News Work Flow
H2O.ai

Machine Intelligence
Category Information
The original data set yielded about 55
categories. In order to streamline the
classification process, we chose the 14
most frequently appearing tags in our
dataset and labeled the rest into a
catch-all category titled “Other.” The
figure to the right shows the
distribution of data in each category.
Category Information
The variable importance chart to the right shows that
the author holds an overwhelming majority when it
comes to importance among variables. In other
words, the classification took place using very little
information from the text samples provided and
came mostly from authors that frequently write
under the same article tag. Let’s see how this changes
when we try to classify the articles using only the text
samples.
H2O.ai

Machine Intelligence
Analysis
The validation confusion matrix below is for the model that used both the authors and text blurbs to
categorize articles. We know that in this model, there was a heavy variable importance placed on
authors. In the confusion matrix below, we see how this effects the error rate of various tags. For tags
with smaller sets of data, it is common that a few authors write the majority of articles associated with
those tags. For the “Enterprise” tag for example, the data set is relatively small, and the error rate is
relatively low (40%).
H2O.ai

Machine Intelligence
Analysis
The validation confusion matrix below is for the model that uses text blurbs exclusively to categorize
articles. If we look at the error rate on the “Enterprise” tag, we see that the error rate is 75%,
significantly higher than the error rate we saw when authors were incorporated into the data. This
shows the strength in the variable importance of the authors.
H2O.ai

Machine Intelligence
Example Classification
With the Scala code below, we identify and author of an article and a
the snippet of the article provided, and try to classify what the article is
about.
H2O.ai

Machine Intelligence
Hit Ratios
With Authors Without Authors
Hit ratios illustrate the chances of your model correctly categorizing a text blurb on the 1st,
2nd, 3rd, etc. try. The above charts show that both the model that do and do not include
authors have approx. 70% chance of correctly predicting a text blurb on the second try.
H2O.ai

Machine Intelligence
Possible Use
A possible use for such classification capabilities would be for blog
posting sites. The user would enter their text into the field, and the
classification model would automatically choose tags for the post.
H2O.ai

Machine IntelligenceCustomers • Community •
November 9, 10, 11
Computer History Museum
H2OWORLD.H2O.AI
20% off
registration
using code:
h2ocommuni
ty

More Related Content

Viewers also liked

CUNY, New Biz Models for News 11.09
CUNY, New Biz Models for News 11.09CUNY, New Biz Models for News 11.09
CUNY, New Biz Models for News 11.09
Mel Taylor
 
New media business models for news
New media business models for newsNew media business models for news
New media business models for news
Paul Bradshaw
 
Mclellancommunitynews422011
Mclellancommunitynews422011Mclellancommunitynews422011
Mclellancommunitynews422011
Michele McLellan
 
Adapting news media for the digital age
Adapting news media for the digital ageAdapting news media for the digital age
Adapting news media for the digital age
Laura Rich
 
New Models for News
New Models for NewsNew Models for News
New Models for News
Seth Lewis
 
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...
Gerd Leonhard
 
Hyperlocal 101: Part One, 10 hyperlocal business models
Hyperlocal 101: Part One, 10 hyperlocal business modelsHyperlocal 101: Part One, 10 hyperlocal business models
Hyperlocal 101: Part One, 10 hyperlocal business models
Damian Radcliffe
 
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...
HubSpot
 
New Business Models for News
New Business Models for NewsNew Business Models for News
New Business Models for News
jeffjarvis
 
Startup Secrets - Game Changing Business Models
Startup Secrets - Game Changing Business ModelsStartup Secrets - Game Changing Business Models
Startup Secrets - Game Changing Business Models
Michael Skok
 
Business Models For News
Business Models For News Business Models For News
Business Models For News fredericlejeune
 

Viewers also liked (11)

CUNY, New Biz Models for News 11.09
CUNY, New Biz Models for News 11.09CUNY, New Biz Models for News 11.09
CUNY, New Biz Models for News 11.09
 
New media business models for news
New media business models for newsNew media business models for news
New media business models for news
 
Mclellancommunitynews422011
Mclellancommunitynews422011Mclellancommunitynews422011
Mclellancommunitynews422011
 
Adapting news media for the digital age
Adapting news media for the digital ageAdapting news media for the digital age
Adapting news media for the digital age
 
New Models for News
New Models for NewsNew Models for News
New Models for News
 
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...
The future of media and news monitoring (Futurist Speaker Gerd Leonhard at FI...
 
Hyperlocal 101: Part One, 10 hyperlocal business models
Hyperlocal 101: Part One, 10 hyperlocal business modelsHyperlocal 101: Part One, 10 hyperlocal business models
Hyperlocal 101: Part One, 10 hyperlocal business models
 
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...
The Anatomy of the Corporate Content Team: 5 Models to Inspire Your Team's St...
 
New Business Models for News
New Business Models for NewsNew Business Models for News
New Business Models for News
 
Startup Secrets - Game Changing Business Models
Startup Secrets - Game Changing Business ModelsStartup Secrets - Game Changing Business Models
Startup Secrets - Game Changing Business Models
 
Business Models For News
Business Models For News Business Models For News
Business Models For News
 

Similar to Classifying Tech News with Sparkling Water

wp-25tips-oltscripts-2287467
wp-25tips-oltscripts-2287467wp-25tips-oltscripts-2287467
wp-25tips-oltscripts-2287467Yutaka Takatsu
 
Top 5 performance problems in .net applications application performance mon...
Top 5 performance problems in .net applications   application performance mon...Top 5 performance problems in .net applications   application performance mon...
Top 5 performance problems in .net applications application performance mon...
KennaaTol
 
Database Website on Django
Database Website on DjangoDatabase Website on Django
Database Website on Django
HamdaAnees
 
Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Introduction to Behavior Driven Development
Introduction to Behavior Driven Development
Robin O'Brien
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
Usama Wahab Khan Cloud, Data and AI
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
Productdata Scrape
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
Productdata Scrape
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
acijjournal
 
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...
Carlos Tomas
 
Introduction To Angular.js - SpringPeople
Introduction To Angular.js - SpringPeopleIntroduction To Angular.js - SpringPeople
Introduction To Angular.js - SpringPeople
SpringPeople
 
Unlocking the Potential of AI in Spring.pdf
Unlocking the Potential of AI in Spring.pdfUnlocking the Potential of AI in Spring.pdf
Unlocking the Potential of AI in Spring.pdf
Inexture Solutions
 
Django 1.10.3 Getting started
Django 1.10.3 Getting startedDjango 1.10.3 Getting started
Django 1.10.3 Getting started
MoniaJ
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
IRJET Journal
 
report_vendor_connect
report_vendor_connectreport_vendor_connect
report_vendor_connectYash Mittal
 
Codeigniter
CodeigniterCodeigniter
Codeigniter
ShahRushika
 
django
djangodjango
Enterprise Level Application Architecture with Web APIs using Entity Framewor...
Enterprise Level Application Architecture with Web APIs using Entity Framewor...Enterprise Level Application Architecture with Web APIs using Entity Framewor...
Enterprise Level Application Architecture with Web APIs using Entity Framewor...
Akhil Mittal
 
Web application finger printing - whitepaper
Web application finger printing - whitepaperWeb application finger printing - whitepaper
Web application finger printing - whitepaper
Anant Shrivastava
 

Similar to Classifying Tech News with Sparkling Water (20)

wp-25tips-oltscripts-2287467
wp-25tips-oltscripts-2287467wp-25tips-oltscripts-2287467
wp-25tips-oltscripts-2287467
 
Top 5 performance problems in .net applications application performance mon...
Top 5 performance problems in .net applications   application performance mon...Top 5 performance problems in .net applications   application performance mon...
Top 5 performance problems in .net applications application performance mon...
 
Database Website on Django
Database Website on DjangoDatabase Website on Django
Database Website on Django
 
Introduction to Behavior Driven Development
Introduction to Behavior Driven Development Introduction to Behavior Driven Development
Introduction to Behavior Driven Development
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptxHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pptx
 
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdfHow to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
How to Scrape Amazon Best Seller Lists with Python and BeautifulSoup.pdf
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
 
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...
Case Study: Putting The Watson Developer Cloud to Work - by Doron Katz & Luci...
 
Introduction To Angular.js - SpringPeople
Introduction To Angular.js - SpringPeopleIntroduction To Angular.js - SpringPeople
Introduction To Angular.js - SpringPeople
 
Unlocking the Potential of AI in Spring.pdf
Unlocking the Potential of AI in Spring.pdfUnlocking the Potential of AI in Spring.pdf
Unlocking the Potential of AI in Spring.pdf
 
Django 1.10.3 Getting started
Django 1.10.3 Getting startedDjango 1.10.3 Getting started
Django 1.10.3 Getting started
 
Final paper
Final paperFinal paper
Final paper
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
API.docx
API.docxAPI.docx
API.docx
 
report_vendor_connect
report_vendor_connectreport_vendor_connect
report_vendor_connect
 
Codeigniter
CodeigniterCodeigniter
Codeigniter
 
django
djangodjango
django
 
Enterprise Level Application Architecture with Web APIs using Entity Framewor...
Enterprise Level Application Architecture with Web APIs using Entity Framewor...Enterprise Level Application Architecture with Web APIs using Entity Framewor...
Enterprise Level Application Architecture with Web APIs using Entity Framewor...
 
Web application finger printing - whitepaper
Web application finger printing - whitepaperWeb application finger printing - whitepaper
Web application finger printing - whitepaper
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Classifying Tech News with Sparkling Water

  • 1. H2O.ai
 Machine Intelligence BUILDING MACHINE LEARNING APPLICATIONS WITH SPARKLING WATER AV N I WA D H WA & V I N O D I Y E N G A R
  • 2. H2O.ai
 Machine Intelligence Sparkling Water • Seamless integration of H2O with Spark ecosystem • Transparent use of H2O data structures and algorithms with Spark API • Excels in existing Spark workflows requiring advanced Machine Learning algorithms Provides the following:
  • 3. H2O.ai
 Machine Intelligence Sparkling Water Requirements • Spark Version 1.4 • Sparkling Water 1.4.3 (download at h2o.ai/download)
  • 4. H2O.ai
 Machine Intelligence Tech News Use Case • The goal is to predict the tag based on the short summary of the article
  • 5. H2O.ai
 Machine Intelligence Tech News Use Case— Crawler Used import.io to create a crawler which went through numerous pages of techcrunch.com news and and acquired data regarding the title of the article, the author, a 2-3 sentence opening from the beginning of the articles, and the tags associated with the article
  • 6. H2O.ai
 Machine Intelligence Tech News Use Case First manipulation of words involves eliminating words that could occur frequently and do not add value to the classification process. Sample Scala code:
  • 7. H2O.ai
 Machine Intelligence Tech News Use Case We now eliminate words that do not add value to the classification process • ie punctuation, stopwords, and words that do not occur frequently Sample Scala code:
  • 8. H2O.ai
 Machine Intelligence Tech News Use Case — Word2Vec A mathematical way to represent a word as a vector of numbers. These vector ‘representations’ encode information about the given word. In other words, the vector captures the meaning of the word.
  • 9. Text blurb Word2Vec Model GBM
 Model Word2Vec Categorize the text Train a model “This article is related to gadgets” “Apple has been tinkering with ways to make the iPhone better at managing battery life…” Article Blurb Tech News Work Flow
  • 10. H2O.ai
 Machine Intelligence Category Information The original data set yielded about 55 categories. In order to streamline the classification process, we chose the 14 most frequently appearing tags in our dataset and labeled the rest into a catch-all category titled “Other.” The figure to the right shows the distribution of data in each category. Category Information The variable importance chart to the right shows that the author holds an overwhelming majority when it comes to importance among variables. In other words, the classification took place using very little information from the text samples provided and came mostly from authors that frequently write under the same article tag. Let’s see how this changes when we try to classify the articles using only the text samples.
  • 11. H2O.ai
 Machine Intelligence Analysis The validation confusion matrix below is for the model that used both the authors and text blurbs to categorize articles. We know that in this model, there was a heavy variable importance placed on authors. In the confusion matrix below, we see how this effects the error rate of various tags. For tags with smaller sets of data, it is common that a few authors write the majority of articles associated with those tags. For the “Enterprise” tag for example, the data set is relatively small, and the error rate is relatively low (40%).
  • 12. H2O.ai
 Machine Intelligence Analysis The validation confusion matrix below is for the model that uses text blurbs exclusively to categorize articles. If we look at the error rate on the “Enterprise” tag, we see that the error rate is 75%, significantly higher than the error rate we saw when authors were incorporated into the data. This shows the strength in the variable importance of the authors.
  • 13. H2O.ai
 Machine Intelligence Example Classification With the Scala code below, we identify and author of an article and a the snippet of the article provided, and try to classify what the article is about.
  • 14. H2O.ai
 Machine Intelligence Hit Ratios With Authors Without Authors Hit ratios illustrate the chances of your model correctly categorizing a text blurb on the 1st, 2nd, 3rd, etc. try. The above charts show that both the model that do and do not include authors have approx. 70% chance of correctly predicting a text blurb on the second try.
  • 15. H2O.ai
 Machine Intelligence Possible Use A possible use for such classification capabilities would be for blog posting sites. The user would enter their text into the field, and the classification model would automatically choose tags for the post.
  • 16. H2O.ai
 Machine IntelligenceCustomers • Community • November 9, 10, 11 Computer History Museum H2OWORLD.H2O.AI 20% off registration using code: h2ocommuni ty