This document describes a project to predict high-level attributes of businesses like restaurants from customer photos on Yelp. The attributes include whether a place is good for dinner, groups, or kids. Challenges include predicting general attributes from images/captions rather than objects, and that business labels are multi-label while image labels are unknown. To address this, models are trained on images from single-label businesses to predict image labels, then these are used to predict multi-label businesses. Models include a CNN for images and an TF-IDF caption classifier. The results achieve an improvement over the random baseline.
To download please go to: http://www.intelligentmining.com/knowledge-base.html
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)
To download please go to: http://www.intelligentmining.com/category/knowledge-base/
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on Dec. 10, 2009.
Business DNA Model, Balanced Scorecard, and Strategy Map: A Visual Mathematic...Rod King, Ph.D.
This presentation features a 1-Page Diagram of the Business DNA Model as a platform for visually documenting, organizing, managing, and evaluating ideas on business models. The Business DNA Model can also be used to more deeply understood tools of Performance Management such as the Balanced Scorecard and Strategy Map as well as business modeling tools such as the Business Model Yacht, Business Model Canvas, and Lean Canvas.
http://goo.gl/qRZhwV
To download please go to: http://www.intelligentmining.com/knowledge-base.html
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)
To download please go to: http://www.intelligentmining.com/category/knowledge-base/
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on Dec. 10, 2009.
Business DNA Model, Balanced Scorecard, and Strategy Map: A Visual Mathematic...Rod King, Ph.D.
This presentation features a 1-Page Diagram of the Business DNA Model as a platform for visually documenting, organizing, managing, and evaluating ideas on business models. The Business DNA Model can also be used to more deeply understood tools of Performance Management such as the Balanced Scorecard and Strategy Map as well as business modeling tools such as the Business Model Yacht, Business Model Canvas, and Lean Canvas.
http://goo.gl/qRZhwV
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
Do you want to learn how to use the low-hanging fruit of knowledge graphs — schema.org and JSON-LD — to annotate content and improve your SEO with semantics and entities? This hands-on workshop with one of the leading Semantic SEO practitioners will help you get started.
In this talk, Simo will explore a two-tiered approach to data in organizations.
On the one hand, data is the footprint of an organisation’s maturity and ability to communicate across silos. It is brutally honest in revealing the flaws and imperfections of the environment, especially when it comes to building pipelines that gather data from all corners of the organization. Simo will show how healthy communication structures are instrumental in building robust data collection and processing pipelines, and he will share his views on how taking a leaf out of the agile workbook will help in healing the broken communication structures of the organization.
On the other hand, data is the product of tools and platforms used for data collection, processing, ETL, and querying. This means that in order for these tools to be actually useful, they must be customized to accommodate the idiosyncrasies of the organization using the tools. In this part of the talk, Simo zooms in from the bird’s-eye view of the first half of the talk to look at web analytics specifically, and how tools like Google Analytics must be customized to be able to produce any useful insight into the machinations of the organization.
The talk will leave you with inspiration to look beyond the silos of your organization, and your fingers will itch at the opportunity to put all the customization tips to use with your web analytics tool.
B2BMF2019 - Demystifying the Buzz Words of Marketing Analytics - Ortect - Ivo...B2B Marketing Forum
Tijdens zijn presentatie de-mythificeert Ivo Fugers de grootste buzzwords die gebruikt worden binnen marketing analytics. Hij neemt het publiek mee in de stappen die je als B2B-bedrijf zet richting volwassenheid in inzicht met behulp van data. Aan de hand van twee case studies maakt hij zijn verhaal praktisch toepasbaar en herkenbaar. Eén praktijkcase gaat over een aanbevelingssysteem voor evenementbezoekers, de andere situatie beschrijft slimme churn-analyse. Want wat is er belangrijker dan leren van anderen in B2B?
7 Dimensions of Agile Analytics by Ken Collier Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself. Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities. The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This deck will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
sophisticated analytics techniques, plus
lean learning principles, plus
agile delivery methods, plus
so-called "big data" technologies
Learn:
The analytical modeling process and techniques
How analytical models are deployed using modern technologies
The complexities of data discovery, harvesting, and preparation
How to apply agile techniques to shorten the analytics development cycle
How to apply lean learning principles to develop actionable and valuable analytics.
Why Big and Small Data Is Important by Google's Product ManagerProduct School
In this talk, Dan McClary, a Product Manager at Google, walked through the importance of using data to drive product decisions, as well as how to quickly pull together an architecture using free tools to help grow a product effort from market analysis to live data capture and data-driven product decisions. We also played a rousing game of Breakout.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
How can you implement machine learning and artificial intelligence without having to build your own? In this webinar, we explore APIs, how companies are providing them "as-a-service" and how Ogilvy is applying machine learning to shopper reviews.
SEO Campixx 2021 - How to win in seo when ads are taking up all the space [vi...Jonas Donbæk
This lecture will teach you practical examples of how you can with SEO in a world where Ads are taking up more and more of the space. The lecture will include the view from s360 on SEO, the approaches and framework that have been the baseline on many great SEO cases during the last couple of years.
Big Data for Small Businesses & StartupsFujio Turner
Big Data is not just for Big Businesses. In this slideshare we will cover how small businesses and startups can leverage Big Data to increase revenue. HPCC Systems lets you get started with only one machine and grow to exabytes.
1. Mining and understanding customers behavior from data outside the firewall and joining it with internal data to turn it into actionable marketing strategies.
2. Understanding your whole business with BI tools. Learn how Big Data help join data from different parts of your business to see the big picture.
SMX West Structured Data Practical and AdvancedAlexis Sanders
Properly structured data is a critical and often forgotten element in the formula for getting SEO results. Google now relies heavily on structured data markup for Rich Snippets -- moving beyond stars, images and additional information to better understand and index your content, giving it more prominence in search results.
In this session, you'll get insights into newer markup options and advanced uses of structured data related to Google Rich Snippets that can help you refine, update and optimize your content.
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
Do you want to learn how to use the low-hanging fruit of knowledge graphs — schema.org and JSON-LD — to annotate content and improve your SEO with semantics and entities? This hands-on workshop with one of the leading Semantic SEO practitioners will help you get started.
In this talk, Simo will explore a two-tiered approach to data in organizations.
On the one hand, data is the footprint of an organisation’s maturity and ability to communicate across silos. It is brutally honest in revealing the flaws and imperfections of the environment, especially when it comes to building pipelines that gather data from all corners of the organization. Simo will show how healthy communication structures are instrumental in building robust data collection and processing pipelines, and he will share his views on how taking a leaf out of the agile workbook will help in healing the broken communication structures of the organization.
On the other hand, data is the product of tools and platforms used for data collection, processing, ETL, and querying. This means that in order for these tools to be actually useful, they must be customized to accommodate the idiosyncrasies of the organization using the tools. In this part of the talk, Simo zooms in from the bird’s-eye view of the first half of the talk to look at web analytics specifically, and how tools like Google Analytics must be customized to be able to produce any useful insight into the machinations of the organization.
The talk will leave you with inspiration to look beyond the silos of your organization, and your fingers will itch at the opportunity to put all the customization tips to use with your web analytics tool.
B2BMF2019 - Demystifying the Buzz Words of Marketing Analytics - Ortect - Ivo...B2B Marketing Forum
Tijdens zijn presentatie de-mythificeert Ivo Fugers de grootste buzzwords die gebruikt worden binnen marketing analytics. Hij neemt het publiek mee in de stappen die je als B2B-bedrijf zet richting volwassenheid in inzicht met behulp van data. Aan de hand van twee case studies maakt hij zijn verhaal praktisch toepasbaar en herkenbaar. Eén praktijkcase gaat over een aanbevelingssysteem voor evenementbezoekers, de andere situatie beschrijft slimme churn-analyse. Want wat is er belangrijker dan leren van anderen in B2B?
7 Dimensions of Agile Analytics by Ken Collier Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself. Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities. The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This deck will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
sophisticated analytics techniques, plus
lean learning principles, plus
agile delivery methods, plus
so-called "big data" technologies
Learn:
The analytical modeling process and techniques
How analytical models are deployed using modern technologies
The complexities of data discovery, harvesting, and preparation
How to apply agile techniques to shorten the analytics development cycle
How to apply lean learning principles to develop actionable and valuable analytics.
Why Big and Small Data Is Important by Google's Product ManagerProduct School
In this talk, Dan McClary, a Product Manager at Google, walked through the importance of using data to drive product decisions, as well as how to quickly pull together an architecture using free tools to help grow a product effort from market analysis to live data capture and data-driven product decisions. We also played a rousing game of Breakout.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
How can you implement machine learning and artificial intelligence without having to build your own? In this webinar, we explore APIs, how companies are providing them "as-a-service" and how Ogilvy is applying machine learning to shopper reviews.
SEO Campixx 2021 - How to win in seo when ads are taking up all the space [vi...Jonas Donbæk
This lecture will teach you practical examples of how you can with SEO in a world where Ads are taking up more and more of the space. The lecture will include the view from s360 on SEO, the approaches and framework that have been the baseline on many great SEO cases during the last couple of years.
Big Data for Small Businesses & StartupsFujio Turner
Big Data is not just for Big Businesses. In this slideshare we will cover how small businesses and startups can leverage Big Data to increase revenue. HPCC Systems lets you get started with only one machine and grow to exabytes.
1. Mining and understanding customers behavior from data outside the firewall and joining it with internal data to turn it into actionable marketing strategies.
2. Understanding your whole business with BI tools. Learn how Big Data help join data from different parts of your business to see the big picture.
SMX West Structured Data Practical and AdvancedAlexis Sanders
Properly structured data is a critical and often forgotten element in the formula for getting SEO results. Google now relies heavily on structured data markup for Rich Snippets -- moving beyond stars, images and additional information to better understand and index your content, giving it more prominence in search results.
In this session, you'll get insights into newer markup options and advanced uses of structured data related to Google Rich Snippets that can help you refine, update and optimize your content.
Yelp Data Set Challenge (What drives restaurant ratings?)
PosterML
1. Business
Images +
Captions
xDTzv6a
Test Data
Business Labels
xDTzv6a {1,3,5}
Determine
Businesses Labels
Predict one Label
per Picture
Business
Images +
Captions Labels
{1,3,5}Rdoi23s
{2}Ox4geZ
Multi Label
businesses
Single Label
Businesses
Supervised Learning on Data
from Single Label Businesses
Predict one Label
per Picture
Images +
Captions Labels
{2}
{2}
{2}
Images +
Captions Labels
{1}
{3}
{3}
{5}
{2}
{2}
{2}
Train Data
Supervised Learning on
Data from all Businesses
Caption Classification
model
Image Classification
CNN model
Predictions
Supervised ProblemSemi-supervised Problem Labelling
Caption Classification
model
Merge Predictions
(Average Probabilities)
CHOCOLATE FOUNTAIN!
Christina Bogdan Vincent Chabot Urjit Patel
Overview
Our goal throughout this project was to predict high level attributes related to venues based on pictures uploaded
by customers on the Yelp platform. Typically, we were interested in predicting whether a restaurant would be
‘good for dinner’, ‘good for groups’ and/or ‘good for kids’. Each business could potentially have several true labels.
There were mainly two major points of interest we had to deal with to solve this question :
1) Inferring very general attributes from images/caption analysis (rather than the most usual case of simply
classifying the images based on the objects represented)
2) The labels to predict were only business level labels. From the fact that a business could have several true
labels, we would not have direct access to individual image level labels, making the task semi-supervised.
To solve this, we first isolated the images from businesses that had only one true label. We trained both an image
based model and a caption based model in order predict individual labels on the remaining images associated
with multi labels businesses. This would turn our initial problem into a supervised model, that we would solve by
training another two image and caption based models. From the single image level predictions, we could
eventually predict the business level multiple attributes.
By only using
images that are
associated to single
label businesses, we
can perform fully
supervised
classification. We
use the captions on
these images to
train a model using
tf-idf.
We then use this
model to determine
a unique picture-
level label for the
pictures associated
to businesses with
multi labels
Image individual label retrieving
via captions of images with unique label
Good for Kids Good for Groups Good for Dinner
MultinomialNBClassifier
Train Accuracy
Test Accuracy
0.8943
0.6662
Final Results
Accross various classifiers, Linear SVC, SGD and Multinomial models gave the best
accuracy.
From the analysis of the words contributing the most to each class (bar chart above), we add additional stop words such as:
'page', 'restaurant', 'area', 'place', 'interior', 'outside', 'like', 'sauce', 'view'
Data Understanding
Our initial data consisted of two files. The first mapped
each business to the various attributes we had to
predict (possibly multiple true labels per business) :
In the second data set, each row consisted of a picture
id with its caption if any and the business it was related
to:
We can observe below that our classes were not evenly
distributed. More particularly, we had very few ’good for
dinner’ businesses. Furthermore, displaying the
correlation between the target labels shows that
businesses that are good for kids and those that are
good for groups are relatively negatively correlated, as
we could expect (as we show in the caption analysis
part, good for group business are more related to key
words as ’beer’, ’bar’, ‘night’, …) :
Business multi labels prediction
& Final results
Finally, we predicted the business labels based on the predicted labels of the
images associated to each business : if a business had at least one image
with a given label, we would predict that label for the business as well.
We expose below our final results :
Images individual labels predictions
From the merge predictions of both caption based and image based models
(by simply averaging probability predictions), we predicted individual labels
for each image of the test set by taking the label corresponding to the
maximum predicted merged probability.
Conclusion
& Future Work
Ultimately, we were able to improve on our random
baseline with our joint CNN and caption-based
model. If we had more time, we would have used
another decision function to infer business labels
from our prediction photo labels, instead of just
taking the union of predicted labels across all photos.
We would have also explored other methods to
combine our predictions from our image and caption
based models. Finally, we would have included our
implementation of ∝SVM, which we believe would
have helped us in our attempts to generate image-
level labels.
Images classification via both caption and image analysis
Input:
3@96x96
64@24x24 128@8x8
256@8x8 256@4x4 256@4x4 256@4x4 256@2x2
256x2x2
256
3
Convolution
+ Normalization
+ ReLU
+ MaxPooling (4x4)
Convolution
+ Normalization
+ ReLU
+ MaxPooling (3x3)
Convolution
+ Normalization
+ ReLU
Convolution
+ Normalization
+ ReLU
+ MaxPooling (2x2)
Convolution
+ Normalization
+ ReLU
Convolution
+ Normalization
+ ReLU
Convolution
+ Normalization
+ ReLU
+ MaxPooling (2x2)
Full Connection
Linear Transforrmation
+ DropOut
Normalization
+ ReLU
+ Dropout
+ Linear Transforrmation
Data Augmentation
on Good for Dinner class
to generate more data
Data
Augmentation
on Input
CNNClassifier
Train Accuracy 0. 6711
0. 6021Test Accuracy
Final Results (after 92 epochs)Data Augmentation
Original
Normalized
Input Images
Augmented
Normalized
Input Images
Once recovered what we feel confident in being images unique labels, we trained 2 fully supervised models separately :
1) A text classifier on the images captions, using the exact same methodology as before
2) A Convolutional Neural Network on the images, which architecture and results are exposed in this section
Visualization of the 128 feature maps of the second layer
Training & validation mean class accuracy f(epochs)
0.9318Precision
Recall 0.5405
0.6841F1 score
Could you tell us, by looking at this picture, if this
restaurant looks rather good for kids? Would you
rather recommend it as takeaway or for dinner? This
is typically the kind of high level, context information
customers would like to know before choosing the
next restaurant to go to. The kind of questions you
could ask a friend that you know already went to this
place for example. But what if your favorite
recommendation platform / research engine / social
network could answer all your questions? Of course,
it is not easy to find data that carry such high level
information. But both the large amount of pictures
uploaded by customers and possibilities offered by
machine learning and deep learning techniques
makes it interesting to try to answer such questions.
Business Case