SlideShare a Scribd company logo
1 of 34
Download to read offline
Profiling US Restaurants from Billions
of Payment Card Transactions
Himel Dev Hossein Hamooni
Payment Card Transactions
Across the globe, billions of
people regularly use payment
cards (say debit and credit) to
pay for goods and services.
Payment processing companies
handle these transactions and
record their attributes in the
form of transaction data.
Transaction data contain rich
insights into the behavior of
cardholders and the business
of merchants.
Transactional insights can benefit various applications, such
as payment fraud detection and merchant recommendation.
However, utilizing transactional insights often require
auxiliary information about merchants that are missing
from the payment company’s perspective.
Can’t We Just Use Google orYelp?
Cost: It is expensive to acquire
merchant information, especially for
commercial purposes.
Unavailability: Payment cards are
used in many countries where crowd
sourced merchant data is unavailable.
Unreliability: Acquired information
become outdated, as new merchants
appear or old merchants disappear.
Our big goal is to infer latent merchant attributes from
transaction data, without using external sources.
Proof-of-Concept Use Case
Infer the cuisine types of restaurants by analyzing only transaction data.
Restaurant
Recommendation
Fraud
Detection
Transaction Data
Our dataset contains four billion debit and credit card transactions in
more than half a million US restaurants within three months.
We aim to develop a framework for inferring the cuisine types of
restaurants from the transaction data.
Cardholder Merchant Zip Date & Time $$ $$+
5r8g3d0u5q Peking Cafe 55075 20-06-2017 18:03:05 12.65 13.65
5r8g3d0u5q Health Junkie 55075 12-07-2017 19:21:17 17.81 19.81
5r8g3d0u5q Pineda Tacos 55075 13-07-2017 13:06:04 10.99 12.00
1
Cardholder Merchant Zip Date & Time $$ $$+
5r8g3d0u5q Peking Cafe 55075 20-06-2017 18:03:05 12.65 13.65
5r8g3d0u5q Health Junkie 55075 12-07-2017 19:21:17 17.81 19.81
5r8g3d0u5q Pineda Tacos 55075 13-07-2017 13:06:04 10.99 12.00
Location PriceTime
Group CapacityTips
Loyalty
Name Label
Peking
Cafe
Chinese
Pineda
Tacos
Mexican
Health
Junkie
? Embedding
2
2
3
Deep Neural Network
Cuisine Inference
Framework
Weakly Supervised
Label Generation
Statistical Feature and
Neural Embedding Extraction
Deep Neural Network
Based Classification
1
2
3
1.Weakly Supervised Label Generation
Cuisine Taxonomy Creation
Id Cuisine Type Subcategories
1 Latin American Mexican, Cuban, Brazilian, Colombian
2 European French, Italian, German, Polish, Irish
3 Mediterranean Greek,Turkish
Middle Eastern Saudi Arabian, Lebanese, Persian,Afghan
African Moroccan, Ethiopian, Eritrean
4 South Asian Indian, Pakistani, Nepalese, Bangladeshi
5 South East Asian Thai,Vietnamese, Indonesian, Malaysian
6 East Asian Chinese, Japanese, Korean, Mongolian
7 Grill and Steak Grill, Steakhouse
8 Fastfood Sandwich, Burger, Pizza
9 Bar Bar, Pub,Tavern, Inn
10 Dessert Ice Cream, Cafe, Bakery, Juice
We create a cuisine
taxonomy for the US
restaurants.
Our taxonomy contains
the ten most popular
cuisine types in the US.
Each of these major
cuisine types cover many
minor cuisine types.
Seed Word Compilation
Restaurant Name Cuisine Type
Peking Garden Chinese
Golden Wok Chinese
Ambar India Indian
Himalayan Chimney Indian
Biaggi's Ristorante Italiano Italian
Pizzeria Antica Italian
Maize Mexican Grill Mexican
Burrito King Mexican
Garbanzo Mediterranean Fresh Mediterranean
Jerusalem Mediterranean
Good Fella ???
We compile a set of seed words
for each cuisine type in our
taxonomy.
We use these words as common
patterns to generate cuisine labels
for restaurant names.
Currently, we have a list of 225
seed words that represent the ten
major cuisine types.
Bootstrapped Label Expansion
We extract new (beyond seed) words from restaurant names to utilize as highly
accurate patterns for increasing the coverage of labeled restaurants.
Frequency: The word needs to appear in θf fraction of all restaurant
names
Precision: If we use the word and its majority label as a labeling rule, the
rule needs to be true for θp fraction of labeled restaurants
Significance: The ratio of labeled and unlabeled restaurants that contain the
word should be θs
Using seed and bootstrapped words, we
could label 35% restaurants in our dataset.
Topic Modeling
To augment the keyword-based approach, we develop a custom topic model.
Issue Description Solution
Monolith Many restaurant names consist of a single word Sprinkling
Sparse Sparse word co-occurrence patterns in restaurant names BTM
LongTail Long-tail distribution of words in restaurant names Stratification
The resultant topics (cuisine types) are coherent and consistent
with the cuisine types generated by our keyword-based approach.
1I. Statistical Feature and Neural Embedding Extraction
Statistical Features
Feature Type Description
Pricing The deciles of authorized amount in transactions
Tipping culture The deciles of (settlement amount - authorized amount) in transactions
Serving capacity The deciles of hourly transaction count
Party size The proportion of transactions for different party size
Party pricing The average authorized amount for different party size
Temporal pattern I The distribution of number of transactions over days of the week
Temporal pattern II The distribution of number of transactions over the hours of weekdays
Temporal pattern III The distribution of number of transactions over the hours of weekends
Customer revisitation The deciles of the number of revisits by the customers
Customer loyalty The deciles of the number of restaurants visited by the customers
Location The digits of restaurant zipcode and corresponding location granularity
Statistical Feature Insights
X
Customer Restaurant Interaction
U1 U2 U3 U4
R1 R2 R3 R4 R5
Micro and Macro Hypotheses
The distinction between the hypotheses lies in application: individual vs group.
Micro Hypothesis: The compatibility between a customer’s preferences and a
restaurant’s attributes is a good predictor of whether the customer will visit
the restaurant. For example, a vegetarian is likely to visit an Indian restaurant.
Macro Hypothesis: The type of customers who visit a given restaurant (as a
whole) is a good predictor of its attributes. For example, a restaurant is
unlikely to be a steakhouse if many of its customers are vegetarian.
Micro Embedding
U1 U2 U3 U4
R1 R2 R3 R4 R5
R1 R3R2 R2 R5 R4Micro: word2vec
Macro Embedding
U1 U2 U3 U4
R1 R2 R3 R4 R5
U1 U3U2 U2 U4 U3Macro: doc2vec
Micro and Macro Embedding
U1 U2 U3 U4
R1 R2 R3 R4 R5
U1 U3U2 U2 U4 U3
R1 R3R2 R2 R5 R4Micro: word2vec
Macro: doc2vec
Name Embedding
We generate name embeddings to utilize the non-labeling words in names.
We first remove the labeling words from each restaurant name.
We then retrieve pre-trained GloVe embedding for each remaining word in name.
We finally combine the word embeddings via max pooling.
We generate three sets of restaurant embeddings
to represent the latent characteristics of restaurants.
1II. Deep Neural Network Based Classification
DNN Models
Shallow Feedforward: This is a feedforward neural network with two hidden layers
Deep Feedforward: This is a feedforward neural network with four hidden layers
Deep Feedforward Res: This is a deep feedforward network with residual connections
Wide and Deep: This is the wide and deep network that captures feature interaction
Deep and Cross: This is the deep and cross network that applies feature crossing
To demonstrate the effectiveness of our framework, we develop several DNN models.
Price Tips Location…
Statistical Features
Micro Macro Name
Embeddings
Concatenated Layer
Hidden Layer 1
Hidden Layer 2
S
Output
…
…
+ + + + + +
Deep
Feedforward
with Residual
Experimental Evaluation
Performance Comparison
The Deep Feedforward Network outperforms Wide and Deep, and Deep and Cross.
Adding residual connections boost the performance of the Deep Feedforward Network.
DNN Model Accuracy
Shallow Feedforward 0.743
Deep Feedforward 0.756
Deep Feedforward with Residual 0.762
Wide and Deep 0.740
Deep and Cross 0.746
Confusion Matrix
Confusion Matrix
Confusion Matrix
Ablation Study
Ablation Study
Ablation Study
Summary
We developed a framework for inferring the cuisine types of restaurants from debit
and credit card transactions.
Our proposed framework consists of three steps: 1) weakly-supervised label
generation, 2) statistical feature and neural embedding extraction, and 3) deep neural
network based classification.
The proposed framework achieved a 76.2% accuracy in classifying the US restaurants.

More Related Content

Similar to Profiling US Restaurants from Billions of Payment Card Transactions

Revel Presents at Under the Radar
Revel Presents at Under the RadarRevel Presents at Under the Radar
Revel Presents at Under the Radar
Dealmaker Media
 
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Lora Cecere
 

Similar to Profiling US Restaurants from Billions of Payment Card Transactions (20)

How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisions
 
A Detailed Analysis of Food Delivery Aggregator Data (1).pptx
A Detailed Analysis of Food Delivery Aggregator Data (1).pptxA Detailed Analysis of Food Delivery Aggregator Data (1).pptx
A Detailed Analysis of Food Delivery Aggregator Data (1).pptx
 
How rubinson can help you drive growth in a digital age december 2019
How rubinson can help you drive growth in a digital age december 2019How rubinson can help you drive growth in a digital age december 2019
How rubinson can help you drive growth in a digital age december 2019
 
Dat analytics all verticals
Dat analytics all verticalsDat analytics all verticals
Dat analytics all verticals
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Behavior Analytics by Ronny Max
Behavior Analytics by Ronny MaxBehavior Analytics by Ronny Max
Behavior Analytics by Ronny Max
 
Restaurants of Seoul - "likes" prediction report
Restaurants of Seoul - "likes" prediction reportRestaurants of Seoul - "likes" prediction report
Restaurants of Seoul - "likes" prediction report
 
Analytical Strategies For Energy Marketing
Analytical Strategies For Energy MarketingAnalytical Strategies For Energy Marketing
Analytical Strategies For Energy Marketing
 
A Detailed Analysis of Food Delivery Aggregator Data (1).pdf
A Detailed Analysis of Food Delivery Aggregator Data (1).pdfA Detailed Analysis of Food Delivery Aggregator Data (1).pdf
A Detailed Analysis of Food Delivery Aggregator Data (1).pdf
 
A Detailed Analysis of Food Delivery Aggregator Data.pdf
A Detailed Analysis of Food Delivery Aggregator Data.pdfA Detailed Analysis of Food Delivery Aggregator Data.pdf
A Detailed Analysis of Food Delivery Aggregator Data.pdf
 
Revel Presents at Under the Radar
Revel Presents at Under the RadarRevel Presents at Under the Radar
Revel Presents at Under the Radar
 
Best Buy
Best BuyBest Buy
Best Buy
 
Solving Big Data Industry Use Cases with AWS Cloud Computing
Solving Big Data Industry Use Cases with AWS Cloud ComputingSolving Big Data Industry Use Cases with AWS Cloud Computing
Solving Big Data Industry Use Cases with AWS Cloud Computing
 
Ihop Cs Slide Final Xp
Ihop Cs Slide Final   XpIhop Cs Slide Final   Xp
Ihop Cs Slide Final Xp
 
01.seven principles scm
01.seven principles scm01.seven principles scm
01.seven principles scm
 
How to Visualize a Business
How to Visualize a BusinessHow to Visualize a Business
How to Visualize a Business
 
How to Visualize a Business
How to Visualize a BusinessHow to Visualize a Business
How to Visualize a Business
 
Using Lifetime Value to Optimize Your Digital Marketing Investments
Using Lifetime Value to Optimize Your Digital Marketing InvestmentsUsing Lifetime Value to Optimize Your Digital Marketing Investments
Using Lifetime Value to Optimize Your Digital Marketing Investments
 
Telecom analytics brochure
Telecom analytics brochure Telecom analytics brochure
Telecom analytics brochure
 
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
 

Recently uploaded

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
mikehavy0
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 

Recently uploaded (20)

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 

Profiling US Restaurants from Billions of Payment Card Transactions

  • 1. Profiling US Restaurants from Billions of Payment Card Transactions Himel Dev Hossein Hamooni
  • 2. Payment Card Transactions Across the globe, billions of people regularly use payment cards (say debit and credit) to pay for goods and services. Payment processing companies handle these transactions and record their attributes in the form of transaction data. Transaction data contain rich insights into the behavior of cardholders and the business of merchants. Transactional insights can benefit various applications, such as payment fraud detection and merchant recommendation.
  • 3. However, utilizing transactional insights often require auxiliary information about merchants that are missing from the payment company’s perspective.
  • 4. Can’t We Just Use Google orYelp? Cost: It is expensive to acquire merchant information, especially for commercial purposes. Unavailability: Payment cards are used in many countries where crowd sourced merchant data is unavailable. Unreliability: Acquired information become outdated, as new merchants appear or old merchants disappear.
  • 5. Our big goal is to infer latent merchant attributes from transaction data, without using external sources.
  • 6. Proof-of-Concept Use Case Infer the cuisine types of restaurants by analyzing only transaction data. Restaurant Recommendation Fraud Detection
  • 7. Transaction Data Our dataset contains four billion debit and credit card transactions in more than half a million US restaurants within three months. We aim to develop a framework for inferring the cuisine types of restaurants from the transaction data. Cardholder Merchant Zip Date & Time $$ $$+ 5r8g3d0u5q Peking Cafe 55075 20-06-2017 18:03:05 12.65 13.65 5r8g3d0u5q Health Junkie 55075 12-07-2017 19:21:17 17.81 19.81 5r8g3d0u5q Pineda Tacos 55075 13-07-2017 13:06:04 10.99 12.00
  • 8. 1 Cardholder Merchant Zip Date & Time $$ $$+ 5r8g3d0u5q Peking Cafe 55075 20-06-2017 18:03:05 12.65 13.65 5r8g3d0u5q Health Junkie 55075 12-07-2017 19:21:17 17.81 19.81 5r8g3d0u5q Pineda Tacos 55075 13-07-2017 13:06:04 10.99 12.00 Location PriceTime Group CapacityTips Loyalty Name Label Peking Cafe Chinese Pineda Tacos Mexican Health Junkie ? Embedding 2 2 3 Deep Neural Network Cuisine Inference Framework Weakly Supervised Label Generation Statistical Feature and Neural Embedding Extraction Deep Neural Network Based Classification 1 2 3
  • 10. Cuisine Taxonomy Creation Id Cuisine Type Subcategories 1 Latin American Mexican, Cuban, Brazilian, Colombian 2 European French, Italian, German, Polish, Irish 3 Mediterranean Greek,Turkish Middle Eastern Saudi Arabian, Lebanese, Persian,Afghan African Moroccan, Ethiopian, Eritrean 4 South Asian Indian, Pakistani, Nepalese, Bangladeshi 5 South East Asian Thai,Vietnamese, Indonesian, Malaysian 6 East Asian Chinese, Japanese, Korean, Mongolian 7 Grill and Steak Grill, Steakhouse 8 Fastfood Sandwich, Burger, Pizza 9 Bar Bar, Pub,Tavern, Inn 10 Dessert Ice Cream, Cafe, Bakery, Juice We create a cuisine taxonomy for the US restaurants. Our taxonomy contains the ten most popular cuisine types in the US. Each of these major cuisine types cover many minor cuisine types.
  • 11. Seed Word Compilation Restaurant Name Cuisine Type Peking Garden Chinese Golden Wok Chinese Ambar India Indian Himalayan Chimney Indian Biaggi's Ristorante Italiano Italian Pizzeria Antica Italian Maize Mexican Grill Mexican Burrito King Mexican Garbanzo Mediterranean Fresh Mediterranean Jerusalem Mediterranean Good Fella ??? We compile a set of seed words for each cuisine type in our taxonomy. We use these words as common patterns to generate cuisine labels for restaurant names. Currently, we have a list of 225 seed words that represent the ten major cuisine types.
  • 12. Bootstrapped Label Expansion We extract new (beyond seed) words from restaurant names to utilize as highly accurate patterns for increasing the coverage of labeled restaurants. Frequency: The word needs to appear in θf fraction of all restaurant names Precision: If we use the word and its majority label as a labeling rule, the rule needs to be true for θp fraction of labeled restaurants Significance: The ratio of labeled and unlabeled restaurants that contain the word should be θs Using seed and bootstrapped words, we could label 35% restaurants in our dataset.
  • 13. Topic Modeling To augment the keyword-based approach, we develop a custom topic model. Issue Description Solution Monolith Many restaurant names consist of a single word Sprinkling Sparse Sparse word co-occurrence patterns in restaurant names BTM LongTail Long-tail distribution of words in restaurant names Stratification The resultant topics (cuisine types) are coherent and consistent with the cuisine types generated by our keyword-based approach.
  • 14. 1I. Statistical Feature and Neural Embedding Extraction
  • 15. Statistical Features Feature Type Description Pricing The deciles of authorized amount in transactions Tipping culture The deciles of (settlement amount - authorized amount) in transactions Serving capacity The deciles of hourly transaction count Party size The proportion of transactions for different party size Party pricing The average authorized amount for different party size Temporal pattern I The distribution of number of transactions over days of the week Temporal pattern II The distribution of number of transactions over the hours of weekdays Temporal pattern III The distribution of number of transactions over the hours of weekends Customer revisitation The deciles of the number of revisits by the customers Customer loyalty The deciles of the number of restaurants visited by the customers Location The digits of restaurant zipcode and corresponding location granularity
  • 17. Customer Restaurant Interaction U1 U2 U3 U4 R1 R2 R3 R4 R5
  • 18. Micro and Macro Hypotheses The distinction between the hypotheses lies in application: individual vs group. Micro Hypothesis: The compatibility between a customer’s preferences and a restaurant’s attributes is a good predictor of whether the customer will visit the restaurant. For example, a vegetarian is likely to visit an Indian restaurant. Macro Hypothesis: The type of customers who visit a given restaurant (as a whole) is a good predictor of its attributes. For example, a restaurant is unlikely to be a steakhouse if many of its customers are vegetarian.
  • 19. Micro Embedding U1 U2 U3 U4 R1 R2 R3 R4 R5 R1 R3R2 R2 R5 R4Micro: word2vec
  • 20. Macro Embedding U1 U2 U3 U4 R1 R2 R3 R4 R5 U1 U3U2 U2 U4 U3Macro: doc2vec
  • 21. Micro and Macro Embedding U1 U2 U3 U4 R1 R2 R3 R4 R5 U1 U3U2 U2 U4 U3 R1 R3R2 R2 R5 R4Micro: word2vec Macro: doc2vec
  • 22. Name Embedding We generate name embeddings to utilize the non-labeling words in names. We first remove the labeling words from each restaurant name. We then retrieve pre-trained GloVe embedding for each remaining word in name. We finally combine the word embeddings via max pooling. We generate three sets of restaurant embeddings to represent the latent characteristics of restaurants.
  • 23. 1II. Deep Neural Network Based Classification
  • 24. DNN Models Shallow Feedforward: This is a feedforward neural network with two hidden layers Deep Feedforward: This is a feedforward neural network with four hidden layers Deep Feedforward Res: This is a deep feedforward network with residual connections Wide and Deep: This is the wide and deep network that captures feature interaction Deep and Cross: This is the deep and cross network that applies feature crossing To demonstrate the effectiveness of our framework, we develop several DNN models.
  • 25. Price Tips Location… Statistical Features Micro Macro Name Embeddings Concatenated Layer Hidden Layer 1 Hidden Layer 2 S Output … … + + + + + + Deep Feedforward with Residual
  • 27. Performance Comparison The Deep Feedforward Network outperforms Wide and Deep, and Deep and Cross. Adding residual connections boost the performance of the Deep Feedforward Network. DNN Model Accuracy Shallow Feedforward 0.743 Deep Feedforward 0.756 Deep Feedforward with Residual 0.762 Wide and Deep 0.740 Deep and Cross 0.746
  • 34. Summary We developed a framework for inferring the cuisine types of restaurants from debit and credit card transactions. Our proposed framework consists of three steps: 1) weakly-supervised label generation, 2) statistical feature and neural embedding extraction, and 3) deep neural network based classification. The proposed framework achieved a 76.2% accuracy in classifying the US restaurants.