SlideShare a Scribd company logo
AKA
identifies unique coupons given different names in
the SnipSnap coupon database using a combination
of k-means clustering and "smoking gun" feature
based rule inference.

Github: https://github.com/snipsnap/aka-service/
Email: luke.otterblad@gmail.com
Step 1: Matches – same value,
description text and activity dates
Matches – pairs are shown ,but many more
than 2 items are matched into groups
More Examples…Different Barcodes –
Same Coupon

The above two were matched into a group. The
coupon below was also in the same set of American
Eagle but NOT put into the same group even though it
has some similarity….
How does it work?
• https://github.com/snipsnap/aka-service
• run via the command line
• $ python aka.py -db_pswd your_password store McDonald’s
id

face_value

offer_details

start_date

expiriation_date

988767 Free

With the purchase of an Egg McMuffin

2013-09-03

2013-10-31

989829 FREE Egg McMuffin

with the purchase of an Egg McMuffin

2013-09-03

2013-10-31

997447 Free Egg McMuffin

with the purchase of an Egg McMuffin

2013-09-03

2013-10-31
Active Coupons for a Store as a Graph
•

When the aka-service is started, for a particular store each active coupon is converted to dictionary
format and face value and details based features are converted to the python version of a graph and
normalized with some language processing.

•

Item - > Features
{"CouponA": {[‘free’, ‘with’, ‘the’, ‘purchase’, ‘of’, ‘an’, ‘egg’, ‘mcmuffin’]
{"CouponB": {[‘free’, ‘with’, ‘the’, ‘purchase’, ‘of’, ‘an’, ‘egg’, ‘mcmuffin’]

•

Features -> Item
{“egg":["CouponA","CouponB"],
“mcmuffin": ["CouponA","CouponB"],
“free": ["CouponA","CouponB"],
“with":["CouponA","CouponB"],
“the": ["CouponA","CouponB"]}
"purchase": ["CouponA","CouponB"]
“of": ["CouponA","CouponB"]}
“an": ["CouponA","CouponB"]}
Despite different text AKA identifies all
of these as the same item
id

face_value

988767 Free

989829 FREE Egg McMuffin

997447 Free Egg McMuffin

offer_details

With the purchase of an Egg McMuffin

with the purchase of an Egg McMuffin

with the purchase of an Egg McMuffin

start_date

2013-09-03

2013-09-03

2013-09-03

expiriation_date

aka_guid

2013-10-31

de5086f035bc-11e38da3005056c0000
8

2013-10-31

de5086f035bc-11e38da3005056c0000
8

2013-10-31

de5086f035bc-11e38da3005056c0000
8
Free is treated as a value keyword
(along with % and $ descriptions)
But, words and value
alone don’t create the match. Expiry
date also matters
Coupon with No Barcode connected to
the same offer with a barcode

Same offer value (free mini candle) and same data range (September 9October 6, 2013)
Matching picture and computer
images
A change in degree…but the same
coupon
Smoking Gun Features
• A Smoking gun feature for a coupon is a piece of
information that identifies it as being the same real
world item as another coupon (with near certainty).
• There are two sources of such identification in the
database. The first is a barcode_id. Multiple coupons
that have the same barcode_id are indeed the same
physical coupon. The second is a promo_code.
• Two coupons that have the same promo_code are the
same coupon 95% + of the time. (Some stores like
Dunkin Donuts don’t use unique codes…but more on
that later)
More Matches

Above two coupons are matched, and are also
NOT matched with the below coupon despite
having an extremely similar description and
validity:

The code in the upper right hand corner (9152 versus 9992 –the smoking gun)
helps significantly in separating them into a different identification.
Two coupons Not matched, even
though they have the same description
and similar text

(they are valid at different
times)
Finding smoother images

I experimented with using the number of recorded features as an indicator of
picture quality – but that didn’t have much correlation. What did work was
using the picture with the highest number of redemptions within an aka group
Better images
The Dollar Store $1 Off coupon
problem – likely to be many of those

These four were originally matched. But I had to introduce the notion of a confidence
percentage.
This is largely because AKA weights the value of an item more heavily than the
details words describing the offer (for most stores they have few items that are the
same price)
More equal prices, but with high
confidence set
Trouble Spots: AKA identifies same offer due to
assumed smoking gun, but while there is the
same barcode there is a different expiry.

Ignoring PLU for Dunkin Donuts (and other publishers that duplicate promocodes)
and going with 99% confidence does the trick.
There’s Exceptions to every rule

• Coupons are no different
• In the settings.yaml (pictured above) you can define
exceptions to global rules.
• What pop_smoking_gun tells aka is that for Dunkin’
Donuts the global rules of promo_code and barcode_id
does not apply– for Dunkin Donuts’ they don’t create PLU
codes as unique to an offer.
Another example

Ignoring PLU for Dunkin Donuts (and other publishers that duplicate promocodes)
and going with 99% confidence does the trick.
But knowing the store “rules” also helps
correct errors (if they stick to unique codes)
Mechanical Turk expiry: 10/17/2012

Mechanical Turk expiry: 10/7/2012

http://c346897.r97.cf1.rackcdn.c
http://c346897.r97.cf1.rackcdn.com/d32b578eom/cd0faf92-f85e-11e2-9f66fd2a-11e2-9be6-40406c9e1e47.jpg
40406c9e1e47.jpg
Since Bed Bath & Beyond id’s and promocodes
indicate the same item aka can reconcile the mistake
AKA- never misinterpret a store's
coupon rules again
ids

sharable

descrption_text

Aka_guid

987120

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

987271

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

988484

1

save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cdf139439
005056c00008

989519

1

save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd9522
005056c00008

989774

1 save 20% on your entire purchase bath body works

990040

0

990943

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

992970

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

992998

0 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

994314

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

75926f4f-328f-11e3-a3cd005056c00008

save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cdf139492
005056c00008

10 coupons all identified as the same item with some marked sharable and some not.
Suppose a publisher had submitted coupon 990040 to not be shareable……
AKA- never misinterpret a store's coupon
rules again
sharabl
descrption_text
e

ids

Aka_guid

aka_sharable

987120

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

987271

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

988484

1

save 20% on your entire purchase bath body works
f139439

75926f4f-328f-11e3-a3cd005056c00008

0

989519

1

save 20% on your entire purchase bath body works
9522

75926f4f-328f-11e3-a3cd005056c00008

0

989774

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

990040

0

save 20% on your entire purchase bath body works
f139492

75926f4f-328f-11e3-a3cd005056c00008

0

990943

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

992970

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

992998

0 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

994314

1 save 20% on your entire purchase bath body works

75926f4f-328f-11e3-a3cd005056c00008

0

An easy feature could be to treat a single not sharable within an aka group as a
“presidential” vote and switch all to not sharable. This can also work for items
tagged as manufacturer coupons. You’d basically only need 1 tag from Mechanical
Turk (or a from classifier).
Exact Matches

http://c346897.r97.cf1.rackcdn.com/5a621136-1511-11e3a7d0-40406c9e1e47-thumb.jpg
http://c346897.r97.cf1.rackcdn.com/34605f52-1515-11e38576-40406c9e1e47-thumb.jpg
Kroger’s matches

Kroger’s requires the highest confidence of any store, as many of their coupons
are different only by a single word. These will match (incorrectly) without a
high confidence set. Listed below is a sample false match made by AKA:
Same item in the database twice for
Macy’s

http://c346897.r97.cf1.rackcdn.com/59667340-1588-11e3-a8e340406c9e1e47-thumb.jpg
http://c346897.r97.cf1.rackcdn.com/ac4dc266-1588-11e3-a7d040406c9e1e47-thumb.jpg
Same item again

http://c346897.r97.cf1.rackcdn.com/25ddfb40-13d2-11e3-998b-40406c9e1e47thumb.jpg
http://c346897.r97.cf1.rackcdn.com/25ddfb40-13d2-11e3-998b-40406c9e1e47thumb.jpg
Rougher Image Connected with a
better version at McDonalds
Does a big mac by any other name, still
taste like a big mac?
Digital and print match
More Matches
Better coupon picture identification
Occasional data entry errors can lead
to bad reconciliation
aka_guid

id

barcode_id

alt_barcode face_
_id
value

offer_details

$5.00
Off
Save $5.00 On Your Purchase
0
$25.0 Of $25.00 Or More
0

2719bf74-40b611e3-86dd22000a91806d

421909

138859

2719bf74-40b611e3-86dd22000a91806d

539197

46299

0

Save
On Any Aveeno Product
$1.00

2719bf74-40b611e3-86dd22000a91806d

560927

138859

0

Save
On any
$1.00

2719bf74-40b611e3-86dd22000a91806d

595323

138859

0

20%
Off

1 Regular Priced Item

Here the 99% reliable barcode_id is idenified with 3 different items (for Toys R Us)
These three items were matched via barcode which I can only assume is some
type of data entry error. The difference is that for every other toys”r”us coupon
the smoking gun rules are valid. These items barcodes are recorded incorrectly
But it is an isolated error
Background for entity resolution (aka
collective reconciliation, de-duping)
• Chapter 20 of Beautiful Data “Connecting Data” by Toby
Segaran (who I think likely wrote the chapter while
working on the YouTube reconciliation).
• Indrajit Bhattacharya’s PhD dissertation, which you can find
at: http://www.lib.umd.edu/drum/handle/1903/4241
• About me: Father of 2 lovely daughters with my wife
Emma. Programmer, Statistician, Pot Limit Omaha and
Mixed Game poker semi-professional (though I don’t get
much time for poker nowadays). I'm located in historic
Northfield, MN where I share an office with my Jack Russell
Terrier, Kirby.
• Email: luke.otterblad@gmail.com.
Questions

More Related Content

Viewers also liked

Unit 2 past events and history
Unit 2   past events and historyUnit 2   past events and history
Unit 2 past events and historykimngan_ulis
 
[Séminaire en ligne] Bonnes pratiques marketing cross-canal
[Séminaire en ligne] Bonnes pratiques marketing cross-canal[Séminaire en ligne] Bonnes pratiques marketing cross-canal
[Séminaire en ligne] Bonnes pratiques marketing cross-canal
Experian
 
Unit 1 – people and relationships
Unit 1 – people and relationshipsUnit 1 – people and relationships
Unit 1 – people and relationshipskimngan_ulis
 
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...
François Renaville
 
Office Accomodation
Office AccomodationOffice Accomodation
Office Accomodationaizellbernal
 

Viewers also liked (8)

Unit 2 past events and history
Unit 2   past events and historyUnit 2   past events and history
Unit 2 past events and history
 
[Séminaire en ligne] Bonnes pratiques marketing cross-canal
[Séminaire en ligne] Bonnes pratiques marketing cross-canal[Séminaire en ligne] Bonnes pratiques marketing cross-canal
[Séminaire en ligne] Bonnes pratiques marketing cross-canal
 
Unit 1 – people and relationships
Unit 1 – people and relationshipsUnit 1 – people and relationships
Unit 1 – people and relationships
 
Marriott Hotel
Marriott HotelMarriott Hotel
Marriott Hotel
 
Banapple
BanappleBanapple
Banapple
 
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...
 
Strategic HR
Strategic HRStrategic HR
Strategic HR
 
Office Accomodation
Office AccomodationOffice Accomodation
Office Accomodation
 

Similar to Aka examples

Wildcard and Url Coupons: Magento Extension by Amasty. User Guide
Wildcard and Url Coupons: Magento Extension by Amasty. User GuideWildcard and Url Coupons: Magento Extension by Amasty. User Guide
Wildcard and Url Coupons: Magento Extension by Amasty. User Guide
Amasty
 
PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...
PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...
PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...
Clean Digital
 
Qrb 501Education Specialist / snaptutorial.com
Qrb 501Education Specialist / snaptutorial.comQrb 501Education Specialist / snaptutorial.com
Qrb 501Education Specialist / snaptutorial.com
McdonaldRyan119
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
celenarouzie
 
Amazon case study
Amazon case studyAmazon case study
Amazon case study
Broz Asset Private Limited
 
Master's Project Report - Minchao Lin
Master's Project Report - Minchao LinMaster's Project Report - Minchao Lin
Master's Project Report - Minchao LinMinchao Lin
 
SMX East 2016 - Optimize Google Shopping 
with the help of Euclid
SMX East 2016 - Optimize Google Shopping 
with the help of EuclidSMX East 2016 - Optimize Google Shopping 
with the help of Euclid
SMX East 2016 - Optimize Google Shopping 
with the help of Euclid
Whoop!
 
Class8 2.ppt
Class8 2.pptClass8 2.ppt
Class8 2.ppt
Ola Hashim
 
Five hot Niche Products To Sell On eBay
Five hot Niche Products To Sell On eBayFive hot Niche Products To Sell On eBay
Five hot Niche Products To Sell On eBay
A F
 
Shopping Ads 101
Shopping Ads 101Shopping Ads 101
Shopping Ads 101
Stukent Inc.
 
5 meijer baby-rewards_annotations_06_10_2013
5  meijer baby-rewards_annotations_06_10_20135  meijer baby-rewards_annotations_06_10_2013
5 meijer baby-rewards_annotations_06_10_2013
zakaku50
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
Five Mill Tree Method 1208
Five Mill Tree Method 1208Five Mill Tree Method 1208
Five Mill Tree Method 1208David Szetela
 
Five Mill Tree Method
Five Mill Tree MethodFive Mill Tree Method
Five Mill Tree Methoddansoha
 
It's not too late! Optimize your Google Shopping campaigns in time for the Ho...
It's not too late! Optimize your Google Shopping campaigns in time for the Ho...It's not too late! Optimize your Google Shopping campaigns in time for the Ho...
It's not too late! Optimize your Google Shopping campaigns in time for the Ho...
Crealytics
 
Shopify Theme Building Workshop
Shopify Theme Building WorkshopShopify Theme Building Workshop
Shopify Theme Building Workshop
Keir Whitaker
 
Expanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SASExpanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SAS
Michael Mina
 
Paid Search Marketing & Analytics Report
Paid Search Marketing & Analytics ReportPaid Search Marketing & Analytics Report
Paid Search Marketing & Analytics Report
Carolyn A. Chinchilla - Digital Marketing Expert
 

Similar to Aka examples (20)

Introtosqltuning
IntrotosqltuningIntrotosqltuning
Introtosqltuning
 
Wildcard and Url Coupons: Magento Extension by Amasty. User Guide
Wildcard and Url Coupons: Magento Extension by Amasty. User GuideWildcard and Url Coupons: Magento Extension by Amasty. User Guide
Wildcard and Url Coupons: Magento Extension by Amasty. User Guide
 
PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...
PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...
PPC Masters 2016 - Query Power - Breaking Down Search Queries to Improve our ...
 
Qrb 501Education Specialist / snaptutorial.com
Qrb 501Education Specialist / snaptutorial.comQrb 501Education Specialist / snaptutorial.com
Qrb 501Education Specialist / snaptutorial.com
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
 
Amazon case study
Amazon case studyAmazon case study
Amazon case study
 
Master's Project Report - Minchao Lin
Master's Project Report - Minchao LinMaster's Project Report - Minchao Lin
Master's Project Report - Minchao Lin
 
SMX East 2016 - Optimize Google Shopping 
with the help of Euclid
SMX East 2016 - Optimize Google Shopping 
with the help of EuclidSMX East 2016 - Optimize Google Shopping 
with the help of Euclid
SMX East 2016 - Optimize Google Shopping 
with the help of Euclid
 
Class8 2.ppt
Class8 2.pptClass8 2.ppt
Class8 2.ppt
 
Design_writeup (1)
Design_writeup (1)Design_writeup (1)
Design_writeup (1)
 
Five hot Niche Products To Sell On eBay
Five hot Niche Products To Sell On eBayFive hot Niche Products To Sell On eBay
Five hot Niche Products To Sell On eBay
 
Shopping Ads 101
Shopping Ads 101Shopping Ads 101
Shopping Ads 101
 
5 meijer baby-rewards_annotations_06_10_2013
5  meijer baby-rewards_annotations_06_10_20135  meijer baby-rewards_annotations_06_10_2013
5 meijer baby-rewards_annotations_06_10_2013
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Five Mill Tree Method 1208
Five Mill Tree Method 1208Five Mill Tree Method 1208
Five Mill Tree Method 1208
 
Five Mill Tree Method
Five Mill Tree MethodFive Mill Tree Method
Five Mill Tree Method
 
It's not too late! Optimize your Google Shopping campaigns in time for the Ho...
It's not too late! Optimize your Google Shopping campaigns in time for the Ho...It's not too late! Optimize your Google Shopping campaigns in time for the Ho...
It's not too late! Optimize your Google Shopping campaigns in time for the Ho...
 
Shopify Theme Building Workshop
Shopify Theme Building WorkshopShopify Theme Building Workshop
Shopify Theme Building Workshop
 
Expanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SASExpanding Your Sales Funnel with SAS
Expanding Your Sales Funnel with SAS
 
Paid Search Marketing & Analytics Report
Paid Search Marketing & Analytics ReportPaid Search Marketing & Analytics Report
Paid Search Marketing & Analytics Report
 

Recently uploaded

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Aka examples

  • 1. AKA identifies unique coupons given different names in the SnipSnap coupon database using a combination of k-means clustering and "smoking gun" feature based rule inference. Github: https://github.com/snipsnap/aka-service/ Email: luke.otterblad@gmail.com
  • 2. Step 1: Matches – same value, description text and activity dates
  • 3. Matches – pairs are shown ,but many more than 2 items are matched into groups
  • 4. More Examples…Different Barcodes – Same Coupon The above two were matched into a group. The coupon below was also in the same set of American Eagle but NOT put into the same group even though it has some similarity….
  • 5. How does it work? • https://github.com/snipsnap/aka-service • run via the command line • $ python aka.py -db_pswd your_password store McDonald’s id face_value offer_details start_date expiriation_date 988767 Free With the purchase of an Egg McMuffin 2013-09-03 2013-10-31 989829 FREE Egg McMuffin with the purchase of an Egg McMuffin 2013-09-03 2013-10-31 997447 Free Egg McMuffin with the purchase of an Egg McMuffin 2013-09-03 2013-10-31
  • 6. Active Coupons for a Store as a Graph • When the aka-service is started, for a particular store each active coupon is converted to dictionary format and face value and details based features are converted to the python version of a graph and normalized with some language processing. • Item - > Features {"CouponA": {[‘free’, ‘with’, ‘the’, ‘purchase’, ‘of’, ‘an’, ‘egg’, ‘mcmuffin’] {"CouponB": {[‘free’, ‘with’, ‘the’, ‘purchase’, ‘of’, ‘an’, ‘egg’, ‘mcmuffin’] • Features -> Item {“egg":["CouponA","CouponB"], “mcmuffin": ["CouponA","CouponB"], “free": ["CouponA","CouponB"], “with":["CouponA","CouponB"], “the": ["CouponA","CouponB"]} "purchase": ["CouponA","CouponB"] “of": ["CouponA","CouponB"]} “an": ["CouponA","CouponB"]}
  • 7. Despite different text AKA identifies all of these as the same item id face_value 988767 Free 989829 FREE Egg McMuffin 997447 Free Egg McMuffin offer_details With the purchase of an Egg McMuffin with the purchase of an Egg McMuffin with the purchase of an Egg McMuffin start_date 2013-09-03 2013-09-03 2013-09-03 expiriation_date aka_guid 2013-10-31 de5086f035bc-11e38da3005056c0000 8 2013-10-31 de5086f035bc-11e38da3005056c0000 8 2013-10-31 de5086f035bc-11e38da3005056c0000 8
  • 8. Free is treated as a value keyword (along with % and $ descriptions)
  • 9. But, words and value alone don’t create the match. Expiry date also matters
  • 10. Coupon with No Barcode connected to the same offer with a barcode Same offer value (free mini candle) and same data range (September 9October 6, 2013)
  • 11. Matching picture and computer images
  • 12. A change in degree…but the same coupon
  • 13. Smoking Gun Features • A Smoking gun feature for a coupon is a piece of information that identifies it as being the same real world item as another coupon (with near certainty). • There are two sources of such identification in the database. The first is a barcode_id. Multiple coupons that have the same barcode_id are indeed the same physical coupon. The second is a promo_code. • Two coupons that have the same promo_code are the same coupon 95% + of the time. (Some stores like Dunkin Donuts don’t use unique codes…but more on that later)
  • 14. More Matches Above two coupons are matched, and are also NOT matched with the below coupon despite having an extremely similar description and validity: The code in the upper right hand corner (9152 versus 9992 –the smoking gun) helps significantly in separating them into a different identification.
  • 15. Two coupons Not matched, even though they have the same description and similar text (they are valid at different times)
  • 16. Finding smoother images I experimented with using the number of recorded features as an indicator of picture quality – but that didn’t have much correlation. What did work was using the picture with the highest number of redemptions within an aka group
  • 18. The Dollar Store $1 Off coupon problem – likely to be many of those These four were originally matched. But I had to introduce the notion of a confidence percentage. This is largely because AKA weights the value of an item more heavily than the details words describing the offer (for most stores they have few items that are the same price)
  • 19. More equal prices, but with high confidence set
  • 20. Trouble Spots: AKA identifies same offer due to assumed smoking gun, but while there is the same barcode there is a different expiry. Ignoring PLU for Dunkin Donuts (and other publishers that duplicate promocodes) and going with 99% confidence does the trick.
  • 21. There’s Exceptions to every rule • Coupons are no different • In the settings.yaml (pictured above) you can define exceptions to global rules. • What pop_smoking_gun tells aka is that for Dunkin’ Donuts the global rules of promo_code and barcode_id does not apply– for Dunkin Donuts’ they don’t create PLU codes as unique to an offer.
  • 22. Another example Ignoring PLU for Dunkin Donuts (and other publishers that duplicate promocodes) and going with 99% confidence does the trick.
  • 23. But knowing the store “rules” also helps correct errors (if they stick to unique codes) Mechanical Turk expiry: 10/17/2012 Mechanical Turk expiry: 10/7/2012 http://c346897.r97.cf1.rackcdn.c http://c346897.r97.cf1.rackcdn.com/d32b578eom/cd0faf92-f85e-11e2-9f66fd2a-11e2-9be6-40406c9e1e47.jpg 40406c9e1e47.jpg Since Bed Bath & Beyond id’s and promocodes indicate the same item aka can reconcile the mistake
  • 24. AKA- never misinterpret a store's coupon rules again ids sharable descrption_text Aka_guid 987120 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 987271 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 988484 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cdf139439 005056c00008 989519 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd9522 005056c00008 989774 1 save 20% on your entire purchase bath body works 990040 0 990943 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 992970 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 992998 0 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 994314 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 75926f4f-328f-11e3-a3cd005056c00008 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cdf139492 005056c00008 10 coupons all identified as the same item with some marked sharable and some not. Suppose a publisher had submitted coupon 990040 to not be shareable……
  • 25. AKA- never misinterpret a store's coupon rules again sharabl descrption_text e ids Aka_guid aka_sharable 987120 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 987271 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 988484 1 save 20% on your entire purchase bath body works f139439 75926f4f-328f-11e3-a3cd005056c00008 0 989519 1 save 20% on your entire purchase bath body works 9522 75926f4f-328f-11e3-a3cd005056c00008 0 989774 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 990040 0 save 20% on your entire purchase bath body works f139492 75926f4f-328f-11e3-a3cd005056c00008 0 990943 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 992970 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 992998 0 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 994314 1 save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd005056c00008 0 An easy feature could be to treat a single not sharable within an aka group as a “presidential” vote and switch all to not sharable. This can also work for items tagged as manufacturer coupons. You’d basically only need 1 tag from Mechanical Turk (or a from classifier).
  • 27. Kroger’s matches Kroger’s requires the highest confidence of any store, as many of their coupons are different only by a single word. These will match (incorrectly) without a high confidence set. Listed below is a sample false match made by AKA:
  • 28. Same item in the database twice for Macy’s http://c346897.r97.cf1.rackcdn.com/59667340-1588-11e3-a8e340406c9e1e47-thumb.jpg http://c346897.r97.cf1.rackcdn.com/ac4dc266-1588-11e3-a7d040406c9e1e47-thumb.jpg
  • 30. Rougher Image Connected with a better version at McDonalds
  • 31. Does a big mac by any other name, still taste like a big mac?
  • 34.
  • 35. Better coupon picture identification
  • 36. Occasional data entry errors can lead to bad reconciliation aka_guid id barcode_id alt_barcode face_ _id value offer_details $5.00 Off Save $5.00 On Your Purchase 0 $25.0 Of $25.00 Or More 0 2719bf74-40b611e3-86dd22000a91806d 421909 138859 2719bf74-40b611e3-86dd22000a91806d 539197 46299 0 Save On Any Aveeno Product $1.00 2719bf74-40b611e3-86dd22000a91806d 560927 138859 0 Save On any $1.00 2719bf74-40b611e3-86dd22000a91806d 595323 138859 0 20% Off 1 Regular Priced Item Here the 99% reliable barcode_id is idenified with 3 different items (for Toys R Us)
  • 37. These three items were matched via barcode which I can only assume is some type of data entry error. The difference is that for every other toys”r”us coupon the smoking gun rules are valid. These items barcodes are recorded incorrectly
  • 38. But it is an isolated error
  • 39. Background for entity resolution (aka collective reconciliation, de-duping) • Chapter 20 of Beautiful Data “Connecting Data” by Toby Segaran (who I think likely wrote the chapter while working on the YouTube reconciliation). • Indrajit Bhattacharya’s PhD dissertation, which you can find at: http://www.lib.umd.edu/drum/handle/1903/4241 • About me: Father of 2 lovely daughters with my wife Emma. Programmer, Statistician, Pot Limit Omaha and Mixed Game poker semi-professional (though I don’t get much time for poker nowadays). I'm located in historic Northfield, MN where I share an office with my Jack Russell Terrier, Kirby. • Email: luke.otterblad@gmail.com.

Editor's Notes

  1. http://c346897.r97.cf1.rackcdn.com/980ac2e4-1494-11e3-998b-40406c9e1e47-thumb.jpg