The document describes AKA, a service that identifies unique coupons in a coupon database using clustering and rule inference techniques. It provides examples of how AKA matches coupons that are essentially the same offer but described differently. It discusses the techniques used, including creating graphs of coupon features to identify matches. It also addresses challenges like exceptions to typical matching rules for certain stores.
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...François Renaville
Notes de formation relatives au passage à la solution discovery Primo 4.1 (février 2013) et à l'upgrade vers la version 4.4.1 (automne 2013) dans les Bibliothèques de l'Université de Liège.
Présentation téléchargeable sur http://hdl.handle.net/2268/154830
Wildcard and Url Coupons: Magento Extension by Amasty. User GuideAmasty
Enables you to create unlimited number of coupon codes for each shopping cart rule and give coupons by links.
With 'Wildcard and URL Coupons' you can very quickly create a lot of coupons, as it enables you to have virtually unlimited number of coupon codes for each shopping cart rule. The multiple coupon codes can be either imported or created by using '*' symbols within coupon code. This is especially effective, if you take advantage of promotional or deal hosting services like Groupon.
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docxcelenarouzie
Walmart Sales Prediction Using Rapidminer
Prepared by : Nagarjun Singharavelu
I. Introduction:
Wal-Mart Stores, Inc is an American Multinational retail corporation that
operates a chain of discount department stores and Warehouse Stores. Headquartered in
Bentonville, Arkansas, United States, the company was founded by Sam Walton in 1962 and
incorporated on October 31, 1969. It has over 11,000 stores in 27 countries, under a total 71
banners. Walmart is the world's largest company by revenue, according to the Fortune Global
500 list in 2014, as well as the biggest private employer in the world with 2.2 million employees.
Walmart is a family-owned business, as the company is controlled by the Walton family. Sam
Walton's heirs own over 50 percent of Walmart through their holding company, Walton
Enterprises, and through their individual holdings. The company was listed on the New York
Stock Exchange in 1972. In the late 1980s and early 1990s, the company rose from a regional to
a national giant. By 1988, Walmart was the most profitable retailer in the U.S. Walmart helps
individuals round the world economize and live better.
The main aim of our project is to identify the impact on sales throughout
numerous strategic selections taken by the corporate. The analysis is performed on historical
sales data across 45 Walmart stores located in different regions. The foremost necessary is
Walmart runs many promotional markdown events throughout the year and we have to check
the impact it creates on sales during that particular period. The markdowns precede prominent
holidays, the four largest of which are the Labor Day, Thanksgiving and Christmas. During these
weeks it is noted that there is a tremendous amount of change in the day-to-day sales. Hence
we tend to apply different algorithms which we learnt in class over this dataset to identify the
effect of markdowns on these holiday weeks.
II. Information about dataset:
We had taken four different datasets of Walmart from Kaggle.com
containing the information about the stores, departments, average temperature in that
particular region, CPI, day of the week, sales and mainly indicating if that week was a
holiday. Let us explain each dataset in detail.
Stores:
The no. of attributes in this dataset is 3.
They are store number, type of store and the size of store.
Output attribute is the size of store.
There are 45 stores whose information is collected.
Stores are categorized into three such as A, B and C, which we assume it to be
superstores containing different types of products.
The store size would be calculated by the no. of products available in the particular
store ranging from 34,000 to 210,000.
Train:
This is the historical training data, which covers to 2010-02-05 to 2012-11-01.
It consists of the store and department number.
Date of the week.
Weekl.
SMX East 2016 - Optimize Google Shopping with the help of EuclidWhoop!
Optimize Google Shopping with help of Euclid
What does a Greek philosopher have to do with PPC and Google Shopping?
Well, his theories are a great approach for optimizing your Google Shopping campaign.
We open the Open the black box Product Listing Ads
1. Black-Box-Bidding
2. First Dimension (Product Targets)
3. Second Dimension (Devices)
4.Third Dimension (Query Types)
5. Moderate the Google Shopping autopilot
Introduction:
The catalogue presents fifteen hot niche products to sell on ebay Germany which is particularly researched for oemer55 of fiverr.com These are fifteen hot niche products which have medium competition and huge demand in the ebay market and there is lot of room for new seller to sell these kinds of products on ebay. The products are researched from multiple categories. And can be sourced from Alibaba.com, the product source web links for sourcing purpose is listed under the product image for each product.
As research shows that some of Alibaba.com products are same as on ebay and some are very similar and have slight differences in terms of color, size, and so on. And most of the products at Alibaba.com are generic or unbranded, and in terms of design and style it is similar to high ranking hot products listed on ebay.
Return on investment (ROI):
In regards to ROI you need to keep in mind the cost of platform such as ebay monthly charges, time of stock at your warehouse, shipping stamps cost, stationary cost which includes cardboard boxes, bubble envelopes, tapes and so on, space rent, electricity and other business usage equipment’s such internet, PC and so on, or you may use third-party software applications monthly or yearly cost. After clear calculation of above business expenses, actual or buying cost of goods at your door step I mean including custom and duty cost. And subtract all expenses from the selling cost and you will get an idea how much you will make per item. Now you need to decide a price plan which cover all your cost and keep a clear margin where you can make some profit.
Competitive Edge Pricing:
On ebay to keep the prices on competitive edge does not mean that your price would be the lowest but you need have an average price as compare to the competitor. Keep the price a bit low in the start as the sale start increase it chunks by chunks like increase it by 50 cents and so on. Conversion on ebay is not just about low price on ebay but SEO Optimized title, bullet points, images, and description also play in important role and which bring more traffic to your listing and ultimately you get more conversion.
In this Expert Session, Kirk Williams gives us a great introduction to Shopping Ads for Bing and Google. Shoppings Ads are becoming more and more important to eCommerce and Kirk shows us why: they can be some of the most effective ads out there when executed correctly.
Kirk shows us tips, tricks and strategies to optimize these ads for a hefty return on your ad budget. You'll learn how to find the best keywords to use and you'll see the technology used by everyone from the smallest online stores to the biggest e-commerce giants like Wal-Mart and Amazon. Kirk will show you where to focus your energy to get the biggest return on your biggest investment: your time.
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.
In this project, Spark is used in the whole pipeline: retrieving and processing the search phrases and their results, making click models, creating feature sets, training and evaluating ranking models, pushing the models to production using ElasticSearch and creating Tableau dashboarding. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
Primo @ ULg : formation à destination du personnel des Bibliothèques de l'Uni...François Renaville
Notes de formation relatives au passage à la solution discovery Primo 4.1 (février 2013) et à l'upgrade vers la version 4.4.1 (automne 2013) dans les Bibliothèques de l'Université de Liège.
Présentation téléchargeable sur http://hdl.handle.net/2268/154830
Wildcard and Url Coupons: Magento Extension by Amasty. User GuideAmasty
Enables you to create unlimited number of coupon codes for each shopping cart rule and give coupons by links.
With 'Wildcard and URL Coupons' you can very quickly create a lot of coupons, as it enables you to have virtually unlimited number of coupon codes for each shopping cart rule. The multiple coupon codes can be either imported or created by using '*' symbols within coupon code. This is especially effective, if you take advantage of promotional or deal hosting services like Groupon.
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docxcelenarouzie
Walmart Sales Prediction Using Rapidminer
Prepared by : Nagarjun Singharavelu
I. Introduction:
Wal-Mart Stores, Inc is an American Multinational retail corporation that
operates a chain of discount department stores and Warehouse Stores. Headquartered in
Bentonville, Arkansas, United States, the company was founded by Sam Walton in 1962 and
incorporated on October 31, 1969. It has over 11,000 stores in 27 countries, under a total 71
banners. Walmart is the world's largest company by revenue, according to the Fortune Global
500 list in 2014, as well as the biggest private employer in the world with 2.2 million employees.
Walmart is a family-owned business, as the company is controlled by the Walton family. Sam
Walton's heirs own over 50 percent of Walmart through their holding company, Walton
Enterprises, and through their individual holdings. The company was listed on the New York
Stock Exchange in 1972. In the late 1980s and early 1990s, the company rose from a regional to
a national giant. By 1988, Walmart was the most profitable retailer in the U.S. Walmart helps
individuals round the world economize and live better.
The main aim of our project is to identify the impact on sales throughout
numerous strategic selections taken by the corporate. The analysis is performed on historical
sales data across 45 Walmart stores located in different regions. The foremost necessary is
Walmart runs many promotional markdown events throughout the year and we have to check
the impact it creates on sales during that particular period. The markdowns precede prominent
holidays, the four largest of which are the Labor Day, Thanksgiving and Christmas. During these
weeks it is noted that there is a tremendous amount of change in the day-to-day sales. Hence
we tend to apply different algorithms which we learnt in class over this dataset to identify the
effect of markdowns on these holiday weeks.
II. Information about dataset:
We had taken four different datasets of Walmart from Kaggle.com
containing the information about the stores, departments, average temperature in that
particular region, CPI, day of the week, sales and mainly indicating if that week was a
holiday. Let us explain each dataset in detail.
Stores:
The no. of attributes in this dataset is 3.
They are store number, type of store and the size of store.
Output attribute is the size of store.
There are 45 stores whose information is collected.
Stores are categorized into three such as A, B and C, which we assume it to be
superstores containing different types of products.
The store size would be calculated by the no. of products available in the particular
store ranging from 34,000 to 210,000.
Train:
This is the historical training data, which covers to 2010-02-05 to 2012-11-01.
It consists of the store and department number.
Date of the week.
Weekl.
SMX East 2016 - Optimize Google Shopping with the help of EuclidWhoop!
Optimize Google Shopping with help of Euclid
What does a Greek philosopher have to do with PPC and Google Shopping?
Well, his theories are a great approach for optimizing your Google Shopping campaign.
We open the Open the black box Product Listing Ads
1. Black-Box-Bidding
2. First Dimension (Product Targets)
3. Second Dimension (Devices)
4.Third Dimension (Query Types)
5. Moderate the Google Shopping autopilot
Introduction:
The catalogue presents fifteen hot niche products to sell on ebay Germany which is particularly researched for oemer55 of fiverr.com These are fifteen hot niche products which have medium competition and huge demand in the ebay market and there is lot of room for new seller to sell these kinds of products on ebay. The products are researched from multiple categories. And can be sourced from Alibaba.com, the product source web links for sourcing purpose is listed under the product image for each product.
As research shows that some of Alibaba.com products are same as on ebay and some are very similar and have slight differences in terms of color, size, and so on. And most of the products at Alibaba.com are generic or unbranded, and in terms of design and style it is similar to high ranking hot products listed on ebay.
Return on investment (ROI):
In regards to ROI you need to keep in mind the cost of platform such as ebay monthly charges, time of stock at your warehouse, shipping stamps cost, stationary cost which includes cardboard boxes, bubble envelopes, tapes and so on, space rent, electricity and other business usage equipment’s such internet, PC and so on, or you may use third-party software applications monthly or yearly cost. After clear calculation of above business expenses, actual or buying cost of goods at your door step I mean including custom and duty cost. And subtract all expenses from the selling cost and you will get an idea how much you will make per item. Now you need to decide a price plan which cover all your cost and keep a clear margin where you can make some profit.
Competitive Edge Pricing:
On ebay to keep the prices on competitive edge does not mean that your price would be the lowest but you need have an average price as compare to the competitor. Keep the price a bit low in the start as the sale start increase it chunks by chunks like increase it by 50 cents and so on. Conversion on ebay is not just about low price on ebay but SEO Optimized title, bullet points, images, and description also play in important role and which bring more traffic to your listing and ultimately you get more conversion.
In this Expert Session, Kirk Williams gives us a great introduction to Shopping Ads for Bing and Google. Shoppings Ads are becoming more and more important to eCommerce and Kirk shows us why: they can be some of the most effective ads out there when executed correctly.
Kirk shows us tips, tricks and strategies to optimize these ads for a hefty return on your ad budget. You'll learn how to find the best keywords to use and you'll see the technology used by everyone from the smallest online stores to the biggest e-commerce giants like Wal-Mart and Amazon. Kirk will show you where to focus your energy to get the biggest return on your biggest investment: your time.
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.
In this project, Spark is used in the whole pipeline: retrieving and processing the search phrases and their results, making click models, creating feature sets, training and evaluating ranking models, pushing the models to production using ElasticSearch and creating Tableau dashboarding. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
Slides from my afternoon session on Shopify Theme Building covering everything from Shopify's key concepts right through to how these relate to Liquid templates.
Hosted by Second Wednesday - Nottingham 10th June 2015
The process by which many sales leads are narrowed down to a smaller number of actual sales is often referred to as a sales funnel. The idea behind the funnel image is that many sales leads enter into the wider top of the funnel, but as a result of exclusions, specific targeting criteria, and customer choices, only some of them will actually emerge from the narrow end of the funnel and result in sales.
This presentation focuses on a specific application of SAS, namely how to structure a SAS data set to facilitate sales funnel analysis, and how to analyze this data to find expansion opportunities in any industry to which the sales funnel concept is applicable. With a properly structured data set, the SAS code needed for sales funnel analysis is straightforward, and can generate significant return on investment.
The waterfall is a key concept in sales funnel analysis, and it will be covered, but this paper will focus on increasing the number of waterfall survivors rather than the technical aspects of creating a waterfall chart in SAS.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Securing your Kubernetes cluster_ a step-by-step guide to success !
Aka examples
1. AKA
identifies unique coupons given different names in
the SnipSnap coupon database using a combination
of k-means clustering and "smoking gun" feature
based rule inference.
Github: https://github.com/snipsnap/aka-service/
Email: luke.otterblad@gmail.com
2. Step 1: Matches – same value,
description text and activity dates
3. Matches – pairs are shown ,but many more
than 2 items are matched into groups
4. More Examples…Different Barcodes –
Same Coupon
The above two were matched into a group. The
coupon below was also in the same set of American
Eagle but NOT put into the same group even though it
has some similarity….
5. How does it work?
• https://github.com/snipsnap/aka-service
• run via the command line
• $ python aka.py -db_pswd your_password store McDonald’s
id
face_value
offer_details
start_date
expiriation_date
988767 Free
With the purchase of an Egg McMuffin
2013-09-03
2013-10-31
989829 FREE Egg McMuffin
with the purchase of an Egg McMuffin
2013-09-03
2013-10-31
997447 Free Egg McMuffin
with the purchase of an Egg McMuffin
2013-09-03
2013-10-31
6. Active Coupons for a Store as a Graph
•
When the aka-service is started, for a particular store each active coupon is converted to dictionary
format and face value and details based features are converted to the python version of a graph and
normalized with some language processing.
•
Item - > Features
{"CouponA": {[‘free’, ‘with’, ‘the’, ‘purchase’, ‘of’, ‘an’, ‘egg’, ‘mcmuffin’]
{"CouponB": {[‘free’, ‘with’, ‘the’, ‘purchase’, ‘of’, ‘an’, ‘egg’, ‘mcmuffin’]
•
Features -> Item
{“egg":["CouponA","CouponB"],
“mcmuffin": ["CouponA","CouponB"],
“free": ["CouponA","CouponB"],
“with":["CouponA","CouponB"],
“the": ["CouponA","CouponB"]}
"purchase": ["CouponA","CouponB"]
“of": ["CouponA","CouponB"]}
“an": ["CouponA","CouponB"]}
7. Despite different text AKA identifies all
of these as the same item
id
face_value
988767 Free
989829 FREE Egg McMuffin
997447 Free Egg McMuffin
offer_details
With the purchase of an Egg McMuffin
with the purchase of an Egg McMuffin
with the purchase of an Egg McMuffin
start_date
2013-09-03
2013-09-03
2013-09-03
expiriation_date
aka_guid
2013-10-31
de5086f035bc-11e38da3005056c0000
8
2013-10-31
de5086f035bc-11e38da3005056c0000
8
2013-10-31
de5086f035bc-11e38da3005056c0000
8
8. Free is treated as a value keyword
(along with % and $ descriptions)
9. But, words and value
alone don’t create the match. Expiry
date also matters
10. Coupon with No Barcode connected to
the same offer with a barcode
Same offer value (free mini candle) and same data range (September 9October 6, 2013)
13. Smoking Gun Features
• A Smoking gun feature for a coupon is a piece of
information that identifies it as being the same real
world item as another coupon (with near certainty).
• There are two sources of such identification in the
database. The first is a barcode_id. Multiple coupons
that have the same barcode_id are indeed the same
physical coupon. The second is a promo_code.
• Two coupons that have the same promo_code are the
same coupon 95% + of the time. (Some stores like
Dunkin Donuts don’t use unique codes…but more on
that later)
14. More Matches
Above two coupons are matched, and are also
NOT matched with the below coupon despite
having an extremely similar description and
validity:
The code in the upper right hand corner (9152 versus 9992 –the smoking gun)
helps significantly in separating them into a different identification.
15. Two coupons Not matched, even
though they have the same description
and similar text
(they are valid at different
times)
16. Finding smoother images
I experimented with using the number of recorded features as an indicator of
picture quality – but that didn’t have much correlation. What did work was
using the picture with the highest number of redemptions within an aka group
18. The Dollar Store $1 Off coupon
problem – likely to be many of those
These four were originally matched. But I had to introduce the notion of a confidence
percentage.
This is largely because AKA weights the value of an item more heavily than the
details words describing the offer (for most stores they have few items that are the
same price)
20. Trouble Spots: AKA identifies same offer due to
assumed smoking gun, but while there is the
same barcode there is a different expiry.
Ignoring PLU for Dunkin Donuts (and other publishers that duplicate promocodes)
and going with 99% confidence does the trick.
21. There’s Exceptions to every rule
• Coupons are no different
• In the settings.yaml (pictured above) you can define
exceptions to global rules.
• What pop_smoking_gun tells aka is that for Dunkin’
Donuts the global rules of promo_code and barcode_id
does not apply– for Dunkin Donuts’ they don’t create PLU
codes as unique to an offer.
22. Another example
Ignoring PLU for Dunkin Donuts (and other publishers that duplicate promocodes)
and going with 99% confidence does the trick.
23. But knowing the store “rules” also helps
correct errors (if they stick to unique codes)
Mechanical Turk expiry: 10/17/2012
Mechanical Turk expiry: 10/7/2012
http://c346897.r97.cf1.rackcdn.c
http://c346897.r97.cf1.rackcdn.com/d32b578eom/cd0faf92-f85e-11e2-9f66fd2a-11e2-9be6-40406c9e1e47.jpg
40406c9e1e47.jpg
Since Bed Bath & Beyond id’s and promocodes
indicate the same item aka can reconcile the mistake
24. AKA- never misinterpret a store's
coupon rules again
ids
sharable
descrption_text
Aka_guid
987120
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
987271
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
988484
1
save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cdf139439
005056c00008
989519
1
save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cd9522
005056c00008
989774
1 save 20% on your entire purchase bath body works
990040
0
990943
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
992970
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
992998
0 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
994314
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
75926f4f-328f-11e3-a3cd005056c00008
save 20% on your entire purchase bath body works 75926f4f-328f-11e3-a3cdf139492
005056c00008
10 coupons all identified as the same item with some marked sharable and some not.
Suppose a publisher had submitted coupon 990040 to not be shareable……
25. AKA- never misinterpret a store's coupon
rules again
sharabl
descrption_text
e
ids
Aka_guid
aka_sharable
987120
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
987271
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
988484
1
save 20% on your entire purchase bath body works
f139439
75926f4f-328f-11e3-a3cd005056c00008
0
989519
1
save 20% on your entire purchase bath body works
9522
75926f4f-328f-11e3-a3cd005056c00008
0
989774
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
990040
0
save 20% on your entire purchase bath body works
f139492
75926f4f-328f-11e3-a3cd005056c00008
0
990943
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
992970
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
992998
0 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
994314
1 save 20% on your entire purchase bath body works
75926f4f-328f-11e3-a3cd005056c00008
0
An easy feature could be to treat a single not sharable within an aka group as a
“presidential” vote and switch all to not sharable. This can also work for items
tagged as manufacturer coupons. You’d basically only need 1 tag from Mechanical
Turk (or a from classifier).
27. Kroger’s matches
Kroger’s requires the highest confidence of any store, as many of their coupons
are different only by a single word. These will match (incorrectly) without a
high confidence set. Listed below is a sample false match made by AKA:
28. Same item in the database twice for
Macy’s
http://c346897.r97.cf1.rackcdn.com/59667340-1588-11e3-a8e340406c9e1e47-thumb.jpg
http://c346897.r97.cf1.rackcdn.com/ac4dc266-1588-11e3-a7d040406c9e1e47-thumb.jpg
36. Occasional data entry errors can lead
to bad reconciliation
aka_guid
id
barcode_id
alt_barcode face_
_id
value
offer_details
$5.00
Off
Save $5.00 On Your Purchase
0
$25.0 Of $25.00 Or More
0
2719bf74-40b611e3-86dd22000a91806d
421909
138859
2719bf74-40b611e3-86dd22000a91806d
539197
46299
0
Save
On Any Aveeno Product
$1.00
2719bf74-40b611e3-86dd22000a91806d
560927
138859
0
Save
On any
$1.00
2719bf74-40b611e3-86dd22000a91806d
595323
138859
0
20%
Off
1 Regular Priced Item
Here the 99% reliable barcode_id is idenified with 3 different items (for Toys R Us)
37. These three items were matched via barcode which I can only assume is some
type of data entry error. The difference is that for every other toys”r”us coupon
the smoking gun rules are valid. These items barcodes are recorded incorrectly
39. Background for entity resolution (aka
collective reconciliation, de-duping)
• Chapter 20 of Beautiful Data “Connecting Data” by Toby
Segaran (who I think likely wrote the chapter while
working on the YouTube reconciliation).
• Indrajit Bhattacharya’s PhD dissertation, which you can find
at: http://www.lib.umd.edu/drum/handle/1903/4241
• About me: Father of 2 lovely daughters with my wife
Emma. Programmer, Statistician, Pot Limit Omaha and
Mixed Game poker semi-professional (though I don’t get
much time for poker nowadays). I'm located in historic
Northfield, MN where I share an office with my Jack Russell
Terrier, Kirby.
• Email: luke.otterblad@gmail.com.