The Merchant Data team at Groupon uses MongoDB heavily in its mission to create the most comprehensive database of places and merchants in the world. We think of ourselves as a key part of Groupon's platform because our data is necessary for salesforce CRM and public merchant pages, among other things, and directly impacts the business. Our team uses a real-time data processing pipeline that includes automated matching and categorization in a Storm cluster and crowd sourcing for manual cleaning. MongoDB is a key piece of our infrastructure because we use a cluster to generate a set of candidates for matching new data coming into our system. In this talk we will discuss: A high level view of the Merchant Data processing pipeline, including the data collection, realtime processing, and serving layers, an overview of our data model, how we mapped our data model into infrastructure, and the dynamics of parallel querying in our storm cluster.
5. Arnold: Declarative Crowd-Machine
Data Integration
Shawn Jeffery, Liwen Sun, Matt DeLand,
Nick Pendar, Rick Barber, Andrew Galdi
CIDR 2013
cidrdb.org/cidr2013/Papers/CIDR13_Paper22.pdf