Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FARROT - Filter Amazon Review Ratings Over Time

412 views

Published on

FARR - Filter Amazon Review Ratings Over Time

Published in: Software
  • Be the first to comment

  • Be the first to like this

FARROT - Filter Amazon Review Ratings Over Time

  1. 1. FARROT: Filter Amazon Review Ratings Over Time Andy Lai
  2. 2. Problem Amazon doesn't allow filtering review ratings and totals by state or time http://youtu.be/w78X0IpjI5c
  3. 3. UI DEMO http://youtu.be/w78X0IpjI5c
  4. 4. Data set Stanford SNAP Amazon reviews 35GB 35M reviews University of Illinois Amazon member info 142MB Member location information joeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXW OH
  5. 5. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  6. 6. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  7. 7. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
  8. 8. Pipeline PIG to CLEAN, JOIN and AGGREGATE rating reviews and totals ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
  9. 9. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  10. 10. HBase Schema Table Schemas: PRODUCTID_STATE, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYYEAR_EPOCH, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYMONTH_EPOCH, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYDAY_EPOCH, TOTAL REVIEWS, AVG RATING • Example: B00003CWT6_CA_BYMONTH_1008115200000
  11. 11. Retrospective Design Considerations • HBase was used for optimizations for reads, range scans, and scalability • Data was bucketed by state and different time intervals for query performance by avoiding the cost of recalculating aggregates at the expense of storage • Java MR was used to convert multi-row reviews to tabular format Future • Scrape Amazon for new reviews • Filter and display reviews
  12. 12. About me – Andy Lai  UC Berkeley (B.S. Electrical Engineering & Computer Science)  SJSU (M.S. Engineering)  Software Engineer (DB2, Relational database)  Interests:

×