SlideShare a Scribd company logo
1 of 12
FARROT: 
Filter Amazon Review Ratings 
Over Time 
Andy Lai
Problem 
Amazon doesn't allow filtering review ratings 
and totals by state or time 
http://youtu.be/w78X0IpjI5c
UI DEMO 
http://youtu.be/w78X0IpjI5c
Data set 
Stanford SNAP Amazon reviews 
35GB 
35M reviews 
University of Illinois Amazon member info 
142MB 
Member location information 
joeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXW 
OH
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
TSV HappyBase
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
TSV HappyBase
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase 
N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I 
have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
Pipeline 
PIG to CLEAN, 
JOIN and 
AGGREGATE 
rating reviews and 
totals 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase 
N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I 
have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
TSV HappyBase
HBase Schema 
Table Schemas: 
PRODUCTID_STATE, 
TOTAL REVIEWS, AVG RATING 
PRODUCTID_STATE_BYYEAR_EPOCH, 
TOTAL REVIEWS, AVG RATING 
PRODUCTID_STATE_BYMONTH_EPOCH, 
TOTAL REVIEWS, AVG RATING 
PRODUCTID_STATE_BYDAY_EPOCH, 
TOTAL REVIEWS, AVG RATING 
• Example: 
B00003CWT6_CA_BYMONTH_1008115200000
Retrospective 
Design Considerations 
• HBase was used for optimizations for reads, range 
scans, and scalability 
• Data was bucketed by state and different time 
intervals for query performance by avoiding the cost 
of recalculating aggregates at the expense of storage 
• Java MR was used to convert multi-row reviews to 
tabular format 
Future 
• Scrape Amazon for new reviews 
• Filter and display reviews
About me – Andy Lai 
 UC Berkeley (B.S. Electrical Engineering & 
Computer Science) 
 SJSU (M.S. Engineering) 
 Software Engineer (DB2, Relational 
database) 
 Interests:

More Related Content

Viewers also liked

Tieng Anh Grade 10 Unit 5 technology and you
Tieng Anh Grade 10 Unit 5 technology and youTieng Anh Grade 10 Unit 5 technology and you
Tieng Anh Grade 10 Unit 5 technology and youRizza Mae Go
 
1_Primary Care Spend_Final
1_Primary Care Spend_Final1_Primary Care Spend_Final
1_Primary Care Spend_FinalKim Paull
 
Tieng Anh Unit 6 speak and listen
Tieng Anh  Unit 6 speak and listenTieng Anh  Unit 6 speak and listen
Tieng Anh Unit 6 speak and listenRizza Mae Go
 
Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...
Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...
Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...Nordic Morning
 
Tieng Anh Unit four (listening)
Tieng Anh Unit four (listening)Tieng Anh Unit four (listening)
Tieng Anh Unit four (listening)Rizza Mae Go
 
Verkkosivuilla mobiilisti
Verkkosivuilla mobiilistiVerkkosivuilla mobiilisti
Verkkosivuilla mobiilistiNordic Morning
 
Pelillistämisestä virtaa viestintään
Pelillistämisestä virtaa viestintäänPelillistämisestä virtaa viestintään
Pelillistämisestä virtaa viestintäänNordic Morning
 

Viewers also liked (9)

Tieng Anh Grade 10 Unit 5 technology and you
Tieng Anh Grade 10 Unit 5 technology and youTieng Anh Grade 10 Unit 5 technology and you
Tieng Anh Grade 10 Unit 5 technology and you
 
1_Primary Care Spend_Final
1_Primary Care Spend_Final1_Primary Care Spend_Final
1_Primary Care Spend_Final
 
Tieng Anh Unit 6 speak and listen
Tieng Anh  Unit 6 speak and listenTieng Anh  Unit 6 speak and listen
Tieng Anh Unit 6 speak and listen
 
Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...
Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...
Interface is everything - Tuottoisin paikka arvoketjussa on yhä useammin käyt...
 
Gbi mind
Gbi mindGbi mind
Gbi mind
 
Tieng Anh Unit four (listening)
Tieng Anh Unit four (listening)Tieng Anh Unit four (listening)
Tieng Anh Unit four (listening)
 
Matthewrogers
MatthewrogersMatthewrogers
Matthewrogers
 
Verkkosivuilla mobiilisti
Verkkosivuilla mobiilistiVerkkosivuilla mobiilisti
Verkkosivuilla mobiilisti
 
Pelillistämisestä virtaa viestintään
Pelillistämisestä virtaa viestintäänPelillistämisestä virtaa viestintään
Pelillistämisestä virtaa viestintään
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Filter Amazon Reviews by Location and Time

  • 1. FARROT: Filter Amazon Review Ratings Over Time Andy Lai
  • 2. Problem Amazon doesn't allow filtering review ratings and totals by state or time http://youtu.be/w78X0IpjI5c
  • 4. Data set Stanford SNAP Amazon reviews 35GB 35M reviews University of Illinois Amazon member info 142MB Member location information joeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXW OH
  • 5. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  • 6. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  • 7. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
  • 8. Pipeline PIG to CLEAN, JOIN and AGGREGATE rating reviews and totals ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
  • 9. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  • 10. HBase Schema Table Schemas: PRODUCTID_STATE, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYYEAR_EPOCH, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYMONTH_EPOCH, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYDAY_EPOCH, TOTAL REVIEWS, AVG RATING • Example: B00003CWT6_CA_BYMONTH_1008115200000
  • 11. Retrospective Design Considerations • HBase was used for optimizations for reads, range scans, and scalability • Data was bucketed by state and different time intervals for query performance by avoiding the cost of recalculating aggregates at the expense of storage • Java MR was used to convert multi-row reviews to tabular format Future • Scrape Amazon for new reviews • Filter and display reviews
  • 12. About me – Andy Lai  UC Berkeley (B.S. Electrical Engineering & Computer Science)  SJSU (M.S. Engineering)  Software Engineer (DB2, Relational database)  Interests: