• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Austin bdug 2011_01_27_small_and_big_data
 

Austin bdug 2011_01_27_small_and_big_data

on

  • 4,136 views

Short overview of data infrastructure at Bazaarvoice. We use a combination of many different data stores such as MySQL, SOLR, Infobright, MongoDB and Hadoop.

Short overview of data infrastructure at Bazaarvoice. We use a combination of many different data stores such as MySQL, SOLR, Infobright, MongoDB and Hadoop.

Statistics

Views

Total Views
4,136
Views on SlideShare
711
Embed Views
3,425

Actions

Likes
1
Downloads
7
Comments
0

39 Embeds 3,425

http://austinhug.blogspot.com 3010
http://austinhug.blogspot.in 138
http://austinhug.blogspot.ru 50
http://austinhug.blogspot.co.uk 29
http://austinhug.blogspot.ca 22
http://www.twylah.com 22
http://austinhug.blogspot.com.au 15
http://austinhug.blogspot.de 15
http://austinhug.blogspot.fr 12
http://192.168.6.179 11
http://austinhug.blogspot.mx 10
http://austinhug.blogspot.com.br 9
http://austinhug.blogspot.com.es 9
http://austinhug.blogspot.nl 7
http://austinhug.blogspot.it 6
http://austinhug.blogspot.kr 5
http://austinhug.blogspot.se 4
http://austinhug.blogspot.tw 4
http://austinhug.blogspot.pt 4
http://austinhug.blogspot.co.il 4
http://webcache.googleusercontent.com 4
http://austinhug.blogspot.be 4
http://a0.twimg.com 3
http://austinhug.blogspot.cz 3
http://prlog.ru 3
http://austinhug.blogspot.jp 2
http://austinhug.blogspot.gr 2
http://austinhug.blogspot.co.nz 2
http://austinhug.blogspot.com.ar 2
http://austinhug.blogspot.ch 2
http://austinhug.blogspot.hk 2
http://www.austinhug.blogspot.com 2
http://austinhug.blogspot.sg 2
http://austinhug.blogspot.ae 1
http://austinhug.blogspot.co.at 1
http://austinhug.blogspot.fi 1
http://austinhug.blogspot.ie 1
http://s.medcl.net 1
http://austinhug.blogspot.dk 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Austin bdug 2011_01_27_small_and_big_data Austin bdug 2011_01_27_small_and_big_data Presentation Transcript

    • I CAN HAS BIG DATA?Small and Big Data at Bazaarvoice Alex Pinkin @apinkin
    • whois apinkin● Alex Pinkin Software Engineering Lead, Data Infrastructure team, Bazaarvoice● Loves both SQL and NoSQL. Cant commit to one! :-) @apinkin
    • Big Data?
    • A few facts about Bazaarvoice● Bazaarvoice is a SaaS company powering user generated content such as ratings and reviews on thousands of web sites● Over 75 Million reviews● 280 Billion impressions● 5 Billion Page Views per month
    • How Do We Do It?● Client-side integration● Code and Servers :)
    • What Do We Run in Prod? ● SQL ○ MySQL ○ Infobright ● NoSQL ○ SOLR ○ ElasticSearch ○ MongoDB ○ CouchDB ○ Hadoop
    • Four Pillars
    • MySQL and Big Data?!! ● Yes, MySQL is our Master. Mostly used as K/V store. ● Scaling Reads: Replication ● Scaling Writes: Sharding ● HA: Hot Back-up, Multiple DC ● Pros ○ Rock solid ○ SQL ● Cons ○ Inflexible schema ○ Replication lag ○ Sharding not built-in ○ HA
    • Search: SOLR/Lucene● Document Store● Inverted Index Term Document IDs rating:5 1,2 rating:4 3 productId: 12345 1,2,3
    • Analytics
    • Analytics - Infobright● Columnar storage ○ Compression (10x+) ○ Reduced disk I/O● Partitioning ○ Horizontal: Data Packs ○ Vertical: Columns● Knowledge grid ○ MIN(C), MAX(C), SUM(C), AVG(C), COUNT(DISTINCT(C))
    • Infobright - Pros and Cons● Pros ○ 30x faster than MySQL on analytics queries ○ Open Source● Cons ○ No DML in OSS version ○ No MPP (good for up to 5 TB)
    • Hadoop Use Case
    • Bazaarvoice EMR - Phase 1
    • Bazaarvoice EMR - Phase 2
    • Summary ● We use the best tool for the job ● NoSQL is maturing quickly. Query languages are still in flux though. ● Hadoop is here to stay ● We are (slowly) moving away from MySQL
    • @apinkin