MongoDB @ fliptop
Upcoming SlideShare
Loading in...5
×
 

MongoDB @ fliptop

on

  • 3,672 views

tech talk about how fliptop leverage mongodb in its infrastructure for better scalability @ twjug

tech talk about how fliptop leverage mongodb in its infrastructure for better scalability @ twjug

Statistics

Views

Total Views
3,672
Views on SlideShare
3,636
Embed Views
36

Actions

Likes
9
Downloads
59
Comments
0

5 Embeds 36

http://www.linkedin.com 20
https://www.linkedin.com 8
http://us-w1.rockmelt.com 6
http://a0.twimg.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava

MongoDB @ fliptop MongoDB @ fliptop Presentation Transcript

  •   MongoDB @ Fliptop 2011/12/10
  • Agenda
      • Fliptop
        • infrastructure
      • MongoDB
        • architecture
        • sharding strategy
        • data schema
        • index and query
        • miscellaneous
  • What is Fliptop?
      • Social profiles lookup
        • facebook, twitter, linkedin
        • campaign analysis
        • api lookup
      •   Our problems
        • scalability
          • Data
            • ~ 7 billion data
          • Infrastructure
            • ~ 1MM lookup/day
  • Fliptop Infrastructure
      • Infrastructure
        • Amazon EC2
    •  
      • NoSQL Database
        • MongoDB
      • Indexing and full-text search
        • Apache SOLR
      • Distributed computing
        • AWS Elastic MapReduce (Hadoop)
  • Fliptop DataBases 
      • Fliptop Data
        • ~50MM records
      • w/t MongoDB
        • MySQL
          • AWS RDS x1
        • Solr
          • AWS EC2 m1.large x 10
      • w MongoDB
        • MySQL
          • AWS RDS x1
        • Solr
          • AWS EC2 m1.large x 2 (master/slave)
        • MongoDB
          • AWS EC2 m2.large x 10 (replication set)
  • From Solr to MongoDB
      • Our Storage Requirement
        • auto sharding
        • richness of queries
        • short insert latency
      • Other Reasons
        • documentation
        • active community
        • word of mouth
      •   Migration Efforts
        • queries
        • db driver
        • performance tuning
  • MongoDB Features
      • Auto-Sharding
        • scale out to 1000 nodes 
      • Replication & High Availability
        • master/slave and replication set 
      • Querying
        • most SQL syntax
      • Document-oriented storage
        • json, schema-free
      • Full Index Support
        • inde any field
      • Map/Reduce
        • javascript at server side
  • MongoDB Servers  
  • MongoDB Shardings
      • Automatic balancing for changes in load and data distribution
      • Easy addition of new machines
      • Scaling out to one thousand nodes
      • No single points of failure
      • Automatic failover
  • MongoDB Replication
      • master/slave
        • easy setup
        • manually fail-over 
      • replication set
        • bit complex setup
        • automatic fail-over
        • minimun nodes: 3 (1 abriter)
        • maximun nodes :12 
  • MongoDB Failover
      • Voting algorithm (replication set)
        •   floor(all nodes/current nodes)+1
      • Priority
        • if 0, never becomes primary
          • backup with small machine
  • Fliptop MongoDB Infrastructures
    •  
      • Data
        • 10MM/replication set
      • MongoDB servers
        • router x 1
        • config server x1
        • shards servers x 10
          • 5 primary
          • 5 secondary
        • abriter servers x 5
      • AWS EC2 Instances
        • m2.large x 10
  • MongoDB and AWS EC2
      • Instances type
        • m2.xlarge
          •   17.1 GB of memory
          •   6.5 EC2 Compute Units
      •   Storage
        •   Local Drive
          • faster i/o
          • not portable
        • EBS
          • i/o = network + disk i/o
          • portable
          • easy backup
          • raid 1/0 
  • MongoDB Sharding Strategy
      • Sharding Key Strategy
        • Ascending shard key
          • data locality
          • hotspot for read/write
          • ex. timestamp, auto-incement PK
        • Random sharding key 
          • evenly distribute read/write
          • no data locality
          • ex. UUID, md5
        • Hybrid sharding key
          •   ascending 
          •   evenly distribute
          • ex. timestamp + uuid
  • From timestamp to uuid
      • Why timestamp?
        • same sharding key with our solr
        • issues
          • slowness of count (traverse) query
          • maintenance headache
            • add node more frequently
          • duplication of uuids
      • From timestamp to uuid
        •   performance gain with cout
          • 2x faster
            • ex. count 1MM, from 10s ~ 5s.
        • less maintenance
          • enable multiple nodes at the same time
        • dedup
          • uniqueness of uuid is guarantee local only
  • MongoDB Balancer
      • if number of chunks are not evenly distributed, balancer can fix it
        • stop criteria
          • until diff between each nodes is <=2
        • balancer window
          • active time window
        • blocking if moving massive data
          • while add brand new node
  • MongoDB Schema
      • Document oriented
        • json
      • Schema Free
        • pros
          • no predefined schema is required
          • save 'as is'
        • cons
          • overhead of headers
          • low sensitivity of broken data
  • MongoDB Schema and Size
      • Size matters
        • simple schema is better
          • payment:[{&quot;publisher_id&quot;: 176, &quot;paid&quot;:true}]
          • payment:[176_1]
        • abbreviation of headers
          • payment:[176_1]
          • pm:[176_1]
  • MongoDB Queries 1) COLUMN = VALUE 2) COLUMN in RANGE 3) boolean operators AND, OR, NOT 4) pagination (start, rows) 5) sort 6) count (of query result) 7) COLUMN is non-existent 8) multiValued fields 9) dynamic fields 10) dynamic multiValued fields 11) stats queries (min, max) 12) faceted queries (aggregation of specific fields) 13) free text search (regular expression)
  • MongoDB Index
      • Tree structure Index
      • At most 64 indexes per collection(table)
      • A query only leverages 1 index unless using $or query
      • Index entails addition work on insert, delete, update 
  • MongoDB Index Types
      • Basic Index
        •   db.persons.ensureIndex({name:1});
      • Embedded Index
        •   db.pesons.ensureIndex({location.city:1})
      • Compound Index
        •   db.persins.ensureIndex({name:1, location.city:1})
      • Sparse Index
        •   db.persons.ensureIndex({job:1}, {sparse: true})
  • MongoDB Index Limits
      • negations operation
        •   $ne, $not
        •   ex. db.things.find( { x : { $ne : 3 } } );
      • arithmetic operations 
        • $mod
        • ex. db.things.find( &quot;this.a % 10 == 1&quot;)
      • most regular expression
        • yes
          • db.persons.find({/^robbie/})
          • db.persons.find({/^robbie.*/})
          • db.persons.find({/^robbie.*/i})
        • no
          • db.persons.find({/robbie}})
      • $where
  • MongoDB Index Optimization
      • simple data type
        • ex. int is faster than string
      • simple data schema
        • ex. {payment: &quot;176_1&quot;}
      • sparse index
        • if optional fields
  • MongoDB Miscellaneous
      • Monitoring
        • CPU
          • if high which implies index is broken
        • Driver Size
          • time to add new instance
      • Backup
        • EBS: snapshot
        • mongo import/export tool
          • mongodump/mongoimport
      • Auto Deployment
        • Hudson + fabric (python)
  • What's Next?
      • Further Data and Index weight lose
        • target: 20MM/instance
      • introduce Java POJO/DAO
        • Morphia
        • Spring mongodb
      • Watchdog mechanism
        • restart server automatically
  • Q & A Robbie Cheng Lead Software Engineer [email_address]
  • We're Hiring
      • please mail to jobs@fliptop.com
  • Thank you!