Your SlideShare is downloading. ×
MongoDB @ fliptop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoDB @ fliptop

3,363

Published on

tech talk about how fliptop leverage mongodb in its infrastructure for better scalability @ twjug

tech talk about how fliptop leverage mongodb in its infrastructure for better scalability @ twjug

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,363
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
76
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • http://blog.jayway.com/2009/10/22/google-collections/ http://www.slideshare.net/gosain20/google-collections -api-an-introduction http://www.discursive.com/books/cjcook/reference/gua va -collect http://www.tfnico.com/presentations/google-guava
  • Transcript

    • 1.   MongoDB @ Fliptop 2011/12/10
    • 2. Agenda
        • Fliptop
          • infrastructure
        • MongoDB
          • architecture
          • sharding strategy
          • data schema
          • index and query
          • miscellaneous
    • 3. What is Fliptop?
        • Social profiles lookup
          • facebook, twitter, linkedin
          • campaign analysis
          • api lookup
        •   Our problems
          • scalability
            • Data
              • ~ 7 billion data
            • Infrastructure
              • ~ 1MM lookup/day
    • 4. Fliptop Infrastructure
        • Infrastructure
          • Amazon EC2
      •  
        • NoSQL Database
          • MongoDB
        • Indexing and full-text search
          • Apache SOLR
        • Distributed computing
          • AWS Elastic MapReduce (Hadoop)
    • 5. Fliptop DataBases 
        • Fliptop Data
          • ~50MM records
        • w/t MongoDB
          • MySQL
            • AWS RDS x1
          • Solr
            • AWS EC2 m1.large x 10
        • w MongoDB
          • MySQL
            • AWS RDS x1
          • Solr
            • AWS EC2 m1.large x 2 (master/slave)
          • MongoDB
            • AWS EC2 m2.large x 10 (replication set)
    • 6. From Solr to MongoDB
        • Our Storage Requirement
          • auto sharding
          • richness of queries
          • short insert latency
        • Other Reasons
          • documentation
          • active community
          • word of mouth
        •   Migration Efforts
          • queries
          • db driver
          • performance tuning
    • 7. MongoDB Features
        • Auto-Sharding
          • scale out to 1000 nodes 
        • Replication & High Availability
          • master/slave and replication set 
        • Querying
          • most SQL syntax
        • Document-oriented storage
          • json, schema-free
        • Full Index Support
          • inde any field
        • Map/Reduce
          • javascript at server side
    • 8. MongoDB Servers  
    • 9. MongoDB Shardings
        • Automatic balancing for changes in load and data distribution
        • Easy addition of new machines
        • Scaling out to one thousand nodes
        • No single points of failure
        • Automatic failover
    • 10. MongoDB Replication
        • master/slave
          • easy setup
          • manually fail-over 
        • replication set
          • bit complex setup
          • automatic fail-over
          • minimun nodes: 3 (1 abriter)
          • maximun nodes :12 
    • 11. MongoDB Failover
        • Voting algorithm (replication set)
          •   floor(all nodes/current nodes)+1
        • Priority
          • if 0, never becomes primary
            • backup with small machine
    • 12. Fliptop MongoDB Infrastructures
      •  
        • Data
          • 10MM/replication set
        • MongoDB servers
          • router x 1
          • config server x1
          • shards servers x 10
            • 5 primary
            • 5 secondary
          • abriter servers x 5
        • AWS EC2 Instances
          • m2.large x 10
    • 13. MongoDB and AWS EC2
        • Instances type
          • m2.xlarge
            •   17.1 GB of memory
            •   6.5 EC2 Compute Units
        •   Storage
          •   Local Drive
            • faster i/o
            • not portable
          • EBS
            • i/o = network + disk i/o
            • portable
            • easy backup
            • raid 1/0 
    • 14. MongoDB Sharding Strategy
        • Sharding Key Strategy
          • Ascending shard key
            • data locality
            • hotspot for read/write
            • ex. timestamp, auto-incement PK
          • Random sharding key 
            • evenly distribute read/write
            • no data locality
            • ex. UUID, md5
          • Hybrid sharding key
            •   ascending 
            •   evenly distribute
            • ex. timestamp + uuid
    • 15. From timestamp to uuid
        • Why timestamp?
          • same sharding key with our solr
          • issues
            • slowness of count (traverse) query
            • maintenance headache
              • add node more frequently
            • duplication of uuids
        • From timestamp to uuid
          •   performance gain with cout
            • 2x faster
              • ex. count 1MM, from 10s ~ 5s.
          • less maintenance
            • enable multiple nodes at the same time
          • dedup
            • uniqueness of uuid is guarantee local only
    • 16. MongoDB Balancer
        • if number of chunks are not evenly distributed, balancer can fix it
          • stop criteria
            • until diff between each nodes is <=2
          • balancer window
            • active time window
          • blocking if moving massive data
            • while add brand new node
    • 17. MongoDB Schema
        • Document oriented
          • json
        • Schema Free
          • pros
            • no predefined schema is required
            • save 'as is'
          • cons
            • overhead of headers
            • low sensitivity of broken data
    • 18. MongoDB Schema and Size
        • Size matters
          • simple schema is better
            • payment:[{&quot;publisher_id&quot;: 176, &quot;paid&quot;:true}]
            • payment:[176_1]
          • abbreviation of headers
            • payment:[176_1]
            • pm:[176_1]
    • 19. MongoDB Queries 1) COLUMN = VALUE 2) COLUMN in RANGE 3) boolean operators AND, OR, NOT 4) pagination (start, rows) 5) sort 6) count (of query result) 7) COLUMN is non-existent 8) multiValued fields 9) dynamic fields 10) dynamic multiValued fields 11) stats queries (min, max) 12) faceted queries (aggregation of specific fields) 13) free text search (regular expression)
    • 20. MongoDB Index
        • Tree structure Index
        • At most 64 indexes per collection(table)
        • A query only leverages 1 index unless using $or query
        • Index entails addition work on insert, delete, update 
    • 21. MongoDB Index Types
        • Basic Index
          •   db.persons.ensureIndex({name:1});
        • Embedded Index
          •   db.pesons.ensureIndex({location.city:1})
        • Compound Index
          •   db.persins.ensureIndex({name:1, location.city:1})
        • Sparse Index
          •   db.persons.ensureIndex({job:1}, {sparse: true})
    • 22. MongoDB Index Limits
        • negations operation
          •   $ne, $not
          •   ex. db.things.find( { x : { $ne : 3 } } );
        • arithmetic operations 
          • $mod
          • ex. db.things.find( &quot;this.a % 10 == 1&quot;)
        • most regular expression
          • yes
            • db.persons.find({/^robbie/})
            • db.persons.find({/^robbie.*/})
            • db.persons.find({/^robbie.*/i})
          • no
            • db.persons.find({/robbie}})
        • $where
    • 23. MongoDB Index Optimization
        • simple data type
          • ex. int is faster than string
        • simple data schema
          • ex. {payment: &quot;176_1&quot;}
        • sparse index
          • if optional fields
    • 24. MongoDB Miscellaneous
        • Monitoring
          • CPU
            • if high which implies index is broken
          • Driver Size
            • time to add new instance
        • Backup
          • EBS: snapshot
          • mongo import/export tool
            • mongodump/mongoimport
        • Auto Deployment
          • Hudson + fabric (python)
    • 25. What's Next?
        • Further Data and Index weight lose
          • target: 20MM/instance
        • introduce Java POJO/DAO
          • Morphia
          • Spring mongodb
        • Watchdog mechanism
          • restart server automatically
    • 26. Q & A Robbie Cheng Lead Software Engineer [email_address]
    • 27. We're Hiring
        • please mail to jobs@fliptop.com
    • 28. Thank you!

    ×