Moving from C#/.NET toHadoop/MongoDBRobert VandeheyDecember 4, 2012
We power the Discovery,Delivery and Display ofDigital Entertainment4   © 2012 Rovi Corporation. Company confidential.
Global Reach137M+                                                 47M+    Viewers use our guide technologies              ...
11   © 2012 Rovi Corporation. Company confidential.
The Problem13   © 2012 Rovi Corporation. Company confidential.
ETL/Cache Loading Data Takes Too Long                                                          Node 1                     ...
The Solution17   © 2012 Rovi Corporation. Company confidential.
Hadoop/MongoDB18   Copyright ®2012 Rovi Corporation. Company confidential.
Network Diagram20   Copyright ®2012 Rovi Corporation. Company confidential.
Mongo Sharding21   Copyright ®2012 Rovi Corporation. Company confidential.
Challenges23   © 2012 Rovi Corporation. Company confidential.
Challenges• Transition existing Windows/.NET team to Linux/Java      – Environment setup. Technology framework choices    ...
Lessons Learned25   © 2012 Rovi Corporation. Company confidential.
Lessons Learned• General      – Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products         ...
Lessons Learned - 2• MongoDB      – RAM, RAM, RAM!!!      – Many writes from Hadoop can easily overwhelm MongoDB          ...
Mongo Query – returns 90 rows from a database of 9million in 44ms28   © 2012 Rovi Corporation. Company confidential.
Q&A31 © 2012 Rovi Corporation. Company confidential.
Follow-up Information• Email: robert.vandehey@rovicorp.com• LinkedIn: http://www.linkedin.com/in/bvandehey• Twitter: @bvan...
Thank You33 © 2012 Rovi Corporation. Company confidential.
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
Upcoming SlideShare
Loading in …5
×

Moving from C#/.NET to Hadoop/MongoDB

3,404 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,404
On SlideShare
0
From Embeds
0
Number of Embeds
319
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This is the new Data Load Process. It makes it look easy…
  • …The reality it is quite complex. This is just one of our workflows. The orange/tan-ish boxes are Java map/reduce processes. The pink boxes are pig processes. The white boxes are BCP processes. The green boxes are MongoDB collections.
  • Here is our sharding scheme. We actually have 6 more servers than is shown because we decided to have multiple replicas at each remote site.
  • Moving from C#/.NET to Hadoop/MongoDB

    1. 1. Moving from C#/.NET toHadoop/MongoDBRobert VandeheyDecember 4, 2012
    2. 2. We power the Discovery,Delivery and Display ofDigital Entertainment4 © 2012 Rovi Corporation. Company confidential.
    3. 3. Global Reach137M+ 47M+ Viewers use our guide technologies Storefronts with entertainment services through service provider offerings powered by Rovi Entertainment Store 266M+Consumer electronic (CE) devices Data coverage:have our CE guide technologies 4.5M+ TV shows, movies, sports and celebrities 40M+ Households reached globally by Rovi Advertising Network 3.3M+ Album releases and 32M music tracks 600M+Devices certified for high quality DivX videoplayback 500K+ Movie titles7 © 2012 Rovi Corporation. Company confidential.
    4. 4. 11 © 2012 Rovi Corporation. Company confidential.
    5. 5. The Problem13 © 2012 Rovi Corporation. Company confidential.
    6. 6. ETL/Cache Loading Data Takes Too Long Node 1 MemcacheD MemcacheD DB (Scratch ClusterDSG DB Server Server(s)) WSP ETL Server Backup & MemcacheDServer(s) Restore MemcacheD Transform CI Cache MemcacheD DSG Extract Database CI Table Loading Database LoadingDatabase Database Process Process MemcacheD MemcacheDB Node 2 Cluster DB Server MemcacheDB Backup & Restore MemcacheDB CI Database Page 16
    7. 7. The Solution17 © 2012 Rovi Corporation. Company confidential.
    8. 8. Hadoop/MongoDB18 Copyright ®2012 Rovi Corporation. Company confidential.
    9. 9. Network Diagram20 Copyright ®2012 Rovi Corporation. Company confidential.
    10. 10. Mongo Sharding21 Copyright ®2012 Rovi Corporation. Company confidential.
    11. 11. Challenges23 © 2012 Rovi Corporation. Company confidential.
    12. 12. Challenges• Transition existing Windows/.NET team to Linux/Java – Environment setup. Technology framework choices – Coding differences – Cultural differences – Platform differences – Easier than expected to transition team from .NET to Java – No religious battles• Backwards compatibility of CXF web services to Microsoft .NET web services• Managing new releases of Hadoop• BCP took too long – Converted to base tables. Used Pig to join the data• Writes to Mongo are very fast. Updates are slower and saturated disks – Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo24 © 2012 Rovi Corporation. Company confidential.
    13. 13. Lessons Learned25 © 2012 Rovi Corporation. Company confidential.
    14. 14. Lessons Learned• General – Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products • We purchased enterprise support agreements from both Cloudera and 10gen – Create a developers VM image – Deploy early and often even if not ready for real customers – Use the same setup in test and production environments • Sharding caused differences• SQL – Get raw tables without any transformation or joins • Let Hadoop do the processing for you• Hadoop – Do as much work as you can in Hadoop – Take the time to create small datasets to iterate fast – Take the time to learn and use Pig • It is very fast and provides tons of functionality that you don’t need to code in Java – Don’t create Runners - Use Oozie workflows – Measure, benchmark and track performance – Use Hadoop counters26 © 2012 Rovi Corporation. Company confidential.
    15. 15. Lessons Learned - 2• MongoDB – RAM, RAM, RAM!!! – Many writes from Hadoop can easily overwhelm MongoDB • Single database lock • Drive bandwidth saturation – Can be expanded through sharding • Do as much as possible to minimize writes • Measure where your application is blocking and optimize – Don’t shard unless you have to – if you do shard, preconfigure your shard key • You need a good shard key – Use Replica sets. They are easy to setup and work good. • Make sure repllog is large enough. – Use MongoDB Monitoring Service (MMS) – It’s free – Mongo queries are fast!27 © 2012 Rovi Corporation. Company confidential.
    16. 16. Mongo Query – returns 90 rows from a database of 9million in 44ms28 © 2012 Rovi Corporation. Company confidential.
    17. 17. Q&A31 © 2012 Rovi Corporation. Company confidential.
    18. 18. Follow-up Information• Email: robert.vandehey@rovicorp.com• LinkedIn: http://www.linkedin.com/in/bvandehey• Twitter: @bvandehey• Rovi Cloud Services: http://developer.rovicorp.com/32 © 2012 Rovi Corporation. Company confidential.
    19. 19. Thank You33 © 2012 Rovi Corporation. Company confidential.

    ×