MongoDB & Hadoop, Sittin' in a Tree

3,800 views

Published on

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,800
On SlideShare
0
From Embeds
0
Number of Embeds
2,727
Actions
Shares
0
Downloads
41
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

MongoDB & Hadoop, Sittin' in a Tree

  1. 1. K Young - CEO, MortarMongoDB + Hadoopsittin’ in a tree
  2. 2. OF THIS SESSIONOverviewSuper-fast intro to Hadoop, PigWhy MongoDB + Pig?Demo: Move data MongoDB <=> PigDemo: processing data with Pig
  3. 3. SUPER-FAST INTROHadoopFrom Google researchBuilt for massive parallelizationBatch (for now)Widely applicable
  4. 4. SUPER-FAST INTROHadoop
  5. 5. Social Graph
  6. 6. Predict
  7. 7. Detect
  8. 8. Genetics
  9. 9. SUPER-FAST INTROHadoop
  10. 10. ON HADOOPPigLess codeExpressive codeCompiles to MRInsulates from APIPopular(LinkedIn, Twitter,Salesforce, Yahoo,Stanford University...)
  11. 11. BRIEF, EXPRESSIVELIKE PROCEDURAL SQLPig(thanks: twitter hadoop world presentation)
  12. 12. FOR SERIOUSThe Same Script, In MapReduce
  13. 13. Alternatives to HadoopWrite MapReduce in Javascript• Javascript is not fast• Has limited data types• Hard to use complex analytic libsAdds load to data storeMONGODB NATIVE MAPREDUCE
  14. 14. Alternatives to HadoopMONGODB AGGREGATION FRAMEWORKGreat when• Doing SQL-style aggregation• Do not require external data libs• Extra load is ok
  15. 15. MOTIVATIONSMongoDB + PigData storage and data processing are oftenseparate concernsHadoop is built for scalable processing of largedatasets
  16. 16. SIMILAR PHILOSOPHYMongoDB, PigPoly-structured data• MongoDB: stores data, regardless of structure• Pig: reads data, regardless of structure (got itsname because Pigs are omnivorous)
  17. 17. MortarFAST INTROOpen-source code-based dev framework for data,built on Hadoop and PigInspired by RailsSelf-contained, organized, executable projects
  18. 18. > gem install mortar> git clone https://github.com/mortardata/mongo-pig-examples.git
  19. 19. LOADMONGO => PIGMongo-Hadoop connectorLOAD mongodb://<username>:<password>@<host>:<port>/<database>.<collection>USING com.mongodb.hadoop.pig.MongoLoader();
  20. 20. STOREPIG => MONGOSTORE resultINTO mongodb://<username>:<password>@<host>:<port>/<database>.<collection>USING com.mongodb.hadoop.pig.MongoStorage(update [key1, key2, key3],{key1: 1, key2: 1, key3: 1},{unique:false, dropDups: false});
  21. 21. What’s my schema?GENERATE ITPig is schema-optional.No schema: document#user#nameWith schema: user.name
  22. 22. What’s in the collection?CHARACTERIZE ITHadoop-based utility describes your collection• Field name• Unique value count• Example value• Data type• Example value count
  23. 23. AppendixLINKSReference:http://help.mortardata.com/reference/loading_and_storing_data/MongoDBMongo-Hadoop connectorhttps://github.com/mortardata/mongo-hadoop
  24. 24. @kky@mortardatahelp.mortardata.com
  25. 25. Lunch 1:20 – 2:05Next Sessions at 2:055th Floor:West Side Ballroom 3&4: How to Keep Your Data Safe in MongoDBWest Side Ballroom 1&2: Geospatial Enhancements in MongoDB 2.4Juilliard Complex: Business Track: Business Track: How MongoDB HelpsTelefonica Digital Accelerate Time to MarketLyceum Complex: Ask the Experts: MongoDB Monitoring and BackupService Session7th Floor:Empire Complex: Real-Time Integration Between MongoDB and SQLDatabasesSoHo Complex: High Performance,Scalable MongoDB in a Bare MetalCloud

×