Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoDB & Hadoop, Sittin' in a Tree

3,280

Published on

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,280
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
38
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. K Young - CEO, MortarMongoDB + Hadoopsittin’ in a tree
  • 2. OF THIS SESSIONOverviewSuper-fast intro to Hadoop, PigWhy MongoDB + Pig?Demo: Move data MongoDB <=> PigDemo: processing data with Pig
  • 3. SUPER-FAST INTROHadoopFrom Google researchBuilt for massive parallelizationBatch (for now)Widely applicable
  • 4. SUPER-FAST INTROHadoop
  • 5. Social Graph
  • 6. Predict
  • 7. Detect
  • 8. Genetics
  • 9. SUPER-FAST INTROHadoop
  • 10. ON HADOOPPigLess codeExpressive codeCompiles to MRInsulates from APIPopular(LinkedIn, Twitter,Salesforce, Yahoo,Stanford University...)
  • 11. BRIEF, EXPRESSIVELIKE PROCEDURAL SQLPig(thanks: twitter hadoop world presentation)
  • 12. FOR SERIOUSThe Same Script, In MapReduce
  • 13. Alternatives to HadoopWrite MapReduce in Javascript• Javascript is not fast• Has limited data types• Hard to use complex analytic libsAdds load to data storeMONGODB NATIVE MAPREDUCE
  • 14. Alternatives to HadoopMONGODB AGGREGATION FRAMEWORKGreat when• Doing SQL-style aggregation• Do not require external data libs• Extra load is ok
  • 15. MOTIVATIONSMongoDB + PigData storage and data processing are oftenseparate concernsHadoop is built for scalable processing of largedatasets
  • 16. SIMILAR PHILOSOPHYMongoDB, PigPoly-structured data• MongoDB: stores data, regardless of structure• Pig: reads data, regardless of structure (got itsname because Pigs are omnivorous)
  • 17. MortarFAST INTROOpen-source code-based dev framework for data,built on Hadoop and PigInspired by RailsSelf-contained, organized, executable projects
  • 18. > gem install mortar> git clone https://github.com/mortardata/mongo-pig-examples.git
  • 19. LOADMONGO => PIGMongo-Hadoop connectorLOAD mongodb://<username>:<password>@<host>:<port>/<database>.<collection>USING com.mongodb.hadoop.pig.MongoLoader();
  • 20. STOREPIG => MONGOSTORE resultINTO mongodb://<username>:<password>@<host>:<port>/<database>.<collection>USING com.mongodb.hadoop.pig.MongoStorage(update [key1, key2, key3],{key1: 1, key2: 1, key3: 1},{unique:false, dropDups: false});
  • 21. What’s my schema?GENERATE ITPig is schema-optional.No schema: document#user#nameWith schema: user.name
  • 22. What’s in the collection?CHARACTERIZE ITHadoop-based utility describes your collection• Field name• Unique value count• Example value• Data type• Example value count
  • 23. AppendixLINKSReference:http://help.mortardata.com/reference/loading_and_storing_data/MongoDBMongo-Hadoop connectorhttps://github.com/mortardata/mongo-hadoop
  • 24. @kky@mortardatahelp.mortardata.com
  • 25. Lunch 1:20 – 2:05Next Sessions at 2:055th Floor:West Side Ballroom 3&4: How to Keep Your Data Safe in MongoDBWest Side Ballroom 1&2: Geospatial Enhancements in MongoDB 2.4Juilliard Complex: Business Track: Business Track: How MongoDB HelpsTelefonica Digital Accelerate Time to MarketLyceum Complex: Ask the Experts: MongoDB Monitoring and BackupService Session7th Floor:Empire Complex: Real-Time Integration Between MongoDB and SQLDatabasesSoHo Complex: High Performance,Scalable MongoDB in a Bare MetalCloud

×