Introduction to MongoDB and Hadoop

9,672 views

Published on

An Introduction to MongoDB + an Introduction to MongoDB + Hadoop.

This presentation was given at the CT Java Users Group in March 2013.

Published in: Technology
3 Comments
27 Likes
Statistics
Notes
No Downloads
Views
Total views
9,672
On SlideShare
0
From Embeds
0
Number of Embeds
1,732
Actions
Shares
0
Downloads
0
Comments
3
Likes
27
Embeds 0
No embeds

No notes for slide
  • AGPL – GNU Affero General Public License
  • * Big endian and ARM not supported.
  • Kristine to update this graphic at some point
  • Kristine to update this graphic at some point
  • Kristine to update this graphic at some point
  • Powerful message here. Finally a database that enables rapid & agile development.
  • Creating a book here. A few things to make note of.
  • Powerful message here. Finally a database that enables rapid & agile development.
  • Introduction to MongoDB and Hadoop

    1. 1. #MongoDBIntroduction to MongoDB& MongoDB + HadoopSteve FranciaChief Evangelist, 10gen
    2. 2. What is MongoDB
    3. 3. MongoDB is a ___________database• Document• Open source• High performance• Horizontally scalable• Full featured
    4. 4. Document Database• Not for .PDF & .DOC files• A document is essentially an associative array• Document == JSON object• Document == PHP Array• Document == Python Dict• Document == Ruby Hash• etc
    5. 5. Open Source• MongoDB is an open source project• On GitHub• Licensed under the AGPL• Started & sponsored by 10gen• Commercial licenses available• Contributions welcome
    6. 6. High Performance• Written in C++• Extensive use of memory-mapped files i.e. read-through write-through memory caching.• Runs nearly everywhere• Data serialized as BSON (fast parsing)• Full support for primary & secondary indexes• Document model = less work
    7. 7. Horizontally Scalable
    8. 8. Full Featured• Ad Hoc queries• Real time aggregation• Rich query capabilities• Traditionally consistent• Geospatial features• Support for most programming languages• Flexible schema
    9. 9. Database Landscape
    10. 10. http://www.mongodb.org/downloads
    11. 11. Mongo Shell
    12. 12. Document Database
    13. 13. RDBMS MongoDBTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded DocumentForeign Key ➜ ReferencePartition ➜ ShardTerminology
    14. 14. Typical (relational) ERD
    15. 15. MongoDB ERD
    16. 16. Working with MongoDB
    17. 17. Creating an author> db.author.insert({ first_name: j.r.r., last_name: tolkien, bio: J.R.R. Tolkien (1892.1973), beloved throughout theworld as the creator of The Hobbit and The Lord of the Rings, was aprofessor of Anglo-Saxon at Oxford, a fellow of PembrokeCollege, and a fellow of Merton College until his retirement in 1959.His chief interest was the linguistic aspects of the early Englishwritten tradition, but even as he studied these classics he wascreating a set of his own.})
    18. 18. Querying for our author> db.author.findOne( { last_name : tolkien } ){ "_id" : ObjectId("507ffbb1d94ccab2da652597"), "first_name" : "j.r.r.", "last_name" : "tolkien", "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the worldas the creator of The Hobbit and The Lord of the Rings, was aprofessor of Anglo-Saxon at Oxford, a fellow of PembrokeCollege, and a fellow of Merton College until his retirement in 1959.His chief interest was the linguistic aspects of the early Englishwritten tradition, but even as he studied these classics he wascreating a set of his own."}
    19. 19. Creating a Book> db.books.insert({ title: fellowship of the ring, the, author: ObjectId("507ffbb1d94ccab2da652597"), language: english, genre: [fantasy, adventure], publication: { name: george allen & unwin, location: London, date: new Date(21 July 1954), }}) http://society6.com/PastaSoup/The-Fellowship-of-the-Ring-ZZc_Print/
    20. 20. Multiple values per key> db.books.findOne({language: english}, {genre: 1}){ "_id" : ObjectId("50804391d94ccab2da652598"), "genre" : [ "fantasy", "adventure" ]}
    21. 21. Querying for key withmultiple values> db.books.findOne({genre: fantasy}, {title: 1}){ "_id" : ObjectId("50804391d94ccab2da652598"), "title" : "fellowship of the ring, the"} Query key with single value or multiple values the same way.
    22. 22. Nested Values> db.books.findOne({}, {publication: 1}){ "_id" : ObjectId("50804ec7d94ccab2da65259a"), "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") }}
    23. 23. Reach into nested valuesusing dot notation> db.books.findOne( {publication.date : { $lt : new Date(21 June 1960)} }){ "_id" : ObjectId("50804391d94ccab2da652598"), "title" : "fellowship of the ring, the", "author" : ObjectId("507ffbb1d94ccab2da652597"), "language" : "english", "genre" : [ "fantasy", "adventure" ], "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") }}
    24. 24. Update books> db.books.update( {"_id" : ObjectId("50804391d94ccab2da652598")}, { $set : { isbn: 0547928211, pages: 432 } }) True agile development . Simply change how you work with the data and the database follows
    25. 25. The Updated Book recorddb.books.findOne(){ "_id" : ObjectId("50804ec7d94ccab2da65259a"), "author" : ObjectId("507ffbb1d94ccab2da652597"), "genre" : [ "fantasy", "adventure" ], "isbn" : "0395082544", "language" : "english", "pages" : 432, "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") }, "title" : "fellowship of the ring, the"}
    26. 26. Creating indexes> db.books.ensureIndex({title: 1})> db.books.ensureIndex({genre : 1})> db.books.ensureIndex({publication.date: -1})
    27. 27. Finding author by book> book = db.books.findOne( {"title" : "return of the king, the"})> db.author.findOne({_id: book.author}){ "_id" : ObjectId("507ffbb1d94ccab2da652597"), "first_name" : "j.r.r.", "last_name" : "tolkien", "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world asthe creator of The Hobbit and The Lord of the Rings, was a professor ofAnglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow ofMerton College until his retirement in 1959. His chief interest was thelinguistic aspects of the early English written tradition, but even as hestudied these classics he was creating a set of his own."}
    28. 28. The Big DataStory
    29. 29. Is actually two stories
    30. 30. Doers & Tellers talking aboutdifferent things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
    31. 31. Tellers
    32. 32. Doers
    33. 33. Doers talk a lot more aboutactual solutions
    34. 34. They know its a two sidedstory Storage Processing
    35. 35. Take aways• MongoDB and Hadoop• MongoDB for storage & operations• Hadoop for processing & analytics
    36. 36. MongoDB & DataProcessing
    37. 37. Applications have complex needs• MongoDB ideal operational database• MongoDB ideal for BIG data• Not a data processing engine, but provides processing functionality
    38. 38. Many options for ProcessingData• Process in MongoDB using Map Reduce• Process in MongoDB using Aggregation Framework• Process outside MongoDB (using Hadoop)
    39. 39. MongoDB MapReduce
    40. 40. MongoDB Map Reduce• MongoDB map reduce quite capable... but with limits• - Javascript not best language for processing map reduce• - Javascript limited in external data processing libraries• - Adds load to data store
    41. 41. MongoDB Aggregation• Most uses of MongoDB Map Reduce were for aggregation• Aggregation Framework optimized for aggregate queries• Realtime aggregation similar to SQL GroupBy
    42. 42. MongoDB & Hadoop
    43. 43. DEMO• Install Hadoop MongoDB Plugin• Import tweets from twitter• Write mapper• Write reducer• Call myself a data scientist
    44. 44. Installing Mongo- hadoop https://gist.github.com/1887726hadoop_version 0.23hadoop_path="/usr/local/Cellar/hadoop/$hadoop_version.0/libexec/lib"git clone git://github.com/mongodb/mongo-hadoop.gitcd mongo-hadoopsed -i "s/default/$hadoop_version/g" build.sbtcd streaming./build.sh
    45. 45. Groking Twittercurl https://stream.twitter.com/1/statuses/sample.json -u<login>:<password> | mongoimport -d test -c live ... let it run for about 2 hours
    46. 46. DEMO 1
    47. 47. Map Hashtags in Javapublic class TwitterMapper extends Mapper<Object, BSONObject, Text, IntWritable> { @Override public void map( final Object pKey, final BSONObject pValue, final Context pContext ) throws IOException, InterruptedException{ BSONObject entities = (BSONObject)pValue.get("entities"); if(entities == null) return; BasicBSONList hashtags = (BasicBSONList)entities.get("hashtags"); if(hashtags == null) return; for(Object o : hashtags){ String tag = (String)((BSONObject)o).get("text"); pContext.write( new Text( tag ), new IntWritable( 1 ) ); } }}
    48. 48. Reduce hashtags in Javapublic class TwitterReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce( final Text pKey, final Iterable<IntWritable> pValues, final Context pContext ) throws IOException, InterruptedException{ int count = 0; for ( final IntWritable value : pValues ){ count += value.get(); } pContext.write( pKey, new IntWritable( count ) ); }}
    49. 49. All together#!/bin/shexport HADOOP_HOME="/Users/mike/hadoop/hadoop-1.0.4"declare -a job_argscd ..job_args=("jar" "examples/twitter/target/twitter-example_*.jar")job_args=(${job_args[@]} "com.mongodb.hadoop.examples.twitter.TwitterConfig ")job_args=(${job_args[@]} "-D" "mongo.job.verbose=true")job_args=(${job_args[@]} "-D" "mongo.job.background=false")job_args=(${job_args[@]} "-D" "mongo.input.key=")job_args=(${job_args[@]} "-D" "mongo.input.uri=mongodb://localhost:27017/test.live")job_args=(${job_args[@]} "-D" "mongo.output.uri=mongodb://localhost:27017/test.twit_hashtags")job_args=(${job_args[@]} "-D" "mongo.input.query=")job_args=(${job_args[@]} "-D" "mongo.job.mapper=com.mongodb.hadoop.examples.twitter.TwitterMapper")job_args=(${job_args[@]} "-D" "mongo.job.reducer=com.mongodb.hadoop.examples.twitter.TwitterReducer")job_args=(${job_args[@]} "-D" "mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat")job_args=(${job_args[@]} "-D" "mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat")job_args=(${job_args[@]} "-D" "mongo.job.output.key=org.apache.hadoop.io.Text")job_args=(${job_args[@]} "-D" "mongo.job.output.value=org.apache.hadoop.io.IntWritable")job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.key=org.apache.hadoop.io.Text")job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.value=org.apache.hadoop.io.IntWritable")job_args=(${job_args[@]} "-D" "mongo.job.combiner=com.mongodb.hadoop.examples.twitter.TwitterReducer")job_args=(${job_args[@]} "-D" "mongo.job.partitioner=")job_args=(${job_args[@]} "-D" "mongo.job.sort_comparator=")#echo "${job_args[@]}"$HADOOP_HOME/bin/hadoop "${job_args[@]}" "$1"
    50. 50. Popular HashTagsdb.twit_hashtags.find().sort( {count : -1 }){ "_id" : "YouKnowYoureInLoveIf", "count" : 287 }{ "_id" : "teamfollowback", "count" : 200 }{ "_id" : "RT", "count" : 150 }{ "_id" : "Arsenal", "count" : 148 }{ "_id" : "milars", "count" : 145 }{ "_id" : "sanremo", "count" : 145 }{ "_id" : "LoseMyNumberIf", "count" : 139 }{ "_id" : "RelationshipsShould", "count" : 137 }{ "_id" : "oomf", "count" : 117 }{ "_id" : "TeamFollowBack", "count" : 105 }{ "_id" : "WhyDoPeopleThink", "count" : 102 }{ "_id" : "np", "count" : 100 }
    51. 51. DEMO 2
    52. 52. Aggregation in Mongo2.2db.live.aggregate( { $unwind : "$entities.hashtags" } , { $match : { "entities.hashtags.text" : { $exists : true } } } , { $group : { _id : "$entities.hashtags.text", count : { $sum : 1 } } } , { $sort : { count : -1 } }, { $limit : 10 })
    53. 53. Popular HashTagsdb.twit_hashtags.aggregate(a){ "result" : [ { "_id" : "YouKnowYoureInLoveIf", "count" : 287 }, { "_id" : "teamfollowback", "count" : 200 }, { "_id" : "RT", "count" : 150 }, { "_id" : "Arsenal", "count" : 148 }, { "_id" : "milars", "count" : 145 }, { "_id" : "sanremo","count" : 145 }, { "_id" : "LoseMyNumberIf", "count" : 139 }, { "_id" : "RelationshipsShould", "count" : 137 }, ],"ok" : 1}
    54. 54. #MongoDBQuestions?Steve FranciaChief Evangelist, 10gen@spf13Spf13.com

    ×