Shafaq AbdullahPrincipal, Zenprise
   MongoDB features and Architecture   Business Intelligence: Overview   Model of Data using SQL vs NoSQL   Concept of...
   Key-value   Column   Document-based   Graph
   Schema less data model   Adhoc Query   Scalability   High Availability   Speed
consistency    Availability         Partition        Tolerance
http://www.mongodb.org
Business + smart information =                Business Intelligence  Consists of   querying,reporting, and analytics for  ...
   Modelization of Data in SQL       A 1-many relation of node (id, value) with    other nodes related by two different r...
NoSQL Modelization mapped on RelationalDatabase Modelization  Node                   Relation                         c   ...
   Using Complex Type Attributes to Model data                      Nodes                 _id                 valued     ...
   No join operation required   Nesting of documents, resulting in    instantaneous access to retrieve nodes/   Support...
   MapReduce    ◦ Programming model for managing large amount of      data in a parallel fashion    ◦ Map : Processing of...
map(k1, v1) = list(k2,v2) reduce(k1, list(v2)) = list(v3)List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5)Map: (a;[2, 4, 1]), (b;...
   ProblemThe task is to find the 2 closestcities in each countryGiven that earth as 2D planeThe distance b/w P1 (x1,y1) ...
Computations   Groups
create view city_dist asselect c1.CountryID,       c1.CityId, c1.City,        c2.CityId as CityId2, c2.City as City2,    d...
select city_dist.* from( select CountryID, min(Dist) as MinDist fromcity_dist where Dist > 0 /* Avoid cities whichshare La...
Map   Reduce   Finalize
function MapCode() {emit(this.CountryID, {     "data":     [ { "city": this.City, "lat": this.Latitude, "lon":   this.Long...
function ReduceCode(key, values) { var reduced = {"data":[]}; for (var i in values) {      var inter = values[i];         ...
function Finalize(key, reduced) {   if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 Cit...
   Shard!   For write intensive, increase number of    shards   For read intensive, increase number of    replica-sets ...
   MongoDB has application well-suited for BI   Schemaless Model allows high-performance   Replication with Sharding al...
   http://www.mongodb.org/   http://www.jaspersoft.com/   R. Cattell. Scalable SQL and NoSQL Data    Stores.http://www....
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
Upcoming SlideShare
Loading in...5
×

MongoDB MapReduce Business Intelligence

2,292

Published on

Published in: Technology, Education

MongoDB MapReduce Business Intelligence

  1. 1. Shafaq AbdullahPrincipal, Zenprise
  2. 2.  MongoDB features and Architecture Business Intelligence: Overview Model of Data using SQL vs NoSQL Concept of MapReduce Real-world use-case of Business Analytics Scalability Conclusions
  3. 3.  Key-value Column Document-based Graph
  4. 4.  Schema less data model Adhoc Query Scalability High Availability Speed
  5. 5. consistency Availability Partition Tolerance
  6. 6. http://www.mongodb.org
  7. 7. Business + smart information = Business Intelligence Consists of querying,reporting, and analytics for businesses Enable business to make smart decision to execute
  8. 8.  Modelization of Data in SQL A 1-many relation of node (id, value) with other nodes related by two different relations Node Relation id id_node1 value id_node2
  9. 9. NoSQL Modelization mapped on RelationalDatabase Modelization Node Relation c c id _id value value 1 value2
  10. 10.  Using Complex Type Attributes to Model data Nodes _id valued relations [] value 1 value 2 … value n
  11. 11.  No join operation required Nesting of documents, resulting in instantaneous access to retrieve nodes/ Supporting agile method of programming Schema flexible adaptive to changing business needs
  12. 12.  MapReduce ◦ Programming model for managing large amount of data in a parallel fashion ◦ Map : Processing of a data list to create key/value pairs ◦ Reduce: Porcess above pair to create new aggregated key/value pairs
  13. 13. map(k1, v1) = list(k2,v2) reduce(k1, list(v2)) = list(v3)List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5)Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5])Reduce: (a;7), (b;6),(c;5)
  14. 14.  ProblemThe task is to find the 2 closestcities in each countryGiven that earth as 2D planeThe distance b/w P1 (x1,y1) andP2 (x2,y2) is computed asSquare-Root of { (x1-x2)2 + (y1-y2)2 }
  15. 15. Computations Groups
  16. 16. create view city_dist asselect c1.CountryID, c1.CityId, c1.City, c2.CityId as CityId2, c2.City as City2, dist(c1,c2) asDist from cities c1 inner join cities c2where c1.CountryID = c2.CountryIDand c1.CityId < c2.CityId /* Calculate distance between 2 cities only once */
  17. 17. select city_dist.* from( select CountryID, min(Dist) as MinDist fromcity_dist where Dist > 0 /* Avoid cities whichshare Latitude & Longitude */ group byCountryID ) a inner join city_dist ona.CountryID = city_dist.CountryID anda.MinDist = city_dist.Dist;
  18. 18. Map Reduce Finalize
  19. 19. function MapCode() {emit(this.CountryID, { "data": [ { "city": this.City, "lat": this.Latitude, "lon": this.Longitude } ]}); }
  20. 20. function ReduceCode(key, values) { var reduced = {"data":[]}; for (var i in values) { var inter = values[i]; for (var j in inter.data) { reduced.data.push(inter.data[j]); } } return reduced;
  21. 21. function Finalize(key, reduced) { if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" }; } var min_dist = 999999999999; var city1 = { "name": "" }; var city2 = { "name": "" }; var c1; var c2; var d; for (var i in reduced.data) { for (var j in reduced.data) { if (i>=j) continue; c1 = reduced.data[i]; c2 = reduced.data[j]; d = Dist(c1,c2) if (d < min_dist && d > 0) { min_dist = d; city1 = c1; city2 = c2; } } } return {"city1": city1.name, "city2": city2.name, "dist": min_dist}; }
  22. 22.  Shard! For write intensive, increase number of shards For read intensive, increase number of replica-sets within shards Best Read performance : Data in Shard breadth in memory
  23. 23.  MongoDB has application well-suited for BI Schemaless Model allows high-performance Replication with Sharding allows Scaling out
  24. 24.  http://www.mongodb.org/ http://www.jaspersoft.com/ R. Cattell. Scalable SQL and NoSQL Data Stores.http://www.cattell.net/datastores/Dat astores.pdf C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y.Ng, and K. Olukotun. Map- reduce for machine learning on multicore. In NIPS, pages 281–288, 2006. http://www.mongovue.com/2010/11/03/yet -another-mongodb-map-reduce-tutorial/

×