Your SlideShare is downloading. ×
Building Social Analytics Tool with MongoDB
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Building Social Analytics Tool with MongoDB

38
views

Published on

IntelliGrape presentation by Abhishek Tejpaul at An Afternoon with MongoDB New Delhi & Bangalore

IntelliGrape presentation by Abhishek Tejpaul at An Afternoon with MongoDB New Delhi & Bangalore

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
38
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1.              Building  Social  Analy/cs  Tool  with  MongoDB  -­‐   A  Developer's  Perspec/ve
  • 2. 1.  Product  Overview   2.  Why  MongoDB  for  us?   3.  Aggrega?on  Queries  to  the  rescue   4.  How  Javascript  helped  us?   5.  Experiences  with  Indexes   6.  In-­‐progress  use-­‐cases   7.  Tips  &  Tricks   8.  Demo   Agenda
  • 3. Abhishek  Tejpaul     SoUware  Developer  @  IntelliGrape  SoUware     Loves  Grails,  Git  and  Linux     abhishek@intelligrape.com   About me
  • 4. DataSiU   Instagram   Web   Crawler1   Web   Crawler…   mongoDB Product Overview – Information Flow
  • 5. Product Overview – Results
  • 6. Product Overview – Results
  • 7. Product Overview – Results
  • 8. •  Schema-­‐less  data.  Typical  data  sources     •  Adding  new  social  pla4orms  in  future   •  Needed  fast  read-­‐write  opera6ons   Why MongoDB for us?
  • 9. Aggregation Queries – Getting Insights •  Combina6on  of  queries  chained  together   •  At  every  stage,  we  can  filter/chain/massage  data     Image  credit:  h@ps://www.openshiC.com/blogs/an-­‐overview-­‐of-­‐whats-­‐new-­‐in-­‐mongodb-­‐22  
  • 10. Our use-case (esp. for graphs) •  Sen6ment  Analysis   •  Demographic  Analysis   •  Ar6cle  Analysis   •  Plan   •  Crea?on  of  Intelligence  tables  in  advance   •  Reality   •  On-­‐the-­‐fly  analysis  using  Aggrega6on  queries  
  • 11. How to go about it? •  Operates  on  a  single  collec6on     •  Think  about  data  you  have  and  insights  you  want   •  Focus  on  reducing  data  size  early  on   •  $match   •  $project   •  $sort   •  $limit,  $skip   •  Example db.collec?onName.aggregate(    {  "$match"  :  {  fieldName  :    matchingValue    },    {  "$project"  :  {    oldOrNewField:  fieldValue      }},    {  "$group"  :  {  fieldName  :  oldOrNewField,  "sum":  {"$sum":1}}},    {  "$sort"  :    {  "sum"  :  -­‐1  }},    {  "$limit"  :  20  })    
  • 12. Javascript Capabilities •  All  the  programming  capabili6es  of  Javascript  language  at  your   disposal   •  Taking  business  logic  /  processing  to  your  data-­‐store  
  • 13. Javascript – Our use-cases •  Remove  garbage  data  at  DB  level   •  Twijer  wrong  results   •  Filtering  out  STOP  keywords      db.IgnoreList.findOne().stopWords.forEach(  func?on(data)  {      db.ProcessedAr?cle.update(        {  "isAc?ve"  :  true,  "isIgnored"  :  {"$ne":true}  },          {            "$pull"    :  {"topicOfDiscussion"  :  {"name":  data}},          "$set"    :  {"isIgnored"  :    true}        },        {  "mul?"    :  true    }      )    });    return  true    
  • 14. Javascript – Caveats •  Takes  up  read-­‐write  locks  on  the  en6re  database   •  Can  be  run  with  {‘noLock’  :  true}  op?on      db.runCommand({        Eval:  <func?on>,                                                        Args:  <args>,        Nolock:  <true/false>        })     •  Can  be  replaced  by  mapreduce  in  most  cases     •  Take  it  as  one-­‐off  case  
  • 15. Indexes – Our use-cases •  dropDups   {dropDups  :  true}   •  backGround   {backGround  :  true}   •  Time  to  Live   {expireAUerSeconds  :  3600}   •  Compound  Indexing   {key1  :  1,  key2  :  1}  !=  {key1  :  1}    
  • 16. Our current state •  Faster  write  opera?ons   •  Under  high  data  load  from  different  sources   •  Faster  read  opera?ons   •  Graph  rendering  up-­‐to  10  x  quicker   •  Ease  of  scalability   •  Though  yet  to  reach  there  
  • 17. Work In Progress •  Full-­‐text  search  implementa?on   •  can  be  created  only  on  strings  or  array  of  strings   •  db.collec?onName.ensureIndex(  {  fieldName  :  "text"  }  )   •  Capped  Collec?ons   •  Widgets  for  last-­‐run  jobs  /  event  log  tables   •  Very  fast  writes  possible   •  db.createCollec?on("cName",  {  capped  :  true,  size  :  5242880,   max  :  5000  }  )   •  size  argument  is  always  required  
  • 18. Tips / Tricks – Things we learnt •  cloneCollec6on   •  No  more  ssh/scp  to  remote  systems   •  db.runCommand({cloneCollec?on:  <nsCollec?on>,  from:  <remote>,  query:  {}})   •  db.cloneCollec?on(from,  collec?onName,  query)   •  db.Collec-onName.copyTo   •  doesn’t  not  copy  indexes  
  • 19. Tips / Tricks – Things we learnt •  remove()  vs  drop()   •  Can’t  use  remove  for  capped  collec6ons     •  remove  keeps  indexes  while  drop()  clears  them   •  To  remove  all  the  documents  in  a  collec?on,  use  drop()   •  To  remove  beZer  part  of  large  collec?on,  use  javascript   •  preZy()  find  by  default   •  DBQuery.prototype._prejyShell  =  true  (  inside  your  ~/.mongorc.js)  
  • 20. DEMO  
  • 21. I  am  not  a  MongoDB  expert  though  J  
  • 22. Thank  You!!