The Art of Social Media Analysis with Twitter & Python

  • 8,973 views
Uploaded on

Slides for my tutorial at OSCON 2012 http://goo.gl/fpxVE …

Slides for my tutorial at OSCON 2012 http://goo.gl/fpxVE

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
8,973
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
357
Comments
2
Likes
27

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Art of Social Media Analysis with Twitter & Python krishna sankar @ksankar http://www.oscon.com/oscon2012/public/schedule/detail/23130
  • 2. Intro API, Objects,… o  House  Rules  (1  of  2)   Twitter Network We will analyze @clouderati, o  Doesn’t  assume  any  knowledge   Analysis 2072 followers, exploding to of  Twitter  API   Pipeline ~980,000 distinct users down one level o  Goal:  Everybody  in  the  same   page  &  get  a  working   knowledge  of  Twitter  API   NLP, NLTK, o  To  bootstrap  your  exploration   @mention Cliques, social Sentiment network graph into  Social  Network  Analysis  &   Analysis Twitter     Rewteeet analytics, Growth, #tag Network Information o  Simple  programs,  to  illustrate   contagion weakties usage  &  data  manipulation  
  • 3. Intro API, Objects,… Twittero  House  Rules  (2  of  2)   Network We will analyze @clouderati, Analysis 2072 followers, exploding to o  Am  using  the  requests  library   Pipeline ~980,000 distinct users down o  There  are  good  Twitter  frameworks   one level for  python,  but  wanted  to  build   from  the  basics.  Once  one   understands  the  fundamentals,   frameworks  can  help   NLP, NLTK, @mention Cliques, social Sentiment o  Many  areas  to  explore  –  not  enough   Analysis network graph time.  So  decided  to  focus  on  social   graph,  cliques  &  networkx   Rewteeet analytics, Growth, #tag Network Information contagion weakties
  • 4. About  Me •  Lead  Engineer/Data  Scientist/AWS  Ops  Guy  at   Genophen.com   o  Co-­‐chair  –  2012  IEEE  Precision  Time  Synchronization     •  http://www.ispcs.org/2012/index.html   o  Blog  :  http://doubleclix.wordpress.com/   o  Quora  :  http://www.quora.com/Krishna-­‐Sankar  •  Prior  Gigs   o  Lead  Architect  (Egnyte)   o  Distinguished  Engineer  (CSCO)   o  Employee  #64439  (CSCO)  to  #39(Egnyte)  &  now  #9  !  •  Current  Focus:   o  Design,  build  &  ops  of  BioInformatics/Consumer  Infrastructure  on  AWS,   MongoDB,  Solr,  Drupal,GitHub,…   o  Big  Data  (more  of  variety,  variability,  context  &  graphs,  than  volume  or  velocity  –   so  far  !)   o  Overlay  based  semantic  search  &  ranking  •  Other  related  Presentations   o  http://goo.gl/P1rhc  Big  Data  Engineering  Top  10  Pragmatics  (Summary)   o  http://goo.gl/0SQDV  The  Art  of  Big  Data  (Detailed)   o  http://goo.gl/EaUKH  The  Hitchhiker’s  Guide  to  Kaggle  OSCON  2011  Tutorial  
  • 5. Twitter Tips – A Baker’s Dozen 1.  Twitter  APIs  are  (more  or  less)  congruent  &  symmetric  2.  Twitter  is  usually  right  &  simple  -­‐  recheck  when  you  get  unexpected  results   before  blaming  Twitter   o  I  was  getting  numbers  when  I  was  expecting  screen_names  in  user  objects.   o  Was  ready  to  send  blasting  e-­‐mails  to  Twitter  team.  Decided  to  check  one  more  time   and  found  that  my  parameter  key  was  wrong-­‐screen_name  instead  of  user_id   o  Always test with one or two records before a long run ! - learned the hard way3.  Twitter  APIs  are  very  powerful  –  consistent  use  can  bear  huge  data   o  In  a  week,  you  can  pull  in  4-­‐5  million  users  &  some  tweets  !     o  Night runs are far more faster & error-free4.  Use  a  NOSQL  data  store  as  a  command  buffer  &  data  buffer   o  Would  make  it  easy  to  work  with  Twitter  at  scale   o  I  use    MongoDB   The o  Keep  the  schema  simple  &  no  fancy  transformation   End •  And  as  far  as  possible  same  as  the  ( json)  response       Beg As Th inni o  Use  NOSQL  CLI  for  trimming  records  et  al   ng e
  • 6. Twitter Tips – A Baker’s Dozen 5.  Always  use  a  big  data  pipeline   o  Collect - Store - Transform & Analyze - Model & Reason - Predict, Recommend & Visualize o  That  way  you  can  orthogonally  extend,  with  functional  components  like  command  buffers,   validation  et  al    6.  Use  functional  approach  for  a  scalable  pipeline   o  Compose  your  data  big  pipeline  with  well  defined  granular  functions,  each  doing  only  one  thing   o  Don’t  overload  the  functional  components  (i.e.  no  collect,  unroll  &  store  as  a  single  component)   o  Have  well  defined  functional  components  with  appropriate  caching,  buffering,  checkpoints  &   restart  techniques   •  This did create some trouble for me, as we will see later7.  Crawl-­‐Store-­‐Validate-­‐Recrawl-­‐Refresh  cycle   o  The  equivalent  of  the  traditional  ETL   o  Validation  stage  &  validation  routines  are  important   •  Cannot  expect  perfect  runs   •  Cannot  manually  look  at  data  either,  when  data  is  at  scale  8.  Have  control  numbers  to  validate  runs  &  monitor  them   o  I still remember control numbers which start with the number of punch cards in the input deck &d then follow that number through the various runs ! o  There will be a separate printout of the control numbers that will be kept in the operations files
  • 7. Twitter Tips – A Baker’s Dozen 9.  Program  defensively     o  more so for a REST-based-Big Data-Analytics systems o  Expect  failures  at  the  transport  layer  &  accommodate  for  them    10.  Have  Erlang-­‐style  supervisors  in  your  pipeline   o  Fail  fast  &  move  on   o  Don’t  linger  and  try  to  fix  errors  that  cannot  be  controlled  at  that  layer   o  A  higher  layer  process  will  circle  back  and  do  incremental  runs  to   correct  missing  spiders  and  crawls   o  Be  aware  of  visibility  &  lack  of  context.  Validate  at  the  lowest  layer  that   has  enough  context  to  take  corrective  actions   o  I have an example in part 211.  Data  will  never  be  perfect   o  Know  your  data  &  accommodate  for  it’s  idiosyncrasies     •  for  example:  0  followers,  protected  users,  0  friends,…  
  • 8. Twitter Tips – A Baker’s Dozen 12.  Check  Point  frequently  (preferably  after  ever  API  call)  &  have  a   re-­‐startable  command  buffer  cache     o  See a MongoDB example in Part 213.  Don’t  bombard  the  URL   o  Wait  a  few  seconds  before  successful  calls.  This  will  end  up  with  a   scalable  system,  eventually   o  I found 10 seconds to be the sweet spot. 5 seconds gave retry error. Was able to work with 5 seconds with wait & retry. Then, the rate limit started kicking in ! 14.  Always  measure  the  elapsed  time  of  your  API  runs  &  processing   o  Kind  of  early  warning  when  something  is  wrong  15.  Develop  incrementally;  don’t  fail  to  check  “cut  &  paste”  errors  
  • 9. Twitter Tips – A Baker’s Dozen 16.  The  Twitter  big  data  pipeline  has  lots  of  opportunities  for  parallelism   o  Leverage  data  parallelism  frameworks  like  MapReduce   o  But  first  :   §  Prototype  as  a  linear  system,     §  Optimize  and  tweak  the  functional  modules  &  cache  strategies,     §  Note  down  stages  and  tasks  that  can  be  parallelized  and     §  Then  parallelize  them   o  For the example project, we will see later, I did not leverage any parallel frameworks, but the opportunities were clearly evident. I will point them out, as we progress through the tutorial17.   Pay  attention  to  handoffs  between  stages   o  They  might  require  transformation  –  for  example  collect  &  store  might  store  a  user  list   as  multiple  arrays,  while  the  model  requires  each  user  to  be  a  document  for   aggregation     o  But resist the urge to overload collect with transform o  i.e let the collect stage store in arrays, but then have an unroll/flatten stage to transform the array to separate documents o  Add transformation as a granular function – of course, with appropriate buffering, caching, checkpoints & restart techniques 18.  Have  a  good  log  management  system  to  capture  and  wade  through   logs    
  • 10. Twitter Tips – A Baker’s Dozen 19.  Understand  the  underlying  network  characteristics  for  the   inference  you  want  to  make   o  Twitter  Network    !=  Facebook  Network  ,    Twitter  Graph  !=  LinkedIn  Graph   o  Twitter  Network  is  more  of  an  Interest  Network   o  So, many of the traditional network mechanisms & mechanics, like network diameter & degrees of separation, might not make sense o  But, others like Cliques and Bipartite Graphs do
  • 11. Twitter Gripes 1.  Need  more  rich  APIs  for  #tags   o  Somewhat  similar  to  users  viz.  followers,  friends  et  al   o  Might  make  sense  to  make  #tags  a  top  level  object  with  it’s  own  semantics  2.  HTTP  Error  Return  is  not  uniform     o  Returns  400  bad  Request  instead  of  420   o  Granted, there is enough information to figure this out3.  Need  an  easier  way  to  get  screen_name  from  user_id  4.  “following”  vs.  “friends_count”  i.e.  “following”  is  a  dummy  variable.   o  There are a few like this, most probably for backward compatibility5.  Parameter  Validation  is  not  uniform   o  Gives  “404  Not  found”  instead  of  “406  Not  Acceptable”  or  “413  Too  Long”  or  “416   Range  Unacceptable”  6.  Overall  more  validation  would  help   o  Granted, it is more of growing pains. Once one comes across a few inconsistencies, the rest is easy to figure out
  • 12. A Fork   &  deep ,NLTK    •   NLP weets into  T ment   4 o  Sen ysis   Anal • Not enough time for both • I chose the Social Graph route
  • 13. A minute about Twitter as platform & it’s evolution blog/ er. com/ tter-­‐ twitt wi ps:/ /dev. nsistent-­‐t htt ring-­‐co e deliv ence   “The micro-blogging service must find the ri expe right balance of running a profitable business and maintaining a robust “.. we want to make sure that the Twitter experience is developers community.” – Chenda, CBS straightforward and easy to understand -- whether you’re on news! Twitter.com or elsewhere on the web”-Michael!My  Wish  &  Hope  •  I  spend  a  lot  of  time  with  Twitter  &  derive  value;  the  platform  is  rich  &  the  APIs  intuitive  •  I  did  like  the  fact  that  tweets  are  part  of  LinkedIn.  I  still  used  Twitter  more  than  LinkedIn   o  I  don’t  think  showing  Tweets  in  LinkedIn  took  anything  away  from  the  Twitter  experience   o  LinkedIn  experience  &  Twitter  experience  are  different  &  distinct.  Showing  tweets  in  LinkedIn  didn’t  change  that  •  I  sincerely  hope  that  the  platform  grows  with  a  rich  developer  eco  system  •  Orthogonally  extensible  platform  is  essential  •  Of  course,  along  with  a  congruent  user  experience  –  “  …  core  Twitter  consumption  experience  through  consistent  tools”  
  • 14. •  For  Hands  on  Today   Setup o  Python  2.7.3   o  easy_install  –v  requests   •  http://docs.python-­‐requests.org/en/latest/user/quickstart/#make-­‐a-­‐ request   o  easy_install  –v  requests-­‐oauth   o  Hands  on  programs  at  https://github.com/xsankar/oscon2012-­‐handson  •  For  advanced  data  science  with  social  graphs   o  easy_install  –v  networkx   o  easy_install  –v  numpy   o  easy_install  –v  nltk     •  Not  for  this  tutorial,  but  good  for  sentiment  analysis  et  al   o  Mongodb     •  I  used  MongoDB  in  AWS  m2.xlarge,  RAID  10  X  8  X  15  GB  EBS   o  graphviz  -­‐  http://www.graphviz.org/;  easy_install  pygraphviz   o  easy_install  pydot  
  • 15. Thanks To these Giants …
  • 16. Problem Domain For this tutorial •  Data  Science  (trends,  analytics  et  al)  on  Social  Networks  as   observed  by  Twitter  primitives   o  Not  for  Twitter  based  apps  for  real  time  tweets   o  Not  web  sites  with  real  time  tweets  •  By  looking  at  the  domain  in  aggregate  to  derive  inferences  &   actionable  recommendations  •  Which  also  means,  you  need  to  be  deliberate  &  systemic  (  i.e.   not  look  at  a  fluctuation  as  a  trend  but  dig  deeper  before   pronouncing  a  trend)  
  • 17. Agenda I.  Mechanics  :  Twitter  API  (1:30  PM  -­‐  3:00  PM)     o  Essential  Fundamentals  (Rate  Limit,  HTTP  Codes  et  al)   o  Objects   o  API   o  Hands-­‐on  (2:45  PM  -­‐  3:00  PM)  II.  Break  (3:00  PM  -­‐  3:30  PM)  III.  Twitter  Social  Graph  Analysis  (3:30  PM  -­‐  5:00  PM)   o  Underlying  Concepts   o  Social  Graph  Analysis  of  @clouderati   §  Stages,  Strategies  &  Tasks   §  Code  Walk  thru    
  • 18. Open  This  First
  • 19. Twi5er  API  :  Read  These  First •  Using  Twitter  Brand   o  New  logo  &  associated  guidelines  :  https://twitter.com/about/logos   o  Twitter  Rules  :   https://support.twitter.com/groups/33-­‐report-­‐a-­‐violation/topics/121-­‐guidelines-­‐ best-­‐practices/articles/18311-­‐the-­‐twitter-­‐rules   o  Developer  Rules  of  the  road  https://dev.twitter.com/terms/api-­‐terms  •  Read  These  Links  First   1.  https://dev.twitter.com/docs/things-­‐every-­‐developer-­‐should-­‐know   2.  https://dev.twitter.com/docs/faq   3.  Field  Guide  to  Objects  https://dev.twitter.com/docs/platform-­‐objects   4.  Security  https://dev.twitter.com/docs/security-­‐best-­‐practices   5.  Media  Best  Practices  :  https://dev.twitter.com/media   6.  Consolidates  Page  :  https://dev.twitter.com/docs   7.  Streaming  APIs  https://dev.twitter.com/docs/streaming-­‐apis   8.  How  to  Appeal  (Not  that  you  all  would  need  it  !)  https://support.twitter.com/ articles/72585  •  Only  One  version  of  Twitter  APIs  
  • 20. API  Status  Page •  https://dev.twitter.com/status  •  https://dev.twitter.com/issues  •  https://dev.twitter.com/discussions  
  • 21. h5ps://dev.twi5er.com/status http://www.buzzfeed.com/tommywilhelm/google-­‐users-­‐being-­‐total-­‐dicks-­‐about-­‐the-­‐twitter  
  • 22. Open  This  First •  Install  pre-­‐req  as  per  the  setup  slide  •  Run     o  oscon2012_open_this_first.py   o  To  test  connectivity  –  “canary  query”  •  Run   o  oscon2012_rate_limit_status.py   o  Use  http://www.epochconverter.com  to  check  reset_time  •  Formats  xml,  json,  atom  &  rss  
  • 23. Twitter  API   Near-realtime, High Volume Follow users,Core Data, REST   Streaming   topics, dataCore Twitter mining Objects Public  Streams   Seach & User  Streams   Trend Twitter   Twitter   Site  Streams   REST   Search   Firehose   Build  Profile   Keywords   Create/Post  Tweets   Specific  User   Reply   Trends   Favorite,  Re-­‐tweet   Rate  Limit  :     Rate  Limit  :  150/350        Complexity  &  Frequency  
  • 24. Rate  Limit
  • 25. Rate  Limits •  By  API  type  &  Authentication  Mode   API No authC authC Error REST   150/hr   350/hr   400  Search   Complexity  &   -­‐N/A-­‐   420   Frequency  Streaming   Upto  1%  Fire  hose   none   none  
  • 26. Rate  Limit  Header •  {  •  "status":  "200  OK",    •     "vary":  "Accept-­‐Encoding",    •     "x-­‐frame-­‐options":  "SAMEORIGIN",    •     "x-­‐mid":  "8e775a9323c45f2a541eeb4d2d1eb9b468w81c6",    •     "x-­‐ratelimit-­‐class":  "api",    •     "x-­‐ratelimit-­‐limit":  "150",    •     "x-­‐ratelimit-­‐remaining":  "149",    •     "x-­‐ratelimit-­‐reset":  "1340467358",    •     "x-­‐runtime":  "0.04144",    •     "x-­‐transaction":  "2b49ac31cf8709af",    •     "x-­‐transaction-­‐mask":   "a6183ffa5f8ca943ff1b53b5644ef114df9d6bba"  •  }  
  • 27. Rate  Limit-­‐‑ed  Header •  {  •     "cache-­‐control":  "no-­‐cache,  max-­‐age=300",    •     "content-­‐encoding":  "gzip",    •     "content-­‐length":  "150",    •     "content-­‐type":  "application/json;  charset=utf-­‐8",    •     "date":  "Wed,  04  Jul  2012  00:48:25  GMT",    •     "expires":  "Wed,  04  Jul  2012  00:53:25  GMT",    •     "server":  "tfe",    •     ”…  •     "status":  "400  Bad  Request",    •     "vary":  "Accept-­‐Encoding",    •     "x-­‐ratelimit-­‐class":  "api",    •     "x-­‐ratelimit-­‐limit":  "150",    •     "x-­‐ratelimit-­‐remaining":  "0",    •     "x-­‐ratelimit-­‐reset":  "1341363230",    •     "x-­‐runtime":  "0.01126"  •  }  
  • 28. Rate  Limit  Example •  Run   o  oscon2012_rate_limit_02.py  •  It  iterates  through  a  list  to  get  followers    •  List  is  2072  long  
  • 29. •  {  •     …  •     "date":  "Wed,  04  Jul  2012  00:54:16  GMT",    •  "status":  "200  OK",    •     "vary":  "Accept-­‐Encoding",    •     "x-­‐frame-­‐options":  "SAMEORIGIN",    •     "x-­‐mid":  "f31c7278ef8b6e28571166d359132f152289c3b8",    •     "x-­‐ratelimit-­‐class":  "api",    •     "x-­‐ratelimit-­‐limit":  "150",     Last  time,  it  gave  me  5  min.   Now  the  reset  timer  is  1  •     "x-­‐ratelimit-­‐remaining":  "147",     hour    •     "x-­‐ratelimit-­‐reset":  "1341366831",     150  calls,  not  authenticated  •     "x-­‐runtime":  "0.02768",    •     "x-­‐transaction":  "f1bafd60112dddeb",    •     "x-­‐transaction-­‐mask":  "a6183ffa5f8ca943ff1b53b5644ef11417281dbc"  •  }  
  • 30. •  {  •     "cache-­‐control":  "no-­‐cache,  max-­‐age=300",    •     "content-­‐encoding":  "gzip",    •     "content-­‐type":  "application/json;  charset=utf-­‐8",    •     "date":  "Wed,  04  Jul  2012  00:55:04  GMT",     And  Rate  Limit  kicked-­‐‑in •  …  •  "status":  "400  Bad  Request",    •     "transfer-­‐encoding":  "chunked",    •     "vary":  "Accept-­‐Encoding",    •     "x-­‐ratelimit-­‐class":  "api",    •     "x-­‐ratelimit-­‐limit":  "150",    •     "x-­‐ratelimit-­‐remaining":  "0",    •     "x-­‐ratelimit-­‐reset":  "1341366831",    •     "x-­‐runtime":  "0.01342"  •  }  
  • 31. API  with  OAuth •  {  •     …  •     "date":  "Wed,  04  Jul  2012  01:32:01  GMT",    •     "etag":  ""dd419c02ed00fc6b2a825cc27wbe040"",    •     "expires":  "Tue,  31  Mar  1981  05:00:00  GMT",    •     "last-­‐modified":  "Wed,  04  Jul  2012  01:32:01  GMT",    •     "pragma":  "no-­‐cache",    •     "server":  "tfe",    •  …  •  "status":  "200  OK",    •     "vary":  "Accept-­‐Encoding",    •     "x-­‐access-­‐level":  "read",    •     "x-­‐frame-­‐options":  "SAMEORIGIN",    •     "x-­‐mid":  "5bbb87c04fa43c43bc9d7482bc62633a1ece381c",    •     "x-­‐ratelimit-­‐class":  "api_identified",    •     "x-­‐ratelimit-­‐limit":  "350",    •     "x-­‐ratelimit-­‐remaining":  "349",    •     "x-­‐ratelimit-­‐reset":  "1341369121",    •     "x-­‐runtime":  "0.05539",     OAuth  • •     "x-­‐transaction":  "9f8508fe4c73a407",        "x-­‐transaction-­‐mask":  "a6183ffa5f8ca943ff1b53b5644ef11417281dbc"   “api-­‐identified”  •  }   1  hr  reset   350  calls  
  • 32. •  {  •     …  •     "date":  "Thu,  05  Jul  2012  14:56:05  GMT",    •  …  •     "x-­‐ratelimit-­‐class":  "api_identified",    •     "x-­‐ratelimit-­‐limit":  "350",    •     "x-­‐ratelimit-­‐remaining":  "133",    •     "x-­‐ratelimit-­‐reset":  "1341500165",    •   …   Rate  Limit  resets  during  •  }   consecutive  calls •  ********  2416  •  {   +1  •  …   hour •     "date":  "Thu,  05  Jul  2012  14:56:18  GMT",    •  …  •     "status":  "200  OK",    •     ….  •     "x-­‐ratelimit-­‐class":  "api_identified",    •     "x-­‐ratelimit-­‐limit":  "350",    •     "x-­‐ratelimit-­‐remaining":  "349",    •     "x-­‐ratelimit-­‐reset":  "1341503776",    •  ********  2417  
  • 33. Unexplained  Errors •  Traceback  (most  recent  call  last):  •     File  "oscon2012_get_user_info_01.py",  line  39,  in  <module>  •         r  =  client.get(url,  params=payload)  •     File  "build/bdist.macosx-­‐10.6-­‐intel/egg/requests/sessions.py",  line  244,  in  get  •     File  "build/bdist.macosx-­‐10.6-­‐intel/egg/requests/sessions.py",  line  230,  in  request  •     File  "build/bdist.macosx-­‐10.6-­‐intel/egg/requests/models.py",  line  609,  in  send  •  requests.exceptions.ConnectionError:  HTTPSConnectionPool(host=api.twitter.com,  port=443):  Max   retries  exceeded  with  url:  /1/users/lookup.json? user_id=237552390%2C101237516%2C208192270%2C340183853%2C221203257%2C15254297%2C44 614426%2C617136931%2C415810340%2C76071717%2C17351462%2C574253%2C35048243%2C38854 7381%2C254329657%2C65585979%2C253580293%2C392741693%2C126403390%2C300467007%2C8 962882%2C21545799%2C15254346%2C141083469%2C340312913%2C44614485%2C600359770%2C While  trying  to  get  details  of  1,000,000  users,  I  get  this  error  –   17351519%2C38323042%2C21545828%2C86557546%2C90751854%2C128500592%2C115917681%2C usually  10-­‐6  AM  PST   42517364%2C34128760%2C15254397%2C453559166%2C92849025%2C600359811%2C17351556%2C 8962952%2C296038349%2C325503810%2C122209166%2C123827693%2C59294611%2C19448725%   2C21545881%2C17351581%2C130468677%2C80266144%2C15254434%2C84680859%2C65586084% Got  around  by  “Trap  &  wait  5  seconds”   2C19448741%2C15254438%2C214483879%2C48808878%2C88654768%2C15474846%2C48808887%   2C334021563%2C60214090%2C134792126%2C15254464%2C558416833%2C138986435%2C2648155 56%2C63488965%2C17222476%2C537445328%2C97854214%2C255598755%2C65586132%2C362260 Night  Runs  are  relatively  error  free   09%2C187220954%2C257346383%2C15254493%2C554222558%2C302564320%2C59165520%2C446 14626%2C76071907%2C80266213%2C325503825%2C403227628%2C20368210%2C17351666%2C886 54836%2C340313077%2C151569400%2C302564345%2C118014971%2C11060222%2C233229141%2C 13727232%2C199803906%2C220435108%2C268531201  
  • 34. •  {  • •   …      "date":  "Fri,  06  Jul  2012  03:41:09  GMT",     A Day in the life of•     "expires":  "Fri,  06  Jul  2012  03:46:09  GMT",     Twitter Rate Limit•     "server":  "tfe",    •     "set-­‐cookie":  "dnt=;  domain=.twitter.com;  path=/;  expires=Thu,  01-­‐Jan-­‐1970  00:00:00  GMT",    •     "status":  "400  Bad  Request",    •     "vary":  "Accept-­‐Encoding",    •     "x-­‐ratelimit-­‐class":  "api_identified",    •     "x-­‐ratelimit-­‐limit":  "350",    •     "x-­‐ratelimit-­‐remaining":  "0",     Missed  by  4  min! •     "x-­‐ratelimit-­‐reset":  "1341546334",    •     "x-­‐runtime":  "0.01918"  •  }  •  Error,  sleeping  •  {  •   …  •   "date":  "Fri,  06  Jul  2012  03:46:12  GMT",    •   …  •   "status":  "200  OK",    •   …  •   "x-­‐ratelimit-­‐class":  "api_identified",    •     "x-­‐ratelimit-­‐limit":  "350",    •     "x-­‐ratelimit-­‐remaining":  "349",     OK  after  5  min  sleep •   …  
  • 35. Strategies I  have  no  exotic  strategies,  so  far  !  1.  Obvious  :    Track  elapsed  time  &  sleep  when  rate  limit  kicks  in  2.  Combine  authenticated  &  non-­‐authenticated  calls  3.  Use  multiple  API  types  4.  Cache  5.  Store  &  get  only  what  is  needed  6.  Checkpoint  &  buffer  request  commands  7.  Distributed  data  parallelism  –  for  example  AWS  instances  http://www.epochconverter.com/  <-­‐  useful  to  debug  the  timer Pl share your tips and tricks for conserving the Rate Limit
  • 36. Authentication
  • 37. Authentication •  Three  modes   o  Anonymous   o  HTTP  Basic  Auth   o  OAuth  •  As  of  Aug  31,  2010,  only  Anonymous  or  OAuth  are   supported  •   OAuth  enables  the  user  to  authorize  an  application   without  sharing  credentials  •  Also  has  the  ability  to  revoke  •  Twitter  supports  OAuth  1.0a  •  OAuth  2.0  is  the  new  standard,  much  simpler   o  No  timeframe  for  Twitter  support,  yet      
  • 38. OAuth  Pragmatics •  Helpful  Links   o  https://dev.twitter.com/docs/auth/oauth   o  https://dev.twitter.com/docs/auth/moving-­‐from-­‐basic-­‐auth-­‐to-­‐oauth   o  https://dev.twitter.com/docs/auth/oauth/single-­‐user-­‐with-­‐examples   o  http://blog.andydenmark.com/2009/03/how-­‐to-­‐build-­‐oauth-­‐consumer.html  •  Discussion  on  OAuth  internal  mechanisms  is  better  left  for   another  day  •  For  headless  applications  to  get  OAuth  token,  go  to  https:// dev.twitter.com/apps  •   Create  an  application  &  get  four  credential  pieces   o  Consumer  Key,  Consumer  Secret,  Access  Token  &  Access  Token  Secret  •  All  the  frameworks  have  support  for  OAuth.  So  plug  –in   these  values  &  use  the  framework’s  calls  •  I  used  request-­‐oauth  library  like  so:  
  • 39. request-­‐‑oauth def  get_oauth_client():   Get  client  using  the        consumer_key  =  "5dbf348aa966c5f7f07e8ce2ba5e7a3badc234bc"   token,  key  &  secret  from          consumer_secret  =  "fceb3aedb960374e74f559caeabab3562efe97b4"   dev.twitter.com/apps          access_token  =  "df919acd38722bc0bd553651c80674fab2b465086782Ls"          access_token_secret  =  "1370adbe858f9d726a43211afea2b2d9928ed878"          header_auth  =  True          oauth_hook  =  OAuthHook(access_token,  access_token_secret,  consumer_key,  consumer_secret,  header_auth)          client  =  requests.session(hooks={pre_request:  oauth_hook})          return  client   Use  the  client  instead   def  get_followers(user_id):   of  requests                                      url  =  https://api.twitter.com/1/followers/ids.json’                                      payload={"user_id":user_id}  #  if  cursor  is  needed  {"cursor":-­‐1,"user_id":scr_name}                                      r  =  requests.get(url,  params=payload)   def  get_followers_with_oauth(user_id,client):                                        url  =  https://api.twitter.com/1/followers/ids.json                                        payload={"user_id":user_id}  #  if  cursor  is  needed  {"cursor":-­‐1,"user_id":scr_name}                                          r  =  client.get(url,  params=payload)  Ref:  h5p://pypi.python.org/pypi/requests-­‐‑oauth
  • 40. OAuth  Authorize  screen •  The  user   authenticates  with   Twitter  &  grants   access  to  Forbes   Social   •  Forbes  social   doesn’t  have  the   users  credentials,   but  uses  OAuth  to   access  the  user’s   account  
  • 41. HTTP  Status   Codes
  • 42. HTTP  status  Codes •  0  Never  made  it  to  Twitter  Servers  -­‐   •  404  Not  Found   Library  error   •  406  Not  Acceptable   •  200  OK   •  413  Too  Long   •  304  Not  Modified   •  416  Range  Unacceptable   •  400  Bad  Request   •  420  Enhance  Your  Calm   o  Check  error  message  for  explanation   o  Rate  Limited   o  REST  Rate  Limit  !     •  500  Internal  Server  Error   •  401  UnAuthorized   •  502  Bad  Gateway     o  Beware  –  you  could  get  this  for  other   o  Down  for  maintenance   reasons  as  well.       •  503  Service  Unavailable   •  403  Forbidden   o  Overloaded  “Fail  whale”   o  Hit  Update  Limit  (>  max  Tweets/day,   •  504  Gateway  Timeout   following  too  many  people)   o  Overloaded  h5ps://dev.twi5er.com/docs/error-­‐‑codes-­‐‑responses
  • 43. HTTP  Status  Code  -­‐‑  Example •  {  •     "cache-­‐control":  "no-­‐cache,  max-­‐age=300",    •     "content-­‐encoding":  "gzip",    •     "content-­‐length":  "91",    •     "content-­‐type":  "application/json;  charset=utf-­‐8",    •     "date":  "Sat,  23  Jun  2012  00:06:56  GMT",    •     "expires":  "Sat,  23  Jun  2012  00:11:56  GMT",    •     "server":  "tfe",    •   …  •     "status":  "401  Unauthorized",    •     "vary":  "Accept-­‐Encoding",    •     "www-­‐authenticate":  "OAuth  realm="https://api.twitter.com"",    • •     "x-­‐ratelimit-­‐class":  "api",        "x-­‐ratelimit-­‐limit":  "0",     Detailed  error  •     "x-­‐ratelimit-­‐remaining":  "0",     message    in  JSON  !  •     "x-­‐ratelimit-­‐reset":  "1340413616",    •     "x-­‐runtime":  "0.01997"   I  like  this  •  }  •  {  •     "errors":  [  •         {  •             "code":  53,    •             "message":  "Basic  authentication  is  not  supported"  •         }  •     ]  •  }  
  • 44. HTTP  Status  Code  –  Confusing  Example •  {   •  GET  https://api.twitter.com/1/users/lookup.json?•  …   screen_nme=twitterapi,twitter&include_entities=•     "pragma":  "no-­‐cache",     true  •     "server":  "tfe",    •   …     •  Spelling  Mistake  •     "status":  "404  Not  Found",     o  Should  be  screen_name  •     …   •  But  confusing  error  !  •  }  •  {   •  Should  be  406  Not  Acceptable  or  413  Too  Long  ,  •     "errors":  [   showing  parameter  error  •         {  •             "code":  34,    •             "message":  "Sorry,  that  page  does  not  exist"  •         }  •     ]  •  }  
  • 45. HTTP  Status  Code  -­‐‑  Example •  {  •     "cache-­‐control":  "no-­‐cache,  no-­‐store,  must-­‐revalidate,  pre-­‐check=0,  post-­‐check=0",    •     "content-­‐encoding":  "gzip",    •     "content-­‐length":  "112",    •     "content-­‐type":  "application/json;charset=utf-­‐8",     Sometimes,  the  errors  are  •     "date":  "Sat,  23  Jun  2012  01:23:47  GMT",     not  correct.  I  got  this  error  •     "expires":  "Tue,  31  Mar  1981  05:00:00  GMT",    •  …   for  user_timeline.json  w/  •     "status":  "401  Unauthorized",     user_id=20,15,12  •     "www-­‐authenticate":  "OAuth  realm="https://api.twitter.com"",     Clearly  a  parameter  error  •     "x-­‐frame-­‐options":  "SAMEORIGIN",    •     "x-­‐ratelimit-­‐class":  "api",     (i.e.  more  parameters)  •     "x-­‐ratelimit-­‐limit":  "150",    •     "x-­‐ratelimit-­‐remaining":  "147",    •     "x-­‐ratelimit-­‐reset":  "1340417742",    •     "x-­‐transaction":  "d545a806f9c72b98"  •  }  •  {  •     "error":  "Not  authorized",    •     "request":  "/1/statuses/user_timeline.json?user_id=12%2C15%2C20"  •  }  
  • 46. Objects
  • 47. Followers   Twitter  Platform   Friends   Are Followed By Objects   Follow Users   Status Update @ user_mentions   Entities   embed urls   Temporally Tweets   embe d Ordered media   TimeLine   # Places   hashtags  h5ps://dev.twi5er.com/docs/platform-­‐‑objects
  • 48. Tweets •  A.k.a  Status  Updates   •  Interesting  fields   o  Coordinates  <-­‐  geo  location   o  created_at   o  entities  (will  see  later)   o  Id,  id_str   o  possibly  sensitive   o  user  (will  see  later)   •  perspectival  attributes  embedded  within  a  child  object  of  an  unlike  parent  –   hard  to  maintain  at  scale   •  https://dev.twitter.com/docs/faq#6981   o  withheld_in_countries     •  https://dev.twitter.com/blog/new-­‐withheld-­‐content-­‐fields-­‐api-­‐responses  h5ps://dev.twi5er.com/docs/platform-­‐‑objects/tweets
  • 49. A  word  about  id,  id_str •  June  1,  2010   o  Snowflake  the  id  generator  service   o  “The  full  ID  is  composed  of  a  timestamp,   a  worker  number,  and  a  sequence   number”   o  Had  problems  with  JavaScript  to  handle   numbers  >  53  bits   o  “id”:819797   o  “id_str”:”819797”  h5p://engineering.twi5er.com/2010/06/announcing-­‐‑snowflake.html h5ps://groups.google.com/forum/?fromgroups#!topic/twi5er-­‐‑development-­‐‑talk/ahbvo3VTIYI h5ps://dev.twi5er.com/docs/twi5er-­‐‑ids-­‐‑json-­‐‑and-­‐‑snowflake
  • 50. Tweets  -­‐‑  example •  Let  us  run  oscon2012-­‐tweets.py  •  Example  of  tweet   o  coordinates   o  id     o  id_str  
  • 51. Users •  followers_count   •  geo_enabled   •  Id,  Id_str   •  name,  screen_name   •  Protected   •  status,  statuses_count   •  withheld_in_countries  h5ps://dev.twi5er.com/docs/platform-­‐‑objects/users
  • 52. Users  –  Let  us  run  some  examples •  Run     o  oscon_2012_users.py   •  Lookup  users  by  screen_name   o  oscon12_first_20_ids.py   •  Lookup  users  by  user_id  •  Inspect  the  results   o  id,  name,  status,  status_count,  protected,  followers   (for  top  10  followers),  withheld  users  •  Can  use  information  for  customizing   the  user’s  screen  in  your  web  app  
  • 53. Entities •  Metadata  &  Contextual  Information   •  You  can  parse  them,  but  Entities   parse  them  out  as  structured  data   •  REST  API/Search  API  –   include_entities=1   •  Streaming  API  –  included  by  default   •  hashtags,  media,  urls,   user_mentions  h5ps://dev.twi5er.com/docs/platform-­‐‑objects/entities h5ps://dev.twi5er.com/docs/tweet-­‐‑entities h5ps://dev.twi5er.com/docs/tco-­‐‑url-­‐‑wrapper
  • 54. Entities •  Run     o  oscon2012_entities.py  •  Inspect  hashtags,  urls  et  al    
  • 55. Places •  attributes   •  bounding_box   •  Id  (as  a  string!)   •  country   •  name  h5ps://dev.twi5er.com/docs/platform-­‐‑objects/places h5ps://dev.twi5er.com/docs/about-­‐‑geo-­‐‑place-­‐‑a5ributes
  • 56. Places •  Can  search  for  tweets  near  a  place  like  so:  •  Get  latlong  of  conv  center  [45.52929,-­‐122.66289]   o  Tweets  near  that  place  •  Tweets  near  San  Jose  [37.395715,-­‐122.102308]  •  We  will  not  see  further  here.  But  very  useful  
  • 57. Timelines •  Collections  of  tweets  ordered  by  time   •  Use  max_id  &  since_id  for  navigation  h5ps://dev.twi5er.com/docs/working-­‐‑with-­‐‑timelines
  • 58. Other  Objects  &  APIs •  Lists  •  Notifications  •  Friendships/exists  to  see  if  one  follows   the  other  
  • 59. Followers   Twitter  Platform   Friends   Are Followed By Objects   Follow Users   Status Update @ user_mentions   Entities   embed urls   Temporally Tweets   embe d Ordered media   TimeLine   # Places   hashtags  h5ps://dev.twi5er.com/docs/platform-­‐‑objects
  • 60. Hands-­‐‑on  Exercise  (15  min) •  Setup  environment  –  slide  #14  •  Sanity  Check  Environment  &  Libraries   o  oscon2012_open_this_first.py   o  oscon2012_rate_limit_status.py  •  Get  objects  (show  calls)   o  Lookup  users  by  screen_name    -­‐  oscon12_users.py   o  Lookup  users  by  id  -­‐  oscon12_first_20_ids.py   o  Lookup  tweets  -­‐  oscon12_tweets.py   o  Get  entities  -­‐  oscon12_entities.py  •  Inspect  the  results  •  Explore  a  little  bit  •  Discussion  
  • 61. Twi5er  APIs
  • 62. Twitter  API   Near-realtime, High Volume Follow users,Core Data, REST   Streaming   topics, dataCore Twitter mining Objects Public  Streams Seach & User  Streams Trend Twitter   Twitter   Site  Streams REST   Search   Firehose Build  Profile Keywords Create/Post  Tweets Specific  User Reply Trends Favorite,  Re-­‐‑tweet Rate  Limit  :   Rate  Limit  :  150/350      Complexity  &  Frequency
  • 63. Twi5er  REST  API •  https://dev.twitter.com/docs/api  •  What  we  were  doing  were  the  REST  API  •  Request-­‐Response  •  Anonymous  or  OAuth  •  Rate  Limited  :   o  150/350  
  • 64. Twi5er  Trends •  oscon2012-­‐trends.py  •  Trends/weekly,  Trends/monthly  •  Let  us  run  some  examples   o  oscon2012_trends_daily.py   o  oscon2012_trends_weekly.py  •  Trends  &  hashtags   o  #hashtag  euro2012   o  http://hashtags.org/euro2012   o  http://sproutsocial.com/insights/2011/08/twitter-­‐hashtags/   o  http://blog.twitter.com/2012/06/euro-­‐2012-­‐follow-­‐all-­‐action-­‐on-­‐pitch.html   o  Top  10  :  http://twittercounter.com/pages/100,  http://twitaholic.com/  
  • 65. Brand  Rank  w/  Twi5er •  Walk  Through  &  results  of  following   o  oscon2012_brand_01.py  •  Followed  10  user-­‐brands  for  a  few  days  to  find   growth  •  Brand  Rank     o  Growth  of  a  brand  w.r.t  the  industry   o  Surge  in  popularity  –  could  be  due  to  –ve  or  +ve  buzz.  Need  to  understand  &   correlate  using  Twitter  APIs  &  metrics  •  API  :  url=https://api.twitter.com/1/users/ lookup.json  •  payload={"screen_name":"miamiheat,okcthunder,n ba,uefacom,lovelaliga,FOXSoccer,oscon,clouderati, googleio,OReillyMedia"}  
  • 66. Brand  Rank  w/  Twi5er Clouderati   is  very   stable
  • 67. Brand  Rank  w/  Twi5er   Tech  Brands •  Google  I/O  showed  a  spike  on  6/27-­‐   6/28   •  OReillyMedia  shares  some  spike   •  Looking  at  a  few  days  worth  of   data,  our  best  inference  is  that   “oscon  doesn’t  track  with  googleio”   •  “Clouderati  doesn’t  track  at  all”  
  • 68. Brand  Rank  w/  Twi5er   World  of  Soccer •  FOXSoccer,UEFAcom   track  each  other     The  numbers  seldom   decrease.  So  calculating   –ve  velocity  will  not   work OTOH,  if  you  see  a  –ve   velocity,  investigate
  • 69. Brand  Rank  w/  Twi5er   World  of  Basketball •  NBA,  MiamiHeat,  okcthunder  track  each  other  •  Used  %  than  absolute  numbers  to  compare  •  The  hike  on  7/6  to  7/10  is  interesting.      
  • 70. Brand  Rank  w/  Twi5er   Rising  Tide  … •  For  some  reason,  all  numbers  are  going  up  7/6  thru   7/10  –  except  for  clouderati!   •  Is  a  rising  (Twitter)  tide  lifting  all  (well,  almost  all)  ?  
  • 71. Trivia  :  Search  API •  Search(search.twitter.com)   o  Built  by  Summize  which  was  acquired  by  Twitter  in   2008   o  Summize  described  itself  as  “sentiment  mining”  
  • 72. Search  API •  Very  simple     o  GET  http://search.twitter.com/search.json?q=<blah>   •  Based  on  a  search  criteria   •  “The Twitter Search API is a dedicated API for running searches against the real-time index of recent Tweets” •  Recent  =  Last  6-­‐9  days  worth  of  tweets   •  Anonymous  Call   •  Rate  Limit   o  Not  No.  of  calls/hour,  but  Complexity  &  Frequency  h5ps://dev.twi5er.com/docs/using-­‐‑search h5ps://dev.twi5er.com/docs/api/1/get/search
  • 73. Search  API •  Filters   o  Search  URL  encoded   o  @  =  %40,  #=%23   o   emoticons    :)  and  :(,   o  http://search.twitter.com/search.atom?q=sometimes+%3A)   o  http://search.twitter.com/search.atom?q=sometimes+%3A(  •  Location  Filters,  date  filters  •  Content  searches  
  • 74. Streaming  API •  Not  request  response;  but  stream  •  Twitter  frameworks  have  the  support  •  Rate  Limit  :  Upto  1%  •  Stall  warning  if  the  client  is  falling  behind  •  Good  Documentation  Links   o  https://dev.twitter.com/docs/streaming-­‐apis/connecting   o  https://dev.twitter.com/docs/streaming-­‐apis/parameters   o  https://dev.twitter.com/docs/streaming-­‐apis/processing  
  • 75. Firehose •  ~  400  million  public  tweets/day  •  If  you  are  working  with  Twitter  firehose,  I  envy  you  !  •  If  you  hit  real  limits,  then  explore  the  firehose  route  •  AFAIK,  it  is  not  cheap,  but  worth  it  
  • 76. API  Best  Practices 1.  Use  JSON   2.  Use  user_id  than  screen_name   o  User_id  is  constant  while  screen_name  can  change   3.  max_id  and  since_id   o  For  example  direct  messages,  if  you  have  last  message  use   since_id  for  search   o  max_id  how  far  to  go  back   4.  Cache  as  much  as  you  can   5.  Set  the  User-­‐Agent  header  for  debugging   I have listed a few good blogs that have API best practices, in the reference section, at the end of this presentationThese are gathered from various books, blogs & other media, I used for this tutorial. See Reference(at the end) for the sources
  • 77. Twitter  API   Near-realtime, High Volume Follow users,Core Data, REST   Streaming   topics, dataCore Twitter mining Objects Public  Streams Seach & User  Streams Trend Twitter   Twitter   Site  Streams REST   Search   Firehose Build  Profile Questions  ?   Keywords Create/Post  Tweets Specific  User Reply Trends Favorite,  Re-­‐‑tweet Rate  Limit  :   Rate  Limit  :  150/350      Complexity  &  Frequency
  • 78. Part II SNA Part IITwitter Network Analysis
  • 79. 2.  Store   3.  Transform  &     1.  Collect   Analyze   the Validate Dataset & . Keep don’t Tip: 3 simple; re-crawl/refresh a schem afrai d to be for mMost  important  &   transthe  ugliest  slide  in   this  deck  !   as lem ent , 1. Imp ipeline 4.  Model   Tip: age d p nolith 5.  Predict,   &     a st r a mo Reason   neve Recommend  &   Visualize  
  • 80. Trivia •  Social  Network  Analysis  originated  as  Sociometry  &   the  social  network  was  called  a  sociogram  •  Back  then,  Facebook  was  called  SocioBinder!  •  Jacob  Levi  Morano,  is  considered  the  originator   o  NYTimes,  April  3,  1933,  P.  17  
  • 81. Twi5er  Networks-­‐‑Definitions •  Nodes   o  Users   o  #tags  •  Edges   o  Follows   o  Friends   o  @mentions   o  #tags  •  Directed  
  • 82. Twi5er  Networks-­‐‑Definitions •  In-­‐degree   o  Followers  •  Out-­‐Degree   o  Friends/Follow  •  Centrality  Measures  •  Hubs  &  Authorities   o  Hubs/Directories  tell  us  where   Authorities  are   o  “Of  Mortals  &  Celebrities”  is   more  “Twitter-­‐style”  
  • 83. Twi5er  Networks-­‐‑Properties M•  Concepts  From  Citation   N Networks   K J o  Cocitation   L   I •  Common  papers  that  cite  a  paper   A •  Common  Followers   B G o  C  &  G  (Followed  by  F  &  H)   C H o  Bibliographic  Coupling   •  Cite  the  same  papers   D F   •  Common  Friends  (i.e.  follow  same   E person)   o  D,  E,  F  &  H  
  • 84. Twi5er  Networks-­‐‑Properties •  Concepts  From  Citation  Networks   M o  Cocitation   N •  Common  papers  that  cite  a  paper   K •  Common  Followers   J   L   o  C  &  G  (Followed  by  F  &  H)   I   o  Bibliographic  Coupling   A •  Cite  the  same  papers   B G •  Common  Friends    (i.e.  follow  same  person)   o  D,  E,  F  &  H  follow  C   o  H  &  F  follow  C  &  G   H C •  So  H  &  F  have  high  coupling   D •  Hence,  if  H  follows  A,  we  can   F   recommend  F  to  follow  A   E
  • 85. Twi5er  Networks-­‐‑Properties •  Bipartite/Affiliation  Networks   o  Two  disjoint  subsets   o  The  bipartite  concept  is  very  relevant  to  Twitter  social  graph   o  Membership  in  Lists     •  lists  vs.  users  bipartite  graph   o  Common  #Tags  in  Tweets     •  #tags  vs.  members  bipartite  graph   o  @mention  together   •  ?  Can  this  be  a  bipartite  graph   •  ?  How  would  we  fold  this  ?  
  • 86. Other  Metrics  &  Mechanisms •  Kronecker  Graphs  Models   o  Kronecker  product  is  a  way  of  generating  self-­‐similar  matrices   o  Prof.Leskovec  et  al  define  the  Kronecker  product  of  two  graphs  as  the  Kronecker  product  of   their  adjacency  matrices   o  Application  :  Generating  models  for  analysis,  prediction,  anomaly  detection  et  al   •  Erdos-­‐Renyl  Random  Graphs   o  Easy  to  build  a  Gn,p  graph   o  Assumes  equal  likelihood  of  edges  between  two  nodes   o  In a Twitter social network, we can create a more realistic expected distribution (adding the “social reality” dimension) by inspecting the #tags & @mentions •  Network  Diameter   •  Weak  Ties   •  Follower  velocity  (+ve  &  –ve),  Association  strength   o  Unfollow  not  a  reliable  measure   o  But  an  interesting  property  to  investigate  when  it  happens   Not covered here, but potential for an encore !Ref:  Jure  Leskovec:  Kronecker  Graphs,  Random  Graphs
  • 87. Twi5er  Networks-­‐‑Properties •  Twitter != LinkedIn, Twitter != Facebook•  Twitter Network == Interest Network•  Be  cognizant  of  the  above  when  you  apply  traditional  network   properties  to  Twitter    •  For  example,     o  Six  degrees  of  separation  doesnt  make  sense  (most  of  the  time)  in   Twitter  –  except  may  be  for  Cliques   o  Is  diameter  a  reliable  measure  for  a  Twitter  Network  ?   •  Probably  not   o  Do  cut  sets  make  sense  ?     •  Probably  not   o  But  citation  network  principles  do  apply;  we  can  learn  from  cliques   o  Bipartite  graphs  do  make  sense  
  • 88. Cliques  (1  of  2) •  “Maximal  subset  of  the  vertices  in  an   undirected  network  such  that  every  member   of  the  set  is  connected  by  an  edge  to  every   other”  •  Cohesive  subgroup,  closely  connected  •  Near-­‐cliques  than  a  perfect  clique  (k-­‐plex  i.e.   connected  to  at  least  n-­‐k  others)  •  k-­‐plex  clique  to  discover  sub  groups  in  a  sparse   network;  1-­‐plex  being  the  perfect  clique   Ref:  Networks,  An  Introduction-­‐‑Newman
  • 89. Cliques  (2  of  2) •  k-­‐core  –  at  least  k  others  in  the  subset;   (n-­‐k)-­‐plex  •  k-­‐clique  –  no  more  than  k  distance  away   o  Path  inside  or  outside  the  subset   o  k-­‐clan  or  k-­‐club  (path  inside  the  subset)  •  We  will  apply  k-­‐plex  Cliques  for  one  of   our  hands-­‐on     Ref:  Networks,  An  Introduction-­‐‑Newman
  • 90. Sentiment  Analysis •  Sentiment  Analysis  is  an  important  &  interesting  work   on  the  Twitter  platform   o  Collect  Tweets   o  Opinion  Estimation  -­‐Pass  thru  Classifier,  Sentiment  Lexicons   •  Naïve  Bayes/Max  Entropy  Class/SVM   o  Aggregated  Text  Sentiment/Moving  Average  •  I  chose  not  to  dive  deeper  because  of  time  constraints   o  Couldn’t  do  justice  to  API,  Social  Network  and  Sentiment  Analysis,   all  in  3  hrs  •  Next  3  Slides  have  couple  of  interesting  examples    
  • 91. Sentiment  Analysis •  Twitter  Mining  for  Airline  Sentiment   •  Opinion  Lexicon  -­‐  +ve  2000,  -­‐ve  4800    h5p://www.inside-­‐‑r.org/howto/mining-­‐‑twi5er-­‐‑airline-­‐‑consumer-­‐‑sentiment h5p://sentiment.christopherpo5s.net/lexicons.html#opinionlexicon
  • 92. Need  I  say  more  ? “A  bit  of  clever  math  can  uncover  interes4ng  pa7erns  that  are  not  visible  to  the   human  eye”      h5p://www.economist.com/blogs/schumpeter/2012/06/tracking-­‐‑social-­‐‑media?fsrc=scn/gp/wl/bl/moodofthemarket h5p://www.relevantdata.com/pdfs/IUStudy.pdf
  • 93. Project  Ideas  
  • 94. Interesting Vectors of Exploration 1.  Find  trending  #tags  &  then  related  #tags  –  using   cliques  over  co-­‐#tag-­‐citation,  which  infers  topics   related  to  trending  topics  2.  Related  #tag  topics  over  a  set  of  tweets  by  a  user  or   group  of  users  3.  Analysis-­‐In/Out  flow,  Tweet  Flow   –  Frequent  @mention  4.  Find  affiliation  networks  by  List  memberships,  #tags   or  frequent  @mentions    
  • 95. Interesting Vectors of Exploration 5.  Use  centrality  measures  to  determine  mortals  vs.   celebrities  6.  Classify  Tweet  networks/cliques  based  on  message   passing  characteristics   –  Tweets  vs.  Retweets,  No  of  reweets,…  7.  Retweet  Network   –  Measure  Influence  by  retweet  count  &  frequency   –  Information  contagion  by  looking  at  different  retweet   network  subcomponents  –  who,  when,  how  much,…  
  • 96. Twi5er  Network  Graph  Analysis An  Example  
  • 97. Analysis  Story  Board •  @clouderati  is  a  popular  cloud  related   Twitter  account   •  Goals:   o  Analyze  the  social  graph  characteristics  of  the  users  who  are   following  the  account   In this •  Dig  one  level  deep,  to  the  followers  &  friends,  of  the   tutorial followers  of  @clouderati   o  How  many  cliques  ?  How  strong  are  they  ?   o  Does  the  @mention  support  the  clique  inferences  ?  For you to o  What  are  the  retweet  characteristics  ?  explore !! o  How  does  the  #tag  network  graph  look  like  ?      
  • 98. Twi5er  Analysis  Pipeline  Story  Board   Stages,  Strategies,  APIs  &  Tasks Stage  4   Stag o  e  5   o  Get  &  Store  User  details   For  e (distinct  user  list)   follo ach  @c o  w loud o  Unroll   Find er   erat  frie i   inte nd=f rsec o tion llower   Note:  Needed  a   Note:  Unroll      -­‐  se stage  took  time   t   command  buffer   to  manage  scale   &  missteps   (~980,000  users)     Stage  3   Stage  6 raph    s ocial  g heory   o  Create twork  t ne o  Get  distinct  user  list   o  Apply   ues  &  other   applying  the   liq o  Infer  c s     set(union(list))  operation   tie proper
  • 99. @clouderati  Twi5er  Social  Graph   •  Stats  (Retrospect  after  the  runs):   o  Stage  1     •  @clouderati  has  2072  followers   o  Stage  2   •  Limiting  followers  to  5,000  per  user   o  Stage  3   •  Digging  1st  level  (set  union  of  followers  &  friends  of  the   followers  of  @clouderati)  explodes  into  ~980,000  distinct   users   o  MongoDB  of  the  cache  and  intermediate  datasets  ~10  GB   o  The  database  was  hosted  at  AWS  (Hi  Mem  XLarge  –  m2.xlarge  ),  8   X  15  GB,  Raid  10,  opened  to  Internet  with  DB  authentication  
  • 100. Code  &  Run  Walk  Through o  Code:   §  oscon_2012_user_list_spider_01.py   o  Challenges:   Stage  1   §  Nothing  fancy   §  Get  the  record  and  store  o  Get  @clouderati  Followers  o  Store  in  MongoDB   §  Would  have  had  to  recurse  through  a  REST   cursor  if  there  were  more  than  5000  followers   §  @clouderati  has  2072  followers   o  Interesting  Points:  
  • 101. Code  &  Run  Walk  Through o  Code:   §  oscon_2012_user_list_spider_02.py   §  oscon_2012_twitter_utils.py   §  oscon_2012_mongo.py   §  oscon_2012_validate_dataset.py   o  Challenges:   §  Multiple  runs,  errors  et  al  !   Stage  2   o  Interesting  Points:   §  Set  operation  between  two  mongo  collections  for  restart  buffer  o  Crawl  1  level  deep   §  Protected  users,  some  had  0  followers,  or  0  friends  o  Get  friends  &  followers   §  Interesting  operations  for  validate,  re-­‐crawl  and  refresh  o  Validate,  re-­‐crawl  &  refresh   §  Added  “status_code”  to  differentiate  protected  users   §  {$set:  {status_code:  401  Unauthorized,401  Unauthorized}}   §  Getting friends & followers of 2000 users is the hardest (or so I thought, until I got through the next stage!)    
  • 102. Validate-­‐‑Recrawl-­‐‑Refresh  Logs •  pymongo  version  =    2.2  •  Connected  to  DB!   o  1st  run  –  132  bad  records  •  …   o  This  is  the  classic  Erlang-­‐style  •  2075   supervisor  •  Error  Friends  :    <type  exceptions.KeyError>   o  The  crawl  continue  on  transport  errors  •  4ff3cd40e5557c00c7000000  -­‐  none  has  2072  followers  &  0  friends  •  Error  Friends  :    <type  exceptions.KeyError>   without  worrying  about  retry  •  o  Validate  will  recrawl  &  refresh  as   4ff3a958e5557cfc58000000  -­‐  none  has  2072  followers  &  0  friends  •  Error  Friends  :    <type  exceptions.KeyError>   needed  •  4ff3ccdee5557c00b6000000  -­‐  none  has  2072  followers  &  0  friends  •  4ff3d3b9e5557c01b900001e  -­‐  371187804  has  0  followers  &  0  friends  •  4ff3d3d8e5557c01b9000048  -­‐  63488295  has  155  followers  &  0  friends  •  4ff3d3d9e5557c01b9000049  -­‐  342712617  has  0  followers  &  0  friends  •  4ff3d3d9e5557c01b900004a  -­‐  21266738  has  0  followers  &  0  friends  •  4ff3d3dae5557c01b900004b  -­‐  204652853  has  0  followers  &  0  friends  •  …  •  4ff475cfe5557c1657000074  -­‐  258944989  has  0  followers  &  0  friends  •  4ff475d3e5557c165700007d  -­‐  327286780  has  0  followers  &  0  friends  •  Looks  like  we  have  132  not  so  good  records  •  Elapsed  Time  =  0.546846  
  • 103. Code  &  Run  Walk  Through o  Code:   §  oscon2012_analytics_01.py   Stage  3   o  Challenges:   o  Figure  out  the  right  Set  operations  o  Get  distinct  user  list   applying  the   set(union(list))  operation   o  Interesting  Points:   §  973,323  unique  users  !   §  Recursively  apply  set  union  over  400,00  lists   §  Set  operations  took  slightly  more  than  a  minute    
  • 104. Code  &  Run  Walk  Through o  Code:   §  oscon2012_analytics_01.py  (focus  on  cmd  string  creation)   §  oscon2012_get_user_info_01.py   §  oscon2012_unroll_user_list_01.py   §  oscon2012_unroll_user_list_02.py   Stage  4   o  Challenges:  o  Get  &  Store  User  details   §  Where  do  I  start  ?   (distinct  user  list)   •  In  the  next  few  slides    o  Unroll   §  Took  me  a  few  days  to  get  it  right  (along  with  my  daily  job!)   §  Unfortunately  I  did  not  employ  parallelism  &  didn’t  use  my   MacPro  with  32  GB  memory.  So  the  runs  were  long   §  But  learned  hard  lessons  on  check  point  &  restart   o  Interesting  Points:   §  Tracking  Control  Numbers   §  Time  …  Marathon  unroll  run  19:33:33  !  
  • 105. Twi5er  @  scale  Pa5ern •  Challenge:   o  You  want  to  get  screen  names,  follower  counts  and  other  details  for  a  million   users  •  Problem:   o  No  easy  REST  API   o  https://api.twitter.com/1/users/lookup.json  will  take  100  user_ids  and  give   details  •  Solution:   o  This  is  a  scalability  challenge.  Approach  it  like  so   o  Create  a  command  buffer  collection  in  MongoDB  splitting  millon  user_ids   into  batches  of  100   o  Have  a  “done”  flag  initialized  to  0  for  checkpoint  &  restart   o  After  each  cmd  str  is  executed,  rest  “done”:1   o  For  subsequent  runs,  ignore  “done”:1.     o  Also  helps  in  control  number  tracking  
  • 106. Control  numbers
  • 107. Control  Numbers •  >  db.t_users_info.count()  •  8122  •  >  db.api_str.count({"done":0,"seq_no":{"$lt":8185}},{"seq_no”:)  •  63  •  >  db.api_str.find({"done":0,"seq_no":{"$lt":8185}},{"seq_no":1})   The  collection  should  have  8185  •  {  "_id"  :  ObjectId("4ff4daeae5557c28bf001d53"),  "seq_no"  :  5433  }   documents  •  {  "_id"  :  ObjectId("4ff4daeae5557c28bf001d59"),  "seq_no"  :o5439  }   But  it  has     nly  8122.   Where  did  the  rest  go  ?  •  {  "_id"  :  ObjectId("4ff4daeae5557c28bf001d5f"),  "seq_no"  :  5445  }   63  of  them  still  have  done=0  •  8122  +  63  =  8185  !   {  "_id"  :  ObjectId("4ff4daebe5557c28bf001d74"),  "seq_no"  :  5466  }   Aha,  mystery  solved.    •  {  "_id"  :  ObjectId("4ff4daece5557c28bf001d7a"),  "seq_no"  :  5472  }        They  fell  through  the  cracks  •  Need  a  catch-­‐all  final  run       {  "_id"  :  ObjectId("4ff4daece5557c28bf001d80"),  "seq_no"  :  5478  }  •  {  "_id"  :  ObjectId("4ff4daede5557c28bf001d90"),  "seq_no"  :  5494  }  •  {  "_id"  :  ObjectId("4ff4daefe5557c28bf001daf"),  "seq_no"  :  5525  }  
  • 108. Day  in  the  life  of  a  Control  Number  Detective  –  Run  #1 •  Remember  :  973,323  users.  So,  9734  cmd  strings  (100  users  perstring)  •  >  >  db.api_str.count()  •  9831  •  >  db.api_str.count({"done":0})  •  239  •  >>  db.t_users_info.count()  •  9592  •  >  >  db.api_str.count({"api_str":""})  •  97  •  So  we  should  have  9831  –  97  =  9734  records  •  The  second  run  should  generate  9734-­‐9592  =  142  calls  (i.e.  350-­‐142=208  rate-­‐limit  should  remain).  Let  us  see.  •  {  •     …  •     "x-­‐ratelimit-­‐class":  "api_identified",    •     "x-­‐ratelimit-­‐limit":  "350",    •     "x-­‐ratelimit-­‐remaining":  "209",    •     …  •  }  •  Yep,  209  left  •  >  
  • 109. Day  in  the  life  of  a  Control  Number  Detective  –  Run  #2 •  Remember  :  973,323  users.  So,  9734  cmd  strings  (100  users  perstring)  •  >  db.t_users_info.count()   •  {  •  9728   •   …  •  >  db.api_str.count({"api_str":""})   •     "x-­‐ratelimit-­‐limit":  "350",    •  97   •     "x-­‐ratelimit-­‐remaining":  "344",    •  >  db.api_str.count({"done":0})   •   …  •  103   •  }  •  >9734-­‐9728=6,  same  as  103-­‐97  !   •  Yep,  6  more  records  •  Run  once  more  !   •  >  db.t_users_info.count()  •  >  db.api_str.find({"done":0},{"seq_no":1})   •  9734  •  …   •  Good,  got  9734  !  •  {  "_id"  :  ObjectId("4ff4dbd4e5557c28bf002e22"),  "seq_no"  :  9736  }  •  {  "_id"  :  ObjectId("4ff4db05e5557c28bf001f47"),  "seq_no"  :  5933  }  •  {  "_id"  :  ObjectId("4ff4db8be5557c28bf0028f6"),  "seq_no"  :  8412  }  •  {  "_id"  :  ObjectId("4ff4dba2e5557c28bf002a8c"),  "seq_no"  :  8818  }  •  {  "_id"  :  ObjectId("4ff4dbaee5557c28bf002b69"),  "seq_no"  :  9039  }  •  {  "_id"  :  ObjectId("4ff4dbb8e5557c28bf002c1c"),  "seq_no"  :  9218  }  •  …   Professor Layton would be proud ! In  fact,  I  have  all  the  four  &  plan  to  spend  sometime  with  them  &  Laphraig  !
  • 110. Monitor  runs  &  track  control  numbers Unroll  run  8:48  PM  to  ~4:08  PM  next  day  !  
  • 111. Track  error  &  the  document  numbers
  • 112. Code  &  Run  Walk  Through o  Code:   §  oscon2012_find_strong_ties_01.py   §  oscon2012_social_graph_stats_01.py   Stage  5   o  Challenges:  o  For  each  @clouderati   §  None.  Python  set  operations  made  this  easy   follower  o  Find  friend=follower    -­‐  set   o  Interesting  Points:   intersection   §  Even  at  this  scale,  single  machine  is  not  enough   §  Should  have  tried  data  parallelism     •  This  task  is  well  suited  to  leverage  data   parallelism  as  it  is  commutative  &  associative   •  Was  getting  invalid  cursor  error  from  MongoDB   •  So  had  to  do  the  updates  in  two  steps  
  • 113. Code  &  Run  Walk  Through o  Code:   §  oscon2012_find_cliques_01.py   o  Challenges:   Stage  6   o  Lots  of  good  information  hidden  in   the  data  !  o  Create  social  graph  o  Apply  network  theory   o  Memory  !  o  Infer  cliques  &  other   properties     o  Interesting  Points:   o  Graph,  List  &  set  operations   o  networkx  has  lots  of  interesting   graph  algorithms   o  Collections.Counter  to  the  rescue  
  • 114. Twi5er  Social  Graph  Analysis   of  @clouderati o                                       2072  Followers;  973,323   unique  users  one  level  down  w/   followers/friends  trimmed  at  5,000   o  Strong  ties     o  follower=friend   o  235,697  users,  462,  419  edges   o  501,367    Cliques   o  253  unique  users  8,906  Cliques  w/  >   10  users   o  GeorgeReese  in  7,973  of  them  !  See   List  for  1st  125   o  krishnan  3,446,randy  2,197,  joe  1,977,   sam  1,937,  jp  485,  stu  403,  urquhart   263,beaker  226,  acroll  149,  adrian  63,   gevaperry  24   o  Of  course,  clique  analysis  does  not   tell  us  the  whole  story  …    Clique  Distribution  =  {2:  296521,  3:  58368,  4:  36421,  5:  28788,  6:  24197,  7:  20240,  8:  15997,  9:  11929,  10:  6576,  11:  1909,  12:  364,  13:  55,  14:  2}  
  • 115. Twi5er  Social  Graph  Analysis   of  @clouderati Celebrity  –  very  low  strong  ties   Higher  Celebrity,  low  strong  ties  o  sort  by   followers  vs.   sort  by   strong  ties  is   interesting   Medium  Celebrity,  medium  strong  ties  
  • 116. Twi5er  Social  Graph  Analysis  of  @clouderati o  A  higher  “Strong  Ties”   number  is  interesting   §  It  means  a  very  high   follower-­‐friend  intersection   §  Reeves  62%,  bgolden    85%  o  Bur  a  high  clique  with  a   smaller  “Strong  ties”  show   more  cohesive  &  stronger   social  graph   §  eg.Krishnan  -­‐  15%   friends-­‐followers     §  Samj  –  33%  
  • 117. Twi5er  Social  Graph  Analysis   of  @clouderati o  Ideas  for   more   Exploration   §  Include  all   followers  (instead   of  stopping  at  the   5000  cap)   §  Get  tweets  &  track   @mention   §  Frequent   @mention  shows   more  stronger  ties   §  #tag  analysis  could   show  some   interesting   networks  
  • 118. Twitter Tips – A Baker’s Dozen 1.  Twitter  APIs  are  (more  or  less)  congruent  &  symmetric  2.  Twitter  is  usually  right  &  simple  -­‐  recheck  when  you  get  unexpected  results   before  blaming  Twitter   o  I  was  getting  numbers  when  I  was  expecting  screen_names  in  user  objects.   o  Was  ready  to  send  blasting  e-­‐mails  to  Twitter  team.  Decided  to  check  one  more  time   and  found  that  my  parameter  key  was  wrong-­‐screen_name  instead  of  user_id   o  Always test with one or two records before a long run ! - learned the hard way3.  Twitter  APIs  are  very  powerful  –  consistent  use  can  bear  huge  data   o  In  a  week,  you  can  pull  in  4-­‐5  million  users  &  some  tweets  !     o  Night runs are far more faster & error-free4.  Use  a  NOSQL  data  store  as  a  command  buffer  &  data  buffer   o  Would  make  it  easy  to  work  with  Twitter  at  scale   o  I  use    MongoDB   The o  Keep  the  schema  simple  &  no  fancy  transformation   Beg in •  And  as  far  as  possible  same  as  the  ( json)  response       The ning A o  Use  NOSQL  CLI  for  trimming  records  et  al   End s
  • 119. Twitter Tips – A Baker’s Dozen 5.  Always  use  a  big  data  pipeline   o  Collect - Store - Transform & Analyze - Model & Reason - Predict, Recommend & Visualize o  That  way  you  can  orthogonally  extend,  with  functional  components  like  command  buffers,   validation  et  al    6.  Use  functional  approach  for  a  scalable  pipeline   o  Compose  your  data  big  pipeline  with  well  defined  granular  functions,  each  doing  only  one  thing   o  Don’t  overload  the  functional  components  (i.e.  no  collect,  unroll  &  store  as  a  single  component)   o  Have  well  defined  functional  components  with  appropriate  caching,  buffering,  checkpoints  &   restart  techniques   •  This did create some trouble for me, as we will see later7.  Crawl-­‐Store-­‐Validate-­‐Recrawl-­‐Refresh  cycle   o  The  equivalent  of  the  traditional  ETL   o  Validation  stage  &  validation  routines  are  important   •  Cannot  expect  perfect  runs   •  Cannot  manually  look  at  data  either,  when  data  is  at  scale  8.  Have  control  numbers  to  validate  runs  &  monitor  them   o  I still remember control numbers which start with the number of punch cards in the input deck &d then follow that number through the various runs ! o  There will be a separate printout of the control numbers that will be kept in the operations files
  • 120. Twitter Tips – A Baker’s Dozen 9.  Program  defensively     o  more so for a REST-based-Big Data-Analytics systems o  Expect  failures  at  the  transport  layer  &  accommodate  for  them    10.  Have  Erlang-­‐style  supervisors  in  your  pipeline   o  Fail  fast  &  move  on   o  Don’t  linger  and  try  to  fix  errors  that  cannot  be  controlled  at  that  layer   o  A  higher  layer  process  will  circle  back  and  do  incremental  runs  to   correct  missing  spiders  and  crawls   o  Be  aware  of  visibility  &  lack  of  context.  Validate  at  the  lowest  layer  that   has  enough  context  to  take  corrective  actions   o  I have an example in part 211.  Data  will  never  be  perfect   o  Know  your  data  &  accommodate  for  it’s  idiosyncrasies     •  for  example:  0  followers,  protected  users,  0  friends,…  
  • 121. Twitter Tips – A Baker’s Dozen 12.  Check  Point  frequently  (preferably  after  ever  API  call)  &  have  a   re-­‐startable  command  buffer  cache     o  See a MongoDB example in Part 213.  Don’t  bombard  the  URL   o  Wait  a  few  seconds  before  successful  calls.  This  will  end  up  with  a   scalable  system,  eventually   o  I found 10 seconds to be the sweet spot. 5 seconds gave retry error. Was able to work with 5 seconds with wait & retry. Then, the rate limit started kicking in ! 14.  Always  measure  the  elapsed  time  of  your  API  runs  &  processing   o  Kind  of  early  warning  when  something  is  wrong  15.  Develop  incrementally;  don’t  fail  to  check  “cut  &  paste”  errors  
  • 122. Twitter Tips – A Baker’s Dozen 16.  The  Twitter  big  data  pipeline  has  lots  of  opportunities  for  parallelism   o  Leverage  data  parallelism  frameworks  like  MapReduce   o  But  first  :   §  Prototype  as  a  linear  system,     §  Optimize  and  tweak  the  functional  modules  &  cache  strategies,     §  Note  down  stages  and  tasks  that  can  be  parallelized  and     §  Then  parallelize  them   o  For the example project, we will see later, I did not leverage any parallel frameworks, but the opportunities were clearly evident. I will point them out, as we progress through the tutorial17.   Pay  attention  to  handoffs  between  stages   o  They  might  require  transformation  –  for  example  collect  &  store  might  store  a  user  list   as  multiple  arrays,  while  the  model  requires  each  user  to  be  a  document  for   aggregation     o  But resist the urge to overload collect with transform o  i.e let the collect stage store in arrays, but then have an unroll/flatten stage to transform the array to separate documents o  Add transformation as a granular function – of course, with appropriate buffering, caching, checkpoints & restart techniques 18.  Have  a  good  log  management  system  to  capture  and  wade  through   logs    
  • 123. Twitter Tips – A Baker’s Dozen 19.  Understand  the  underlying  network  characteristics  for  the   inference  you  want  to  make   o  Twitter  Network    !=  Facebook  Network  ,    Twitter  Graph  !=  LinkedIn  Graph   o  Twitter  Network  is  more  of  an  Interest  Network   o  So, many of the traditional network mechanisms & mechanics, like network diameter & degrees of separation, might not make sense o  But, others like Cliques and Bipartite Graphs do
  • 124. Twitter Gripes 1.  Need  more  rich  APIs  for  #tags   o  Somewhat  similar  to  users  viz.  followers,  friends  et  al   o  Might  make  sense  to  make  #tags  a  top  level  object  with  it’s  own  semantics  2.  HTTP  Error  Return  is  not  uniform     o  Returns  400  bad  Request  instead  of  420   o  Granted, there is enough information to figure this out3.  Need  an  easier  way  to  get  screen_name  from  user_id  4.  “following”  vs.  “friends_count”  i.e.  “following”  is  a  dummy  variable.   o  There are a few like this, most probably for backward compatibility5.  Parameter  Validation  is  not  uniform   o  Gives  “404  Not  found”  instead  of  “406  Not  Acceptable”  or  “416  Range   Unacceptable”  6.  Overall  more  validation  would  help   o  Granted, it is more of growing pains. Once one comes across a few inconsistencies, the rest is easy to figure out
  • 125. Thanks To these Giants …
  • 126. Thanks To these Giants …
  • 127. Thanks To these Giants …
  • 128. Thanks To these Giants …
  • 129. Thanks To these Giants …
  • 130. I had a good time researching & preparing for this Tutorial. I hope you learned a few new things & have a few vectors to follow