Internet Archives and Social Science Research - Yeungnam University

242 views

Published on

Talk given at Yeungnam University on April 6, 2014

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
242
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Internet Archives and Social Science Research - Yeungnam University

  1. 1. BIG  DATA  AND  SOCIAL  SCIENCE  THEORY   Leveraging  Large  Scale  Data  to  Discover   New  Pa4erns  in  Society   Monday,  April  7,  2014   CybermoCons  @  Korea   Yeungnam  University     Ma4hew  Weber   Rutgers  University   School  of  CommunicaCon  &  InformaCon  
  2. 2. 2 Opportunity:  The  Internet  Archive  contains  the  largest   single  record  of  the  history  of  the  World  Wide  Web  from   1995  to  the  present—a  wealth  of  untapped  research  data.     Challenge:  There  is  a  significant  lack  of  research-­‐ready   databases  and  tools  available  to  the  scholarly  community  
  3. 3. © Internet Archive 2013
  4. 4. ©  Internet  Archive  2013  
  5. 5. 5
  6. 6. 6
  7. 7. 7
  8. 8. 8
  9. 9. 9
  10. 10. 10 Opportunity:  The  ArchiveHub  project  aims  to  support  the   creaCon  and  disseminaCon  of  general  guidelines  &  tools  for   conducCng  theoreCcally  and  methodologically  rigorous   longitudinal  research  using  archival  Web  data    
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14 Dataset   Research  PotenAal   Dates   Captures   Unique  URLs   Hurricane  Katrina   Online  networks  and  organizaConal   resilience  (Chewning,  Lai  and  Doerfel,   2012;  Perry,  Taylor  and  Doerfel,  2003)  in   the  wake  of  disasters;  informaCon   disseminaCon     2003  –  2012   1,694,236   663,740     Superstorm   Sandy   2003  –  2012   41,703,112   20,013,455   US  Senate   Study  the  growth  of  poliCcal  acCvity  in   online  environments  (Adamic  &  Glance,   2005;  Bruns,  2007;  Chang  &  Park,  2012);   polarizaCon  &  media  discourse   109th  –  112th   Congresses   26,965,770    8,674,397     US  House   51,840,777   12,410,014   Occupy  Wall   Street   Previous  research  on  NGOs  in  the  online   environment  (Bach  &  Stark,  2004;   Shumate,  2003,  2012;  Shumate,  Fulk,  &   Monge,  2005);  use  of  hyperlink  data  to   study  the  formaCon  and  role  of  alliances   between  SMOs   2010  –  2012   247,928,272   11,3259,655   US  Media   Previous  studies  of  news  media   organizaCons  (Greer  &  Mensing,  2006;   Weber,  2012;  Weber  &  Monge,  In   Press);  focus  on  evoluConary  pa4erns   2008  –  2012   1,315,132,555   539,184,823  
  15. 15. 15 http://archivehub.rutgers.edu
  16. 16. 16
  17. 17. Tracing  the  Emergence  of  OrganizaConal  Forms   17 Environment:     OrganizaCons  compete  for  scare  resources;  during  rapid  periods  of   disrupCon,  new  entrants  seek  “protected”  niches  (Weber  &  Monge  2014) PopulaAon:     In  digital  spaces,  online  connecCons  provide  communicaCve  representaCons  of   informaCon  flows  (Weber  &  Monge,  2012)     FormaCon  of  Ces  (e.g.  hyperlinks)  can  posiCvely  impact  long-­‐term  likelihood  of   organizaCon  survival  (Weber,  2012)   OrganizaAon:     OrganizaCons  adapt  internally,  reconfiguring  team  structures  and   developing  new  rouCnes  for  knowledge  sharing     (Ellison,  Gibbs  &  Weber,  In  Press;  Weber  &  Kim,  Under  Review)
  18. 18. 18
  19. 19. Big Data… Big Theory?   •  Networks  are  central  to  social  movements  in  that  links  between   nodes  can  be  influenCal  in  collecCve  acCon   •  Examples  of  nodes  includes  parCcipants,  organizaCons,  media  and   communicaCons  technologies     •  Social  networks  and  social  movements  (Diani,  2003)     •  The  interacCon  between  actors,  and  between  actors  and  hashtags,   collecCvely  represent  a  networked  form  of  organizaCon     •  Network  form  of  organizaCon  (Powell,  1990)  
  20. 20. Over time, dyadic communication will become prevalent in an emerging networked organization.H1:   As a social movement develops as an emerging network form of organization, the organizational structure will be increasingly clustered. H2:  
  21. 21. Data   •  TriangulaCon  of  data  insulates  against  false  readings  from  large-­‐scale  data   (see  Lazer,  Kennedy,  King  and  Vespignani,  2014)   •  Internet  Archive:   –  14  websites;  4,504  hyperlink  dyads  over  a  2-­‐month  period.   •  Lexis  Nexis:   –  Search  conducted  to  assess  U.S.  newspaper  coverage  of  OWS  from  the  early  stages  of  the   movement  in  September  2011  through  Sept.  2012   –  Search  OWS  keywords,  e.g.  “Occupy  Wall  Street,”  “Occupy  Oakland”   •  Twi4er   –  Gnip  PowerTrack     •  Search  by  keywords;  captures  a  larger  volume  of  Twi4er  data  than  other  opCons     –  Sample  includes  October  17,  2011,  through  January  5,  2012.  IniCal  study  focused  on  the   criCcal  two-­‐month  period  from  November  1  through  December  31,  2011,     –  750,816  tweets  across  the  two-­‐month  period.     21
  22. 22. OWS News Coverage  
  23. 23. OWS  on  the  Web   •  335  seed  organizaCons  based  on  records  from  #OccupyResearch   •  Data  extracted  for  2011  &  2012,  based  on  “both  matching”   24 0   2   4   6   8   10   12   14   16   18   Millions   Captures  per  Month  
  24. 24. Maximal  Cores  (k  Coreness)   25 Aug.  2011   Jan.  2012  
  25. 25. 26  -­‐          10,000.00      20,000.00      30,000.00      40,000.00      50,000.00      60,000.00      70,000.00      80,000.00     Edges   60   80   100   120   140   160   180   VerAces  
  26. 26. 27 0   0.01   0.02   0.03   0.04   0.05   0.06   0.07   0.08   Density  
  27. 27. 28 0   10   20   30   40   50   60   70   80   90   100   Clusters  
  28. 28. 29
  29. 29. ImplicaCons   •  Big  Data:   –  Guiding  data  collecCon  with  theoreCcally  grounded  quesCons  avoids  the   “needle-­‐in-­‐the-­‐haystack”  problem   –  Leverage  advances  in  compuCng  with  exisCng  theories  to  develop  robust   studies  of  social  science  phenomenon     •  Big  Theory:   –  Expanding  prior  theories  on  networked  organizaConal  forms  and  form   emergence  (evoluConary)   –  Building  toward  a  macro  theory  of  organizaConal  form  emergence  based  on   resource  availability  and  networks   30
  30. 30. •  Want  data?   –  Email  me!  ma4hew.weber@rutgers.edu   –  ArchiveHub:  h4p://archivehub.rutgers.edu     •  Collaborators   –  Kris  Carpenter  &  Vinay  Goel,  Internet  Archive     –  David  Lazer,  Northeastern  University       31 Research  supported  by  NSF  Award  #1244727  and  the  NetSCI  Lab  @  Rutgers  

×