SlideShare a Scribd company logo
1 of 35
Welcome
• Opening Session
• Internet Archives & Research Potential
• Building Community: Research Highlights
– Oxford Internet Institute
– Centre for Internet Studies & NetLab
– LS3 & the ALEXANDRIA Project
– WebScience @ University of Southampton
• Discussion and Challenges
ArchiveHub and Internet Archive Research
1. Large Scale Data
2. Developing New Tools
3. Testing and Building Theory
{AGENDA}
Large Scale Data | Developing New Tools | Testing and Building Theory
5
Opportunity: The Internet Archive contains the largest
single record of the history of the World Wide Web from
1995 to the present—a wealth of untapped research data.
Challenge: There is a significant lack of research-ready
databases and tools available to the scholarly community
Large Scale Data | Developing New Tools | Testing and Building Theory
A sense of scale
The Library of
Congress contains
approximately 3 PB
of dataa
6
ahttp://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/
The Wayback
Machine contains
more than 410
Billion available web
pages (as of 2014).
The Internet
Archive contains
in excess of 10
PB of archived
cultural material
Library of Congress
Internet Archive
Large Scale Data | Developing New Tools | Testing and Building Theory
7Large Scale Data | Developing New Tools | Testing and Building Theory
8
Opportunity: The ArchiveHub project aims to support the
creation and dissemination of general guidelines & tools for
conducting theoretically and methodologically rigorous
longitudinal research using archival Web data
Large Scale Data | Developing New Tools | Testing and Building Theory
HistoryTracker Tool
9
Version 2.0
20th Century Collection @ RU
PIG Scripts in
Hadoop Environment
RU High-Speed
Computing Cluster
Link Lists & Text Data
Curated Data Sets
Large Scale Data | Developing New Tools | Testing and Building Theory
10
Dataset Research Potential Dates Captures Unique URLs
Hurricane Katrina Online networks and organizational
resilience (Chewning, Lai and Doerfel,
2012; Perry, Taylor and Doerfel, 2003) in
the wake of disasters; information
dissemination
2003 – 2012 1,694,236 663,740
Superstorm
Sandy
2003 – 2012 41,703,112 20,013,455
US Senate Study the growth of political activity in
online environments (Adamic & Glance,
2005; Bruns, 2007; Chang & Park, 2012);
polarization & media discourse
109th – 112th
Congresses
26,965,770 8,674,397
US House 51,840,777 12,410,014
Occupy Wall
Street
Previous research on NGOs in the online
environment (Bach & Stark, 2004;
Shumate, 2003, 2012; Shumate, Fulk, &
Monge, 2005); use of hyperlink data to
study the formation and role of alliances
between SMOs
2010 – 2012 247,928,272 11,3259,655
US Media
Previous studies of news media
organizations (Greer & Mensing, 2006;
Weber, 2012; Weber & Monge, In
Press); focus on evolutionary patterns
2008 – 2012 1,315,132,555 539,184,823
Large Scale Data | Developing New Tools | Testing and Building Theory
What’s in the data?
11
Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text
Link Data:
http://gawker.com/5953665/mitt-romneys-
staff-played-the-media-covering-them-in-a-
friendly-game-of-flag-football
Mitt Romney's Staff Played the Media Covering
Them in a Friendly Game of Flag
http://gawker.com
2012-10-22
Large Scale Data | Developing New Tools | Testing and Building Theory
12
http://archivehub.rutgers.ed
u
13Large Scale Data | Developing New Tools | Testing and Building Theory
14
Large Scale Data | Developing New Tools | Testing and Building Theory
PUTTING BIG THEORY INTO BIG DATA
[or]
moving from observing the Web to observing
new phenomenon on the Web
15Large Scale Data | Developing New Tools | Testing and Building Theory
Tracing the Emergence of Organizational Forms
16
Environment:
Organizations compete for scare resources; during rapid periods of
disruption, new entrants seek “protected” niches (Weber & Monge 2014)
Population:
In digital spaces, online connections provide communicative representations of
information flows (Weber & Monge, 2012)
Formation of ties (e.g. hyperlinks) can positively impact long-term likelihood of
organization survival (Weber, 2012)
Organization:
Organizations adapt internally, reconfiguring team structures and
developing new routines for knowledge sharing
(Ellison, Gibbs & Weber, In Press; Weber & Kim, Under Review)
Large Scale Data | Developing New Tools | Testing and Building Theory
17
18
19
20
21
22
Big Data… Big Theory?
• Networks are central to social movements in that links between
nodes can be influential in collective action
• Examples of nodes includes participants, organizations, media and
communications technologies
• Social networks and social movements (Diani, 2003)
• The interaction between actors, and between actors and hashtags,
collectively represent a networked form of organization
• Network form of organization (Powell, 1990)
Large Scale Data | Developing New Tools | Testing and Building Theory
Data
• Triangulation of data insulates against false readings from large-scale data
(see Lazer, Kennedy, King and Vespignani, 2014)
• Internet Archive:
– 335 OWS related websites; ~330 million edges over a 2-year period
• Lexis Nexis:
– Search conducted to assess U.S. newspaper coverage of OWS from the early stages of the
movement in September 2011 through Sept. 2012
– Search OWS keywords, e.g. “Occupy Wall Street,” “Occupy Oakland”
• Twitter
– Gnip PowerTrack
• Search by keywords; captures a larger volume of Twitter data than other options
– Sample includes October 17, 2011, through January 5, 2012. Initial study focused on the
critical two-month period from November 1 through December 31, 2011,
– 750,816 tweets across the two-month period.
25Large Scale Data | Developing New Tools | Testing and Building Theory
Large Scale Data | Developing New Tools | Testing and Building Theory
OWS News Coverage
Large Scale Data | Developing New Tools | Testing and Building Theory
OWS on the Web
• 335 seed organizations based on records from #OccupyResearch
• Data extracted for 2011 & 2012, based on “both matching”
28
0
2
4
6
8
10
12
14
16
18
Millions
Captures per Month
Large Scale Data | Developing New Tools | Testing and Building Theory
Maximal Cores (k Coreness)
29
Aug. 2011
Jan. 2012
Large Scale Data | Developing New Tools | Testing and Building Theory
30
-
10,000.00
20,000.00
30,000.00
40,000.00
50,000.00
60,000.00
70,000.00
80,000.00
Edges
60
80
100
120
140
160
180
Vertices
Large Scale Data | Developing New Tools | Testing and Building Theory
31
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Density
Large Scale Data | Developing New Tools | Testing and Building Theory
32
0
10
20
30
40
50
60
70
80
90
100
Clusters
Large Scale Data | Developing New Tools | Testing and Building Theory
33Large Scale Data | Developing New Tools | Testing and Building Theory
Challenges:
• Access Challenges:
– Scaling access to the data
• Data Challenges:
– Moving from access to researchable data
• Research Challenges:
– Bridging “big data” to “big theory”
– Potential for use as a historical research tool
34Large Scale Data | Developing New Tools | Testing and Building Theory
• Want data?
– Email me! matthew.weber@rutgers.edu
– ArchiveHub: http://archivehub.rutgers.edu
• The Team
– Kris Carpenter, Vinay Goel, Internet Archive
– David Lazer, Katherine Ognyanova, Northeastern University
– Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University
– Peter Monge, Ayushman Datta, Kristen Guth, USC
35Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers

More Related Content

What's hot

Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
An open data story
An open data storyAn open data story
An open data storyProgCity
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Experiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataExperiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataProgCity
 
Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Juan Sequeda
 
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival RecordsInternet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival Recordsmwe400
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...Anna De Liddo
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challengesMichael Hausenblas
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impactElena Simperl
 
The web bang project michele zadra
The web bang project michele zadraThe web bang project michele zadra
The web bang project michele zadraMichele Zadra
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 

What's hot (20)

Data Power
Data PowerData Power
Data Power
 
Political Transformations in Network Societies - the fifth estate
Political Transformations in Network Societies - the fifth estatePolitical Transformations in Network Societies - the fifth estate
Political Transformations in Network Societies - the fifth estate
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
An open data story
An open data storyAn open data story
An open data story
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Homelessness Data Discussion
Homelessness Data DiscussionHomelessness Data Discussion
Homelessness Data Discussion
 
Experiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataExperiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open data
 
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010
 
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival RecordsInternet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The web bang project michele zadra
The web bang project michele zadraThe web bang project michele zadra
The web bang project michele zadra
 
Data and Technological Citizenship: Principled Public Interest Governing
Data and Technological Citizenship: Principled Public Interest GoverningData and Technological Citizenship: Principled Public Interest Governing
Data and Technological Citizenship: Principled Public Interest Governing
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Critical Data Studies in the Academy
Critical Data Studies in the AcademyCritical Data Studies in the Academy
Critical Data Studies in the Academy
 

Viewers also liked

032415 marketing 101 watershed upload
032415 marketing 101   watershed upload032415 marketing 101   watershed upload
032415 marketing 101 watershed uploadmwe400
 
AEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and LinkingAEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and Linkingmwe400
 
AEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and EducationAEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and Educationmwe400
 
Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.mwe400
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashedmwe400
 
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.mwe400
 

Viewers also liked (10)

032415 marketing 101 watershed upload
032415 marketing 101   watershed upload032415 marketing 101   watershed upload
032415 marketing 101 watershed upload
 
AEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and LinkingAEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and Linking
 
What you always wanted to know about polarity
What you always wanted to know about polarityWhat you always wanted to know about polarity
What you always wanted to know about polarity
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 
AEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and EducationAEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and Education
 
Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashed
 
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 

Similar to Wire Workshop: Overview slides for ArchiveHub Project

Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data scienceHan Woo PARK
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital AgeEric Meyer
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
A framework of Web Science
A framework of Web Science A framework of Web Science
A framework of Web Science vafopoulos
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
Big Data and Social Machines
Big Data and Social MachinesBig Data and Social Machines
Big Data and Social MachinesDavid De Roure
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Jisc
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-ResearchDavid De Roure
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...FIA2010
 
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)Lora Aroyo
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
 
e-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspectivee-Research: A Social Informatics Perspective
e-Research: A Social Informatics PerspectiveEric Meyer
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6Davide Ceolin
 
Data are the new black : Susan Robbins
Data are the new black : Susan RobbinsData are the new black : Susan Robbins
Data are the new black : Susan Robbinstherese nolan-brown
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?Li Ding
 

Similar to Wire Workshop: Overview slides for ArchiveHub Project (20)

Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data science
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital Age
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
A framework of Web Science
A framework of Web Science A framework of Web Science
A framework of Web Science
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
Big Data and Social Machines
Big Data and Social MachinesBig Data and Social Machines
Big Data and Social Machines
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-Research
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
 
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
DREaM Event 2: Louise Cooke
DREaM Event 2: Louise CookeDREaM Event 2: Louise Cooke
DREaM Event 2: Louise Cooke
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
e-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspectivee-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspective
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6
 
Data are the new black : Susan Robbins
Data are the new black : Susan RobbinsData are the new black : Susan Robbins
Data are the new black : Susan Robbins
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Wire Workshop: Overview slides for ArchiveHub Project

  • 1.
  • 2. Welcome • Opening Session • Internet Archives & Research Potential • Building Community: Research Highlights – Oxford Internet Institute – Centre for Internet Studies & NetLab – LS3 & the ALEXANDRIA Project – WebScience @ University of Southampton • Discussion and Challenges
  • 3. ArchiveHub and Internet Archive Research
  • 4. 1. Large Scale Data 2. Developing New Tools 3. Testing and Building Theory {AGENDA} Large Scale Data | Developing New Tools | Testing and Building Theory
  • 5. 5 Opportunity: The Internet Archive contains the largest single record of the history of the World Wide Web from 1995 to the present—a wealth of untapped research data. Challenge: There is a significant lack of research-ready databases and tools available to the scholarly community Large Scale Data | Developing New Tools | Testing and Building Theory
  • 6. A sense of scale The Library of Congress contains approximately 3 PB of dataa 6 ahttp://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/ The Wayback Machine contains more than 410 Billion available web pages (as of 2014). The Internet Archive contains in excess of 10 PB of archived cultural material Library of Congress Internet Archive Large Scale Data | Developing New Tools | Testing and Building Theory
  • 7. 7Large Scale Data | Developing New Tools | Testing and Building Theory
  • 8. 8 Opportunity: The ArchiveHub project aims to support the creation and dissemination of general guidelines & tools for conducting theoretically and methodologically rigorous longitudinal research using archival Web data Large Scale Data | Developing New Tools | Testing and Building Theory
  • 9. HistoryTracker Tool 9 Version 2.0 20th Century Collection @ RU PIG Scripts in Hadoop Environment RU High-Speed Computing Cluster Link Lists & Text Data Curated Data Sets Large Scale Data | Developing New Tools | Testing and Building Theory
  • 10. 10 Dataset Research Potential Dates Captures Unique URLs Hurricane Katrina Online networks and organizational resilience (Chewning, Lai and Doerfel, 2012; Perry, Taylor and Doerfel, 2003) in the wake of disasters; information dissemination 2003 – 2012 1,694,236 663,740 Superstorm Sandy 2003 – 2012 41,703,112 20,013,455 US Senate Study the growth of political activity in online environments (Adamic & Glance, 2005; Bruns, 2007; Chang & Park, 2012); polarization & media discourse 109th – 112th Congresses 26,965,770 8,674,397 US House 51,840,777 12,410,014 Occupy Wall Street Previous research on NGOs in the online environment (Bach & Stark, 2004; Shumate, 2003, 2012; Shumate, Fulk, & Monge, 2005); use of hyperlink data to study the formation and role of alliances between SMOs 2010 – 2012 247,928,272 11,3259,655 US Media Previous studies of news media organizations (Greer & Mensing, 2006; Weber, 2012; Weber & Monge, In Press); focus on evolutionary patterns 2008 – 2012 1,315,132,555 539,184,823 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 11. What’s in the data? 11 Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text Link Data: http://gawker.com/5953665/mitt-romneys- staff-played-the-media-covering-them-in-a- friendly-game-of-flag-football Mitt Romney's Staff Played the Media Covering Them in a Friendly Game of Flag http://gawker.com 2012-10-22 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 13. 13Large Scale Data | Developing New Tools | Testing and Building Theory
  • 14. 14 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 15. PUTTING BIG THEORY INTO BIG DATA [or] moving from observing the Web to observing new phenomenon on the Web 15Large Scale Data | Developing New Tools | Testing and Building Theory
  • 16. Tracing the Emergence of Organizational Forms 16 Environment: Organizations compete for scare resources; during rapid periods of disruption, new entrants seek “protected” niches (Weber & Monge 2014) Population: In digital spaces, online connections provide communicative representations of information flows (Weber & Monge, 2012) Formation of ties (e.g. hyperlinks) can positively impact long-term likelihood of organization survival (Weber, 2012) Organization: Organizations adapt internally, reconfiguring team structures and developing new routines for knowledge sharing (Ellison, Gibbs & Weber, In Press; Weber & Kim, Under Review) Large Scale Data | Developing New Tools | Testing and Building Theory
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. Big Data… Big Theory? • Networks are central to social movements in that links between nodes can be influential in collective action • Examples of nodes includes participants, organizations, media and communications technologies • Social networks and social movements (Diani, 2003) • The interaction between actors, and between actors and hashtags, collectively represent a networked form of organization • Network form of organization (Powell, 1990) Large Scale Data | Developing New Tools | Testing and Building Theory
  • 24.
  • 25. Data • Triangulation of data insulates against false readings from large-scale data (see Lazer, Kennedy, King and Vespignani, 2014) • Internet Archive: – 335 OWS related websites; ~330 million edges over a 2-year period • Lexis Nexis: – Search conducted to assess U.S. newspaper coverage of OWS from the early stages of the movement in September 2011 through Sept. 2012 – Search OWS keywords, e.g. “Occupy Wall Street,” “Occupy Oakland” • Twitter – Gnip PowerTrack • Search by keywords; captures a larger volume of Twitter data than other options – Sample includes October 17, 2011, through January 5, 2012. Initial study focused on the critical two-month period from November 1 through December 31, 2011, – 750,816 tweets across the two-month period. 25Large Scale Data | Developing New Tools | Testing and Building Theory
  • 26. Large Scale Data | Developing New Tools | Testing and Building Theory
  • 27. OWS News Coverage Large Scale Data | Developing New Tools | Testing and Building Theory
  • 28. OWS on the Web • 335 seed organizations based on records from #OccupyResearch • Data extracted for 2011 & 2012, based on “both matching” 28 0 2 4 6 8 10 12 14 16 18 Millions Captures per Month Large Scale Data | Developing New Tools | Testing and Building Theory
  • 29. Maximal Cores (k Coreness) 29 Aug. 2011 Jan. 2012 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 31. 31 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Density Large Scale Data | Developing New Tools | Testing and Building Theory
  • 32. 32 0 10 20 30 40 50 60 70 80 90 100 Clusters Large Scale Data | Developing New Tools | Testing and Building Theory
  • 33. 33Large Scale Data | Developing New Tools | Testing and Building Theory
  • 34. Challenges: • Access Challenges: – Scaling access to the data • Data Challenges: – Moving from access to researchable data • Research Challenges: – Bridging “big data” to “big theory” – Potential for use as a historical research tool 34Large Scale Data | Developing New Tools | Testing and Building Theory
  • 35. • Want data? – Email me! matthew.weber@rutgers.edu – ArchiveHub: http://archivehub.rutgers.edu • The Team – Kris Carpenter, Vinay Goel, Internet Archive – David Lazer, Katherine Ognyanova, Northeastern University – Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University – Peter Monge, Ayushman Datta, Kristen Guth, USC 35Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers

Editor's Notes

  1. 8.5PB of data.
  2. 20th Century Collection = 9TB of metadata Media Seed List = 4,891
  3. 20th Century Collection = 9TB of metadata Media Seed List = 4,891
  4. 9/25/11
  5. Diani – ANT – actants exist thru relationships w/ other nodes; technology nodes as actants; hastags Network form – repeated, enduring exchange…that lack a legitimant organziational authority to arbitrae
  6. Over time, dyadic communication will become prevalent in an emerging networked organization. As a social movement develops as an emerging network form of organization, the organizational structure will be increasingly clustered.
  7. Trend chart illustrating the relationship between OWS and the media
  8. News sources 105 major U.S. newspapers via Lexus Nexus Search terms: Occupy Wall Street, Occupy Los Angeles, Occupy Wall Street, Occupy Chicago, Zuccotti Park Initial Analysis: Sample set drawn from Oct. 17, 2011 – Jan. 1, 2012 Nov. 10 & Nov. 17 Occupy Los Angeles
  9. Aug 2011 -> 20,000 ties Jan. 2012 -> 65,000 ties – denser core