● to document the archiving of the Web and of Twitter during the events
● to question the conditions and possibilities of elaborating corpora
● to bring out the first elements which can emerge from these massive
Charlie Hebdo collection
Legal Deposit Web @Ina
● Law 2006 shares Web deposit responsibilities between mandated institutions :
collects sites related to audiovisual communications
(radio, tv channels, blogs etc.)
collects all other sites
Archive Twitter @ina
● Crawling by using Twitter API (data)
● Since February 2014
● 11 000 users (timelines)
● 400 hashtags
● 400 millions of tweets
Rest API : timelines
● Total : 50 millions
● Average per day: 48 000
Streaming API: hashtags
● Total : 300 millions
● Average per day : 270 000
● Streaming API : 400 hashtags, 5000 users
● 1% of tweets published at time t
● REST API : 3200 old tweets per user
● Search API : 15 minute window of 180 for user and
450 for app
● Canonical version?
● Twitter page?
● Authenticity, integrity
● Second Screen
● TV Sync
● Search / data mining
● Data coverage
● Generic or specific tools
● Open data?
Opening the black boxes of Web archiving
3 interviews with
- Jefferson Bailey & Sylvie Rollason-Cass (Archive-It team) on March 2016
- Annick Le Follic (BnF) on March 21, 2016
- Thomas Drugeon (Ina) on March 21, 2016
An emergency collect: why, when, what for, … ?
Which methods and tools ?
Which issues and limits?
Differences between both events?
Openness and closure
Governance, human and material resources
How to collect and document “the now” and
After observation, manipulation
•Some difficulties : discovering the tools before focusing on specific
•Data deluge and dispersion
•Methodological false moves
-Uses and significance
-Origin and dissemination
-Links with the muslim community
Methodological cautions: same images with different URLS / very different
results in French and English, etc.
Some scientific issues
Long-term perspective / Non french-centric overview (Arquivo.pt, etc.)
A phenomena of third generation ? (cf: Dominique Boullier,
What does an hashtag, a retweet, etc. mean ? →boyd, danah, Scott Golder,
and Gilad Lotan. 2010. “Tweet, Tweet, Retweet: Conversational Aspects of
Retweeting on Twitter.” HICSS-43. IEEE: Kauai, HI, January 6.
Close and distant reading (--> F. Moretti)
“Although Moretti in the main uses the distant reading approach to the
study of large amounts of digital data, I will argue that none of these two
approaches are per se inscribed or inherent in the digital material. By this I
mean that simply because collections of digital material are in many cases
big data, which opens the possibility of asking and answering new types of
research questions, this does not necessarily mean that they have to be
approached as Big Data”. (p. 11)
“The question is not if scholarly studies within the Humanities want to “go
digital”, but rather how”. (p. 9)
Brügger N., Humanities, Digital Humanities, Media Studies, Internet Studies: An
inaugural Lecture, Aarhus, CFI, 2015.
Creating a trading zone: Conclusion
« The metaphor of a trading zone is being applied to collaborations in S &
T. The basis of the metaphor is anthropological studies of how different
cultures are able to exchange goods, despite differences in language and