SlideShare a Scribd company logo
1 of 19
VERSITET
NIELS BRÜGGER
AARHUS
UNIVERSITY 4 NOVEMBER 2021
UNI
WHAT DOES A NATIONAL WEB TALK
ABOUT? — DIGGING INTO BILLIONS
OF WORDS, THE DANISH CASE
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
AGENDA
›the project
›the data
›the analytical design — four approaches
›the methods
›next steps
2
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE PROJECT
›RQ: What has the Danish web talked about, where
have specific topics 'lived', and how has this
developed?
›Empirical result: A mapping of the textual web
landscape 2006-2016
›Methodological result: Develop and test methods for
large scale textual analysis
3
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
4
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE DATA
›Corpus extracted from the Danish web archive
Netarkivet
›One 'time slice' from each year
›Versions removed, one version of each web domain
›Part of the larger study of the Danish web, started
some years ago
›Read about the first digs in Brügger, N., Nielsen, J.,
Laursen, D. (2020). Big data experiments with the
archived Web: Methodological reflections on
studying the development of a nation's Web, First
Monday, 25(3)
5
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE DATA
Acknowledgements:
›corpus extraction and selection of versions: Janne
Nielsen, Ditte Laursen, Ulrich Have, Per Møldrup-
Dalum
›initial calculations or words: Janne & Ulrich
›selected events to be studied: Ditte & me
›Textual analyses: Kristoffer Nielbo, Peter Vahlstrup,
the Centre for Humanities Computing Aarhus
(CHCAA, chcaa.io)
6
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE DATA
The corpus:
›all words on the Danish web as
it has been archived by
Netarkivet,
›in one time slice from each
year, 2006 to 2016
›language recognition
performed, only analyse Danish
words
7
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
8
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE ANALYTICAL DESIGN — FOUR
APPROACHES
›First, we will follow the talk of the web itself by
identifying a number of significant words among the
most used words per year (minus stop words)
›e.g. 10-20 words chosen from the 1,000 most used
words
9
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
10
word n
1 til 735099980
2 med 444560089
3 for 396849909
4 det 394031045
5 der 273008926
6 den 267696098
7 har 251044875
8 kan 225782096
9 ikke 214874628
10 som 205071867
11 jeg 204010581
12 fra 199932948
13 mere 159701573
14 alle 134942476
15 kontakt 125492896
16 eller 124463101
17 dkk 119472355
18 din 112618070
19 her 104361397
20 2016 104005585
21 skal 102123933
22 ved 100361724
23 efter 95686049
24 pris 94400983
25 2015 93486153
26 men 90079959
27 læs 87427063
28 man 82128864
29 vil 80107116
30 vores 79158246
31 dig 77971690
32 også 76196557
33 var 75266165
34 dette 71606610
35 fragt 69648534
36 min 67659330
37 ind 67579071
38 søg 67341460
39 produkter 66794635
40 forside 65254918
41 2014 65073113
42 siden 64703860
43 the 64441149
44 tilbehør 62739776
45 hvis 60727133
› most used words 2016, stop words
included
› lots of prepositions, pronouns, etc.
› also a few indicating commercial
websites: dkk (Danish kroner), pris (prize),
fragt (freight), produkter (products) —
could be used to identify where trade
takes place
› other interesting words: nyheder (news, no
53), indlæg (comment, no 54), børn
(children, no 105), cookies (no 162,
interesting to see development), blog (no
212), spil (games, no 233), sport (no 237)
— and many more about trade (tilbud
(offer), kurv (basket), køb (buy), læg (put))
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE ANALYTICAL DESIGN — FOUR
APPROACHES
›Second, we will use 'the word of the year' for each
year
›'the word of the year' has been chosen since 2006
›including the other candidates that were not
selected
›e.g. in 2020 that would be 'samfundssind' (public
spirit), and 'afstand' (distance), 'albuehilsen' (elbow
bump), 'flokimmunitet' (herd immunity), and
'mundbind' (face mask)
11
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE ANALYTICAL DESIGN — FOUR
APPROACHES
›Third, a number of discussions, topics or events that
have set the agenda throughout each year are
identified
›e.g. 2006, 'Muhammedkrise' (cartoon crisis), 2008,
'finanskrise' (financial crisis), 2012 'lukkelov' (shops
Act), 2015 'flygtningekrise' (refugee crisis)
›a 'dictionary' of synonyms will be established for
each event
12
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE ANALYTICAL DESIGN — FOUR
APPROACHES
›Fourth, the most used search terms on Google from
Denmark are identified (either at Google or in legacy
media where they are often mentioned at the end of
each year)
13
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
THE METHODS
›A variety of calculations of word occurences
›Train embeddings of words and documents to
represent the lexical co-occurrence structure within
and between websites
›Possibly, model and predict information propagation
across the Danish web
14
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
15
word word word word
word word word word
wordword word word
word word word word
word word word word
wordword word
wordword word
wordword word word
word word wordword
word word
Training in recognising
topics — topic model +
neural word embeddings
'Dictionary' of words,
identification of lexical co-
occurrence structure
Grapf of where topics 'live'
Biggest challenge: training efficiency and speed
due to the size of data => training algorithms.
Estimated training time: 4-6 months
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
NEXT STEPS
›The textual analyses to be supplemented by a
hyperlink network analysis
›Identify the link relations between websites where a
given topic is talked about
›Hyperlinks have already been extracted as a
separate dataset
16
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
17
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
Topic x
AARHUS
UNIVERSITET
WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
NIELS BRÜGGER
4 NOVEMBER 2021
NEXT STEPS
Possible relevance in WARCnet:
›track topics from other countries on the Danish web,
possibly on web pages of the country's language
›replicate the study based on holdings from other
national web archive
18
VERSITET
NIELS BRÜGGER
AARHUS
UNIVERSITY 4 NOVEMBER 2021
UNI
WHAT DOES A NATIONAL WEB TALK
ABOUT? — DIGGING INTO BILLIONS
OF WORDS, THE DANISH CASE

More Related Content

What's hot

Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021WARCnet
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...WARCnet
 
Presentatie for "Studiemiddag Linked Data Archieven"
Presentatie for "Studiemiddag Linked Data Archieven"Presentatie for "Studiemiddag Linked Data Archieven"
Presentatie for "Studiemiddag Linked Data Archieven"Victor de Boer
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
20170501 Distributed Network of Digital Heritage Information
20170501  Distributed Network of Digital Heritage Information20170501  Distributed Network of Digital Heritage Information
20170501 Distributed Network of Digital Heritage InformationEnno Meijers
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Suggestions for the content of the PhD project description
Suggestions for the content of the PhD project descriptionSuggestions for the content of the PhD project description
Suggestions for the content of the PhD project descriptionAUStudypedia
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Open Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in GermanyOpen Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in Germanyariadnenetwork
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
 

What's hot (20)

Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
 
Presentatie for "Studiemiddag Linked Data Archieven"
Presentatie for "Studiemiddag Linked Data Archieven"Presentatie for "Studiemiddag Linked Data Archieven"
Presentatie for "Studiemiddag Linked Data Archieven"
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
20170501 Distributed Network of Digital Heritage Information
20170501  Distributed Network of Digital Heritage Information20170501  Distributed Network of Digital Heritage Information
20170501 Distributed Network of Digital Heritage Information
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Suggestions for the content of the PhD project description
Suggestions for the content of the PhD project descriptionSuggestions for the content of the PhD project description
Suggestions for the content of the PhD project description
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Linking knowledge spaces
Linking knowledge spacesLinking knowledge spaces
Linking knowledge spaces
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Linked Data
Linked DataLinked Data
Linked Data
 
Open Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in GermanyOpen Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in Germany
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 

Similar to The Danish case: What does the danish web talk about

Disrupting Digital Monolingualism
Disrupting Digital MonolingualismDisrupting Digital Monolingualism
Disrupting Digital MonolingualismPaul Spence
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectJenn Riley
 
Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Andrea Scharnhorst
 
Warcnet 2022_final.pptx
Warcnet 2022_final.pptxWarcnet 2022_final.pptx
Warcnet 2022_final.pptxWARCnet
 
LACE Project Overview and Exploitation
LACE Project Overview and ExploitationLACE Project Overview and Exploitation
LACE Project Overview and ExploitationHendrik Drachsler
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic mapstmra
 
Same shit, new wrapping – or? On the termwiki of The Language Council of Norway
Same shit, new wrapping – or?  On the termwiki of The Language Council of NorwaySame shit, new wrapping – or?  On the termwiki of The Language Council of Norway
Same shit, new wrapping – or? On the termwiki of The Language Council of NorwayTERMCAT
 
Semantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked DataSemantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked DataNick Bassiliades
 
Smart Content - FREME Project - Presentation Frankfurt Book Fair
Smart Content - FREME Project - Presentation Frankfurt Book FairSmart Content - FREME Project - Presentation Frankfurt Book Fair
Smart Content - FREME Project - Presentation Frankfurt Book FairKevin Koidl
 
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...Olaf Janssen
 
5. NI4OS-Europe Objectives, Workplan, Potential Impact
5. NI4OS-Europe Objectives, Workplan,  Potential Impact 5. NI4OS-Europe Objectives, Workplan,  Potential Impact
5. NI4OS-Europe Objectives, Workplan, Potential Impact EOSC-Pillar European Project
 
Schiller - Measuring researchers mobility
Schiller - Measuring researchers mobilitySchiller - Measuring researchers mobility
Schiller - Measuring researchers mobilityinnovationoecd
 
Copenhagen business school drives sustainability at roskilde festival using c...
Copenhagen business school drives sustainability at roskilde festival using c...Copenhagen business school drives sustainability at roskilde festival using c...
Copenhagen business school drives sustainability at roskilde festival using c...Carlos Tomas
 
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10TechSoup
 

Similar to The Danish case: What does the danish web talk about (20)

Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"
 
Disrupting Digital Monolingualism
Disrupting Digital MonolingualismDisrupting Digital Monolingualism
Disrupting Digital Monolingualism
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library Project
 
Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)
 
Warcnet 2022_final.pptx
Warcnet 2022_final.pptxWarcnet 2022_final.pptx
Warcnet 2022_final.pptx
 
LACE Project Overview and Exploitation
LACE Project Overview and ExploitationLACE Project Overview and Exploitation
LACE Project Overview and Exploitation
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
 
Same shit, new wrapping – or? On the termwiki of The Language Council of Norway
Same shit, new wrapping – or?  On the termwiki of The Language Council of NorwaySame shit, new wrapping – or?  On the termwiki of The Language Council of Norway
Same shit, new wrapping – or? On the termwiki of The Language Council of Norway
 
Semantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked DataSemantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked Data
 
Smart Content - FREME Project - Presentation Frankfurt Book Fair
Smart Content - FREME Project - Presentation Frankfurt Book FairSmart Content - FREME Project - Presentation Frankfurt Book Fair
Smart Content - FREME Project - Presentation Frankfurt Book Fair
 
Niels Brügger's slides from Digital Conversations event on 26/09/2013
Niels Brügger's slides from Digital Conversations event on 26/09/2013Niels Brügger's slides from Digital Conversations event on 26/09/2013
Niels Brügger's slides from Digital Conversations event on 26/09/2013
 
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
Wikipedia, a library and an archive, a family portrait - DISH, 8-12-2015, Rot...
 
Humbolt university berlin febr 2017
Humbolt university berlin febr 2017Humbolt university berlin febr 2017
Humbolt university berlin febr 2017
 
5. NI4OS-Europe Objectives, Workplan, Potential Impact
5. NI4OS-Europe Objectives, Workplan,  Potential Impact 5. NI4OS-Europe Objectives, Workplan,  Potential Impact
5. NI4OS-Europe Objectives, Workplan, Potential Impact
 
Schiller - Measuring researchers mobility
Schiller - Measuring researchers mobilitySchiller - Measuring researchers mobility
Schiller - Measuring researchers mobility
 
Open English
Open EnglishOpen English
Open English
 
Copenhagen business school drives sustainability at roskilde festival using c...
Copenhagen business school drives sustainability at roskilde festival using c...Copenhagen business school drives sustainability at roskilde festival using c...
Copenhagen business school drives sustainability at roskilde festival using c...
 
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
 
National library of Luxembourg visiting Dokk1 Jan 2017
National library of Luxembourg visiting Dokk1 Jan 2017National library of Luxembourg visiting Dokk1 Jan 2017
National library of Luxembourg visiting Dokk1 Jan 2017
 
Hungarian Library Association visiting Dokk1 May 2017
Hungarian Library Association visiting Dokk1 May 2017Hungarian Library Association visiting Dokk1 May 2017
Hungarian Library Association visiting Dokk1 May 2017
 

More from WARCnet

Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxWARCnet
 
Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxWARCnet
 
2022 Visit Royal Danish Library Ditte Laursen.pdf
2022 Visit Royal Danish Library Ditte Laursen.pdf2022 Visit Royal Danish Library Ditte Laursen.pdf
2022 Visit Royal Danish Library Ditte Laursen.pdfWARCnet
 
20221015 introduction to panel Ditte Laursen.pdf
20221015 introduction to panel  Ditte Laursen.pdf20221015 introduction to panel  Ditte Laursen.pdf
20221015 introduction to panel Ditte Laursen.pdfWARCnet
 
WARCnet_2022.pptx
WARCnet_2022.pptxWARCnet_2022.pptx
WARCnet_2022.pptxWARCnet
 
WARCnet conference - Mapping social media archiving initiatives.pptx
WARCnet conference - Mapping social media archiving initiatives.pptxWARCnet conference - Mapping social media archiving initiatives.pptx
WARCnet conference - Mapping social media archiving initiatives.pptxWARCnet
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfWARCnet
 
Hegarty-WARCNet2022-slides.pdf
Hegarty-WARCNet2022-slides.pdfHegarty-WARCNet2022-slides.pdf
Hegarty-WARCNet2022-slides.pdfWARCnet
 
20221018_Panel_Covid_WARCnet_closing_conference.pdf
20221018_Panel_Covid_WARCnet_closing_conference.pdf20221018_Panel_Covid_WARCnet_closing_conference.pdf
20221018_Panel_Covid_WARCnet_closing_conference.pdfWARCnet
 
Millward - We cannot put this off any longer - upload.pptx
Millward - We cannot put this off any longer - upload.pptxMillward - We cannot put this off any longer - upload.pptx
Millward - We cannot put this off any longer - upload.pptxWARCnet
 
Balbi_Keynote_AarhusWARCnet.pptx
Balbi_Keynote_AarhusWARCnet.pptxBalbi_Keynote_AarhusWARCnet.pptx
Balbi_Keynote_AarhusWARCnet.pptxWARCnet
 
Reporting from a Short-Term Network Stay at the BnF and INA
Reporting from a Short-Term Network Stay at the BnF and INAReporting from a Short-Term Network Stay at the BnF and INA
Reporting from a Short-Term Network Stay at the BnF and INAWARCnet
 
Post WARCnet
Post WARCnetPost WARCnet
Post WARCnetWARCnet
 
The WARCnet Code Book of web archive data formats
The WARCnet Code Book of web archive data formatsThe WARCnet Code Book of web archive data formats
The WARCnet Code Book of web archive data formatsWARCnet
 
Web scraping using semi-automated browsing
 Web scraping using semi-automated browsing Web scraping using semi-automated browsing
Web scraping using semi-automated browsingWARCnet
 
Working Group 6 discussion
Working Group 6 discussionWorking Group 6 discussion
Working Group 6 discussionWARCnet
 
What’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collectionsWhat’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collectionsWARCnet
 
Working Group 2 on transnational events
Working Group 2 on transnational eventsWorking Group 2 on transnational events
Working Group 2 on transnational eventsWARCnet
 
Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...WARCnet
 
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...WARCnet
 

More from WARCnet (20)

Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptx
 
Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptx
 
2022 Visit Royal Danish Library Ditte Laursen.pdf
2022 Visit Royal Danish Library Ditte Laursen.pdf2022 Visit Royal Danish Library Ditte Laursen.pdf
2022 Visit Royal Danish Library Ditte Laursen.pdf
 
20221015 introduction to panel Ditte Laursen.pdf
20221015 introduction to panel  Ditte Laursen.pdf20221015 introduction to panel  Ditte Laursen.pdf
20221015 introduction to panel Ditte Laursen.pdf
 
WARCnet_2022.pptx
WARCnet_2022.pptxWARCnet_2022.pptx
WARCnet_2022.pptx
 
WARCnet conference - Mapping social media archiving initiatives.pptx
WARCnet conference - Mapping social media archiving initiatives.pptxWARCnet conference - Mapping social media archiving initiatives.pptx
WARCnet conference - Mapping social media archiving initiatives.pptx
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
 
Hegarty-WARCNet2022-slides.pdf
Hegarty-WARCNet2022-slides.pdfHegarty-WARCNet2022-slides.pdf
Hegarty-WARCNet2022-slides.pdf
 
20221018_Panel_Covid_WARCnet_closing_conference.pdf
20221018_Panel_Covid_WARCnet_closing_conference.pdf20221018_Panel_Covid_WARCnet_closing_conference.pdf
20221018_Panel_Covid_WARCnet_closing_conference.pdf
 
Millward - We cannot put this off any longer - upload.pptx
Millward - We cannot put this off any longer - upload.pptxMillward - We cannot put this off any longer - upload.pptx
Millward - We cannot put this off any longer - upload.pptx
 
Balbi_Keynote_AarhusWARCnet.pptx
Balbi_Keynote_AarhusWARCnet.pptxBalbi_Keynote_AarhusWARCnet.pptx
Balbi_Keynote_AarhusWARCnet.pptx
 
Reporting from a Short-Term Network Stay at the BnF and INA
Reporting from a Short-Term Network Stay at the BnF and INAReporting from a Short-Term Network Stay at the BnF and INA
Reporting from a Short-Term Network Stay at the BnF and INA
 
Post WARCnet
Post WARCnetPost WARCnet
Post WARCnet
 
The WARCnet Code Book of web archive data formats
The WARCnet Code Book of web archive data formatsThe WARCnet Code Book of web archive data formats
The WARCnet Code Book of web archive data formats
 
Web scraping using semi-automated browsing
 Web scraping using semi-automated browsing Web scraping using semi-automated browsing
Web scraping using semi-automated browsing
 
Working Group 6 discussion
Working Group 6 discussionWorking Group 6 discussion
Working Group 6 discussion
 
What’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collectionsWhat’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collections
 
Working Group 2 on transnational events
Working Group 2 on transnational eventsWorking Group 2 on transnational events
Working Group 2 on transnational events
 
Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...
 
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
 

Recently uploaded

OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 

Recently uploaded (20)

OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 

The Danish case: What does the danish web talk about

  • 1. VERSITET NIELS BRÜGGER AARHUS UNIVERSITY 4 NOVEMBER 2021 UNI WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE
  • 2. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 AGENDA ›the project ›the data ›the analytical design — four approaches ›the methods ›next steps 2
  • 3. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE PROJECT ›RQ: What has the Danish web talked about, where have specific topics 'lived', and how has this developed? ›Empirical result: A mapping of the textual web landscape 2006-2016 ›Methodological result: Develop and test methods for large scale textual analysis 3
  • 4. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 4 Topic x Topic x Topic x Topic x Topic x Topic x Topic x Topic x Topic x Topic x
  • 5. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE DATA ›Corpus extracted from the Danish web archive Netarkivet ›One 'time slice' from each year ›Versions removed, one version of each web domain ›Part of the larger study of the Danish web, started some years ago ›Read about the first digs in Brügger, N., Nielsen, J., Laursen, D. (2020). Big data experiments with the archived Web: Methodological reflections on studying the development of a nation's Web, First Monday, 25(3) 5
  • 6. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE DATA Acknowledgements: ›corpus extraction and selection of versions: Janne Nielsen, Ditte Laursen, Ulrich Have, Per Møldrup- Dalum ›initial calculations or words: Janne & Ulrich ›selected events to be studied: Ditte & me ›Textual analyses: Kristoffer Nielbo, Peter Vahlstrup, the Centre for Humanities Computing Aarhus (CHCAA, chcaa.io) 6
  • 7. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE DATA The corpus: ›all words on the Danish web as it has been archived by Netarkivet, ›in one time slice from each year, 2006 to 2016 ›language recognition performed, only analyse Danish words 7
  • 8. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 8
  • 9. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE ANALYTICAL DESIGN — FOUR APPROACHES ›First, we will follow the talk of the web itself by identifying a number of significant words among the most used words per year (minus stop words) ›e.g. 10-20 words chosen from the 1,000 most used words 9
  • 10. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 10 word n 1 til 735099980 2 med 444560089 3 for 396849909 4 det 394031045 5 der 273008926 6 den 267696098 7 har 251044875 8 kan 225782096 9 ikke 214874628 10 som 205071867 11 jeg 204010581 12 fra 199932948 13 mere 159701573 14 alle 134942476 15 kontakt 125492896 16 eller 124463101 17 dkk 119472355 18 din 112618070 19 her 104361397 20 2016 104005585 21 skal 102123933 22 ved 100361724 23 efter 95686049 24 pris 94400983 25 2015 93486153 26 men 90079959 27 læs 87427063 28 man 82128864 29 vil 80107116 30 vores 79158246 31 dig 77971690 32 også 76196557 33 var 75266165 34 dette 71606610 35 fragt 69648534 36 min 67659330 37 ind 67579071 38 søg 67341460 39 produkter 66794635 40 forside 65254918 41 2014 65073113 42 siden 64703860 43 the 64441149 44 tilbehør 62739776 45 hvis 60727133 › most used words 2016, stop words included › lots of prepositions, pronouns, etc. › also a few indicating commercial websites: dkk (Danish kroner), pris (prize), fragt (freight), produkter (products) — could be used to identify where trade takes place › other interesting words: nyheder (news, no 53), indlæg (comment, no 54), børn (children, no 105), cookies (no 162, interesting to see development), blog (no 212), spil (games, no 233), sport (no 237) — and many more about trade (tilbud (offer), kurv (basket), køb (buy), læg (put))
  • 11. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE ANALYTICAL DESIGN — FOUR APPROACHES ›Second, we will use 'the word of the year' for each year ›'the word of the year' has been chosen since 2006 ›including the other candidates that were not selected ›e.g. in 2020 that would be 'samfundssind' (public spirit), and 'afstand' (distance), 'albuehilsen' (elbow bump), 'flokimmunitet' (herd immunity), and 'mundbind' (face mask) 11
  • 12. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE ANALYTICAL DESIGN — FOUR APPROACHES ›Third, a number of discussions, topics or events that have set the agenda throughout each year are identified ›e.g. 2006, 'Muhammedkrise' (cartoon crisis), 2008, 'finanskrise' (financial crisis), 2012 'lukkelov' (shops Act), 2015 'flygtningekrise' (refugee crisis) ›a 'dictionary' of synonyms will be established for each event 12
  • 13. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE ANALYTICAL DESIGN — FOUR APPROACHES ›Fourth, the most used search terms on Google from Denmark are identified (either at Google or in legacy media where they are often mentioned at the end of each year) 13
  • 14. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 THE METHODS ›A variety of calculations of word occurences ›Train embeddings of words and documents to represent the lexical co-occurrence structure within and between websites ›Possibly, model and predict information propagation across the Danish web 14
  • 15. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 15 word word word word word word word word wordword word word word word word word word word word word wordword word wordword word wordword word word word word wordword word word Training in recognising topics — topic model + neural word embeddings 'Dictionary' of words, identification of lexical co- occurrence structure Grapf of where topics 'live' Biggest challenge: training efficiency and speed due to the size of data => training algorithms. Estimated training time: 4-6 months
  • 16. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 NEXT STEPS ›The textual analyses to be supplemented by a hyperlink network analysis ›Identify the link relations between websites where a given topic is talked about ›Hyperlinks have already been extracted as a separate dataset 16
  • 17. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 17 Topic x Topic x Topic x Topic x Topic x Topic x Topic x Topic x Topic x Topic x
  • 18. AARHUS UNIVERSITET WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE NIELS BRÜGGER 4 NOVEMBER 2021 NEXT STEPS Possible relevance in WARCnet: ›track topics from other countries on the Danish web, possibly on web pages of the country's language ›replicate the study based on holdings from other national web archive 18
  • 19. VERSITET NIELS BRÜGGER AARHUS UNIVERSITY 4 NOVEMBER 2021 UNI WHAT DOES A NATIONAL WEB TALK ABOUT? — DIGGING INTO BILLIONS OF WORDS, THE DANISH CASE