Slides of the paper Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project by Katrien Depuydt and Hennie Brugman at the 3rd Edition of the DATeCH2019 International Conference
Reflections from the Pelagios Commons by Leif Isaksen, Lancaster University. http://commons.pelagios.org. Presentation at the 1st Lancaster Data Conversations 30 January 2017
Data mining or data science is the process of applying computational and algorithmic methods to large datasets.
Text mining is collection of methods used to extract information not from “formalised database records” but from “unstructured textual data”
2013: A review of the year in digital marketing Crafted
It was the year that Google Hummingbird took flight and Instagram took on Twitter and YouTube in the online video race. Ian Miller, Search Director at Crafted recaps on some of 2013’s digital marketing milestones. From December's Digital Bites event http://www.digital-bites.co.uk
Google Analytics 100% (not provided) - what does it mean? Crafted
Google recently turned 15 years old, marking the occasion two significant developments that will be
of interest to, and affect, anyone using Google Analytics. One shift now obfuscates all keyword data from natural, or organic, search traffic (commonly a goal in an SEO campaign).
In this resource document we will give you the background on the stopping of all organic keyword data and in what ways it will affect digital marketing campaigns.
This booklet was created for the holidays that creatively showcased what a father would be doing for his daughter. At the end, a gift card was placed with adhesive.
All layout designs were created using Adobe Illustrator, Adobe Photoshop and Adobe InDesign.
Content Jam 2015: Great Content Starts Here: Positioning is More Than a State...Orbit Media Studios
Before you even start writing your content, it helps if you’ve defined your target customer and understand what your company’s core message platforms are. Developing a positioning statement will help you do both of those things and more!
In this workshop you’ll learn:
The 5 key things you need to know to talk clearly about your company
Using your positioning statement to create personas
Using positioning as a springboard for messaging
How to refine your company's existing positioning
Slides of the paper Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project by Katrien Depuydt and Hennie Brugman at the 3rd Edition of the DATeCH2019 International Conference
Reflections from the Pelagios Commons by Leif Isaksen, Lancaster University. http://commons.pelagios.org. Presentation at the 1st Lancaster Data Conversations 30 January 2017
Data mining or data science is the process of applying computational and algorithmic methods to large datasets.
Text mining is collection of methods used to extract information not from “formalised database records” but from “unstructured textual data”
2013: A review of the year in digital marketing Crafted
It was the year that Google Hummingbird took flight and Instagram took on Twitter and YouTube in the online video race. Ian Miller, Search Director at Crafted recaps on some of 2013’s digital marketing milestones. From December's Digital Bites event http://www.digital-bites.co.uk
Google Analytics 100% (not provided) - what does it mean? Crafted
Google recently turned 15 years old, marking the occasion two significant developments that will be
of interest to, and affect, anyone using Google Analytics. One shift now obfuscates all keyword data from natural, or organic, search traffic (commonly a goal in an SEO campaign).
In this resource document we will give you the background on the stopping of all organic keyword data and in what ways it will affect digital marketing campaigns.
This booklet was created for the holidays that creatively showcased what a father would be doing for his daughter. At the end, a gift card was placed with adhesive.
All layout designs were created using Adobe Illustrator, Adobe Photoshop and Adobe InDesign.
Content Jam 2015: Great Content Starts Here: Positioning is More Than a State...Orbit Media Studios
Before you even start writing your content, it helps if you’ve defined your target customer and understand what your company’s core message platforms are. Developing a positioning statement will help you do both of those things and more!
In this workshop you’ll learn:
The 5 key things you need to know to talk clearly about your company
Using your positioning statement to create personas
Using positioning as a springboard for messaging
How to refine your company's existing positioning
Slides of the paper Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable by Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Marie-Claire Beaulieu and Michèle Brunet's Paper presented at Second North American Congress of Greek and Latin Epigraphy University of California at Berkeley (USA)
on 2016, January 4
http://aleshire.berkeley.edu/nacgle-2016
Slides for my keynote presentateion "Linked Data for Digital History" presented at Semantic Web for Scientific History (SW4SH) co-located with ESWC 2015
Innovative methods for data integration: Linked Data and NLPariadnenetwork
Linked Data (LD) + Natural Language Processing (NLP)
Two technologies that open up new possibilities for semantic integration of archaeological datasets and fieldwork reports.
Overview
•Illustrative early examples
- a flavour of progress and challenges to date
•NLP of grey literature (English – Dutch)
•Mapping between multilingual vocabularies
DHI2018 - a comparative study of Chinese and English publicationsJin Gao
Gao, Jin., Mahony, Simon., Duke-Williams, Oliver., and Nyhan, Julianne. (2018). What do we write about in the Digital Humanities? A comparative study of Chinese and English publications. Paper presented at the Digital Humanities Congress 2018, Sheffield, UK. Available at: https://www.dhi.ac.uk/dhc/2018/paper/133
Présentation de Biblissima au Workshop COST Medioevo Europeo "Medieval Scholarly Research and the Digital Ecosystem" (Florence), par Anne-Marie Turcan-Verkerk
Bringing Digital Humanities to the wider public: libraries as incubator for D...Martijn Kleppe
Keynote presented at Language Technologies & Digital Humanities Conferences 2018, Ljubljana, Slovenia.
Abstract:
Digital Humanities researchers rely on large digital datasets. Since the National Library of the Netherlands (KB) has been digitizing its collection for about ten years, their datasets are popular amongst DH scholars that focus on historical newspapers, periodicals and books. By not only supporting researchers by giving them access to datasets, but also by collaborating with them, the KB aims to incorporate DH research results in their services and products. We do this by sharing our prototype tools and code on our online Lab , invite academics to come and work as researcher-in-residence and are full partner in research projects. In this talk I will describe the challenges and opportunities for libraries and academics when they collaborate. What can researchers gain from collaborating with libraries? And how can libraries bring the affordances of DH research to the wider public?
Presentatie 'Transkribus. A research infrastructure for transcribing, recognizing and searching archival documents' tijdens de studiedag Googelen door Archieven op 13 oktober bij het Nationaal Archief.
Text-Fabric: how to do text research in a FAIR way.
Text is one of the simplest and most common data types in computer science.
But there is a lot in text that does not meet the eye, and so people have been annotating texts, century-by-century.
When you research texts, you consume and produce such annotations.
Suddenly you find yourself in the midst of a big fabric of thoughts, contributed by many authors.
Text-Fabric is a tool that helps you to follow the threads that came before you and to weave a few of your own and add them to the scholarly record.
I'll show you how that looks for clay tablets of the Uruk period (the oldest writing on earth), the much more recent Hebrew Bible, and the ultramodern General Missives of the VOC time.
Towards TextPy, a module for processing text.
If we define annotated text as a graph with additional structure, we can make text processing more efficient, in the same way that Pandas makes processing dataframes more efficient.
More Related Content
Similar to 2010 Digital Humanities London - Dutch Republic of Letters
Slides of the paper Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable by Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames at the 3rd Edition of the DATeCH2019 International Conference
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Marie-Claire Beaulieu and Michèle Brunet's Paper presented at Second North American Congress of Greek and Latin Epigraphy University of California at Berkeley (USA)
on 2016, January 4
http://aleshire.berkeley.edu/nacgle-2016
Slides for my keynote presentateion "Linked Data for Digital History" presented at Semantic Web for Scientific History (SW4SH) co-located with ESWC 2015
Innovative methods for data integration: Linked Data and NLPariadnenetwork
Linked Data (LD) + Natural Language Processing (NLP)
Two technologies that open up new possibilities for semantic integration of archaeological datasets and fieldwork reports.
Overview
•Illustrative early examples
- a flavour of progress and challenges to date
•NLP of grey literature (English – Dutch)
•Mapping between multilingual vocabularies
DHI2018 - a comparative study of Chinese and English publicationsJin Gao
Gao, Jin., Mahony, Simon., Duke-Williams, Oliver., and Nyhan, Julianne. (2018). What do we write about in the Digital Humanities? A comparative study of Chinese and English publications. Paper presented at the Digital Humanities Congress 2018, Sheffield, UK. Available at: https://www.dhi.ac.uk/dhc/2018/paper/133
Présentation de Biblissima au Workshop COST Medioevo Europeo "Medieval Scholarly Research and the Digital Ecosystem" (Florence), par Anne-Marie Turcan-Verkerk
Bringing Digital Humanities to the wider public: libraries as incubator for D...Martijn Kleppe
Keynote presented at Language Technologies & Digital Humanities Conferences 2018, Ljubljana, Slovenia.
Abstract:
Digital Humanities researchers rely on large digital datasets. Since the National Library of the Netherlands (KB) has been digitizing its collection for about ten years, their datasets are popular amongst DH scholars that focus on historical newspapers, periodicals and books. By not only supporting researchers by giving them access to datasets, but also by collaborating with them, the KB aims to incorporate DH research results in their services and products. We do this by sharing our prototype tools and code on our online Lab , invite academics to come and work as researcher-in-residence and are full partner in research projects. In this talk I will describe the challenges and opportunities for libraries and academics when they collaborate. What can researchers gain from collaborating with libraries? And how can libraries bring the affordances of DH research to the wider public?
Presentatie 'Transkribus. A research infrastructure for transcribing, recognizing and searching archival documents' tijdens de studiedag Googelen door Archieven op 13 oktober bij het Nationaal Archief.
Similar to 2010 Digital Humanities London - Dutch Republic of Letters (20)
Text-Fabric: how to do text research in a FAIR way.
Text is one of the simplest and most common data types in computer science.
But there is a lot in text that does not meet the eye, and so people have been annotating texts, century-by-century.
When you research texts, you consume and produce such annotations.
Suddenly you find yourself in the midst of a big fabric of thoughts, contributed by many authors.
Text-Fabric is a tool that helps you to follow the threads that came before you and to weave a few of your own and add them to the scholarly record.
I'll show you how that looks for clay tablets of the Uruk period (the oldest writing on earth), the much more recent Hebrew Bible, and the ultramodern General Missives of the VOC time.
Towards TextPy, a module for processing text.
If we define annotated text as a graph with additional structure, we can make text processing more efficient, in the same way that Pandas makes processing dataframes more efficient.
We demonstrate how Text-Fabric can handle the display of text and annotations, even when chunks of text are not properly embedded in each other. This demo contains examples from the Hebrew Bible and the Old Babylonian Letters (cuneiform clay tablets).
Researchers in ancient text corpora can take control over their data. We show a way to do so by means of Text-Fabric.
Co-production of Cody Kingham and Dirk Roorda
Biblia Hebraica Stuttgartensia Amstelodamensis. Coding the Hebrew Bible with an Open Science ethos: Text-Fabric.
Text-Fabric is several things: (1) a browser for ancient text corpora; (2) a Python3 package for processing ancient corpora
A corpus of ancient texts and linguistic annotations represents a large body of knowledge. Text-Fabric makes that knowledge accessible to non-programmers by means of built-in a search interface that runs in your browser.
From there the step to program your own analytics is not so big anymore. Because you can call the Text-Fabric API from your Python programs, and it works really well in Jupyter notebooks.
Developing a tool for handling text with linguistic annotations. Text-Fabric is meant to support researchers that wnat to contribute portions of the data, and weaves the contributions in into a meaningful whole. Currently, it is primarily meant for working with the Hebrew Bible, based on the ETCBC (Amsterdam) linguistic database.
Conference presentation for 2016 annual meeting of the Society of Biblical Literature, San Antonio. (https://www.sbl-site.org).
Authors: Janet Dyk (linguistic ideas) and Dirk Roorda (computational implementation).
A verb organizes the elements in a sentence. Different patterns of constituents affect the meaning of a verb in a given context. The potential of a verb to combine with patterns of elements is known as its valence. A single set of questions, organized as a flow chart, selects the relevant building blocks within the context of a verb. The resulting pattern provides a particular significance for the verb in question. Because all contexts are submitted to the same flow chart, similarities and differences between verbs come to light. For example, verbs of movement in their causative formation manifest the same patterns as transitive verbs with an object that gets moved. We apply this approach to the whole Hebrew Bible, using the database of the Eep Talstra Centre for Bible and Computer (ETCBC), which contains the relevant linguistic annotations. This allows us to have a complete listing of all patterns for all verbs. It provides the basis for consistent proposals for the significance of specific patterns occurring with a particular verb. The valence results are made available in SHEBANQ, an online research tool based on the ETCBC database. It presents the basic data, text and linguistic features, together with annotations by researchers. The valence results consist of a set of algorithmically generated annotations which show up between the lines of the text. The algorithm itself and its documentation can be found at https://shebanq.ancient-data.org/tools?goto=valence. By using SHEBANQ we achieve several goals with respect to the scholarly workflow: (1) all our results are openly accessible online, and other researchers may comment on them; (2) all resources needed to reproduce this research are available online and can be downloaded (Open Access).
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
Datamanagement for Research: A Case StudyDirk Roorda
How practices of data sharing can help researchers to produce more science.
Session in the data management course organized by RDNL (Research Data in the Netherlands)
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
Recently, the Hebrew Bible has been published online as a database. We show what you can do with it, and how to share your results with others. Work by the Amsterdam scholars of the Eep Talstra Centre for Bible and Computer, supported by CLARIN-NL.
LAF-Fabric: a tool to process the ETCBC Hebrew Text Database in Linguistic Annotation Framework.
How researchers in theology and linguistics can create workflows to analyse the text of the Hebrew Bible and extract data for visualization. Those workflows can be written in Python, and run conveniently in the IPython Notebook.
Joint work with Martijn Naaijer (VU University).
With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
2010 Digital Humanities London - Dutch Republic of Letters
1. Letters, Ideas and scholarly
communication
Information Technology @ 1650
Using digital corpora of letters to
disclose the circulation of
knowledge in the 17th century
Erik-Jan Bos, Univ. Utrecht,
erik-jan.bos@phil.uu.nl
scholarly
communication
Charles van den Heuvel, VKS, @ 2050
charles.vandenheuvel@vks.knaw.nl
Dirk Roorda (that’s me), DANS,
dirk.roorda@dans.knaw.nl
3. Nota
Beeckman
Cats STEVIN
relation disciplines
direct - water
indirect - literature
Huygens STEVIN
Langeren
4. Corpora of
17th century scholars
Constantijn Huygens
Christiaan Huygens
Grotius
Descartes
Swammerdam
Leeuwenhoek
Barleaus
Spinoza 4
and more?
5. Corpus Number In Format Metadata Normalized?
of letters: posession?
Grotius 7946 Yes TEI In Interp Yes, DBNL
element codes
Van 337 Yes TEI In Interp Yes, DBNL
Leeuwenhoek element codes
Descartes 750 Yes XML (no other No, plain text
TEI) markup
Barlaeus 1200 300 ready Word unknown unknown
Swammerdam 80 Yes Word unknown unknown
Constantijn 7295 Yes xml Probably DBNL codes
Huygens Interp
element
Christiaan 2900? Medio 2010 probably Probably DBNL codes
Huygens TEI Interp
element
6. CEN -Metadata
Catalogus Epistularum Neerlandaricum
265,000 descriptions of approximately
1,000,000 letters
from 1600 – now of which
100,000 letters in 17th century
7. Research Questions
• History of science:
• How did knowledge circulate in the 17th-
century Dutch Republic?
• Patterns in knowledge growth:
• How can we visualise sets of letters that
exhibit features of knowledge circulation?
• Re-use:
• How can we expose the sources, annotations,
and resulting patterns to further research?
8. Challenge
Traditional scholarship
• interpretation
• close reading East
• solving puzzles
Computational methods We
•dealing with patterns st
•gleaned from large quantities of texts
•by automatic tools
East is east and
West is west and ...
9. Issues to deal with
• making the sources uniformly available
• well coded in TEI, access rights
• overcoming the language barrier
• (17th cent varieties of French, Latin, Dutch)
• named entity recognition & concepts
• people, places, dates, concepts, instruments
• mixture of interpretation and algorithms
• creating useful visualisations
• aiding exploration by historians of science
10. ICT in Humanities Research
• collaboratory
• e-Laborate as starting point
• algorithmic pipelines
• from source material to visualisation
• infrastructure
• archiving results
• re-using data
• developing new algorithms
• disseminating the methodology
13. pipelines (current)
• language detection, using
Language Identification from Text Using N-gram Based
Cumulative Frequency Addition
Bashir Ahmed, Sung-Hyuk Cha, and Charles Tappert 2004
• results
latin
dutch
french
german
14. pipelines (current)
• spelling normalisation
• VARD (http://www.comp.lancs.ac.uk/~barona/vard2/)
• with help from (http://www.dicollecte.org/home.php?prj=fr)
• results
• French: VARD works (after improvements),
although designed for historical English
• Dutch: still on the lookout for a combination of
resources, tools, and dexterity
• Latin: later
20. the project’s legacy
• more than publications
• curated sources, annotations, visualisations
• more than algoritms
• a framework for analysis of historical texts
• more than a piece of historical research
• data and (intermediate) results worthwhile to
• linguists, computer scientists, sociologists
• more than a passive dataset
• extensible, dynamic, interactive
21. preserving the results
• part of the CLARIN infrastructure
• http://www.clarin.eu/
• http://www.clarin.nl/
• materials in a Trusted Digital Repository
(DANS)
• http://easy.dans.knaw.nl/dms
22. working with CLARIN
• CLARIN-EU
• Outreach to humanities: use cases
• CKCC one of 10 selected projects
• received expert input for choice of language
tools
• CLARIN-NL
• CKCC one of 10 initial projects in the Dutch
national construction effort
• support for applying language technology
23. Adapting to CLARIN
• Conforming to standards
• CLARIN standards are in evolution
• (and will remain evolvable)
• Common MetaData Infrastructure
• a registry of metadata components
• defined by the community
• with explicit semantics (http://www.isocat.org/ )
• Data in TEI (as export/import format)
24. Trusted Digital Repository
• materials
• reliable (provenance metadata)
• findable (CMDI metadata)
• referable (persistent identifiers)
• accessible (viewable in webbrowser)
• usable (downloadable)
• sooner or later:
• high-performance computing
• memento: a time-sensitive webinterface to the
dynamic contents of the collaboratory
(http://arxiv.org/abs/0911.1112 )
Slide 7 Vergelijking Waterschyring met model voor het schuren van een haven in het binnenland gelegen door middel van spilsluizen en de afwatering in de kaart van Note Hier zien we duidelijk overeenkomsten. Echter, ondanks grote overeenkomsten in de figuur is het door onduidelijkheden in de datering van de niet door Stevin gepubliceerde teksten moeilijk na te gaan of dit werk Note kan hebben geïnspireerd. In ieder geval heeft, zoals we nog zullen zien, een ander werk van Stevin een grotere rol gespeeld in Note ’ s argumentatie van zijn uitvinding. Boevendien wordt het werk expliciet door Beeckman ’ s ter ondersteuning van Note ’ s verdediging genoemd. Het betreft De Beghinselen des Waterwichts van 1584.
Catalogus Epistularum Neerlandaricum (CEN), or the Catalogue of letters in Dutch repositories. It is a relatively old database, already available via Telnet in the early 1990s, before the world wide web came into being. CEN is an exhaustive database of letters in the collections of five Dutch university libraries, the Royal Library, and four other important libraries. It contains more than 265,000 descriptions of approximately 1,000,000 letters, dating from 1600 until the present day (of which ca. 100,000 from the 17th century). It supplies the following metadata: sender, recipient, place of sending, year, language, repository and shelf mark. The format in which this database will be made available to the project is to be negotiated with the owner, OCLC6. Usage of this database will enable us to make assertions about the fraction of the selected letters with respect to the total body of letters. Moreover, it allows us to increase the density of the networks we are interested in, leading to unprecedented research opportunities.
How did knowledge circulate in the 17th-century Dutch Republic? How were elements of knowledge picked up by the learned community? How was this new knowledge processed, disseminated, theorized and ultimately accepted, or rejected? How can we combine and structure various sets of letters of 17th-century scholars in such a way that we can analyze the circulation of knowledge in an international context and follow the development of themes of interest in space and time? How can we make this information on knowledge production accessible to interdisciplinary research in the Humanities?
How can we combine and structure various sets of letters of 17th-century scholars and their correspondents in such a way that we can analyze and visualize the circulation and appropriation of knowledge production in a wider international context and recognize the development of themes of interest and scholarly debates in space and time? How can we make this information on knowledge production accessible to interdisciplinary research in the Humanities? How can this information be enriched by annotation ?
Letters not uniformly available Multilingual and spelling variations Automated/Manual Linking and Tagging: Much interpretations needed to resolve references to names, dates, places, ideas and concepts; heterogeneous annotations How to make visualizations informative for research at basis of data? Qualitative: Who is corresponding/introducing? Can we distinguish circles and types of scholars? Where are they located/do they meet? Can we distinguish types of letters/rethorical structures? Can we distinguish emerging themes and debates in these networks? Quantitative: Number of correspondents. Frequency and duration of correspondence. Percentage of various languages and themes.
!NB mention distinction between keyword and concept extraction
WMatrix: good on a per letter basis; not so handy for the whole corpus
LDA is puur statistich je kunt de input voor LDA verbeteren door stemming je kunt NER verbeteren door part of speech analysis concept extraction LDA is voor topical modeling keywords => topics samenstellen => labelen topic modeling => concepten
Topic Modelling – with Mallet and LDA latent Dirichlet allocation an Relational Topical Modelling topics linked to senders and receivers of letters Comment on dips and peaks – worth exploring the little guys! Why are they peaking? next step: visualise the dynamics of topics in geography (buienradar)
De nadruk op infrastructuur -voor CLARIN -ook Alfalab -toekomstige computational humanities -geleerdenbrieven (nu ook een CLARIN-NL project)