Digital History seminar and Archives and Society seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Making Materials Findable at the State Library of VictoriaAlan Manifold
Discussion of some of the issues involved with the many data sources and repositories in use at the State Library of Victoria, how they interact and some of the solutions we have come up with to resolve them.
KM SHOWCASE 2019 - Findability StrategiesKM Institute
This document discusses strategies for improving findability of knowledge across the Inter-American Development Bank (IDB). It proposes using word vectors and embedding techniques to automatically tag and extract knowledge from IDB documents, datasets, and publications. This extracted knowledge would be stored in a knowledge base and used to power a recommendation system that provides personalized information to internal and external users based on their profiles and contexts. The goal is to make more of IDB's information easily discoverable and anticipate users' needs through advanced semantic analysis techniques.
a system called natural language interface which transforms user's natural language question into SPARQL query
find related papers here https://sites.google.com/site/fadhlinams81/publication
Presentation delivered at the Linked Ancient World Data Institute, Drew University, 30 May 2013.
Copyright 2013 New York University.
This work is licensed under a Creative Commons Attribution 4.0 International License.
http://creativecommons.org/licenses/by/4.0/deed.en_US
Funding for the preparation and presentation of this presentation and the workshop at which it was presented was provided by the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.
Peter webster interrogating the archived uk webDigital History
This document summarizes a project that analyzes the JISC UK Web Domain Dataset from 1996-2013 to understand the development of UK web space over time. The project aims to establish frameworks for analyzing web archives and explore ethical implications. It will produce tools to support analysis, case studies across disciplines, and training materials. The dataset contains around 300 million resources from the UK web captured by the Internet Archive, but lacks metadata about subjects and dates. The project highlights the value of web archives as historical sources.
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Richard deswarte interrogating the archived uk webDigital History
Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
Making Materials Findable at the State Library of VictoriaAlan Manifold
Discussion of some of the issues involved with the many data sources and repositories in use at the State Library of Victoria, how they interact and some of the solutions we have come up with to resolve them.
KM SHOWCASE 2019 - Findability StrategiesKM Institute
This document discusses strategies for improving findability of knowledge across the Inter-American Development Bank (IDB). It proposes using word vectors and embedding techniques to automatically tag and extract knowledge from IDB documents, datasets, and publications. This extracted knowledge would be stored in a knowledge base and used to power a recommendation system that provides personalized information to internal and external users based on their profiles and contexts. The goal is to make more of IDB's information easily discoverable and anticipate users' needs through advanced semantic analysis techniques.
a system called natural language interface which transforms user's natural language question into SPARQL query
find related papers here https://sites.google.com/site/fadhlinams81/publication
Presentation delivered at the Linked Ancient World Data Institute, Drew University, 30 May 2013.
Copyright 2013 New York University.
This work is licensed under a Creative Commons Attribution 4.0 International License.
http://creativecommons.org/licenses/by/4.0/deed.en_US
Funding for the preparation and presentation of this presentation and the workshop at which it was presented was provided by the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.
Peter webster interrogating the archived uk webDigital History
This document summarizes a project that analyzes the JISC UK Web Domain Dataset from 1996-2013 to understand the development of UK web space over time. The project aims to establish frameworks for analyzing web archives and explore ethical implications. It will produce tools to support analysis, case studies across disciplines, and training materials. The dataset contains around 300 million resources from the UK web captured by the Internet Archive, but lacks metadata about subjects and dates. The project highlights the value of web archives as historical sources.
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Richard deswarte interrogating the archived uk webDigital History
Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
The Pictorial publisher - Agents technologies and the illustrrated book in Br...Digital History
The document discusses the rise of illustrated books and publishing technologies in Britain between 1830-1850. It provides examples of steel engravings and wood engravings used in popular history books and travel guides during this period. Graphs show the increasing number of illustrations and their area in volumes of Charles Knight's Pictorial History of England, reflecting the growing popularity of illustrations in books during this time.
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...DigiVictorian
IHR Digital History Seminar - 10 November 2015
History has not been kind to Victorian jokes. While the great works of nineteenth-century art and literature have been preserved and celebrated by successive generations, even the period’s most popular gags have largely been forgotten. In the popular imagination, the Victorians have long been regarded as terminally humourless; a straitlaced society who, in the words of their queen, were famously “not amused” And yet, millions of jokes were written during the nineteenth century. They were printed in books and newspapers, performed in theatres and music halls, and re-told in pubs, offices, taxicabs, schoolrooms and kitchens throughout the land. Like many other forms of ephemeral popular culture, the majority of these jokes were never recorded and have now been forgotten.
But all is not lost. Millions of puns, gags, and comic sketches have been preserved – often by accident – in archives of nineteenth-century print culture. Some appear in dedicated joke books and comic periodicals, but most have survived as stowaways in the margins of other texts. They are scattered throughout thousands of Victorian books, newspapers, magazines, and periodicals. While some were organised into clearly demarcated collections, others were used more haphazardly as column fillers or sprinkled randomly among other tit-bits of news and entertainment. Until recently, the only way to locate these scattered fragments amidst the ‘vast terra incognita’ of Victorian print culture was to identify a promising host-text and then browse through it manually. The digitisation of Victorian print culture has opened up new possibilities for this kind of research. However, as this talk will argue, the structure of digital archives means that jokes are still buried among millions of pages of other content. In order to make these, and other marginalised texts, more visible, we need to rethink the organisation of our digital collections and open up their contents to creative forms of archival ‘remixing’.
In 2014, Bob Nicholson (Edge Hill University) teamed up with the British Library Labs on a project that aims to find and revive thousands of forgotten Victorian jokes. Their ‘Victorian Meme Machine’ automatically converts old jokes into images and posts them on social media using a ‘Mechanical Comedian’ (@VictorianHumour). In this presentation, Bob will report on the progress of the project and outline his plans for a new transcription platform designed around the principles of ‘meaningful gamifaction’.
European or Imperial Metropolis? Depictions of London in British Newspapers, ...Digital History
Tessa Hauswedell
Digital History Seminar
Institute of Historical Research
19 January 2016
http://ihrdighist.blogs.sas.ac.uk/2015/12/14/tuesday-19-january-2016-tessa-hauswedell-european-or-imperial-metropolis-depictions-of-london-in-british-newspapers-1870-1900/
This document lists the titles of numerous 19th century newspapers from the United States and abroad. It includes major papers like the New York Times as well as smaller regional papers. The list appears to be compiled to show the wide circulation and reprinting of content across different publications in the 19th century newspaper landscape.
Emma Bayne: ‘Traces Through Time overview and next steps’ Digital History
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Gareth millwood interrogating the archived uk webDigital History
Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
Digital History seminar
29 September 2015
Ian Milligan (University of Waterloo)
http://ihrdighist.blogs.sas.ac.uk/2015/09/01/tuesday-29-september-2015-ian-milligan-the-challenge-of-digital-sources-in-the-web-age-common-tensions-across-three-web-histories-1994-2015/
This document discusses mapping the neighborhoods and addresses of 18th century artists in Paris. It outlines sources like membership records from the Académie and almanacs that contain artist addresses. The addresses will be plotted on georeferenced historical maps of Paris from the 18th century. The mapped data and interactive platform will be made available online to show where artists lived and worked in different areas of Paris over time.
Political Meetings Mapper with British Library Labs: mapping the origins of B...Digital History
On April 10, 1848, between 150,000 and 300,000 Chartists gathered on Kennington Common in London to demand political reforms, including universal suffrage. This was one of the largest mass gatherings in British history up to that point. The meeting was peaceful and orderly, with the crowd listening to speeches calling for democratic reforms through non-violent means. However, the movement's petition to Parliament was rejected, disappointing the Chartists and diminishing the campaign's momentum.
Sonia Ranade: 'Traces Through Time overview and next steps'Digital History
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
This document discusses using big data techniques to analyze bibliographical music datasets in order to uncover new narratives in music history. It outlines several large datasets that will be cleaned and analyzed, including catalogs of printed music and manuscripts. The goals are to pilot big data research methods on these datasets, make some datasets publicly available, and potentially challenge traditional music history narratives. Specific examples analyze printing trends over time and locations, and publications related to the composer Palestrina.
This document discusses using network analysis on metadata from Tudor-era correspondence to map intelligence networks and identify important figures. It finds that Edward Courtenay, Earl of Devon, who was suspected of conspiracy, had unusually high "betweenness" in the network, suggesting he bridged opposed factions and was potentially dangerous. It also identifies the Catholic double agent John Snowden based on his betweenness and correspondence patterns. The analysis shows network properties can predict spies and conspirators by finding those who connect disparate groups.
This document discusses crowdsourcing and citizen history projects in cultural heritage. It defines crowdsourcing as outsourcing functions to a large network of people through an open call. Cultural heritage crowdsourcing involves meaningful public tasks related to collections that provide inherent rewards. Citizen science involves public participation in data processing, analysis, and research design. Participatory project models range from contributory to collaborative to co-creative. Examples of citizen history projects discussed include FreeBMD, Old Weather, Operation War Diary, and Children of the Lodz Ghetto. The document examines competing models of citizen history and structural issues that can undermine such projects.
The document discusses using text mining tools to analyze historical medical texts in order to extract semantic information like entities, events, and relationships. It describes a project mining the British Medical Journal and London Medical Officer of Health reports to build a semantic search system that can help researchers investigate the treatment and prevention of diseases over time and in different areas. The project aims to address challenges with optical character recognition (OCR) errors and terminology changes by applying techniques like OCR correction, named entity recognition, and identifying historical variants.
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Tracking the Emergence of New Words across Time and SpaceDigital History
This document discusses tracking the emergence and spread of new words across time and space using a large Twitter corpus. It identifies rising and emerging words from 2014 using correlation analysis and cross-referencing with rare words. Many emerging words follow an S-curve pattern of increasing usage over time. Mapping analyses show words tend to spread from urban to surrounding areas, though factors like population density and demographics also influence patterns of geographical diffusion.
This document discusses techniques for automatically generating topic facets from documents to enable semantic information discovery and retrieval. Topic facets are collections of shared phrases between documents that indicate a semantic agreement or relationship between them on a particular topic. The document provides an example of how topic facets could be generated from six sample documents that share terms, and how the documents could be grouped into sets according to their shared terms and topic facets. It also discusses how topic facets relax the requirement to assign single topics or concepts to documents by allowing fractional or multi-topic agreements.
This document provides an overview of text mining and natural language processing techniques. It discusses bag-of-words approaches, part-of-speech tagging, word sense disambiguation, parsing, and other shallow NLP methods. While full natural language processing is difficult, the document argues that shallow NLP techniques can still provide useful information for text mining applications.
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
Text mining and natural language processing techniques can be used to extract useful information from text data. Common text mining tasks include text categorization to classify documents into predefined categories, document clustering to group similar documents without predefined categories, and keyword-based association analysis to find frequently co-occurring terms. Text classification algorithms such as support vector machines, naive Bayes classifiers, and neural networks are often applied to categorized documents. The vector space model is commonly used to represent documents as vectors of term weights.
Text mining and natural language processing techniques can be used to extract useful information from text data. Common text mining tasks include text categorization to classify documents into predefined categories, document clustering to group similar documents without predefined categories, and keyword-based association analysis to find frequent patterns and relationships between keywords in a collection of documents. Text classification algorithms such as support vector machines, k-nearest neighbors, naive Bayes, and neural networks can be applied to categorize documents based on their contents.
The Pictorial publisher - Agents technologies and the illustrrated book in Br...Digital History
The document discusses the rise of illustrated books and publishing technologies in Britain between 1830-1850. It provides examples of steel engravings and wood engravings used in popular history books and travel guides during this period. Graphs show the increasing number of illustrations and their area in volumes of Charles Knight's Pictorial History of England, reflecting the growing popularity of illustrations in books during this time.
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...DigiVictorian
IHR Digital History Seminar - 10 November 2015
History has not been kind to Victorian jokes. While the great works of nineteenth-century art and literature have been preserved and celebrated by successive generations, even the period’s most popular gags have largely been forgotten. In the popular imagination, the Victorians have long been regarded as terminally humourless; a straitlaced society who, in the words of their queen, were famously “not amused” And yet, millions of jokes were written during the nineteenth century. They were printed in books and newspapers, performed in theatres and music halls, and re-told in pubs, offices, taxicabs, schoolrooms and kitchens throughout the land. Like many other forms of ephemeral popular culture, the majority of these jokes were never recorded and have now been forgotten.
But all is not lost. Millions of puns, gags, and comic sketches have been preserved – often by accident – in archives of nineteenth-century print culture. Some appear in dedicated joke books and comic periodicals, but most have survived as stowaways in the margins of other texts. They are scattered throughout thousands of Victorian books, newspapers, magazines, and periodicals. While some were organised into clearly demarcated collections, others were used more haphazardly as column fillers or sprinkled randomly among other tit-bits of news and entertainment. Until recently, the only way to locate these scattered fragments amidst the ‘vast terra incognita’ of Victorian print culture was to identify a promising host-text and then browse through it manually. The digitisation of Victorian print culture has opened up new possibilities for this kind of research. However, as this talk will argue, the structure of digital archives means that jokes are still buried among millions of pages of other content. In order to make these, and other marginalised texts, more visible, we need to rethink the organisation of our digital collections and open up their contents to creative forms of archival ‘remixing’.
In 2014, Bob Nicholson (Edge Hill University) teamed up with the British Library Labs on a project that aims to find and revive thousands of forgotten Victorian jokes. Their ‘Victorian Meme Machine’ automatically converts old jokes into images and posts them on social media using a ‘Mechanical Comedian’ (@VictorianHumour). In this presentation, Bob will report on the progress of the project and outline his plans for a new transcription platform designed around the principles of ‘meaningful gamifaction’.
European or Imperial Metropolis? Depictions of London in British Newspapers, ...Digital History
Tessa Hauswedell
Digital History Seminar
Institute of Historical Research
19 January 2016
http://ihrdighist.blogs.sas.ac.uk/2015/12/14/tuesday-19-january-2016-tessa-hauswedell-european-or-imperial-metropolis-depictions-of-london-in-british-newspapers-1870-1900/
This document lists the titles of numerous 19th century newspapers from the United States and abroad. It includes major papers like the New York Times as well as smaller regional papers. The list appears to be compiled to show the wide circulation and reprinting of content across different publications in the 19th century newspaper landscape.
Emma Bayne: ‘Traces Through Time overview and next steps’ Digital History
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Gareth millwood interrogating the archived uk webDigital History
Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
Digital History seminar
29 September 2015
Ian Milligan (University of Waterloo)
http://ihrdighist.blogs.sas.ac.uk/2015/09/01/tuesday-29-september-2015-ian-milligan-the-challenge-of-digital-sources-in-the-web-age-common-tensions-across-three-web-histories-1994-2015/
This document discusses mapping the neighborhoods and addresses of 18th century artists in Paris. It outlines sources like membership records from the Académie and almanacs that contain artist addresses. The addresses will be plotted on georeferenced historical maps of Paris from the 18th century. The mapped data and interactive platform will be made available online to show where artists lived and worked in different areas of Paris over time.
Political Meetings Mapper with British Library Labs: mapping the origins of B...Digital History
On April 10, 1848, between 150,000 and 300,000 Chartists gathered on Kennington Common in London to demand political reforms, including universal suffrage. This was one of the largest mass gatherings in British history up to that point. The meeting was peaceful and orderly, with the crowd listening to speeches calling for democratic reforms through non-violent means. However, the movement's petition to Parliament was rejected, disappointing the Chartists and diminishing the campaign's momentum.
Sonia Ranade: 'Traces Through Time overview and next steps'Digital History
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
This document discusses using big data techniques to analyze bibliographical music datasets in order to uncover new narratives in music history. It outlines several large datasets that will be cleaned and analyzed, including catalogs of printed music and manuscripts. The goals are to pilot big data research methods on these datasets, make some datasets publicly available, and potentially challenge traditional music history narratives. Specific examples analyze printing trends over time and locations, and publications related to the composer Palestrina.
This document discusses using network analysis on metadata from Tudor-era correspondence to map intelligence networks and identify important figures. It finds that Edward Courtenay, Earl of Devon, who was suspected of conspiracy, had unusually high "betweenness" in the network, suggesting he bridged opposed factions and was potentially dangerous. It also identifies the Catholic double agent John Snowden based on his betweenness and correspondence patterns. The analysis shows network properties can predict spies and conspirators by finding those who connect disparate groups.
This document discusses crowdsourcing and citizen history projects in cultural heritage. It defines crowdsourcing as outsourcing functions to a large network of people through an open call. Cultural heritage crowdsourcing involves meaningful public tasks related to collections that provide inherent rewards. Citizen science involves public participation in data processing, analysis, and research design. Participatory project models range from contributory to collaborative to co-creative. Examples of citizen history projects discussed include FreeBMD, Old Weather, Operation War Diary, and Children of the Lodz Ghetto. The document examines competing models of citizen history and structural issues that can undermine such projects.
The document discusses using text mining tools to analyze historical medical texts in order to extract semantic information like entities, events, and relationships. It describes a project mining the British Medical Journal and London Medical Officer of Health reports to build a semantic search system that can help researchers investigate the treatment and prevention of diseases over time and in different areas. The project aims to address challenges with optical character recognition (OCR) errors and terminology changes by applying techniques like OCR correction, named entity recognition, and identifying historical variants.
Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Tracking the Emergence of New Words across Time and SpaceDigital History
This document discusses tracking the emergence and spread of new words across time and space using a large Twitter corpus. It identifies rising and emerging words from 2014 using correlation analysis and cross-referencing with rare words. Many emerging words follow an S-curve pattern of increasing usage over time. Mapping analyses show words tend to spread from urban to surrounding areas, though factors like population density and demographics also influence patterns of geographical diffusion.
This document discusses techniques for automatically generating topic facets from documents to enable semantic information discovery and retrieval. Topic facets are collections of shared phrases between documents that indicate a semantic agreement or relationship between them on a particular topic. The document provides an example of how topic facets could be generated from six sample documents that share terms, and how the documents could be grouped into sets according to their shared terms and topic facets. It also discusses how topic facets relax the requirement to assign single topics or concepts to documents by allowing fractional or multi-topic agreements.
This document provides an overview of text mining and natural language processing techniques. It discusses bag-of-words approaches, part-of-speech tagging, word sense disambiguation, parsing, and other shallow NLP methods. While full natural language processing is difficult, the document argues that shallow NLP techniques can still provide useful information for text mining applications.
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
Text mining and natural language processing techniques can be used to extract useful information from text data. Common text mining tasks include text categorization to classify documents into predefined categories, document clustering to group similar documents without predefined categories, and keyword-based association analysis to find frequently co-occurring terms. Text classification algorithms such as support vector machines, naive Bayes classifiers, and neural networks are often applied to categorized documents. The vector space model is commonly used to represent documents as vectors of term weights.
Text mining and natural language processing techniques can be used to extract useful information from text data. Common text mining tasks include text categorization to classify documents into predefined categories, document clustering to group similar documents without predefined categories, and keyword-based association analysis to find frequent patterns and relationships between keywords in a collection of documents. Text classification algorithms such as support vector machines, k-nearest neighbors, naive Bayes, and neural networks can be applied to categorize documents based on their contents.
Text mining and natural language processing techniques can be used to extract useful information from text data. Common text mining tasks include text categorization to classify documents into predefined categories, document clustering to group similar documents without predefined categories, and keyword-based association analysis to find frequently co-occurring terms. Text classification algorithms such as support vector machines, naive Bayes classifiers, and neural networks are often applied to categorized documents into topics. The vector space model is commonly used to represent documents as vectors of term weights to enable similarity comparisons between documents.
Large-Scale Computational Research in Arts & HumanitiesJisc
John Coleman discusses large-scale computational research in the arts and humanities using unwritten media like audio and video. He describes how this is affecting the research landscape and implications for costs and strategies. Examples of ongoing projects mining large speech corpora are provided, along with challenges of data storage, collaboration across sites, and training needed for future researchers.
The document discusses machine learning approaches for natural language processing. It provides an overview of various NLP tasks like tagging, parsing, information extraction that can be addressed using machine learning techniques. Both deductive and inductive approaches are described for developing models from linguistic data and applying them to solve problems in language processing and understanding.
Capturing and expressing REED's (Records of Early English Drama) essence in a digital future. Given at Envisioning REED in the Digital Age 4-5 April, 2011. University of Toronto
The document discusses the Million Book Challenge, an international grant competition seeking to analyze large digitized corpora using natural language processing techniques. It provides an example study where newsbooks from 1650s England were analyzed to extract place names, assign coordinates, and map mentions by theme. Significant technical and legal challenges around intellectual property and access to datasets for analysis and peer review are noted. The competition aims to forge partnerships to explore these opportunities and issues.
"Digital Scholarship: The Intersection of Disciplines"
Invited talk at Semantics Digital Humanities Workshop, 25th-27th of September 2015, New Seminar Room, St John’s College, University of Oxford, St Giles, OX1 3JP. Organized by Dept of Computer Science, e-Research Centre, and St John's College, University of Oxford.
Researchers in ancient text corpora can take control over their data. We show a way to do so by means of Text-Fabric.
Co-production of Cody Kingham and Dirk Roorda
1) The document discusses using blogs and other structured web data to develop linguistic corpora for research. It argues that structured web data provides large amounts of naturally occurring language data in various genres and languages.
2) Examples are given of how blog data in particular is well-structured with metadata like authorship, dates, and semantics. This structured data can be extracted and analyzed to study linguistic patterns and variation across different authors, registers, and languages.
3) One research example analyzed the distribution of future tense expressions ("will" vs. "be going to") in three English language blogs and found patterns relating to subject type that confirm theoretical assumptions.
Converging cultures of open in language resources developmentAlannah Fitzgerald
Presented at the Open Educational Resources (OER16) Conference on 19 April, 2016 in Edinburgh, UK
https://oer16.oerconf.org/sessions/converging-cultures-of-open-in-language-resources-development-1156/
Describing Everything - Open Web standards and classificationDan Brickley
The document discusses the need for a hybrid approach to classification that combines traditional library classification systems with modern web technologies and standards. It proposes putting classification data on the open web so it can be more widely used and built upon. This will help drive innovation by making the data accessible to developers, designers and content creators.
1. The document discusses layers of annotation for analyzing biblical Hebrew text, including the text itself, linguistic features, manually or automatically generated analyses, and queries for exegetical search.
2. It provides an overview of the Linguistic Annotation Framework (LAF) for representing annotated text and statistics on the annotation of one Hebrew text, with over 800,000 regions and 1.4 million nodes.
3. The document describes tools for querying the annotated text, including the SHEBANQ system and LAF-Fabric API, and the ability to work with the data in various formats like XML, binary files, and R.
Slides from my lecture for the Information Retrieval and Data Mining course at University College London
The slides cover introductory concepts on topic models, vector semantics and basic end applications
Information Extraction, Named Entity Recognition, NER, text analytics, text mining, e-discovery, unstructured data, structured data, calendaring, standard evaluation per entity, standard evaluation per token, sequence classifier, sequence labeling, word shapes, semantic analysis in language technology
Similar to Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' (20)
This document summarizes a panel discussion on using digital tools and methods in history education. Clare Rowan discussed using digital storytelling in the classroom, where students create short videos to demonstrate their understanding of course material. Robert Houghton discussed using digital games in the classroom to help students engage with historical topics and arguments in an interactive way. James Baker discussed his experiences introducing digital skills and methods to undergraduate history students at the University of Sussex through dedicated modules on topics like data modeling, visualization, and archiving. The panelists discussed both the benefits of these approaches for student learning and engagement, as well as challenges around resources, skills, and student expectations.
This document summarizes a panel discussion on using digital tools and methods in history classrooms. Clare Rowan discussed using digital storytelling projects to engage students in learning about the Hellenistic world. Students created 2-3 minute audio-visual stories using free software. Robert Houghton reviewed using digital games to help students understand history concepts like historical arguments and environments. Games have limitations like cost and skill requirements but can be improved. James Baker outlined his digital history courses at Sussex which integrate practical skills like archiving, data modeling and visualization. Student feedback found they enjoy hands-on learning and primary sources but skills vary greatly. The panel discussed challenges of digital pedagogy and investing resources effectively.
Slides for IHR Digital History Seminar, 7 January 2020. Details at https://ihrdighist.blogs.sas.ac.uk/2019/09/tuesday-7-january-2019-frederic-clavert-luxembourg-the-social-medias-framework-of-collective-memory-commemorating-the-great-war-on-twitter/
Tuesday 12 February 2019
Ethics and Digital History Panel (Kelly Foster, Sharon Webb, Julianne Nyhan, Kathryn Eccles)
IHR Digital History Seminar
https://ihrdighist.blogs.sas.ac.uk/2018/08/ethics-and-digital-history-panel-kelly-foster-sharon-webb-julianne-nyhan-kathryn-eccles/
This document discusses themes in studying religious history in the web age. It covers religious responses to technological change, interactions with others online and offline, and how religious organizations can be studied through their online presence and link graphs. Specifically, it examines the cross-border online activities of churches in Northern Ireland and Ireland that span both countries. It also analyzes the 2008 controversy in the UK over comments by the Archbishop of Canterbury regarding aspects of sharia law.
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...Digital History
Bram Vannieuwenhuyze
Tuesday 24 April 2018
Digital History Seminar
https://ihrdighist.blogs.sas.ac.uk/2017/09/05/tuesday-24-april-2018-bram-vannieuwenhuyze/
The Language of Migration in the Victorian Press: A Corpus Linguistic ApproachDigital History
Ruth Byrne (Lancaster University)
20 February 2018
Digital History seminar
http://ihrdighist.blogs.sas.ac.uk/2017/09/06/tuesday-20-february-2018-ruth-byrne-the-language-of-migration-in-the-victorian-press-a-corpus-linguistic-approach/
The document discusses how Benedictine monks from England responded to the revolution in France. It focuses on two monastic communities - St Gregory's in Douai, which was founded in 1606, and Lamspringe Abbey founded in 1630. The document aims to use prosopography, the study of the common characteristics of a historical group, to analyze how these English Benedictine communities responded to the revolutionary events in France.
Dr. Lisa Smith from the University of Essex discusses her crowdsourcing project to transcribe early modern recipes. The project involved having volunteers help transcribe recipes from receipt books, including one from Margaret Baker from ca. 1675. Transcribing the recipes helped uncover Baker's social network and provided insights into cooking and medical practices during the early modern period. Dr. Smith reflects on lessons learned from the project, including how coding transcription notes helped with searching and how the process gave volunteers a sense of feeling like professional historians.
The lives and criminal careers of juvenile offendersDigital History
Tuesday 14 November 2017 – Emma Watkins
Digital History Seminar
https://ihrdighist.blogs.sas.ac.uk/2017/09/06/tuesday-14-november-2017-emma-watkins-the-lives-and-criminal-careers-of-juvenile-offenders/
Adam Crymble
Digital History seminar
Tuesday 17 October 2017
https://ihrdighist.blogs.sas.ac.uk/2017/09/06/tuesday-17-october-2017-adam-crymble-the-history-of-learning-digital-history-c-1980-2017/
Holford mapping the medieval countryside 2014-06-17Digital History
This document summarizes a research project to digitize and analyze medieval English inquisitions post mortem (IPMs) from 1418-1446. The project aims to create an online, indexed database of IPMs containing information on landholding, tenure, valuations, demography, and other topics. Preliminary findings show distributions of landholding sizes and values that provide new insights into medieval economy and society. The digitized data allows analysis not possible with printed records, such as mapping land use and studying mortality rates over time. The project illuminates medieval life in new ways and serves as a model for continued study of primary sources.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
1. ChartEx: Discovering spatial descriptions and
relationships in medieval charters
Introduction
www.chartex.org
@ChartExProject
2. Digging into Data Challenge
“As the world becomes increasingly digital, new
techniques will be needed to search, analyze, and
understand these everyday materials. Digging into Data
challenges the research community to help create the
new research infrastructure for 21st century scholarship”
UK: AHRC, ESRC, JISC
US: NEH, IMLS, NSF
CAN: SSHRC
Netherlands: Organisation for Scientific Research
www.diggingintodata.org
3. ChartEx Project Partners
University of York: History and Human Computer
Interaction
University of Brighton: Natural Language Processing
University of Leiden: Data Mining
University of Washington: History, Web Services
University of Toronto: History and Digital Archives
Columbia University: History and Digital Libraries
Data Repositories: The National Archives (UK), Borthwick,
DEEDS Project U of Toronto, Columbia Digital Humanities
Advisory Panel: various cognate projects
4. ChartEx Team
History: Sarah Rees Jones, Adam Kosto, Michael
Gervers, Robin Sutherland-Harris, Stefania Perring,
Jon Crump
NLP: Roger Evans,
Lynne Cahill
Data Mining: Arno Knobbe,
Marvin Meeng
HCI: Helen Petrie, Christopher Power, David Swallow
5. Digital Charter Collections
Charters of the Vicars Choral of York Minster. City of York
and its suburbs to 1546. Latin charters and English
translations.
The National Archives (UK) Ward 2. Abridged English
translations
The Borthwick Institute, Yarburgh Muniments. Abridged
English translations
DEEDS, University of Toronto. Latin charters of English
provenance.
Cluny. Latin charters of French provenance (CBMA)
6. York
corner of Stonegate and Petergate
opposite Minster Gates
detail of OS 1852
Urban Historic Topography
7. A charter from English Edition of
Vicars Choral collection
408. Grant by Thomas son of Josce goldsmith and citizen of York to
his younger son Jeremy of half his land lying in length from
Petergate at the churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to land which mag.
Simon de Evesham inhabited; paying Thomas and his heirs 1d. or
[a pair of] white gloves worth 1 d. at Christmas. Warranty. Seal.
Witnesses: Geoffrey Gunwar, William de
Gerford[b]y,' chaplains,Robert de Farnham,
Robert le Spicer, John le plastrer, Walter de
Alna goldsmith, Nicholas Page, Thomas
talliator, Hugh le bedel, John de Glouc',
clerks, and others.
January 1252
8. Chronological Sequencing and
Spatial Sequencing
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half his
land lying in length from Petergate at the churchyard
of St. Peter to houses of the prebend of Ampleford
and in breadth from Steyngate to land which mag.
Simon de Evesham inhabited
409. Grant by Mariot widow of Thomas son of Josce
goldsmith of York and by Jeremy (Jeremias) son of
Thomas and Mariot to mag. Simon de Evesham
canon of York of land with buildings in Steyngate,
granted to them by Thomas, lying in length between
Petergate and land of the prebend of Ampleford, and
in breadth between Steyngate and land once of
Geoffrey de Norwyz precentor of York
11. People, places and events in
charters: exploring the language of
charters within ChartEx
Robin Sutherland-Harris
University of Toronto
Roger Evans
NLTG, University of Brighton
15 November 2013
12. Grant from Walter de Soke to Richard de Forde
Somerset Heritage Centre, DDWHb/244
13. Final concord made in the Court of Common Pleas at
Westminster before James Dyer, Thomas Meade, Francis
Wyndam, William Peryam, justices, in dispute between
Richard Broke and George Armond, plaintiffs, and Robert
Jenner, defendant, over lands (specified) in Fordham, Essex
Sample document summary
Summary provided by the National Archives, Ward2_55A_188_30
14. Fig. 1 The phases of the Kanga Methodology <ce:cross-ref refid="bib0180"> [36]</ce:cross-ref> . The white boxes indicate the phases
performed by domain experts. The formal structuring is done by domain experts using Rabbit, while the translation to OWL is ...
Ronald Denaux , Catherine Dolbear , Glen Hart , Vania Dimitrova , Anthony G. Cohn
Supporting domain experts to construct conceptual ontologies: A holistic approach
Web Semantics: Science, Services and Agents on the World Wide Web Volume 9, Issue 2 2011 113 - 127
http://dx.doi.org/10.1016/j.websem.2011.02.001
Kanga Methodology
15.
16.
17. BRAT rapid annotation tool
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou and Jun'ichi Tsujii
(2012). brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations
Session at EACL 2012. (http://brat.nlplab.org/)
20. The NLP Task
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half
his land lying in length from Petergate at the
churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to
land which mag. Simon de Evesham inhabited;
paying Thomas and his heirs 1d. or [a pair of]
white gloves worth 1 d. at Christmas. Warranty.
Seal.
… to some
‘semantic’
data like this
Get from a
text like this …
21. The NLP Task
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half
his land lying in length from Petergate at the
churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to
land which mag. Simon de Evesham inhabited;
paying Thomas and his heirs 1d. or [a pair of]
white gloves worth 1 d. at Christmas. Warranty.
Seal.
Get from a
text like this …
… which is not
really very
different from
this (BRAT)
(http://brat.nlplab.org/)
22. The NLP Task
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half
his land lying in length from Petergate at the
churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to
land which mag. Simon de Evesham inhabited;
paying Thomas and his heirs 1d. or [a pair of]
white gloves worth 1 d. at Christmas. Warranty.
Seal.
… although
actually to
us, it looks
more like this
T1 Document 0 17 vicars-choral-387
T2 Transaction 18 23 Grant
T3 Person 27 34 William
T4 Institution 73 90 prebend of Masham
T5 Person 42 49 Richard
T6 Occupation 51 64 canon of York
…
R3 is_son_of Arg1:T3 Arg2:T5
R4 is_grantor_in Arg1:T3 Arg2:T2
R5 occupation_is Arg1:T3 Arg2:T6
(http://brat.nlplab.org/)
Get from a
text like this …
23. NLP development methodology
Step 1 find out what the ‘mark up’
should be (from historians)
Step 2 ask historians to mark up
some documents manually
Step 3 use some of these examples
to develop an NLP system that
does ‘the same’
Step 4 use the rest to evaluate its
performance (in progress)
24. The manual annotation
Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of
half his land lying in length from Petergate at the churchyard of St. Peter to houses of the
prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de
Evesham inhabited;
25. NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
26. NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
27. NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
28. NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
29. NLP – layered pattern matching
Token layer: identify semantic types and intrinsic properties (eg gender) of known
individual words (not all of these types feature in the final output). Some types are
inferred – eg unknown capitalised words are proper nouns.
30. NLP – layered pattern matching
Token layer: identify semantic types and intrinsic properties (eg gender) of known
individual words (not all of these types feature in the final output). Some types are
inferred – eg unknown capitalised words are proper nouns.
31. NLP – layered pattern matching
Lexical layer: identify simple lexical phrases – groups of tokens that act as individual units.
Promote other individual tokens to lexical items (with lexical types).
32. NLP – layered pattern matching
Lexical layer: identify simple lexical phrases – groups of tokens that act as individual units.
Promote other individual tokens to lexical items (with lexical types).
33. NLP – layered pattern matching
Syntax layer: build (local) syntactic structure to identify basic constituents of the sentence.
34. NLP – layered pattern matching
Phrasal layer: use part-of-speech tags to build lexical items into local syntactic/semantic
structures. These have lots of unbound arguments – like jigsaw pieces waiting to be
slotted together.
35. NLP – layered pattern matching
Phrasal layer: use part-of-speech tags to build lexical items into local syntactic/semantic
structures – and then ’collapse’ all the equals relations.
36. NLP – layered pattern matching
Semantic layer: build semantic relationships by gluing the pieces together. Again, first we
use equals relations to capture the structure
37. NLP – layered pattern matching
Semantic layer: build semantic relationships – and then we collapse them down
38. How did we do?
Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of
half his land lying in length from Petergate at the churchyard of St. Peter to houses of the
prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de
Evesham inhabited;
40. Historians vs NLP
• NLP has to be told very clearly what it is
looking for
• NLP has to make a lot of information
explicit that people don’t notice
• NLP sees ambiguity where people do not
– Really struggles with coordination and
attachment
• NLP is programmed by computer scientists
(or sometimes statisticians), not historians
• NLP can do it, but it definitely needs help !
41. Reconstructing spatial relationships from
charters: a collaboration between
Data Mining and Historical Topography
Stefania Merlo Perring, Sarah Rees Jones, York
Arno Knobbe, Leiden
42. This presentation will :
• Introduce the process used by Data Mining to
reproduce the methodology of historic
topography
• Discuss the results and further potentials
• Reflect on how Data Mining of charters and
cartularies can add new meanings to
Diplomatics
56. Conclusions/suggestionsfor debate
• People, institutions, things and technical devices
represented as nodes (actors) in a network are freed
from previous categories and hierarchies (Actor-
Network-Theory, Callon 1991, Latour 1992, 2005)
• Nodes and patterns in a network are starting points for
analysis and can generate new research questions and
insights
• Networks connecting information from different fonds
have potential for creating alternative virtual archives
• These can be created for a person or for a place (e.g.
grouping charters concerning the same location)
57. The ChartEx Virtual Workbench
Helen Petrie, Chris Power, Dave Swallow
University of York
58.
59. The ChartEx Virtual Workbench
A video presentation of the workbench is available on the ChartEx website at:
http://www.chartex.org/docs/Chartex-Montreal-12102013-VIDEO.mov