BiographyNet is a multidisciplinary project that combines expertise from history, computer science and computational linguistics. The project is a collaboration between the Netherlands eScience Center, Huygens ING and VU University Amsterdam. BiographyNet uses data from the Biography Portal of the Netherlands (BP), which contains approximately 125,000 biographies from a variety of Dutch biographical dictionaries. An interlinked semantic knowledge base will be created by extracting relations between people, places, historic events and time periods from these biographical descriptions. Through a combination of data enrichment, visualization and browsing techniques, BiographyNet wants to inspire historians to set up new research projects. The aim of BiographyNet is to develop a demonstrator which supports the discovery of interrelations between people, events, places and time periods in biographical descriptions.
Günter Mühlberger (University of Innsbruck, AT): The READ project. Objectives, tasks and partner organisations
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
The document discusses how libraries provide access to publisher content through various systems like the library catalog, OpenURL link resolver, and discovery service. Publisher-provided metadata is key to enabling access in these systems. Metadata is distributed from publishers to libraries and other sources. Any system that supports OpenURL can potentially link users to publisher content, not just the library catalog. Ensuring accurate metadata is important for proper functioning of these access systems.
Towards Multidimensional Web Archive Access (IIPC 2016)TimelessFuture
Presentation at IIPC 2016 conference, Reykjavik, Iceland, 14 April 2016. Abstract:
Web archiving institutions have jointly harvested Petabytes of archived web content, in potential an exceptionally rich data source for researchers across the globe. These web archives are multidimensional by nature. First, a temporal dimension arises from different versions of web content accumulated over time. Second, a hierarchical dimension is implied as web archives may be examined at different analytical levels (Brügger, 2010), examples include the level of the web sphere, website and web page.
Scholars often focus their analysis on a specific analytical level and temporal range, for example looking at electoral web spheres at election times (Xenos and Bennet, 2007) or hyperlinking in news websites across time (Karlsson et al, 2015). However, we claim that this scholarly practice is not well supported by current web archive access tools, that usually allow only access at the page level and do not offer insights into the temporal development of broader selections of archived Web content, such as web spheres or websites. Hence, there is a need for more flexible access services in a research context.
In this presentation, we conceptually and practically explore how to address this mismatch. We illustrate how the temporal dimension can be harnessed by aggregating web content using different time ranges and the hierarchical dimension accommodated by novel aggregation support. Utilizing a concrete use case, we illustrate the potential usefulness of these representations of aggregated Web content. We analyze and compare the temporal evolution of various categories of websites in the Dutch Web Archive (such as news, history-related and government websites) across a five-year period. In this analysis, we look at the evolution of textual content, internal structure and image content across categories and websites. Finally, our presentation indicates how these types of aggregated representations may be integrated into future search systems for Web archives.
Tell me and I forget, teach me and I remember, involve me and I learn: unders...zzalszjc
This document discusses work-placed learning approaches to teaching statistics at the University of Manchester. It outlines how undergraduate students are taught methods and data analysis through their degrees. In year 2, students can apply for paid 8-week internships where they conduct applied data analysis projects with host organizations. Outputs from internships include reports, papers, and presentations. Outcomes include employment prospects, improved curriculum understanding, and ongoing relationships with host organizations. The approach aims to improve students' quantitative skills and employability through involvement in real-world research projects.
Semantic Need: Guiding Metadata Annotations by Questions People #askHans-Joerg Happel
In its core, the Semantic Web is about the creation, collection and interlinking of metadata on which agents can perform tasks for human users. While many tools and approaches support either the creation or usage of semantic metadata, there is neither a proper notion of metadata need, nor a related theory of guidance which metadata should be created. In this paper, we propose to analyze structured queries to help identifying missing metadata. We conduct a study on Semantic MediaWiki (SMW), one of the most popular Semantic Web applications to date, analyzing structured "ask"-queries in public SMWinstances. Based on that, we describe Semantic Need, an extension for SMW which guides contributors to provide semantic annotations, and summarize feedback from an online survey among 30 experienced SMW users.
Sebastian Colutto (University of Innsbruck, AT): Transkribus. A virtual research environment for the transcription and recognition of historical documents
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
2018 02-13 pathways-data enquiry_martina_emkeDr Martina Emke
This document discusses a research study analyzing how freelance language teachers use Twitter for professional development. It employs Deleuzo-Guattarian concepts of assemblages, rhizomes, and becomings to analyze teachers' participation in Twitter networks like #ELTchat. Situational analysis and social network analysis were used to map relations between teachers, hashtags, and the "Twitter machine." Emerging findings suggest teachers' professional development occurs through unpredictable interactions within human and technological assemblages, reconfiguring understandings of teaching and professional learning.
Günter Mühlberger (University of Innsbruck, AT): The READ project. Objectives, tasks and partner organisations
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
The document discusses how libraries provide access to publisher content through various systems like the library catalog, OpenURL link resolver, and discovery service. Publisher-provided metadata is key to enabling access in these systems. Metadata is distributed from publishers to libraries and other sources. Any system that supports OpenURL can potentially link users to publisher content, not just the library catalog. Ensuring accurate metadata is important for proper functioning of these access systems.
Towards Multidimensional Web Archive Access (IIPC 2016)TimelessFuture
Presentation at IIPC 2016 conference, Reykjavik, Iceland, 14 April 2016. Abstract:
Web archiving institutions have jointly harvested Petabytes of archived web content, in potential an exceptionally rich data source for researchers across the globe. These web archives are multidimensional by nature. First, a temporal dimension arises from different versions of web content accumulated over time. Second, a hierarchical dimension is implied as web archives may be examined at different analytical levels (Brügger, 2010), examples include the level of the web sphere, website and web page.
Scholars often focus their analysis on a specific analytical level and temporal range, for example looking at electoral web spheres at election times (Xenos and Bennet, 2007) or hyperlinking in news websites across time (Karlsson et al, 2015). However, we claim that this scholarly practice is not well supported by current web archive access tools, that usually allow only access at the page level and do not offer insights into the temporal development of broader selections of archived Web content, such as web spheres or websites. Hence, there is a need for more flexible access services in a research context.
In this presentation, we conceptually and practically explore how to address this mismatch. We illustrate how the temporal dimension can be harnessed by aggregating web content using different time ranges and the hierarchical dimension accommodated by novel aggregation support. Utilizing a concrete use case, we illustrate the potential usefulness of these representations of aggregated Web content. We analyze and compare the temporal evolution of various categories of websites in the Dutch Web Archive (such as news, history-related and government websites) across a five-year period. In this analysis, we look at the evolution of textual content, internal structure and image content across categories and websites. Finally, our presentation indicates how these types of aggregated representations may be integrated into future search systems for Web archives.
Tell me and I forget, teach me and I remember, involve me and I learn: unders...zzalszjc
This document discusses work-placed learning approaches to teaching statistics at the University of Manchester. It outlines how undergraduate students are taught methods and data analysis through their degrees. In year 2, students can apply for paid 8-week internships where they conduct applied data analysis projects with host organizations. Outputs from internships include reports, papers, and presentations. Outcomes include employment prospects, improved curriculum understanding, and ongoing relationships with host organizations. The approach aims to improve students' quantitative skills and employability through involvement in real-world research projects.
Semantic Need: Guiding Metadata Annotations by Questions People #askHans-Joerg Happel
In its core, the Semantic Web is about the creation, collection and interlinking of metadata on which agents can perform tasks for human users. While many tools and approaches support either the creation or usage of semantic metadata, there is neither a proper notion of metadata need, nor a related theory of guidance which metadata should be created. In this paper, we propose to analyze structured queries to help identifying missing metadata. We conduct a study on Semantic MediaWiki (SMW), one of the most popular Semantic Web applications to date, analyzing structured "ask"-queries in public SMWinstances. Based on that, we describe Semantic Need, an extension for SMW which guides contributors to provide semantic annotations, and summarize feedback from an online survey among 30 experienced SMW users.
Sebastian Colutto (University of Innsbruck, AT): Transkribus. A virtual research environment for the transcription and recognition of historical documents
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
2018 02-13 pathways-data enquiry_martina_emkeDr Martina Emke
This document discusses a research study analyzing how freelance language teachers use Twitter for professional development. It employs Deleuzo-Guattarian concepts of assemblages, rhizomes, and becomings to analyze teachers' participation in Twitter networks like #ELTchat. Situational analysis and social network analysis were used to map relations between teachers, hashtags, and the "Twitter machine." Emerging findings suggest teachers' professional development occurs through unpredictable interactions within human and technological assemblages, reconfiguring understandings of teaching and professional learning.
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
The session will provide an overview of the HathiTrust Research Center including its mission and current status. It will also include a demonstration of current HTRC phase one technology and services. Additionally, the speakers will address the HTRC's role in supporting humanities research at scale.
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)TimelessFuture
The WebART project developed tools to facilitate scholarly use of web archives. It created an initial search interface called WebARTist to explore a pilot dataset of 432 crawls from the Dutch National Library web archive. The interface allowed full-text search and basic analysis like word frequency, co-word analysis, and geomapping. A workshop with researchers evaluated the interface and provided feedback on improving data quality, search capabilities, and user experience to better meet researcher needs. Next steps include a new prototype with more advanced features and a formal evaluation of the pilot project.
Keystone summer school 2015 paolo-missier-provenancePaolo Missier
Lecture on Provenance modelling, given at the first Keystone Summer School, Malta July 2015.
With thanks to Prof. Luc Moreau for contributing some of the slide material from his own tutorial
Basilis Gatos (Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, GR): Hard Tasks in the Background. Layout analysis
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
The document discusses techniques for extracting temporal information from documents, including determining a document's publication time and any times discussed in its content. It describes challenges in determining a document's publication time due to factors like time gaps between crawling and indexing. It also outlines approaches like using temporal language models to compare a document's words to time-labeled reference corpora or leveraging search statistics to estimate a publication time. The document provides examples of how content-based classification models and techniques like semantic preprocessing can help with temporal information extraction from documents.
This document discusses semantic enrichment of metadata in Europeana. It defines semantic enrichment as linking metadata to controlled vocabularies or other datasets to add context. The document outlines the key stages of semantic enrichment as analysis, linking, and augmentation. It also discusses where enrichment can occur in Europeana's systems and considerations for developing APIs and services to enable enrichment of Europeana records by third parties.
This document provides an overview of search engine technology and the goals of the SET FALL 2009 course. It discusses different types of search engines, what is required to build a search engine, and course logistics such as topics, readings, assignments, and projects. The key goals of the course are to understand how search engines work, their limitations, and learn how to analyze textual and structured data through coding, modeling, and evaluation.
This document provides a summary of the 12th European Semantic Web Conference (ESWC) that took place from May 31st to June 4th, 2015 in Portoroz, Slovenia. It outlines key details about the conference including the number of registered participants, program details such as the number of paper submissions and accepted papers by track, and highlights of the keynote speakers and events during the conference.
- Part I discusses the history of OpenURL linking and introduces IOTA's reports comparing OpenURL strings and preliminary OpenURL Quality Index.
- Part II examines a study analyzing e-book OpenURLs that found including ISBNs and genre metadata improved full-text linking.
- Part III addresses improving IOTA's Quality Index through more systematic element weighting and considering additional linking factors.
The document provides an overview of world history from approximately 8000 BCE to 1750 CE. It covers major developments in early civilizations, belief systems, technologies, and interactions between cultures during this time period. Specific topics discussed include the Neolithic Revolution, river valley civilizations, classical empires in India, China, and the Mediterranean, the major world religions of polytheism, Hinduism, Buddhism, Confucianism, Daoism, Judaism, Christianity, and Islam. It also summarizes trade networks, the spread of religions and empires, and technological innovations from the rise of agriculture to the Scientific Revolution.
The document is a wall chart that outlines major events and developments in world history. It includes sections on human creation, maps and distribution of humans, the major ethnic groups, Noah and the flood, early civilizations, hobbies and inventions from antiquity, culture, sports and games, literature, infrastructure, and wars and expeditions. The chart provides brief descriptions and timelines for key topics and historical figures.
Wikimedia Nederland organized a photography event in the Weerribben for the (international) photo competition Wiki Loves Earth. Participants first got a photography workshop and then went on a boat tour. Afterwards they had the oppertunity to upload their photos to Wikimedia Commons to participate in the competition. These slides were used for the workshop.
For more information on Wiki Loves Earth:
<a>www.wikilovesearth.org</a>
<a>www.wikilovesearth.nl</a> (Dutch
1) The document outlines a brief history of the world from creation to revelation. It begins with Adam and Eve in the Garden of Eden and covers major biblical events like the fall, Noah's ark, God's promises to Abraham, Moses freeing the Israelites from Egypt, King David's rule, prophecies of the coming Messiah, and Jesus's life, death, and resurrection.
2) It includes genealogies and family trees tracing lineage from Adam through Abraham, Isaac, Jacob, and King David to Jesus.
3) The timeline presented includes creation, the great flood, patriarchs like Abraham, the exodus from Egypt to the promised land, times of judges and kings, prophecies of the
The document contains a collection of interesting facts from world history. Some key points include:
- The Civil War resulted in more American deaths than all other wars combined.
- Caligula appointed his horse as a Roman senator.
- Only two people signed the Declaration of Independence on July 4th, 1776.
- Influenza killed over 20 million people in a global pandemic in 1918.
- Leonardo Da Vinci designed early concepts for vehicles and weapons over 500 years ago.
The document provides a history of classical music from 1600 to 2000. It describes several periods including the Baroque period from 1600-1750 which saw the development of instrumental music. The Classical period from 1750-1820 saw changes including the decline of patronage systems. The Romantic period from 1820-1920 featured expanded musical forms and nationalism. Modern music from 1920-2000 included Impressionism, Neo-Classicism, atonal music, and many new styles and types of music as composers explored their imaginations.
This document provides brief biographies of many famous and infamous people from world history, including details on their accomplishments and roles. Figures mentioned include American presidents like George Washington, Abraham Lincoln, Franklin D. Roosevelt and John F. Kennedy; military leaders like Robert E. Lee, Adolf Hitler, and Winston Churchill; civil rights activists like Martin Luther King Jr. and Harriet Tubman; explorers like Amelia Earhart; scientists and inventors like Albert Einstein, the Wright Brothers, and Marie Curie; entertainers like Elvis Presley, Charlie Chaplin, and Michael Jackson; and other impactful historical figures. The document touches on their significance and what made them renowned or notorious figures.
History has always been a point of fascination for many and having hold of a comprehensive historical timeline can be a major help. If you want to to quickly peruse any period in time or the history of any country of the world, visit timelines.ws.
Dongpo Deng attended the Linked Data on the Web 2014 workshop and WWW 2014 conference in Seoul from April 7-12. Some key highlights included:
- He stayed in a reasonably priced hotel close to the metro that had small rooms and noise until 2pm.
- At LDOW 2014 he attended sessions on integration, exploration, and applications of linked data that featured talks on topics like RML mappings, DBpedia exploration, and crowdsourced sensor data.
- WWW 2014 had over 600 submissions across 11 areas with a 13% acceptance rate. The keynotes covered graph mining, organizing the digital world, and taming the web.
- Dongpo attended talks on crowds
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
The session will provide an overview of the HathiTrust Research Center including its mission and current status. It will also include a demonstration of current HTRC phase one technology and services. Additionally, the speakers will address the HTRC's role in supporting humanities research at scale.
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)TimelessFuture
The WebART project developed tools to facilitate scholarly use of web archives. It created an initial search interface called WebARTist to explore a pilot dataset of 432 crawls from the Dutch National Library web archive. The interface allowed full-text search and basic analysis like word frequency, co-word analysis, and geomapping. A workshop with researchers evaluated the interface and provided feedback on improving data quality, search capabilities, and user experience to better meet researcher needs. Next steps include a new prototype with more advanced features and a formal evaluation of the pilot project.
Keystone summer school 2015 paolo-missier-provenancePaolo Missier
Lecture on Provenance modelling, given at the first Keystone Summer School, Malta July 2015.
With thanks to Prof. Luc Moreau for contributing some of the slide material from his own tutorial
Basilis Gatos (Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, GR): Hard Tasks in the Background. Layout analysis
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
Search, Exploration and Analytics of Evolving DataNattiya Kanhabua
The document discusses techniques for extracting temporal information from documents, including determining a document's publication time and any times discussed in its content. It describes challenges in determining a document's publication time due to factors like time gaps between crawling and indexing. It also outlines approaches like using temporal language models to compare a document's words to time-labeled reference corpora or leveraging search statistics to estimate a publication time. The document provides examples of how content-based classification models and techniques like semantic preprocessing can help with temporal information extraction from documents.
This document discusses semantic enrichment of metadata in Europeana. It defines semantic enrichment as linking metadata to controlled vocabularies or other datasets to add context. The document outlines the key stages of semantic enrichment as analysis, linking, and augmentation. It also discusses where enrichment can occur in Europeana's systems and considerations for developing APIs and services to enable enrichment of Europeana records by third parties.
This document provides an overview of search engine technology and the goals of the SET FALL 2009 course. It discusses different types of search engines, what is required to build a search engine, and course logistics such as topics, readings, assignments, and projects. The key goals of the course are to understand how search engines work, their limitations, and learn how to analyze textual and structured data through coding, modeling, and evaluation.
This document provides a summary of the 12th European Semantic Web Conference (ESWC) that took place from May 31st to June 4th, 2015 in Portoroz, Slovenia. It outlines key details about the conference including the number of registered participants, program details such as the number of paper submissions and accepted papers by track, and highlights of the keynote speakers and events during the conference.
- Part I discusses the history of OpenURL linking and introduces IOTA's reports comparing OpenURL strings and preliminary OpenURL Quality Index.
- Part II examines a study analyzing e-book OpenURLs that found including ISBNs and genre metadata improved full-text linking.
- Part III addresses improving IOTA's Quality Index through more systematic element weighting and considering additional linking factors.
The document provides an overview of world history from approximately 8000 BCE to 1750 CE. It covers major developments in early civilizations, belief systems, technologies, and interactions between cultures during this time period. Specific topics discussed include the Neolithic Revolution, river valley civilizations, classical empires in India, China, and the Mediterranean, the major world religions of polytheism, Hinduism, Buddhism, Confucianism, Daoism, Judaism, Christianity, and Islam. It also summarizes trade networks, the spread of religions and empires, and technological innovations from the rise of agriculture to the Scientific Revolution.
The document is a wall chart that outlines major events and developments in world history. It includes sections on human creation, maps and distribution of humans, the major ethnic groups, Noah and the flood, early civilizations, hobbies and inventions from antiquity, culture, sports and games, literature, infrastructure, and wars and expeditions. The chart provides brief descriptions and timelines for key topics and historical figures.
Wikimedia Nederland organized a photography event in the Weerribben for the (international) photo competition Wiki Loves Earth. Participants first got a photography workshop and then went on a boat tour. Afterwards they had the oppertunity to upload their photos to Wikimedia Commons to participate in the competition. These slides were used for the workshop.
For more information on Wiki Loves Earth:
<a>www.wikilovesearth.org</a>
<a>www.wikilovesearth.nl</a> (Dutch
1) The document outlines a brief history of the world from creation to revelation. It begins with Adam and Eve in the Garden of Eden and covers major biblical events like the fall, Noah's ark, God's promises to Abraham, Moses freeing the Israelites from Egypt, King David's rule, prophecies of the coming Messiah, and Jesus's life, death, and resurrection.
2) It includes genealogies and family trees tracing lineage from Adam through Abraham, Isaac, Jacob, and King David to Jesus.
3) The timeline presented includes creation, the great flood, patriarchs like Abraham, the exodus from Egypt to the promised land, times of judges and kings, prophecies of the
The document contains a collection of interesting facts from world history. Some key points include:
- The Civil War resulted in more American deaths than all other wars combined.
- Caligula appointed his horse as a Roman senator.
- Only two people signed the Declaration of Independence on July 4th, 1776.
- Influenza killed over 20 million people in a global pandemic in 1918.
- Leonardo Da Vinci designed early concepts for vehicles and weapons over 500 years ago.
The document provides a history of classical music from 1600 to 2000. It describes several periods including the Baroque period from 1600-1750 which saw the development of instrumental music. The Classical period from 1750-1820 saw changes including the decline of patronage systems. The Romantic period from 1820-1920 featured expanded musical forms and nationalism. Modern music from 1920-2000 included Impressionism, Neo-Classicism, atonal music, and many new styles and types of music as composers explored their imaginations.
This document provides brief biographies of many famous and infamous people from world history, including details on their accomplishments and roles. Figures mentioned include American presidents like George Washington, Abraham Lincoln, Franklin D. Roosevelt and John F. Kennedy; military leaders like Robert E. Lee, Adolf Hitler, and Winston Churchill; civil rights activists like Martin Luther King Jr. and Harriet Tubman; explorers like Amelia Earhart; scientists and inventors like Albert Einstein, the Wright Brothers, and Marie Curie; entertainers like Elvis Presley, Charlie Chaplin, and Michael Jackson; and other impactful historical figures. The document touches on their significance and what made them renowned or notorious figures.
History has always been a point of fascination for many and having hold of a comprehensive historical timeline can be a major help. If you want to to quickly peruse any period in time or the history of any country of the world, visit timelines.ws.
Dongpo Deng attended the Linked Data on the Web 2014 workshop and WWW 2014 conference in Seoul from April 7-12. Some key highlights included:
- He stayed in a reasonably priced hotel close to the metro that had small rooms and noise until 2pm.
- At LDOW 2014 he attended sessions on integration, exploration, and applications of linked data that featured talks on topics like RML mappings, DBpedia exploration, and crowdsourced sensor data.
- WWW 2014 had over 600 submissions across 11 areas with a 13% acceptance rate. The keynotes covered graph mining, organizing the digital world, and taming the web.
- Dongpo attended talks on crowds
Digital Humanities Venice Group Presentation - Opening the Libro d'OroMichael Mitchell
This document outlines a project to create a social networking environment and standardized database for information about historical Venetians. The goal is to provide open access to data and tools for research, visualization, and education. Researchers and citizens would contribute profiles with standardized fields like name, birth/death dates, occupation, family, etc. Sources would be included for validation. Tools would allow network and epidemiological analysis. The timeline is 2 years for data collection and interface development, then maintenance. A team of humanities experts in areas like databases, design, history, and development would oversee the project with potential funding from charitable organizations. The impact would be engaging the public, adapting to research needs, aggregating sources, and visualizing history.
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...TimelessFuture
This document discusses a project that developed a timeline prototype to help scholars explore enriched audiovisual content metadata, like automatic speech transcripts, in a temporal manner. An evaluation with 5 media studies scholars found the prototype facilitated exploratory searching but transparency about data limitations was important. Next steps involve integrating prototype elements into a digital research environment to support audiovisual analysis.
This document discusses transforming open government data from Romania into linked open data. It begins with background on linked data and open data initiatives. Then it describes efforts to model, transform, link, and publish Romanian open data as linked open data. This includes identifying common vocabularies and properties, creating URIs, linking to external datasets like DBPedia, and publishing the linked data for use in applications via a SPARQL endpoint. Overall the goal is to make this data more accessible and interoperable through semantic web standards.
It is our presentation during CEIT-2016 (Fourth Edition of the International Conference on Control Engineering and Information Technology) held at Hammamet, Tunisia, December 16-18 2016.
Could the international community collaborate to create a map of the OER world? The William and Flora Hewlett foundation selected three teams to develop a prototype in response to this challenge. These prototypes were shared at the Hewlett Foundation’s OER Grantees Meeting 2014.
Chaos&Order: Using visualization as a means to explore large heritage collec...TimelessFuture
*note: download original powerpoint to view animations*. Presentation at 4th Int. Alexandria Workshop (19./20. October 2017) - Foundations for Temporal Retrieval, Exploration and Analytics in Web Archives.
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
This document provides an overview of interdisciplinary research using text analysis tools to study political discourse in news media. The research uses both manual annotation by experts and automatic annotation using natural language processing approaches. Basic text analysis methods like counting words can address high-level research questions but provide limited insight, while more advanced methods provide more detailed information but are difficult to apply to large datasets and come with risks of unreliable or biased results. The researchers aim to make researchers aware of these methodological issues and evaluate tools to gain more understanding of potential biases.
NAACL Tutorial Social Media Predictive Analyticsshengjing 孙胜晶
This document summarizes a NAACL tutorial on social media predictive analytics. The tutorial covers theoretical and practical sessions on batch prediction, online inference, and dynamic learning and prediction of user attributes from social media data. It discusses how to collect and annotate social media data, features and models for user attribute classification, and approaches for predicting from streaming data and incorporating neighbors' content. The tutorial materials include slides, code, datasets and references related to predictive analytics on social networks.
Global Media Monitoring presented through several systems for collecting, extracting and enriching data, forming and exploring events across languages in real-time - ...resulting in the system Event Registry (http://eventregistry.org/)
Research into Practice case study 2: Library linked data implementations an...Hazel Hall
The document summarizes a presentation given by Dr. Diane Pennington and Laura Cagnazzo on library linked data implementations and perceptions. The presentation discussed the evolution of the semantic web and linked open data principles. It provided an overview of a study on the status and perceptions of linked data among European national libraries and Scottish libraries. The study found lack of awareness and expertise to be challenges for implementation. Benefits included improved data visibility and opportunities for collaboration. Recommendations focused on training, collaboration, and developing implementation guidelines and case studies.
This paper surveys the landscape of linked open data projects in cultural heritage, exam- ining the work of groups from around the world. Traditionally, linked open data has been ranked using the five star method proposed by Tim Berners-Lee. We found this ranking to be lacking when evaluating how cultural heritage groups not merely develop linked open datasets, but find ways to used linked data to augment user experience. Building on the five-star method, we developed a six-stage life cycle describing both dataset development and dataset usage. We use this framework to describe and evaluate fifteen linked open data projects in the realm of cultural heritage.
This document summarizes a presentation on using digital audio archives to promote performance studies. It discusses two projects - the Baudelaire Song Project and Visualising Voice. The Baudelaire Song Project analyzes French art songs set to the poetry of Baudelaire over four years with AHRC funding. Visualising Voice uses a Europeana Research Award to create a public-facing web interface for digital audio analysis. Both projects use open-access digital archives but face challenges regarding language barriers, audio quality, copyright and data storage.
This presentation was given by guest lecturer Martin Szomszor of Electric Data Solutions LTD, during the seventh session of the NISO Spring training series "Working with Scholarly APIs." Session Seven, Methods and Tools for Scholarly Data Analytics, was moderated by Phill Jones of MoreBrains Cooperative and held on June 9, 2022.
The Agora project is a collaboration between the History and Computer Science departments at the VU University Amsterdam, the Rijksmuseum Amsterdam and the Dutch national audiovisual archive Beeld en Geluid. The aim of Agora is to develop a social platform in which museum objects can be placed into an explicit (art)historic context. Through the (art)historic context, objects from highly diverse museum collections can be related, resulting in a more complete and illustrated description of historical events. End-users will also be allowed to create their own personal narratives which will lead to theoretical reflection on the meaning of digitally mediated public history in contemporary society.
Check out our website http://agora.cs.vu.nl/ and our twitter feed @agora_project
Presented for managers & researchers at The Global One Health Initiative of the Ohio State University, Africa Regional Branch in Addis Ababa, Ethiopia (April 24th 2019)
The document provides guidelines for publishing data as Linked Data. It discusses identifying appropriate data sources, reusing existing vocabularies and non-ontological resources, generating RDF data from relational databases or geometrical data using tools like R2O, ODEMapster and geometry2rdf, and publishing the data on the web by resolving URIs. The Ontology Engineering Group at Universidad Politécnica de Madrid has published Spanish geospatial and statistical data as part of projects like GeoLinkedData following these guidelines.
Similar to BiographyNet: Linking the world of History (20)
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆Sérgio Sacani
Context. The early-type galaxy SDSS J133519.91+072807.4 (hereafter SDSS1335+0728), which had exhibited no prior optical variations during the preceding two decades, began showing significant nuclear variability in the Zwicky Transient Facility (ZTF) alert stream from December 2019 (as ZTF19acnskyy). This variability behaviour, coupled with the host-galaxy properties, suggests that SDSS1335+0728 hosts a ∼ 106M⊙ black hole (BH) that is currently in the process of ‘turning on’. Aims. We present a multi-wavelength photometric analysis and spectroscopic follow-up performed with the aim of better understanding the origin of the nuclear variations detected in SDSS1335+0728. Methods. We used archival photometry (from WISE, 2MASS, SDSS, GALEX, eROSITA) and spectroscopic data (from SDSS and LAMOST) to study the state of SDSS1335+0728 prior to December 2019, and new observations from Swift, SOAR/Goodman, VLT/X-shooter, and Keck/LRIS taken after its turn-on to characterise its current state. We analysed the variability of SDSS1335+0728 in the X-ray/UV/optical/mid-infrared range, modelled its spectral energy distribution prior to and after December 2019, and studied the evolution of its UV/optical spectra. Results. From our multi-wavelength photometric analysis, we find that: (a) since 2021, the UV flux (from Swift/UVOT observations) is four times brighter than the flux reported by GALEX in 2004; (b) since June 2022, the mid-infrared flux has risen more than two times, and the W1−W2 WISE colour has become redder; and (c) since February 2024, the source has begun showing X-ray emission. From our spectroscopic follow-up, we see that (i) the narrow emission line ratios are now consistent with a more energetic ionising continuum; (ii) broad emission lines are not detected; and (iii) the [OIII] line increased its flux ∼ 3.6 years after the first ZTF alert, which implies a relatively compact narrow-line-emitting region. Conclusions. We conclude that the variations observed in SDSS1335+0728 could be either explained by a ∼ 106M⊙ AGN that is just turning on or by an exotic tidal disruption event (TDE). If the former is true, SDSS1335+0728 is one of the strongest cases of an AGNobserved in the process of activating. If the latter were found to be the case, it would correspond to the longest and faintest TDE ever observed (or another class of still unknown nuclear transient). Future observations of SDSS1335+0728 are crucial to further understand its behaviour. Key words. galaxies: active– accretion, accretion discs– galaxies: individual: SDSS J133519.91+072807.4
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxgoluk9330
Ahota Beel, nestled in Sootea Biswanath Assam , is celebrated for its extraordinary diversity of bird species. This wetland sanctuary supports a myriad of avian residents and migrants alike. Visitors can admire the elegant flights of migratory species such as the Northern Pintail and Eurasian Wigeon, alongside resident birds including the Asian Openbill and Pheasant-tailed Jacana. With its tranquil scenery and varied habitats, Ahota Beel offers a perfect haven for birdwatchers to appreciate and study the vibrant birdlife that thrives in this natural refuge.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
PPT on Sustainable Land Management presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSSérgio Sacani
The pathway(s) to seeding the massive black holes (MBHs) that exist at the heart of galaxies in the present and distant Universe remains an unsolved problem. Here we categorise, describe and quantitatively discuss the formation pathways of both light and heavy seeds. We emphasise that the most recent computational models suggest that rather than a bimodal-like mass spectrum between light and heavy seeds with light at one end and heavy at the other that instead a continuum exists. Light seeds being more ubiquitous and the heavier seeds becoming less and less abundant due the rarer environmental conditions required for their formation. We therefore examine the different mechanisms that give rise to different seed mass spectrums. We show how and why the mechanisms that produce the heaviest seeds are also among the rarest events in the Universe and are hence extremely unlikely to be the seeds for the vast majority of the MBH population. We quantify, within the limits of the current large uncertainties in the seeding processes, the expected number densities of the seed mass spectrum. We argue that light seeds must be at least 103 to 105 times more numerous than heavy seeds to explain the MBH population as a whole. Based on our current understanding of the seed population this makes heavy seeds (Mseed > 103 M⊙) a significantly more likely pathway given that heavy seeds have an abundance pattern than is close to and likely in excess of 10−4 compared to light seeds. Finally, we examine the current state-of-the-art in numerical calculations and recent observations and plot a path forward for near-future advances in both domains.
1. BiographyNet
Linking the world of History
Serge ter Braake, Antske Fokkens, Niels Ockeloen,
Susan Legêne, Guus Schreiber, Piek Vossen, et al.
The Network Institute, VU University Amsterdam
http://wm.cs.vu.nl http://www.biographynet.nl
October 2013
2. BiographyNet: Linking the world of history
General project info, February 2014
Overview of this presentation
• Introduction of the project
• What is E-history?
• Project goals
• Short overview of use cases
• Illustrative use case example
• Text mining using NLP
• Challenges
• Preliminary results
• Why provenance is important
• Requirements from the perspective of the Historian
• Requirements from the perspective of the Computer scientist
• The BiographyNet schema
• Extending the schema with Provenance
• Aggregated provenance information
• Detailed provenance information
• Demonstrator Interface
• First ideas and sketches
Overview
3. BiographyNet: Extracting relations between people,
places and historic events
• Multidisciplinary E-History Project
What is BiographyNet?
BiographyNet: Linking the world of history
General project info, February 2014
4. E-humanities
Investigates what can be done in humanities with modern
techniques which we could not do before, or only with a
great deal of effort
What is E-history?
E-history
Sub domain of E-humanities which aims at improving existing methods
of historical research rather than introducing
a whole new way of doing historical research *
* Zaagsma, G.: Doing history in the digital age: history as a hybrid practice (2013)
http://gerbenzaagsma.org/blog/16-03-2013/doing-history-digital-age-history-hybrid-practice
BiographyNet: Linking the world of history
General project info, February 2014
5. BiographyNet: Extracting relations between people,
places and historic events
• Multidisciplinary E-History Project
What is BiographyNet?
BiographyNet: Linking the world of history
General project info, February 2014
6. BiographyNet: Extracting relations between people,
places and historic events
• Multidisciplinary E-History Project
What is BiographyNet?
• Funded by the Netherlands eScience Center
• Partners are the Netherlands eScience Center, the
Huygens/ING Institute of the Royal Dutch Academy of
Sciences and VU University Amsterdam
• Starting Point: The Biographical Portal of the
Netherlands - http://www.biografischportaal.nl
• 125,000 short biographical descriptions with limited meta
data from a variety of Dutch biographical dictionaries
• 76,000 individuals
BiographyNet: Linking the world of history
General project info, February 2014
7. Short biographical descriptions
with limited meta data
0 20 40 60 80 100 120
Name
Category
Gender
Date of Death
Date of Birth
Place of Birth
Place of Death
Occupation
Religion
Father
Mother
Claim to Fame
Partner
Text
Name
Category
Gender
Date of Death
Date of Birth
Place of Birth
Place of Death
Occupation
Religion
Father
Mother
Claim to Fame
Partner
Text
Individuals with available information (%)
BiographyNet: Linking the world of history
General project info, February 2014
8. Main project goals
• Provide a richer historic knowledge base by creating a semantic layer on
top of the data from the Biographical Portal
• Convert the available data to RDF (first conversion available)
• Enrichments (NLP) and Aggregations
• Link to other sources
• Inspire Historians in setting up new research projects by providing them
with interesting leads
• Development of a demonstrator
• Quantitative analysis, visualisation and browsing techniques
• Re-usable deliverables
• Open-source release of the platform for analyzing texts about people
• Methodology for extraction of a relation network between
people, places and events
Project Goals
BiographyNet: Linking the world of history
General project info, February 2014
9. Currently 12 use cases developed involving quantitative
analysis, relation discovery, thematic research, etc.
• Simple:
• Group analysis of Governors-general
of the Dutch Indies
• More complex:
• When did Dutch elites get involved
with the ‘New World’?
• Highly complex:
• What can we say about nationalism in biographical
dictionaries from the nineteenth and twentieth century?
Use Case Overview
BiographyNet: Linking the world of history
General project info, February 2014
10. Governors-General of the Dutch Indies
• Highest Official in the Dutch Indies (1610-1949)
• 129 Biographies describing 71 individuals
• What can we say about these men as a group?
• What properties did they need to have to be appointed?
• Personal qualities
• Relations (already
more difficult)
Illustrative use case
BiographyNet: Linking the world of history
General project info, February 2014
11. Focus on the following information
• Family connections
• Parents
• Partner
• Children
• Dates
• Birth
• Appointment
• Death
• Motivation
• Education
• Religion
• Reasons for appointment
• Reasons for leaving the office
Governors General: Data Mining
BiographyNet: Linking the world of history
General project info, February 2014
12. Manual analysis
“More than one full week to manually mine this information
from the Biography Portal.” (Serge ter Braake)
The question
“Can a historian do this with (almost) the same results in
less than an hour when using the demonstrator?”
Governors General: Time and effort
BiographyNet: Linking the world of history
General project info, February 2014
13. Basic System for data enrichment using text:
• Identifying meta data in text
• Linguistically naïve supervised machine learning
• Linguistic processing
• Detection of (co-referenced) named-entities
(persons, places and dates) and events
• Concept identification
Text mining using Natural Language
Processing (NLP)
BiographyNet: Linking the world of history
General project info, February 2014
14. Challenges for NLP within BiographyNet:
• Deal with alternative spelling
• Texts vary from 19th century Dutch to contemporary Dutch
• Variations in the naming of people and places
• OCR-ed texts contain errors
• Used methods may introduce bias:
• Example: Location identification with GeoNames
Heuristic: On multiple possibilities, take the one in, or
closest to The Netherlands
• Problem: ‘America’ is a place in The Netherlands, but
what about trade with the new world?
NLP: Challenges
BiographyNet: Linking the world of history
General project info, February 2014
15. NLP: Preliminary results – Governors
0
10
20
30
40
50
60
70
80
90
100
metadata
text
Presence of information in text vs. meta data (% on 71 individuals)
BiographyNet: Linking the world of history
General project info, February 2014
16. Before development of the actual demonstrator can
commence, we first need to:
• Convert the data of the Biography Portal to RDF
• Prevent loss of information
• Devise a schema
• Structure the data
• Provide compatibility with other interesting sources
• Facilitate the recording of provenance information on the
manipulation of the data
Towards the demonstrator
BiographyNet: Linking the world of history
General project info, February 2014
17. Two main requirements for the demonstrator:
• A trace back to all original sources (texts and meta data) involved
in producing a certain result
• Which sources were used for the overall outcome and how often?
• What potentially relevant data was excluded from the end result?
• Which piece of data led to a specific result (e.g. the age of a specific
governor at his appointment)?
• Insight in the processes manipulating and selecting the data
• Indication of overall performance: Focus on recall or precision?
• Global description of the used heuristics should be provided
• Indication of responsibility: Who to contact when results are pulled
into question?
Requirements from the perspective
of the Historian
BiographyNet: Linking the world of history
General project info, February 2014
18. Reproducing results:
• Reproducing results in NLP is non-trivial
• Details in implementations or experimental setup can
influence results up to a point where they tell a different story
• Clear registration of all steps involved and storage of
intermediate system output can improve reproducibility
• Systematic testing can help to gain insight into the variation
of the outcome of our systems and hence lead to more
insight in their performance
Antske Fokkens, Marieke van Erp, Marten Postma, Ted Pedersen, Piek Vossen and Nuno
Freire (2013) Offspring from Reproduction Problems: What Replication Failure Teaches
Us. In: Proceedings of ACL 2013, Sofia, Bulgaria, August 2013.
Requirements from the perspective of the
Computer Scientist / Computational Linguist
BiographyNet: Linking the world of history
General project info, February 2014
19. Translation into requirements for the demonstrator:
• Facilitate Replication and Reproduction
• Recording of information on used tools such as Creator, version
number, etc.
• Recording of any kind of pre- / post-processing done on
input/output data.
• Recording of the intention behind the various steps in the NLP
pipeline, including made assumptions and possible biases.
• Intermediate results need to be preserved for debugging purposes
• The schema needs to be both generic and flexible
• NLP pipeline design can change
• Tools and their formats unclear towards the future
Requirements from the perspective of the
Computer Scientist / Computational Linguist
BiographyNet: Linking the world of history
General project info, February 2014
20. Foundations of the schema:
• Based on the structure of the original XML files
• Needs to facilitate the coupling of different biographies of the same
person, without compromising the original data
• Needs to facilitate the incorporation of several enrichments, following
from NLP, as well as aggregations
• Compatible with existing
schemas such as the
Europeana Data Model,
PROV, P-PLAN,
DC terms, etc.
The BiographyNet Schema
BiographyNet: Linking the world of history
General project info, February 2014
21. Purely syntactic conversion
• Preserve the original
structure of the data
• Prevent los of information
• Allow for reinterpretation of
the original data in the future
The conversion process
<XML> Very simplified BP XML Example
<BioDes>
<FileDes> Source Meta Data
<Author></Author>
</FileDes>
<PersonDes> Person Meta Data
<Name></Name>
</PersonDes>
<BioPart> Biographical Text
<Snippet></Snippet>
<BioPart>
</BioDes>
</XML>
BiographyNet: Linking the world of history
General project info, February 2014
22. Conversion steps:
• Retrieval of XML dump of the Biography Portal
• Initial conversion to ‘crude’ RDF
• Using ClioPatria and the XMLRDF
tool for ClioPatria
• RDF restructuring
• Correction of purely syntactic
inefficiencies in the data
• TODO: Linking to other sources
• Essential step in the
‘Linked Data’ philosophy
The conversion process
BiographyNet: Linking the world of history
General project info, February 2014
23. Provenance information is information on how Entities
come into existence
• What are entities?
• Documents, Articles, Pictures, etc.
• Basically anything that can be
‘produced’ by something or someone
• What kind of information?
• Who did what?
• Using which entities?
• In which processes?
• Why use the PROV-DM, i.e. PROV-O?
• PROV-DM now an official W3C recommendation
Adding Provenance Information
BiographyNet: Linking the world of history
General project info, February 2014
24. Based on the requirements for the demonstrator,
provenance needs to be modeled:
• From several perspectives:
• Information involved Sources, but also: NER input data, etc.
• Processes involved All steps in enrichment, aggregation, etc
• People involved Who was responsible for pipeline, tool, etc.
• At multiple levels:
• An aggregated level, Targeted at the Historian
i.e. per enrichment
• A detailed level, i.e. all Targeted at the Computer Scientist and
individual processes computational linguist
Provenance in BiographyNet
BiographyNet: Linking the world of history
General project info, February 2014
25. Needed to ensure credibility of the demonstrator, to
evaluate its performance and to improve the academic
status of the tool
• One needs to be able to validate results
• Replication: Retrieving the same results later using the
demonstrator
• Reproducibility: Manually by the historian
• The aggregated level – Targeted at the historian
• Which original sources where involved?
• Who to contact in case results are pulled into question?
• The detailed level – Targeted at the computer scientist
• Detailed information on each individual step
• Allows for debugging the internal processing pipeline
Recap: Why is provenance info
important for BiographyNet?
BiographyNet: Linking the world of history
General project info, February 2014
27. Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duits
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duits
BiographyNet
Enrichment example
Thorbecke
Biographical
Description
File
Meta Data
NNBW
Person
Meta Data
“Thorbecke”
Biography
Parts
Birth
1798
Event
Biographical
Description
Enrichment
NLP
Pipeline
Person
Meta Data
Event
Birth
Johan Rudolph Thorbecke werd
in 1798 geboren op 14 januari
in Zwolle en komt uit een half-Duits
Zwolle
1798-01-14
prov:plan
BiographyNet: Linking the world of history
General project info, February 2014
28. Provenance and Plans (P-PLAN):* Represent the plans that
guided the execution of scientific processes
• ‘Plans’ describe the original idea behind an activity
• Each ‘Plan’ can consist of one or more ‘Steps’
• Each ‘Step’ corresponds to an ‘Activity’
• ‘Variables’ describe the input/output of an activity
• Structure, format, quantity, etc.
• Each ‘Variable’ corresponds with an input/output ‘Entity’ of an
‘Activity’
• ‘Plans’ have their own provenance info
• E.g. who was responsible for the creation of a plan?
*Daniel Garijo, Yolanda Gil; http://www.opmw.org/model/p-plan
More than just Provenance:
BiographyNet: Linking the world of history
General project info, February 2014
29. P-PLAN is used to not only model what actually
happened, but also what was supposed to happen
• Forces the recording of what an activity and its
input/output should look like
• Provides abstract description of original idea behind activity
• As such, can provide info on heuristics and assumptions
• Allows for comparing the actual activity and its
input/output with the original plan and its variables
• Do they differ from each other and to what extend?
• Makes finding errors much easier, as more information is
available about what the input/output should look like
Why model plans besides provenance?
BiographyNet: Linking the world of history
General project info, February 2014
32. • The interface should be easy to use
• The demonstrator should inspire historians to
undertake new research and give
direction, rather than being the ‘closing factor’
in their research
• The interface should allow to ‘fine tune’
results returned upon an initial action
Interface: Focus
BiographyNet: Linking the world of history
General project info, February 2014
33. • Query composition
• Faceted browsing
• A combination
Interface: Options
BiographyNet: Linking the world of history
General project info, February 2014
34. • Drop down boxes
to select ‘Verbs’,
data elements
and relations
Interface: Query composition
BiographyNet: Linking the world of history
General project info, February 2014
35. • No explicit querying, but
convergence of the data through
browsing and selecting
• Provides better feedback to the user
• Allows for more direct and easier
adjustment of the selected data
Interface: Faceted browsing
BiographyNet: Linking the world of history
General project info, February 2014
37. • Query composition combined with faceted
browsing
• Create new facets by defining a query
– The result of the query is available as a subset of
the data by selecting the defined facet
– As such, combinable with other facets
• Method to integrate ‘open’ querying of the
data into a general interface and visualization
Interface: A combination
BiographyNet: Linking the world of history
General project info, February 2014
39. Time and place
are primary elements
Interface: Demonstrator
Results
?
BiographyNet: Linking the world of history
General project info, February 2014
41. Main components of the demonstrator
• Initial schema available
• Schema models enrichments and aggregations alongside original
sources
• Allows for storing various levels of provenance information
• Model will be adapted while progressing with building the
demonstrator
• Initial conversion to RDF available
• Structure according to devised schema
• Next step is linking to external sources
• Initial NLP system setup available
• Preliminary results comparable with manual use case
• Interface
• First ideas and sketches
Current Status
BiographyNet: Linking the world of history
General project info, February 2014
42. Thank you for your attention
www.biographynet.nl
Feel free to ask questions
BiographyNet: Linking the world of history
General project info, February 2014