A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
These slides accompanied the first part of the workshop that Vinayak Das Gupta and myself gave at the Data Visualization for the Arts and Humanities event, which was held in Queen's University, Belfast on 5-6 March 2015. The workshop, entitled 'Data-mining the Semantic Web and spatially visualising the results', introduced the participants to the concepts and technologies of Open Data, the Semantic Web, RDF, SPARQL, GeoJSON and Leaflet.js. These slides cover the data-mining of online cultural heritage resources.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Mathieu d'Aquin
Presented at the workshop of the "Reading Experience Database" (RED) project - London - 25/02/2011.
Discussion on how linked data can benefit research in humanities, using RED and data.open.ac.uk as early examples.
The document describes the SFX framework for context-sensitive reference linking, which allows a user accessing a citation to be redirected to an appropriate full text or service based on their context. The framework uses an OpenURL standard to pass citation metadata from a link source to a parsing server, which then sends the metadata to a linking server to determine the most relevant services and create dynamic links to them based on the user's access and the available library collections and resources. The goal is to provide context-sensitive services to users based on their access and the cited item metadata rather than relying on pre-computed static links.
Research Data Sharing: A Basic FrameworkPaul Groth
Some thoughts on thinking about data sharing. Prepared for the 2016 LERU Doctoral Summer School - Data Stewardship for Scientific Discovery and Innovation.
http://www.dtls.nl/fair-data/fair-data-training/leru-summer-school/
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
These slides accompanied the first part of the workshop that Vinayak Das Gupta and myself gave at the Data Visualization for the Arts and Humanities event, which was held in Queen's University, Belfast on 5-6 March 2015. The workshop, entitled 'Data-mining the Semantic Web and spatially visualising the results', introduced the participants to the concepts and technologies of Open Data, the Semantic Web, RDF, SPARQL, GeoJSON and Leaflet.js. These slides cover the data-mining of online cultural heritage resources.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Mathieu d'Aquin
Presented at the workshop of the "Reading Experience Database" (RED) project - London - 25/02/2011.
Discussion on how linked data can benefit research in humanities, using RED and data.open.ac.uk as early examples.
The document describes the SFX framework for context-sensitive reference linking, which allows a user accessing a citation to be redirected to an appropriate full text or service based on their context. The framework uses an OpenURL standard to pass citation metadata from a link source to a parsing server, which then sends the metadata to a linking server to determine the most relevant services and create dynamic links to them based on the user's access and the available library collections and resources. The goal is to provide context-sensitive services to users based on their access and the cited item metadata rather than relying on pre-computed static links.
Research Data Sharing: A Basic FrameworkPaul Groth
Some thoughts on thinking about data sharing. Prepared for the 2016 LERU Doctoral Summer School - Data Stewardship for Scientific Discovery and Innovation.
http://www.dtls.nl/fair-data/fair-data-training/leru-summer-school/
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyMaribel Acosta Deibe
Summary of crowdsourcing studies to assess the quality of knowledge graphs and complete missing values. Results focus on findings over the DBpedia knowledge graph ( https://wiki.dbpedia.org/).
Related publications:
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., & Lehmann, J. Crowdsourcing Linked Data Quality Assessment. In International Semantic Web Conference (pp. 260-276), 2013.
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. Detecting Linked Data Quality issues via Crowdsourcing: A DBpedia Study. Semantic Web Journal, 9(3), 303-335, 2018.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: A hybrid SPARQL engine to enhance query answers via crowdsourcing. In Proceedings of the 8th International Conference on Knowledge Capture (p. 11). 2015. Best Student Paper Award.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. Enhancing answer completeness of SPARQL queries via crowdsourcing. Journal of Web Semantics, 45, 41-62, 2017.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: An engine for enhancing answer completeness of SPARQL queries via crowdsourcing. Companion Volume of the Web Conference (pp. 501-505). 2018.
The document discusses using the Semantic Web as a knowledge base for artificial intelligence applications. It describes how the Semantic Web publishes data on the web in a standardized, linked format. This vast amount of distributed knowledge could be mined by AI in various ways, such as linking data mining to find patterns, using reasoning to analyze and understand raw data, and assessing agreement between ontologies. The Semantic Web represents a large, collaborative base of formally represented knowledge that provides many opportunities for future AI research and applications.
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
Talk at WWW2017 on LRMI adoption, quality and usage. Full paper here: http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/companion/p283.pdf.
This document summarizes recent approaches to web data management including Fusion Tables, XML, and Linked Open Data (LOD). It discusses properties of web data like lack of schema, volatility, and scale. LOD uses RDF, global identifiers (URIs), and data links to query and integrate data from multiple sources while maintaining source autonomy. The LOD cloud has grown rapidly, currently consisting of over 3000 datasets with more than 84 billion triples.
Experience from 10 months of University Linked Data Mathieu d'Aquin
Experience from 10 months of University Linked Data at the Open University:
1. The Open University exposed its public data as linked open data to make the data more discoverable, reusable, and integrated with other datasets.
2. Exposing data as linked data provides benefits like increased transparency, data reuse internally and externally, and reduced costs of managing the university's public data.
3. Other UK universities have since followed the Open University's example in exposing their data as linked data.
The document discusses using linked open data and linked data principles for libraries. It covers key concepts like URIs, RDF triples, ontologies and vocabularies. It then outlines options for libraries to both consume and publish linked data, such as enriching existing catalog data by linking to external sources, creating new information aggregates, and publishing library holdings and metadata as linked open data. Challenges include a lack of common identifiers, FRBRization of existing data, and the need for content curation and new technical systems to fully realize the benefits of linked open data for libraries.
Presentation about - Semantic Web - Overview -Semantic Web
Web of Data, Giant Global Graph, Data Web, Web 3.0, Linked Data Web, Semantic Data Web, Enterprise Information Web, HTML, CSS,
LibraryThing is a social networking site and cataloging tool for readers that has recently implemented work-to-work relationships based on FRBR (Functional Requirements for Bibliographic Records). This allows users to define relationships between works such as "contains" or "parodies". LibraryThing displays these relationships through work pages and relationship manipulation tools. While most integrated library systems have not fully implemented FRBR, LibraryThing's work-to-work implementation is currently the most comprehensive and may inspire further library adoption of FRBR standards.
The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information
This document discusses the growth of RNA sequencing from 2008-2013, with sample sizes increasing from around 2 to 900. It also notes the lack of statisticians represented in big data initiatives, with few statisticians among speakers at several conferences and workshops. Finally, it promotes the author's teaching blog and monthly online statistics courses aimed at teaching data analysis skills.
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
The presentation provides an overview of what an ontology is and how it can be used for representing information and for retrieving data with a particular focus on the linguistic resources available for supporting this kind of task. Overview of semantic-based retrieval approaches by highlighting the pro and cons of using semantic approaches with respect to classic ones. Use cases are presented and discussed
This document summarizes the development of Coursera courses at Johns Hopkins University from 2012-2014. It notes key events including the initial announcement of a partnership with Coursera in July 2012, the first courses being run by Brian Caffo and Roger Peng starting in September 2012, and Jeff Leek running his "Data Analysis" course starting in January 2013. It discusses scaling of enrollment, approaches to building out a data science specialization, and financial details. The document reflects on reasons for their early success in MOOCs including speed, leveraging existing infrastructure, and attracting an orthogonal student population.
The document introduces the principles of Linked Data, which aims to share data rather than documents on the web. It describes the four rules of Linked Data and provides examples of existing Linked Data datasets as well as tools for publishing and using Linked Data. The document also discusses extending Linked Data to include geospatial and sensor data by linking web resources, structured geospatial databases, and unstructured geographic information.
The document discusses semantic web mediation, which involves two main steps: 1) providing semantic access to data through the use of ontologies, and 2) the mediation process. It describes applying these concepts to the Personae project, which uses a mediator to provide unified access and querying of distributed semantic web data sources described by different local ontologies. The mediator aligns the local ontologies to a global reference ontology to facilitate query answering across sources.
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
The document discusses exploring web data and knowledge through the semantic web. It describes how the semantic web adds meaning to data through shared vocabularies and schemas. It also discusses challenges with the large number and diversity of linked open datasets, including issues with accessibility, heterogeneity of schemas, and data quality. It proposes approaches to address these challenges, such as dataset profiling, metadata catalogs, and infrastructure for federated querying.
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
This document discusses enabling discovery and search of linked data and knowledge graphs. It presents approaches for dataset recommendation including using vocabulary overlap and existing links between datasets. It also discusses profiling datasets to create topic profiles using entity extraction and ranking techniques. These recommendation and profiling approaches aim to help with discovering relevant datasets and entities for a given topic or task.
This document discusses ethics considerations for conducting social research in virtual worlds. It outlines some potential uses of virtual worlds for research, including as a tool for coordination, observing behavior, and studying community formation. However, it also notes challenges like participants not being accustomed to formal research and difficulties with identity verification and informed consent. The document presents a case study of a job interview study in Second Life and discusses ethics issues that could arise, like protecting social groups. It proposes a "Virtual World Subject's Bill of Rights" to help ensure subjects understand the research, risks/benefits, their rights to participate as their avatar and withdraw from studies.
The document summarizes key events surrounding the drafting of the US Constitution, including the economic difficulties following the Revolutionary War, Shays' Rebellion, and weaknesses of the Articles of Confederation. It describes the Constitutional Convention in Philadelphia in 1787, with delegates including George Washington, Benjamin Franklin, James Madison, and Alexander Hamilton. The Virginia and New Jersey Plans were debated, with compromises including the Great Compromise and the Three-Fifths Compromise.
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyMaribel Acosta Deibe
Summary of crowdsourcing studies to assess the quality of knowledge graphs and complete missing values. Results focus on findings over the DBpedia knowledge graph ( https://wiki.dbpedia.org/).
Related publications:
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., & Lehmann, J. Crowdsourcing Linked Data Quality Assessment. In International Semantic Web Conference (pp. 260-276), 2013.
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., & Lehmann, J. Detecting Linked Data Quality issues via Crowdsourcing: A DBpedia Study. Semantic Web Journal, 9(3), 303-335, 2018.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: A hybrid SPARQL engine to enhance query answers via crowdsourcing. In Proceedings of the 8th International Conference on Knowledge Capture (p. 11). 2015. Best Student Paper Award.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. Enhancing answer completeness of SPARQL queries via crowdsourcing. Journal of Web Semantics, 45, 41-62, 2017.
Acosta, M., Simperl, E., Flöck, F., & Vidal, M. E. HARE: An engine for enhancing answer completeness of SPARQL queries via crowdsourcing. Companion Volume of the Web Conference (pp. 501-505). 2018.
The document discusses using the Semantic Web as a knowledge base for artificial intelligence applications. It describes how the Semantic Web publishes data on the web in a standardized, linked format. This vast amount of distributed knowledge could be mined by AI in various ways, such as linking data mining to find patterns, using reasoning to analyze and understand raw data, and assessing agreement between ontologies. The Semantic Web represents a large, collaborative base of formally represented knowledge that provides many opportunities for future AI research and applications.
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
Talk at WWW2017 on LRMI adoption, quality and usage. Full paper here: http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/companion/p283.pdf.
This document summarizes recent approaches to web data management including Fusion Tables, XML, and Linked Open Data (LOD). It discusses properties of web data like lack of schema, volatility, and scale. LOD uses RDF, global identifiers (URIs), and data links to query and integrate data from multiple sources while maintaining source autonomy. The LOD cloud has grown rapidly, currently consisting of over 3000 datasets with more than 84 billion triples.
Experience from 10 months of University Linked Data Mathieu d'Aquin
Experience from 10 months of University Linked Data at the Open University:
1. The Open University exposed its public data as linked open data to make the data more discoverable, reusable, and integrated with other datasets.
2. Exposing data as linked data provides benefits like increased transparency, data reuse internally and externally, and reduced costs of managing the university's public data.
3. Other UK universities have since followed the Open University's example in exposing their data as linked data.
The document discusses using linked open data and linked data principles for libraries. It covers key concepts like URIs, RDF triples, ontologies and vocabularies. It then outlines options for libraries to both consume and publish linked data, such as enriching existing catalog data by linking to external sources, creating new information aggregates, and publishing library holdings and metadata as linked open data. Challenges include a lack of common identifiers, FRBRization of existing data, and the need for content curation and new technical systems to fully realize the benefits of linked open data for libraries.
Presentation about - Semantic Web - Overview -Semantic Web
Web of Data, Giant Global Graph, Data Web, Web 3.0, Linked Data Web, Semantic Data Web, Enterprise Information Web, HTML, CSS,
LibraryThing is a social networking site and cataloging tool for readers that has recently implemented work-to-work relationships based on FRBR (Functional Requirements for Bibliographic Records). This allows users to define relationships between works such as "contains" or "parodies". LibraryThing displays these relationships through work pages and relationship manipulation tools. While most integrated library systems have not fully implemented FRBR, LibraryThing's work-to-work implementation is currently the most comprehensive and may inspire further library adoption of FRBR standards.
The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information
This document discusses the growth of RNA sequencing from 2008-2013, with sample sizes increasing from around 2 to 900. It also notes the lack of statisticians represented in big data initiatives, with few statisticians among speakers at several conferences and workshops. Finally, it promotes the author's teaching blog and monthly online statistics courses aimed at teaching data analysis skills.
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
The presentation provides an overview of what an ontology is and how it can be used for representing information and for retrieving data with a particular focus on the linguistic resources available for supporting this kind of task. Overview of semantic-based retrieval approaches by highlighting the pro and cons of using semantic approaches with respect to classic ones. Use cases are presented and discussed
This document summarizes the development of Coursera courses at Johns Hopkins University from 2012-2014. It notes key events including the initial announcement of a partnership with Coursera in July 2012, the first courses being run by Brian Caffo and Roger Peng starting in September 2012, and Jeff Leek running his "Data Analysis" course starting in January 2013. It discusses scaling of enrollment, approaches to building out a data science specialization, and financial details. The document reflects on reasons for their early success in MOOCs including speed, leveraging existing infrastructure, and attracting an orthogonal student population.
The document introduces the principles of Linked Data, which aims to share data rather than documents on the web. It describes the four rules of Linked Data and provides examples of existing Linked Data datasets as well as tools for publishing and using Linked Data. The document also discusses extending Linked Data to include geospatial and sensor data by linking web resources, structured geospatial databases, and unstructured geographic information.
The document discusses semantic web mediation, which involves two main steps: 1) providing semantic access to data through the use of ontologies, and 2) the mediation process. It describes applying these concepts to the Personae project, which uses a mediator to provide unified access and querying of distributed semantic web data sources described by different local ontologies. The mediator aligns the local ontologies to a global reference ontology to facilitate query answering across sources.
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
The document discusses exploring web data and knowledge through the semantic web. It describes how the semantic web adds meaning to data through shared vocabularies and schemas. It also discusses challenges with the large number and diversity of linked open datasets, including issues with accessibility, heterogeneity of schemas, and data quality. It proposes approaches to address these challenges, such as dataset profiling, metadata catalogs, and infrastructure for federated querying.
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
This document discusses enabling discovery and search of linked data and knowledge graphs. It presents approaches for dataset recommendation including using vocabulary overlap and existing links between datasets. It also discusses profiling datasets to create topic profiles using entity extraction and ranking techniques. These recommendation and profiling approaches aim to help with discovering relevant datasets and entities for a given topic or task.
This document discusses ethics considerations for conducting social research in virtual worlds. It outlines some potential uses of virtual worlds for research, including as a tool for coordination, observing behavior, and studying community formation. However, it also notes challenges like participants not being accustomed to formal research and difficulties with identity verification and informed consent. The document presents a case study of a job interview study in Second Life and discusses ethics issues that could arise, like protecting social groups. It proposes a "Virtual World Subject's Bill of Rights" to help ensure subjects understand the research, risks/benefits, their rights to participate as their avatar and withdraw from studies.
The document summarizes key events surrounding the drafting of the US Constitution, including the economic difficulties following the Revolutionary War, Shays' Rebellion, and weaknesses of the Articles of Confederation. It describes the Constitutional Convention in Philadelphia in 1787, with delegates including George Washington, Benjamin Franklin, James Madison, and Alexander Hamilton. The Virginia and New Jersey Plans were debated, with compromises including the Great Compromise and the Three-Fifths Compromise.
Reconciling Humanities and Social Science Research With Data ProtectionDavid Erdos
Humanities and social science research contribute enormously to collective public knowledge and discussion. Such activity will almost invariably involve the processing of personal information and will, therefore, trigger the application of EU data protection law including the forthcoming General Data Protection Regulation (GDPR). This presentation argues that the GDPR’s default provisions – especially as regards the presumption of consent for sensitive data, data subject notification rules and strict discipline provisions – pose an acute threat to such activity. Moreover, whilst the research derogations (Art. 89) ameliorate a few of the issues, they are principally designed for work based on a highly structured, predetermined and largely fiduciary model such as is common in bio-medicine. As recognised by a wide variety of research organizations during debate on the GDPR (including the Wellcome Trust and UK Economic and Social Research Council), given that social/humanities scholarship is intrinsically linked to public knowledge and discussion, it should in fact benefit not just from these research derogations but also from the more permissive (but not absolute) derogations for free speech. The GDPR now recognises this but granting free speech protection for “academic expression” alongside that of journalism, literature and art (Art. 85 (2)). (N.B. These slides are based on a talk given at the University of Hong Kong “Positioning Privacy and Transparency in Data-intensive Research and Data-drive Regulation” on 8 November 2016).
Research is defined as a systematic investigation designed to develop or contribute to generalizable knowledge. It involves carefully defining problems, formulating hypotheses, collecting and organizing data, making deductions, reaching conclusions, and testing conclusions. The main objectives of research are to gain familiarity with phenomena, accurately portray characteristics, determine frequencies of occurrences, and test hypotheses of causal relationships between variables. In conclusion, research is a systematic and logical process that follows specified steps in a specified sequence according to a set of rules.
The document provides an overview of the Indian legal system, including its history and sources of law. Some key points:
- Indian law is largely based on English common law and retains many Acts introduced during British rule.
- The primary sources of law are enactments passed by Parliament and state legislatures. Secondary sources include Supreme Court and High Court judgments.
- The Indian Constitution establishes a democratic republic and guarantees fundamental rights and duties. It contains 395 articles and is the world's longest written constitution.
- The legal system includes criminal and civil codes. It has a three-tiered structure of Supreme Court, High Courts, and subordinate courts. Recent trends focus on alternative dispute resolution and improving judicial efficiency.
The document summarizes the hierarchy of courts in India. At the national level is the Supreme Court, which is the highest court of appeal. At the state level are the High Courts, which have appellate and original jurisdiction over subordinate courts. Subordinate courts exist at the district and lower levels, and include civil courts like district courts, and criminal courts like sessions courts and magistrate courts. The document outlines the jurisdiction and sentencing powers of the different courts in India's judicial system. It also discusses the separation of judicial and executive powers between different types of magistrates.
The document discusses various types of research including applied research, basic research, correlational research, descriptive research, ethnographic research, experimental research, and exploratory research. Applied research seeks practical solutions to problems, while basic research expands knowledge without a direct application. Correlational research examines relationships between variables without determining cause and effect. Descriptive research provides accurate portrayals of characteristics, and ethnographic research involves in-depth study of cultures. Experimental research establishes cause-and-effect through controlled manipulation of variables.
Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
The document provides an overview of the work done at DERI Galway, including developing technologies like SIOC, ActiveRDF, and BrowseRDF to interconnect online communities and enable semantic applications. It also describes JeromeDL, a digital library system that uses semantic metadata and services to allow users to collaboratively browse and share knowledge.
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
The Internet represents the connections among computers and devices, the world wide web is a network of interconnected documents, and the semantic web is the closest thing we have today to a network of interconnected facts. Noticeably absent from these global networks is any sort of open, formal representation for an online global social network. Each users' online presence, and its immediate social network, are isolated and typically only available within the confines of the social networking site that hosts it. Discovery across explicit online social networks and implicit social networks such as those that can be inferred from co-authorship relationships and affiliations is, for all practical purposes, impossible. And yet there are practical and non-nefarious reasons why an organization might be interested in exploring portions of such a network. Outreach is one such interest. Los Alamos National Laboratory (LANL) prototyped EgoSystem to harvest and explore the professional social networks of post doctoral students. The project's goal is to enlist past students and other Lab alumni as ambassadors and advocates for LANL's ongoing mission. During this talk we will discuss the various technologies that support the EgoSystem and demonstrate some of its capabilities.
Talk at Semantic Technology Conference, 2010, 23 June, 2010, San Francisco.
The LOD cloud has a potential for applicability in many AI-related tasks, such as open domain question answering, knowledge discovery, and the Semantic Web. An important prerequisite before the LOD cloud can enable these goals is allowing its users (and applications) to effectively pose queries to and retrieve answers from it. However, this prerequisite is still an open problem for the LOD cloud and has restricted it to “merely more data.” To transform the LOD cloud from "merely more data" to "semantically linked data” there are plenty of open issues which should be addressed. We believe this transformation of the LOD cloud can be performed by addressing the shortcomings identified by us: lack of conceptual description of datasets, lack of expressivity, and difficulties with respect to querying.
This document discusses how adding formal semantics to linked open data can make it more useful and powerful. It describes how existing linked data lacks formal semantics, limiting its capabilities. The document proposes two approaches: 1) Enriching linked data schemas using ontology matching techniques to capture relationships between datasets. 2) Developing a system called LOQUS that can perform federated queries across multiple linked datasets by decomposing queries and merging results. This would allow queries without needing intimate knowledge of each dataset's structure.
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Arfon Smith
Chief Scientist for GitHub
Open Government/Open Data
What Academia Can Learn from Open Source
Find more by Arfon here: https://speakerdeck.com/arfon
bridging formal semantics and social semantics on the webFabien Gandon
The document summarizes research on bridging formal semantics and social semantics on the web. It discusses:
1) The Wimmics research team which studies web-instrumented machine interactions, communities, and semantics using a multidisciplinary approach and typed graphs.
2) The challenge of analyzing, modeling, and formalizing social semantic web applications for communities by combining formal semantics and social semantics.
3) Examples of past work that have structured folksonomies, combined metric spaces for tags, and analyzed sociograms and social networks.
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
In search of lost knowledge: joining the dots with Linked Datajonblower
These slides are from my seminar to the University of Reading Department of Meteorology, November 2013. They contain a (hopefully not very technical) introduction to the concepts of Linked Data and how we are applying them in the CHARMe project (http://www.charme.org.uk). In CHARMe we are using Open Annotation to connect users of climate data with community-generated "commentary information" that helps them to understand a dataset's strengths and weaknesses.
The slide notes contain some helpful context, so you might like to download the PPT file!
The slides are licensed as "Creative Commons Attribution 3.0", meaning that you can do what you like with these slides provided that you credit the University of Reading for their creation. See http://creativecommons.org/licenses/by/3.0/.
Information Extraction and Linked Data CloudDhaval Thakker
The document discusses Press Association's semantic technology project which aims to generate a knowledge base using information extraction and the Linked Data Cloud. It outlines Press Association's operations and workflow, and how semantic technologies can be used to develop taxonomies, annotate images, and extract entities from captions into an ontology-based knowledge base. The knowledge base can then be populated and interlinked with external datasets from the Linked Data Cloud like DBpedia to provide a comprehensive, semantically-structured source of information.
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Marko Rodriguez
• The Semantic Web is a distributed, flexible modeling framework.
• The Semantic Web is primarily descriptive in nature. The Semantic Web is used to describe web-pages, services, systems, etc.
• Neno is an object-oriented language that was designed specifically for the Semantic Web.
• Fhat is a virtual machine represented in the Semantic Web.
• With Neno/Fhat the Semantic Web now has a procedural component. The Semantic Web now includes object methods, algorithms, and computing machines.
• The Semantic Web can be made to behave like a distributed, general-purpose computer. Not just an information repository.
eScience: A Transformed Scientific MethodDuncan Hull
The document discusses the concept of eScience, which involves synthesizing information technology and science. It explains how science is becoming more data-driven and computational, requiring new tools to manage large amounts of data. It recommends that organizations foster the development of tools to help with data capture, analysis, publication, and access across various scientific disciplines.
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Stuart Chalk
Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.
The document discusses electronic laboratory notebooks and blogs as a way to record scientific experiments and share data. It proposes using blogs to document experiments in a more collaborative way, while also capturing metadata and linking data to provide context. Challenges addressed include capturing the full context around experiments, facilitating collaboration and discussion, and improving access to data over time.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
The document discusses the concepts of semantic technology and the semantic web. It defines key concepts like tabula rasa, the network effect, and intelligence embedded in data through relationships. It also outlines technologies used in the semantic web like RDF, OWL, SPARQL, FOAF, and DBpedia and how search engines and companies are using these technologies for applications like sentiment analysis, natural language processing, and information extraction.
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...Marko Rodriguez
The large-scale analysis of scholarly artifact usage is constrained primarily by current practices in usage data archiving, privacy issues concerned with the dissemination of usage data, and the lack of a practical ontology for modeling the usage domain. As a remedy to the third constraint, this article presents a scholarly ontology that was engineered to represent those classes for which large-scale bibliographic and usage data exists, supports usage research, and whose instantiation is scalable to the order of 50 million articles along with their associated artifacts (e.g. authors and journals) and an accompanying 1 billion usage events. The real world instantiation of the presented abstract ontology is a semantic network model of the scholarly community which lends the scholarly process to statistical analysis and computational support. We present the ontology, discuss its instantiation, and provide some example inference rules for calculating various scholarly artifact metrics.
This document discusses visualizing activity data through various visualization types and tools. It provides examples of different types of activity data that can be visualized, including library usage data and virtual learning environment usage data. The document discusses visualization types like treemaps, cycle plots, and network graphs that may be suitable for different types of time-series or dimensional activity data. It also discusses tools for visualization like R, Gephi, and Graphviz and how data format and structure influence visualization choices. The overall goal is to help choose effective visualizations to discover stories or insights from activity data.
Similar to How the Web can change social science research (including yours) (20)
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
I claim that none of the commonly used embedding methods capture any semantics.
It's fine if you want to move from a symbolic to a numeric or geometric representation, but when you do, don't throw the semantic baby out with the symbolic bathwater.
I argue that a useful definition of semantics is "predictable inference". This makes it possible to have semantics outside a logical framework.
A methodological warning from 1976: don't fool yourself that wishful mnemonics in your knowledge graph are "semantics". Therefore, knowledge graphs without a schema/ontology is just a data graph, without much semantics.
Finally, a discussion of some embedding methods that do manage to take semantics into account (TransOWL, ball embeddings like ELEm and EmEL++, and box embeddings like BoxEL and Box^2EL.
So: even if you do move to a non-symbolic representation (numerical, geometric), make sure you keep the semantics: don't throw the semantic baby out with the symbolic bathwater.
Logo's of companies for which I have verified evidence that they use knowledge graphs and ontologies in production. This list is (of course) incomplete.
Modular design patterns for systems that learn and reason: a boxologyFrank van Harmelen
A set of modular design patterns that can describe a large number of neuro-symbolic architectures from the literature. Corresponding paper is at https://arxiv.org/abs/2102.11965
This document discusses empirical semantics and insights that can be gained from observing large knowledge bases. It notes that formal semantics often does not accurately model real-world knowledge and proposes some challenges for developing alternative semantic models. Specifically, it suggests empirical study has shown that identity, meaningful names, and different predicate patterns are not adequately captured. The goal is to develop descriptive theories of knowledge based on observation rather than prescriptive theories.
We now have larger Knowledge Bases than ever before. (10 billion facts is now a small number).
We now have the instruments to observe and analyse these very large Knowledge Bases.
We can use these insights for better tools for querying, inferencing, publishing, maintaining, visualising and explaining.
The end of the scientific paper as we know it (or not...)Frank van Harmelen
Two talks in one: the first talk expanding on the great promises of nanopublications, the second talk pointing out why much of that is too difficult (and some of it wrong).
On the nature of AI, and the relation between symbolic and statistical approa...Frank van Harmelen
The document discusses the differences between symbolic and statistical approaches to artificial intelligence (AI). It notes that while modern AI is dominated by machine learning, the two approaches have different strengths and weaknesses. Symbolic AI is better for reasoning, planning, and explanation, while statistical AI excels at pattern recognition, motor skills, and tasks using large datasets. The semantic web allowed symbolic knowledge representation to scale up significantly using web technologies, but introduced challenges that require machine learning techniques to address.
The end of the scientific paper as we know it (in 4 easy steps)Frank van Harmelen
Scientific publishing hasn't really changed in over 300 years. By changing papers from a single narrative text (readable by people only) into a rich network of snippets of knowledge ("nano-publications") we would allow computers to become our colleagues instead of just our tools
An increasing number of patients suffer from multiple diseases at the same time. This makes their treatment much more complex, and the standard medical treatment guidelines no longer apply (they are typically written for patients with just a single disease). We present computer-based techniques for analysing medical guidelines to detect how multiple guidelines may interact in unexpected ways, and how Linked Open Data can be used to recognise and avoid such adverse effects.
Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Frank van Harmelen
We show how it is possible to apply the problem solving patterns from knowledge engineering to systems developed on the semantic web. This give us re-usable problem solving patterns for the semantic web, and would greatly help us to build and understand such systems.
I argue why I think that Computer Science (or better: Informatics) is a "natural science", in the same sense that physics, astronomy, biology, psychology and sociology are a natural science: they study a part of the world around us. In that same sense, I think Informatics studies a part of the world around us.
For a similar talk (including script), but more aimed at a Semantic Web audience in particular, see http://www.cs.vu.nl/~frankh/spool/ISWC2011Keynote/
(or http://videolectures.net/iswc2011_van_harmelen_universal/ for a video registration)
The document discusses and refutes four popular fallacies about the Semantic Web. It clarifies that the Semantic Web enforces languages but not meanings, does not require a single predefined meaning for terms but allows for different vocabularies to be bridged, and does not require users to understand formalized knowledge representation as this is done automatically behind the scenes. It also notes the Semantic Web does not require manually marking up all existing web pages as techniques are being developed to automatically add semantic markup.
The document discusses open data for open government and the benefits of publishing government data in a semantic, linked, and open format on the web. It provides examples of open data initiatives in the US, UK, and other countries that have led to the development of many applications by third parties using publicly available government data. The speaker advocates that governments publish not just documents but the underlying data to allow others to build new sites and applications to make use of the information.
A non-technical explanation of the main ideas and notions in OWL.This talk was also recorded on video, and is available on-line at http://videolectures.net/koml04_harmelen_o/
The document discusses the W3C stack for representing metadata, with XML providing syntax but no semantics, RDF and RDF Schema defining a data model for relations between resources and a vocabulary definition language, and OWL adding more expressivity with concepts such as classes, properties, and cardinality restrictions. It also covers RDF syntaxes like Turtle and XML, and how RDF can represent implied claims from XML and facilitate interoperability between systems through its abstract model.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
How the Web can change social science research (including yours)
1. How the Web
can change
social science research
(including yours)
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
2. Using the web (of data)
for e-science
in Social Sciences
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Health Warning:
Computer
Scientist!
3. This talk is about
using the web
as an observational instrument
using the web of data
as an even better observational instrument
using the web of data
as a data-sharing platform
4. This talk is not about
it's NOT social science about e-science
(e.g Oxford research center)
it's NOT about high-performance computing
(that's just boring infrastructure,
let the computer scientists will deal with that)
I don’t discuss online social experiments
(crowd sourcing, social games, mech. turk, etc)
5. Who are you?
who is using large computerised data-sets ?
who is using data extracted from the web ?
who is using semantic web data ?
6. This talk is about
using the web & the web of data
as an observational instrument &
as a sharing platform
Through:
A whole bunch of realistic examples
A sketch of the technology
Message = yes, you can do this too!
15. Question: Is the content of party-political
programmes and election speeches predictive
of government coalition attempts?
Data
• All party manifesto’s,
• half a year of all Dutch newspapers
17. Question: Can we predict the social network
at Tn from the content at Tn-1?
Data
• Discussions from online forum nl.politiek
• 21.000 participants talking about 19 Dutch
political parties during 259 weeks
23. General idea of Web of Data
(a.k.a. “Semantic Web”)
1. Make data available on the Web
in machine-understandable form
(formalised)
2. Structure the data
and meta-data
in ontologies
25. Bluffer’s Guide to RDF
• Express relations between things:
• Results in labelled network (“graph”)
• All labels are actually web-addresses (URIs)
• You can “ping” any label and find out more
• Bits of the graph can live at physically different
locations & have different owners
Frank y
x
AuthorOf
MIT
publishedBy
Subject Object
Predicate
26. Bluffer’s Guide to RDF Schema
• types for subjects & objects & predicates
• Types organised in a hierarchy
• Inheritance of properties
Frank y
x
AuthorOf
MIT
publishedBy
author book publisher
person artifact
man
27. Ontologies (= hierarchical
conceptual vocabularies)
Identify the key concepts in a domain
Identify a vocabulary for these concepts
Identify relations between these concepts
Make these precise enough
so that they can be shared between
• humans and humans
• humans and machines
• machines and machines
28. Biomedical ontologies (a few..)
Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
UMLS
• Integrates 100 different vocabularies
SNOMED
• 200.000 concepts, College of American Pathologists
Gene Ontology
• 15.000 terms in molecular biology
NCBI Cancer Ontology:
• 17,000 classes (about 1M definitions),
29. On the Web of Data, anyone
can link anything to anything
x T
[<x> IsOfType <T>]
different
owners & locations
<institute>
40. The World Bank is also doing it!
http://data.worldbank.org/
7,000 indicators from World Bank data sets.
41. The US gov is also doing it!
http://data.gov/ : 390.000 data sets
Compare foreign aid budgets
Does tax influence smokers?
Compare campaign money
42. already many billions of facts & rules
Everybody’s doing it!
May ‘09 estimate > 4.2 billion triples +
140 million interlinks
It gets bigger every month
44. And many more
• Reuters
• New York Times
• EU (EUROSTAT, others)
• BBC
• Facebook
• ….
45. So how good is this
observational instrument ?
Studies on validity (e.g. in science dynamics)
methods for provenance & trust
methods for attribution & citation
46. For real ?
“ use the power of information to
explore social and economic life on
Earth ”
1bn€ over 10 years
48. Take home message
use the web & the web-of-data
to obtain your data
use the web-of-data to share your data
yes, you can do this too!
Collaborate with computer scientists
reflect on deeper consquences
for the social sciences
(methodological, theoretical, etc)
49. Acknowledgements
I’ve freely used material from the work of
Shenghui Wang
Paul Groth
Julie Birkholz
Wouter van Atteveldt
Laurens van Rietveld
Rinke Hoekstra
and many in the Semantic Web community
Editor's Notes
Add pictures
Add pictures
Add pictures
Talk about citation data, difficult to get2 weeks to gather a couple of hundred citation scores