Linkitup is a Web-based dashboard for enrichment of research output published via industry grade data repository services. It takes metadata entered through Figshare.com and tries to find equivalent terms, categories, persons or entities on the Linked Data cloud and several Web 2.0 services. It extracts references from publications, and tries to find the corresponding Digital Object Identifier (DOI). Linkitup feeds the enriched metadata back as links to the original article in the repository, but also builds a RDF representation of the metadata that can be downloaded separately, or published as research output in its own right. In this paper, we compare Linkitup to the standard workflow of publishing linked data, and show that it significantly lowers the threshold for publishing linked research data.
Advertising with Linked Data in Web ContentMartin Hepp
Advertising with Linked Data in Web Content: From Semantic SEO to E-Commerce on the Web 3.0
Slides and audio from my talk given at the Knowledge Engineering Group of the University of Economics Prague.
http://keg.vse.cz/seminar.php?datetime=2011-04-06
Locating scientific government information on the webShannon Lynch
This is a 2017 Powerpoint presentation given at the Department of Interior Library. The sources and information were correct at the time of presentation but have not been updated and should be double checked for current accuracy. Please feel free to contact the Department of Interior Library with any questions.
Advertising with Linked Data in Web ContentMartin Hepp
Advertising with Linked Data in Web Content: From Semantic SEO to E-Commerce on the Web 3.0
Slides and audio from my talk given at the Knowledge Engineering Group of the University of Economics Prague.
http://keg.vse.cz/seminar.php?datetime=2011-04-06
Locating scientific government information on the webShannon Lynch
This is a 2017 Powerpoint presentation given at the Department of Interior Library. The sources and information were correct at the time of presentation but have not been updated and should be double checked for current accuracy. Please feel free to contact the Department of Interior Library with any questions.
Information Extraction and Linked Data CloudDhaval Thakker
In the media industry there is a great emphasis on providing descriptive metadata as part of the media assets to the consumers. Information extraction (IE) is considered an important tool for metadata generation process and its performance largely depend on the knowledge base it utilizes. The advances in the “Linked Data Cloud” research provide a great opportunity for generating such knowledge base that benefit from the participation of wider community. In this talk, I will discuss our experiences of utilizing Linked Data Cloud in conjunction with a GATE-based IE system.
Slawski New Approaches for Structured Data:Evolution of Question Answering Bill Slawski
Google has moved from Search to Knowledge, and Focusing on Answering questions with knowledge graph entity information provides has led to answering queries with Knowledge graphs for those questions, with confidence scores between entities and other entities or attributes of entities, based upon freshness, reliabilillity, popularity, and proximity between an entity and another entity or an attribute.
This presentation was provided by Rob Sanderson of the J. Paul Getty Trust during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
Open science can contribute to AI trustworthiness. This talk is a categorization of scientific data platforms, and a framing of AI trustworthiness with pointers to open science contributions.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
Dealing with poor data quality of osint data in fraud risk analysisUniversity of Twente
Presented at the SIKS Smart Auditing Workshop, 25 Feb 2015.
Governmental organizations responsible for keeping certain types of fraud under control, often use data-driven methods for both immediate detection of fraud, or for fraud risk analysis aimed at more effectively targeting inspections. A blind spot in such methods, is that the source data often represents a 'paper reality'. Fraudsters will attempt to disguise themselves in the data they supply painting a world in which they do nothing wrong. This blind spot can be counteracted by enriching the data with traces and indicators from more 'real-world' sources such as social media and internet. One of the crucial data management problems in accomplishing this enrichment is how to capture and handle data quality problems. The presentation will start with a real-world example, which is also used as starting point for a problem generalization in terms of information combination and enrichment (ICE). We then present the ICE technology as well as how data quality problems can be managed with probabilistic databases. In terms of the 4 V's of big data -- volume, velocity, variety and veracity -- this presentation focuses on the third and fourth V's: variety and veracity.
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Werner Leyh
Abstract. The aim of this work is to explore the opportunities offered by
semantic standardization to interlink primary “spatial data” (GI) from “Open-
StreetMap” (OSM) with repositories of the “Linked Open Data Cloud” (LOD).
Research in natural sciences can generate vast amounts of spatial data, where
Wikidata could be considered as the central hub between more detailed natural
science hubs on the spatial semantic web. Wikidata is a world readable and
writable community-driven knowledge base. It offers the opportunity to collaboratively
construct an open access knowledge graph that spans biology,
medicine, and all other domains of knowledge. In this study, we discuss
the opportunities and challenges provided by exploring Wikidata as a central
integration facility by interlink it with OSM, a popular, community driven
collection of free geographic data. This is empowered by the reuse of terms
and properties from commonly understood controlled vocabularies that
represent their respective well-identified knowledge domains.
URL: https://www.springerprofessional.de/en/interlinking-standardized-openstreetmap-data-and-citizen-science/13302088
DOI: https://doi.org/10.1007/978-3-319-60366-7_9
Werner Leyh, Homero Fonseca Filho
University of São Paulo (USP), São Paulo, Brazil
WernerLeyh@yahoo.com
For everybody who gets tired of questions like “when is the Semantic Web actually going to happen”, or any other suggestion that the Semantic Web programme is “only vision, no progress”.
Empowering red and blue teams with osint c0c0n 2017reconvillage
This talk will discuss Open Source Intelligence (OSINT) gathering tools and techniques that are highly useful and effective for both Blue teams and Red teams.
After the Data Breach: Stolen CredentialsSBWebinars
Credentials don’t start out on the dark web - they end there.
When usernames and passwords are compromised in a data breach, the consequences extend far beyond the victim organization due to rampant password reuse. For this reason, NIST recently recommended that organizations check users’ credentials against a set of known compromised passwords. However, by patroning dark web forums and paying for spilled credentials, enterprises indirectly support the criminal ecosystem. Furthermore, attackers often don’t publicly post stolen data until months or years after the breach, if at all. Is there a better way to follow NIST guidelines and protect users from account takeover?
Join Justin Richer, co-author of NIST Digital Identity Guidelines 800-63B, and Gautam Agarwal, Blackfish Product Manager, for a lively discussion on NIST’s password recommendations and how best to prevent account takeover fraud at your organization.
Agenda:
The Threat of Stolen Credentials
Reasoning Behind NIST’s Password Recommendations
Ways to Manage a Password “Breach Corpus”
How Blackfish Helps Organizations Follow NIST Guidelines
Information Extraction and Linked Data CloudDhaval Thakker
In the media industry there is a great emphasis on providing descriptive metadata as part of the media assets to the consumers. Information extraction (IE) is considered an important tool for metadata generation process and its performance largely depend on the knowledge base it utilizes. The advances in the “Linked Data Cloud” research provide a great opportunity for generating such knowledge base that benefit from the participation of wider community. In this talk, I will discuss our experiences of utilizing Linked Data Cloud in conjunction with a GATE-based IE system.
Slawski New Approaches for Structured Data:Evolution of Question Answering Bill Slawski
Google has moved from Search to Knowledge, and Focusing on Answering questions with knowledge graph entity information provides has led to answering queries with Knowledge graphs for those questions, with confidence scores between entities and other entities or attributes of entities, based upon freshness, reliabilillity, popularity, and proximity between an entity and another entity or an attribute.
This presentation was provided by Rob Sanderson of the J. Paul Getty Trust during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
Open science can contribute to AI trustworthiness. This talk is a categorization of scientific data platforms, and a framing of AI trustworthiness with pointers to open science contributions.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
Dealing with poor data quality of osint data in fraud risk analysisUniversity of Twente
Presented at the SIKS Smart Auditing Workshop, 25 Feb 2015.
Governmental organizations responsible for keeping certain types of fraud under control, often use data-driven methods for both immediate detection of fraud, or for fraud risk analysis aimed at more effectively targeting inspections. A blind spot in such methods, is that the source data often represents a 'paper reality'. Fraudsters will attempt to disguise themselves in the data they supply painting a world in which they do nothing wrong. This blind spot can be counteracted by enriching the data with traces and indicators from more 'real-world' sources such as social media and internet. One of the crucial data management problems in accomplishing this enrichment is how to capture and handle data quality problems. The presentation will start with a real-world example, which is also used as starting point for a problem generalization in terms of information combination and enrichment (ICE). We then present the ICE technology as well as how data quality problems can be managed with probabilistic databases. In terms of the 4 V's of big data -- volume, velocity, variety and veracity -- this presentation focuses on the third and fourth V's: variety and veracity.
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Werner Leyh
Abstract. The aim of this work is to explore the opportunities offered by
semantic standardization to interlink primary “spatial data” (GI) from “Open-
StreetMap” (OSM) with repositories of the “Linked Open Data Cloud” (LOD).
Research in natural sciences can generate vast amounts of spatial data, where
Wikidata could be considered as the central hub between more detailed natural
science hubs on the spatial semantic web. Wikidata is a world readable and
writable community-driven knowledge base. It offers the opportunity to collaboratively
construct an open access knowledge graph that spans biology,
medicine, and all other domains of knowledge. In this study, we discuss
the opportunities and challenges provided by exploring Wikidata as a central
integration facility by interlink it with OSM, a popular, community driven
collection of free geographic data. This is empowered by the reuse of terms
and properties from commonly understood controlled vocabularies that
represent their respective well-identified knowledge domains.
URL: https://www.springerprofessional.de/en/interlinking-standardized-openstreetmap-data-and-citizen-science/13302088
DOI: https://doi.org/10.1007/978-3-319-60366-7_9
Werner Leyh, Homero Fonseca Filho
University of São Paulo (USP), São Paulo, Brazil
WernerLeyh@yahoo.com
For everybody who gets tired of questions like “when is the Semantic Web actually going to happen”, or any other suggestion that the Semantic Web programme is “only vision, no progress”.
Empowering red and blue teams with osint c0c0n 2017reconvillage
This talk will discuss Open Source Intelligence (OSINT) gathering tools and techniques that are highly useful and effective for both Blue teams and Red teams.
After the Data Breach: Stolen CredentialsSBWebinars
Credentials don’t start out on the dark web - they end there.
When usernames and passwords are compromised in a data breach, the consequences extend far beyond the victim organization due to rampant password reuse. For this reason, NIST recently recommended that organizations check users’ credentials against a set of known compromised passwords. However, by patroning dark web forums and paying for spilled credentials, enterprises indirectly support the criminal ecosystem. Furthermore, attackers often don’t publicly post stolen data until months or years after the breach, if at all. Is there a better way to follow NIST guidelines and protect users from account takeover?
Join Justin Richer, co-author of NIST Digital Identity Guidelines 800-63B, and Gautam Agarwal, Blackfish Product Manager, for a lively discussion on NIST’s password recommendations and how best to prevent account takeover fraud at your organization.
Agenda:
The Threat of Stolen Credentials
Reasoning Behind NIST’s Password Recommendations
Ways to Manage a Password “Breach Corpus”
How Blackfish Helps Organizations Follow NIST Guidelines
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data — the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
We’ll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next we’ll delve into the evolution of the U.S. Environmental Protection Agency’s Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the world’s largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
PEARC17: ARCC Identity and Access Management, Security and related topics. Cy...Florence Hudson
This presentation explains the NSF EAGER #1650445 Cybersecurity Research Transition To Practice (TTP) Acceleration funded program led by Internet2, inviting researchers and practitioners of IT and cybersecurity to participate.
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
Invited talk, PHIL_OS, March 30-31 2023, Exeter
https://opensciencestudies.eu/whither-open-science. Includes hidden slides.
FAIR and Open Science needs Digital Research Infrastructure, which is a federated system of systems and needs funding models that are fit for purpose
Culture change needed for paying for Open Science’s infrastructure and funding support for data driven research needs more reality and less rhetoric
Open Source Collaboration in Drug Discovery in PharmaKees van Bochove
How pre-competitive collaboration in the pharmaceutical sector through open source platforms enables joint innovation of academics, pharma, SMEs and non-profits.
LIBER Webinar: 23 Things About Research Data ManagementLIBER Europe
These are the slides for the LIBER Webinar "23 Things About Research Data Management", held on 23 February 2017. A recording of the webinar is available here: https://www.youtube.com/watch?v=HGH6fVHrnKQ
It19 20140721 linked data personal perspectiveJanifer Gatenby
A presentation made for Standards Australia's seminar. Outlines the basic aspects of linked data from a personal perspective and where it fits with direct and subject searching.
How open data contribute to improving the world. The life science use case. The technical, social, ethical issues.
This was a talk given within the iGEM 2020 programme by the London Imperial College students group (https://2020.igem.org/Team:Imperial_College), in a webinar organised by the SOAPLab group on the topic of Ethics of Automation. Excellent Dr Brandon Sepulvado was the other speaker of the day.
Similar to Linkitup: Link Discovery for Research Data (20)
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
Presentation of our paper at the WHISE workshop at ESWC 2016 on requirements for metadata over non-public datasets for the science & technology studies field.
Prov-O-Viz is a visualisation service for provenance graphs expressed using the W3C PROV vocabulary. It uses the Sankey-style visualisation from D3js.
See http://provoviz.org
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerRinke Hoekstra
In this paper we explore the possibilities of using the Linked Data representation of all Dutch regulations stored in the MetaLex Doc- ument Server for the purposes of network analysis over the citation graph between regulations, both at the document level, and at the article level. We show that this is possible using relatively straightforward SPARQL queries, and present preliminary results of the analysis.
A Network Analysis of Dutch Regulations. Rinke Hoekstra. figshare.
http://dx.doi.org/10.6084/m9.figshare.689880
Retrieved 11:12, Oct 07, 2013 (GMT)
This presentation describes the use by Data2Semantics (http://www.data2semantics.org) of the VIVO portal (http://vivoweb.org) for interlinking researchers contributing to projects within the COMMIT programme (http://www.commit-nl.nl).
The Data2Semantics project (COMMIT P23) is all about enriching research data, and making it more reusable for future research. Using Linked Data for this task is a fairly obvious step to make (surprise!). However, there are several shortcomings the current practices in publishing Linked Data, that calls for a slightly
different approach which (hopefully) bridges a gap between Web 2.0 and Web 3.0. I will present a proof-of-concept service (Linkitup) that works on top of existing scientific data repositories, and allows individual researchers to enrich their data with additional (linked) metadata.
Talk about the use of Linked Data in historical research on census data. Has some slides about TabLInker as well (http://github.com/Data2Semantics/TabLinker). Part of the data2semantics project (http://data2semantics.org)
Presentatie voor de Belastingdienst in het kader van een onderzoek naar de (on)mogelijkheden rond het herkennen en extraheren van concepten en hun definities, en het representeren daarvan met Semantic Web standaarden.
History of Knowledge Representation (SIKS Course 2010)Rinke Hoekstra
The goal of AI research is the simulation and approximation of human intelligence by computers. To a large extent this comes down to the development of computational reasoning services that allow machines to solve problems. Robots are the stereotypical example: imagine what a robot needs to know before it is able to interact with the world the way we do? It needs to have a highly accurate internal representation of reality. It needs to turn perception into action, know how to reach its goals, what objects it can use to its advantage, what kinds of objects exist, etc.
The field of knowledge representation (KR) tries to deal with the problems surrounding the incorporation of some body of knowledge (in whatever form) in a computer system, for the purpose of automated, intelligent reasoning. In this sense, knowledge representation is the basic research topic in AI. Any artificial intelligence is dependent on knowledge, and thus on a representation of that knowledge. The history of knowledge representation has been nothing less than turbulent. The roller coaster of promise of the 50's and 60's, the heated debates of the 70's, the decline and realism of the 80's and the ontology and knowledge management hype of the 90's each left a clear mark on contemporary knowledge representation technology and its application.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Linkitup: Link Discovery for Research Data
1. 2 Semantics
Datato
From Data
Semantics for Scientific Data Publishers
linkitup
Link Discovery for Research Data
Rinke Hoekstra and Paul Groth
Network Insitute, VU University Amsterdam
Law Faculty, University of Amsterdam
★
★
Linkitup - Link Discovery for Research Data by Rinke Hoekstra
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
2. 2 Semantics
Datato
From Data
Semantics for Scientific Data Publishers
linkitup
Link Discovery for Research Data
Rinke Hoekstra and Paul Groth
Network Insitute, VU University Amsterdam
Law Faculty, University of Amsterdam
★
★
How to share, publish, access, analyse, interpret and reuse data?
Linkitup - Link Discovery for Research Data by Rinke Hoekstra
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
8. www.nature.com/nature
Data’s shameful neglect
Vol 461 | Issue no. 7261 | 10 September 2009
Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
M
ore and more often these days, a research project’s success is
measured not just by the publications it produces, but also by
the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful
such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed
in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional
and cultural frameworks required to support such open data access
(see pages 168 and 171) — leading to a scandalous shortfall in the
sharing of data by researchers (see page 160). This deficiency urgently
needs to be addressed by funders, universities and the researchers
themselves.
Research funding agencies need to recognize that preservation of
and access to digital data are central to their mission, and need to
be supported accordingly. Organizations in the United Kingdom,
for instance, have made a good start. The Joint Information Systems
Committee, established by the seven UK research councils in 1993,
has made data-sharing a priority, and has helped to establish a Digital
Curation Centre, headquartered at the University of Edinburgh, to be
a national focus for research and development into data issues. Other
European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a
29-member Interagency Working Group on Digital Data has been
trying to get US funding agencies to develop plans for how they will
support data archiving — and just as importantly, to develop policies
on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back.
They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure
that they support not only the hardware needed to store the data, but
also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline
the tedious process of annotating data with a description of what the
bits mean, which instrument collected them, which algorithms have
been used to process them and so on — information that is essential
if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and
combined in unanticipated ways, is software that can keep track of
which pieces of data came from whom. Such systems are essential if
tenure and promotion committees are ever to give credit — as they
should — to candidates’ track-record of
“Data management
data contribution.
Who should host these data? Agencies should be woven
and the research community together into every course in
need to create the digital equivalent science.”
of libraries: institutions that can take
responsibility for preserving digital data and making them accessible
over the long term. The university research libraries themselves are
obvious candidates to assume this role. But whoever takes it on, data
preservation will require robust, long-term funding. One potentially
helpful initiative is the US National Science Foundation’s DataNet
programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a
vigorous programme of education and outreach about data. Consider,
for example, that most university science students get a reasonably
good grounding in statistics. But their studies rarely include anything
about information management — a discipline that encompasses the
entire life cycle of data, from how they are acquired and stored to how
they are organized, retrieved and maintained over time. That needs
to change: data management should be woven into every course in
science, as one of the foundations of knowledge.
■
A step too far?
a base on the Moon, then send them to Mars. This idea immediately
set off a debate that is still continuing, in which sceptics ask whether
there is any point in returning to the Moon nearly half a century
after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space
beyond Earth?
Yet that debate is both counter-productive — a new set of rockets
could go to all of these places — and moot, because Bush’s vision
never attracted the hoped-for budget increases. Indeed, a blue-riband
commission reporting to US President Barack Obama this week (see
page 153) finds the organizational malaise unchanged: NASA is still
doing too much with too little. Without more money, the agency won’t
be sending people anywhere beyond the International Space Station,
which resides in low Earth orbit only 350 kilometres up. And even the
ability to do that is in question: Ares I, the US rocket that would return
Research cannot flourish if data are not preserved and made
accessible. All concerned must act accordingly.
DATA
The Obama administration must fund human space
flight adequately, or stop speaking of ‘exploration’.
A
fter the space shuttle Columbia burned up during re-entry
into Earth’s atmosphere in 2003, the board that was convened
to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed
out, the shuttle programme had been trying to do too much with
too little money. NASA desperately needed a clearer vision and a
better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply
that vision with a new long-term goal: first send astronauts to build
145
145-146 Editorials WF IF.indd 145
8/9/09 14:06:40
Silver Bullet?
http://on.wsj.com/XCajtB
9. www.nature.com/nature
Data’s shameful neglect
Vol 461 | Issue no. 7261 | 10 September 2009
Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
M
ore and more often these days, a research project’s success is
measured not just by the publications it produces, but also by
the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful
such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed
in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional
and cultural frameworks required to support such open data access
(see pages 168 and 171) — leading to a scandalous shortfall in the
sharing of data by researchers (see page 160). This deficiency urgently
needs to be addressed by funders, universities and the researchers
themselves.
Research funding agencies need to recognize that preservation of
and access to digital data are central to their mission, and need to
be supported accordingly. Organizations in the United Kingdom,
for instance, have made a good start. The Joint Information Systems
Committee, established by the seven UK research councils in 1993,
has made data-sharing a priority, and has helped to establish a Digital
Curation Centre, headquartered at the University of Edinburgh, to be
a national focus for research and development into data issues. Other
European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a
29-member Interagency Working Group on Digital Data has been
trying to get US funding agencies to develop plans for how they will
support data archiving — and just as importantly, to develop policies
on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back.
They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure
that they support not only the hardware needed to store the data, but
also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline
the tedious process of annotating data with a description of what the
bits mean, which instrument collected them, which algorithms have
been used to process them and so on — information that is essential
if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and
combined in unanticipated ways, is software that can keep track of
which pieces of data came from whom. Such systems are essential if
tenure and promotion committees are ever to give credit — as they
should — to candidates’ track-record of
“Data management
data contribution.
Who should host these data? Agencies should be woven
and the research community together into every course in
need to create the digital equivalent science.”
of libraries: institutions that can take
responsibility for preserving digital data and making them accessible
over the long term. The university research libraries themselves are
obvious candidates to assume this role. But whoever takes it on, data
preservation will require robust, long-term funding. One potentially
helpful initiative is the US National Science Foundation’s DataNet
programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a
vigorous programme of education and outreach about data. Consider,
for example, that most university science students get a reasonably
good grounding in statistics. But their studies rarely include anything
about information management — a discipline that encompasses the
entire life cycle of data, from how they are acquired and stored to how
they are organized, retrieved and maintained over time. That needs
to change: data management should be woven into every course in
science, as one of the foundations of knowledge.
■
A step too far?
a base on the Moon, then send them to Mars. This idea immediately
set off a debate that is still continuing, in which sceptics ask whether
there is any point in returning to the Moon nearly half a century
after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space
beyond Earth?
Yet that debate is both counter-productive — a new set of rockets
could go to all of these places — and moot, because Bush’s vision
never attracted the hoped-for budget increases. Indeed, a blue-riband
commission reporting to US President Barack Obama this week (see
page 153) finds the organizational malaise unchanged: NASA is still
doing too much with too little. Without more money, the agency won’t
be sending people anywhere beyond the International Space Station,
which resides in low Earth orbit only 350 kilometres up. And even the
ability to do that is in question: Ares I, the US rocket that would return
Research cannot flourish if data are not preserved and made
accessible. All concerned must act accordingly.
DATA
The Obama administration must fund human space
flight adequately, or stop speaking of ‘exploration’.
A
fter the space shuttle Columbia burned up during re-entry
into Earth’s atmosphere in 2003, the board that was convened
to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed
out, the shuttle programme had been trying to do too much with
too little money. NASA desperately needed a clearer vision and a
better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply
that vision with a new long-term goal: first send astronauts to build
145
145-146 Editorials WF IF.indd 145
8/9/09 14:06:40
Silver Bullet?
http://on.wsj.com/XCajtB
10. www.nature.com/nature
Data’s shameful neglect
Vol 461 | Issue no. 7261 | 10 September 2009
Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
M
ore and more often these days, a research project’s success is
measured not just by the publications it produces, but also by
the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful
such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed
in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional
and cultural frameworks required to support such open data access
(see pages 168 and 171) — leading to a scandalous shortfall in the
sharing of data by researchers (see page 160). This deficiency urgently
needs to be addressed by funders, universities and the researchers
themselves.
Research funding agencies need to recognize that preservation of
and access to digital data are central to their mission, and need to
be supported accordingly. Organizations in the United Kingdom,
for instance, have made a good start. The Joint Information Systems
Committee, established by the seven UK research councils in 1993,
has made data-sharing a priority, and has helped to establish a Digital
Curation Centre, headquartered at the University of Edinburgh, to be
a national focus for research and development into data issues. Other
European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a
29-member Interagency Working Group on Digital Data has been
trying to get US funding agencies to develop plans for how they will
support data archiving — and just as importantly, to develop policies
on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back.
They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure
that they support not only the hardware needed to store the data, but
also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline
the tedious process of annotating data with a description of what the
bits mean, which instrument collected them, which algorithms have
been used to process them and so on — information that is essential
if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and
combined in unanticipated ways, is software that can keep track of
which pieces of data came from whom. Such systems are essential if
tenure and promotion committees are ever to give credit — as they
should — to candidates’ track-record of
“Data management
data contribution.
Who should host these data? Agencies should be woven
and the research community together into every course in
need to create the digital equivalent science.”
of libraries: institutions that can take
responsibility for preserving digital data and making them accessible
over the long term. The university research libraries themselves are
obvious candidates to assume this role. But whoever takes it on, data
preservation will require robust, long-term funding. One potentially
helpful initiative is the US National Science Foundation’s DataNet
programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a
vigorous programme of education and outreach about data. Consider,
for example, that most university science students get a reasonably
good grounding in statistics. But their studies rarely include anything
about information management — a discipline that encompasses the
entire life cycle of data, from how they are acquired and stored to how
they are organized, retrieved and maintained over time. That needs
to change: data management should be woven into every course in
science, as one of the foundations of knowledge.
■
A step too far?
a base on the Moon, then send them to Mars. This idea immediately
set off a debate that is still continuing, in which sceptics ask whether
there is any point in returning to the Moon nearly half a century
after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space
beyond Earth?
Yet that debate is both counter-productive — a new set of rockets
could go to all of these places — and moot, because Bush’s vision
never attracted the hoped-for budget increases. Indeed, a blue-riband
commission reporting to US President Barack Obama this week (see
page 153) finds the organizational malaise unchanged: NASA is still
doing too much with too little. Without more money, the agency won’t
be sending people anywhere beyond the International Space Station,
which resides in low Earth orbit only 350 kilometres up. And even the
ability to do that is in question: Ares I, the US rocket that would return
Research cannot flourish if data are not preserved and made
accessible. All concerned must act accordingly.
DATA
The Obama administration must fund human space
flight adequately, or stop speaking of ‘exploration’.
A
fter the space shuttle Columbia burned up during re-entry
into Earth’s atmosphere in 2003, the board that was convened
to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed
out, the shuttle programme had been trying to do too much with
too little money. NASA desperately needed a clearer vision and a
better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply
that vision with a new long-term goal: first send astronauts to build
145
145-146 Editorials WF IF.indd 145
8/9/09 14:06:40
Silver Bullet?
http://on.wsj.com/XCajtB
11. Repository Services
•
•
•
•
•
Data is easy to upload
Landing page for data
Citable reference for data
Default licensing options
Guarantees for long term archival
13. Data is the Bottleneck
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble†
⇤ Ontology
Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
e
of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
† School
Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.
I. I NTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
artifact, and as such once developed and tested, it can be
shared and exchanged between scientists. Other scientists can
then reuse existing workflows in their experiments, e.g., as
sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and
improves quality through shared workflow development by
leveraging the expertise of previous users. Users can also
re-purpose existing workflows to adapt them to their needs
[4]. Emerging workflow repositories such as myExperiment
[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objectives:
1) To reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.
In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
that have to do with scheduling and mapping of workflows
onto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providing
related work in Section II, which is followed in Section III by
brief background information on Scientific Workflows, and the
two systems that were subject to our analysis. Afterwards we
describe the dataset and the general approach of our analysis.
We present the detected scientific workflow motifs in Section
IV and we highlight the main features of their distribution
14. Data is the Bottleneck
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble†
⇤ Ontology
Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
e
of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
† School
Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.
I. I NTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
artifact, and as such once developed and tested, it can be
shared and exchanged between scientists. Other scientists can
then reuse existing workflows in their experiments, e.g., as
sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and
improves quality through shared workflow development by
leveraging the expertise of previous users. Users can also
re-purpose existing workflows to adapt them to their needs
[4]. Emerging workflow repositories such as myExperiment
[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objectives:
1) To reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.
In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
that have to do with scheduling and mapping of workflows
onto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providing
related work in Section II, which is followed in Section III by
brief background information on Scientific Workflows, and the
two systems that were subject to our analysis. Afterwards we
describe the dataset and the general approach of our analysis.
We present the detected scientific workflow motifs in Section
IV and we highlight the main features of their distribution
Data-Oriented Motifs per Domain
Fig. 3.
Distribution of Data-Oriented Motifs per domain
15. Data is the Bottleneck
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble†
⇤ Ontology
Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
e
of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
† School
Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.
Fig. 3.
[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objectives:
Distribution of Data-Orientedpractices in work- domain
1) To reverse-engineer the set of current Motifs per
I. I NTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
artifact, and as such once developed and tested, it can be
shared and exchanged between scientists. Other scientists can
then reuse existing workflows in their experiments, e.g., as
sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and
improves quality through shared workflow development by
leveraging the expertise of previous users. Users can also
re-purpose existing workflows to adapt them to their needs
[4]. Emerging workflow repositories such as myExperiment
flow development through an analysis of empirical evidence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.
In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
that have to do with scheduling and mapping of workflows
onto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providing
related work in Section II, which is followed in Section III by
brief background information on Scientific Workflows, and the
two systems that were subject to our analysis. Afterwards we
describe the dataset and the general approach of our analysis.
We present the detected scientific workflow motifs in Section
IV and we highlight the main features of their distribution
Fig. 5.
Fig. 3.
Data-Preparation Motifs per Domain
Data-Oriented Motifs per Domain
Data Preparation Motifs in the Genomics Wo
Distribution of Data-Oriented Motifs per domain
17. Make Data Flourish
From data to information to knowledge
Global identification of
data sets and data items
Data uses a common syntax
Papers explicitly
link to data
Metadata expressed using
shared vocabularies
Capture the processes by
which data is manipulated
Track and publish explicit
provenance information
18. Make Data Flourish
From data to information to knowledge
Global identification of
data sets and data items
Metadata expressed using
shared vocabularies
Capture the processes by
"Someone who is not the person who collected the data can
which data is
Data uses a common syntax experiment and data" - Shreejoy Tripathy manipulated
understand the
Papers explicitly
link to data
Track and publish explicit
provenance information
19. Linked Data
•
•
•
•
•
Use existing Web infrastructure
Everything gets a URI and usually a category
Express typed relations between things (triples)
Express sameness or difference
Reuse identifiers as much as possible
+
=
20. Salah, Alkim Almila Akdag, Cheng Gao, Krzysztof Suchecki, and Andrea Scharnhorst. 2012. “Need to Categorize: A Comparative Look at the Categories of Universal
Decimal Classification System and Wikipedia.” Leonardo 45 (1) (February): 84-85. doi:10.1162/LEON_a_00344. (Preprint http://arxiv.org/abs/1105.5912v1)
21. Linked Data for Science
Neuroscience Information Framework
(Ontologies, Semantic Wiki, Catalog)
Nanopublications
(small scientific assertions)
Workflow Systems
(WINGS, Taverna, …)
Linked Science
(tools)
BioPortal
(ontologies)
Organic Data Publishing
Rightfield
(Semantic Wiki)
(systems biology)
Bio2RDF
(big linked data)
23. Hellenic
FBD
Hellenic
PD
Crime
Reports
UK
Ox
Points
NHS
(EnAKTing)
Ren.
Energy
Generators
Open
Election
Data
Project
EU
Institutions
CO2
Emission
(EnAKTing)
Energy
(EnAKTing)
EEA
Mortality
(EnAKTing)
Ordnance
Survey
legislation
data.gov.uk
UK Postcodes
ESD
standards
ISTAT
Immigration
Lichfield
Spending
Scotland
Pupils &
Exams
Traffic
Scotland
Data
Gov.ie
reference
data.gov.
uk
London
Gazette
TWC LOGD
Eurostat
(FUB)
CORDIS
CORDIS
(FUB)
(RKB
Explorer)
Linked
EDGAR
(Ontology
Central)
EURES
(Ontology
Central)
GovTrack
Finnish
Municipalities
New
York
Times
Italian
public
schools
IdRef
Sudoc
Greek
DBpedia
Geo
Names
World
Factbook
Geo
Species
UMBEL
Freebase
DBLP
(FU
Berlin)
dataopenac-uk
TCM
Gene
DIT
Daily
Med
SIDER
Twarql
EUNIS
PDB
SMC
Journals
Ocean
Drilling
Codices
Turismo
de
Zaragoza
Janus
AMP
Climbing
Linked
GeoData
Alpine
Ski
Austria
AEMET
Metoffice
Weather
Forecasts
Yahoo!
Geo
Planet
National
Radioactivity
JP
ChEMBL
Open
Data
Thesaurus
Sears
DBLP
(RKB
Explorer)
STW
GESIS
Budapest
Pisa
RESEX
Scholarometer
IRIT
ACM
NVD
IBM
DEPLOY
Newcastle
RAE2001
LOCAH
Roma
CiteSeer
Courseware
dotAC
ePrints
IEEE
RISKS
PROSITE
Affymetrix
SISVU
GEMET
Airports
lobid
Organisations
ECS
(RKB
Explorer)
HGNC
(Bio2RDF)
PubMed
ProDom
VIVO
Cornell
STITCH
Linked
Open
Colors
SGD
Gene
Ontology
AGROV
OC
Product
DB
Weather
Stations
Swedish
Open
Cultural
Heritage
LAAS
NSF
KISTI
JISC
WordNet
(RKB
Explorer)
EARTh
ECS
Southampton
EPrints
VIVO
Indiana
UniProt
LODE
WordNet
(W3C)
Wiki
ECS
Southampton
Pfam
LinkedCT
Taxono
my
Cornetto
NSZL
Catalog
P20
Eurécom
totl.net
WordNet
(VUA)
lobid
Resources
UN/
LOCODE
Drug
Bank
Enipedia
Lexvo
DBLP
(L3S)
ERA
Diseasome
lingvoj
Europeana
Deutsche
Biographie
OAI
data
dcs
Uberblic
YAGO
Open
Cyc
BibBase
OS
dbpedia
lite
Norwegian
MeSH
VIAF
UB
Mannheim
Ulm
data
bnf.fr
BNB
Project
Gutenberg
Rådata
nå!
GND
ndlna
Calames
DDC
iServe
riese
GeoWord
Net
El
Viajero
Tourism
URI
Burner
LIBRIS
LCSH
MARC
Codes
List
PSH
RDF
Book
Mashup
Open
Calais
ntnusc
Thesaurus W
SW
Dog
Food
Portuguese
DBpedia
LEM
RAMEAU
SH
LinkedL
CCN
Sudoc
UniProt
US Census
(rdfabout)
Piedmont
Accomodations
Linked
MDB
t4gm
info
Open
Library
(Talis)
theses.
fr
my
Experiment
flickr
wrappr
NDL
subjects
Plymouth
Reading
Lists
Revyu
Fishes
of Texas
(rdfabout)
Scotland
Geography
Pokedex
Event
Media
US SEC
Semantic
XBRL
FTS
Goodwin
Family
NTU
Resource
Lists
Open
Library
SSW
Thesaur
us
Didactal
ia
DBpedia
Linked
Sensor Data
(Kno.e.sis)
Eurostat
Chronicling
America
Telegraphis
Geo
Linked
Data
Source Code
Ecosystem
Linked Data
semantic
web.org
BBC
Music
BBC
Wildlife
Finder
NASA
(Data
Incubator)
transport
data.gov.
uk
Eurostat
Classical
(DB
Tune)
Taxon
Concept
LOIUS
Poképédia
St.
Andrews
Resource
Lists
Manchester
Reading
Lists
gnoss
Last.FM
(rdfize)
BBC
Program
mes
Rechtspraak.
nl
Openly
Local
data.gov.uk
intervals
Music
Brainz
(DBTune)
Jamendo
(DBtune)
Ontos
News
Portal
Sussex
Reading
Lists
Bricklink
yovisto
Semantic
Tweet
Linked
Crunchbase
RDF
ohloh
(Data
Incubator)
(DBTune)
OpenEI
statistics
data.gov.
uk
GovWILD
Brazilian
Politicians
educatio
n.data.g
ov.uk
Music
Brainz
(zitgist)
Discogs
FanHubz
patents
data.go
v.uk
research
data.gov.
uk
Klappstuhlclub
Lotico
(Data
Incubator)
Last.FM
artists
Population (EnAKTing)
reegle
Surge
Radio
tags2con
delicious
Slideshare
2RDF
(DBTune)
Music
Brainz
John
Peel
(DBTune)
EUTC
Productions
business
data.gov.
uk
Crime
(EnAKTing)
GTAA
Magnatune
DB
Tropes
Moseley
Folk
Linked
User
Feedback
LOV
Audio
Scrobbler
OMIM
MGI
InterPro
Smart
Link
Product
Types
Ontology
Open
Corporates
Italian
Museums
Amsterdam
Museum
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
UniParc
UniRef
UniSTS
GeneID
Linked
Open
Numbers
Reactome
OGOLOD
KEGG
Pathway
Medi
Care
Google
Art
wrapper
meducator
KEGG
Drug
Pub
Chem
UniPath
way
Chem2
Bio2RDF
Homolo
Gene
VIVO UF
ECCOTCP
bible
ontology
KEGG
Enzyme
PBAC
KEGG
Reaction
KEGG
Compound
KEGG
Glycan
Media
Geographic
Publications
User-generated content
Government
Cross-domain
Life sciences
As of September 2011
24. Eurostat
Finnish
Municipalities
0
(rdfabout)
Scotland
Geography
US Census
(rdfabout)
GeoWord
Net
Piedmont
Accomodations
Italian
public
schools
El
Viajero
Tourism
Greek
DBpedia
World
Factbook
Geo
Species
UMBEL
Freebase
Project
Gutenberg
dbpedia
lite
DBLP
(FU
Berlin)
dataopenac-uk
TCM
Gene
DIT
Daily
Med
SIDER
SMC
Journals
Ocean
Drilling
Codices
Turismo
de
Zaragoza
Janus
AMP
EUNIS
Climbing
Twarql
Linked
GeoData
WordNet
(W3C)
Alpine
Ski
Austria
AEMET
Metoffice
Weather
Forecasts
WordNet
(RKB
Explorer)
UniProt
(Bio2RDF)
Affymetrix
SISVU
GEMET
ChEMBL
Open
Data
Thesaurus
Product
DB
Airports
National
Radioactivity
JP
LODE
Taxono
my
Sears
Linked
Open
Colors
PDB
PROSITE
Open
Corporates
Italian
Museums
PubMed
MGI
InterPro
Amsterdam
Museum
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
UniRef
HGNC
SGD
Gene
Ontology
OMIM
UniParc
UniSTS
Linked
Open
Numbers
Reactome
OGOLOD
Pub
Chem
GeneID
ECS
Southampton
EPrints
lobid
Organisations
ECS
(RKB
Explorer)
DBLP
(RKB
Explorer)
UniPath
way
Chem2
Bio2RDF
Swedish
Open
Cultural
Heritage
STW
GESIS
Budapest
Pisa
RESEX
Scholarometer
IRIT
ACM
NVD
IBM
DEPLOY
Newcastle
RAE2001
LOCAH
Roma
CiteSeer
Courseware
KEGG
Drug
KEGG
Pathway
Homolo
Gene
dotAC
ePrints
LAAS
NSF
KISTI
JISC
VIVO UF
ECCOTCP
bible
ontology
KEGG
Enzyme
PBAC
KEGG
Reaction
KEGG
Compound
IEEE
RISKS
VIVO
Cornell
STITCH
Medi
Care
Google
Art
wrapper
meducator
Wiki
ECS
Southampton
VIVO
Indiana
ProDom
Smart
Link
Product
Types
Ontology
NSZL
Catalog
Pfam
LinkedCT
AGROV
OC
EARTh
Weather
Stations
Yahoo!
Geo
Planet
Cornetto
lobid
Resources
P20
Eurécom
totl.net
WordNet
(VUA)
Ulm
UN/
LOCODE
Drug
Bank
Enipedia
Lexvo
DBLP
(L3S)
ERA
Diseasome
lingvoj
Europeana
Deutsche
Biographie
OAI
data
dcs
Uberblic
YAGO
Open
Cyc
BibBase
OS
VIAF
UB
Mannheim
Calames
BNB
UniProt
US SEC
Semantic
XBRL
FTS
Geo
Names
riese
8 okt. 2007
Linked
EDGAR
(Ontology
Central)
EURES
(Ontology
Central)
GovTrack
URI
Burner
Norwegian
MeSH
GND
ndlna
data
bnf.fr
iServe
Fishes
of Texas
Linked
Sensor Data
(Kno.e.sis)
Eurostat
1 mei 2007
CORDIS
(FUB)
(RKB
Explorer)
IdRef
Sudoc
DDC
Open
Calais
Rådata
nå!
PSH
RDF
Book
Mashup
DBpedia
Geo
Linked
Data
CORDIS
New
York
Times
LIBRIS
LCSH
MARC
Codes
List
Sudoc
SW
Dog
Food
Portuguese
DBpedia
ntnusc
Thesaurus W
23 feb. 2012
TWC LOGD
Eurostat
(FUB)
Event
Media
LEM
RAMEAU
SH
LinkedL
CCN
14 jul. 2009
Data
Gov.ie
100
London
Gazette
NASA
(Data
Incubator)
transport
data.gov.
uk
Linked
MDB
27 mrt. 2009
Traffic
Scotland
data.gov.uk
intervals
flickr
wrappr
t4gm
info
Open
Library
(Talis)
theses.
fr
my
Experiment
5 mrt. 2009
Scotland
Pupils &
Exams
reference
data.gov.
uk
Pokedex
NDL
subjects
Plymouth
Reading
Lists
Revyu
Taxon
Concept
LOIUS
Chronicling
America
Telegraphis
200
Goodwin
Family
NTU
Resource
Lists
Open
Library
SSW
Thesaur
us
semantic
web.org
BBC
Music
BBC
Wildlife
Finder
Rechtspraak.
nl
Openly
Local
Classical
(DB
Tune)
Source Code
Ecosystem
Linked Data
Didactal
ia
18 sep. 2008
ISTAT
Immigration
Lichfield
Spending
OpenEI
statistics
data.gov.
uk
GovWILD
ESD
standards
educatio
n.data.g
ov.uk
Ordnance
Survey
legislation
data.gov.uk
UK Postcodes
Brazilian
Politicians
300
Poképédia
Last.FM
(rdfize)
BBC
Program
mes
Ontos
News
Portal
Manchester
Reading
Lists
gnoss
31 mrt. 2008
Open
Election
Data
Project
EU
Institutions
CO2
Emission
(EnAKTing)
Energy
(EnAKTing)
EEA
Mortality
(EnAKTing)
Jamendo
(DBtune)
28 feb. 2008
Ren.
Energy
Generators
(DBTune)
patents
data.go
v.uk
research
data.gov.
uk
Music
Brainz
(DBTune)
FanHubz
Last.FM
artists
Population (EnAKTing)
NHS
(EnAKTing)
(Data
Incubator)
yovisto
Semantic
Tweet
Linked
Crunchbase
RDF
ohloh
Discogs
10 nov. 2007
Ox
Points
reegle
business
data.gov.
uk
Crime
(EnAKTing)
Surge
Radio
Music
Brainz
(zitgist)
(Data
Incubator)
7 nov. 2007
Crime
Reports
UK
400
Lotico
St.
Andrews
Resource
Lists
19 sep. 2011
Hellenic
PD
EUTC
Productions
Klappstuhlclub
Sussex
Reading
Lists
Bricklink
(DBTune)
Music
Brainz
John
Peel
(DBTune)
tags2con
delicious
Slideshare
2RDF
22 sep. 2010
Hellenic
FBD
GTAA
Magnatune
DB
Tropes
Moseley
Folk
Linked
User
Feedback
LOV
Audio
Scrobbler
KEGG
Glycan
Media
Geographic
Publications
User-generated content
Government
Cross-domain
Life sciences
As of September 2011
28. An Ambient Agent Model for Monitoring and
Analysing Dynamics of Complex Human
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
29. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model for Monitoring and
Analysing Dynamics of Complex Human
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
30. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
31. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
32. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*
a
a
a
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
33. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*
a
a
a
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
34. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*
a
a
a
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
“And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
presents
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
35. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*
a
a
a
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
“And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
presents
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni“Noo! You’ll need persistent Cool URI’s and publish your endpoint
toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
for eternity of course. Duh.”
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics
Journal of Ambient Intelligence and Smart Environments
36. “Whoah! Cool, you should publish that stuff as Linked Data”
An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*
a
a
a
a
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands
“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
“And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
presents
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni“Noo! You’ll need persistent Cool URI’s and publish your endpoint
toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
for eternity of course. Duh.”
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
“Eh?”
Keywords: ambient agent model, human behaviour, dynamics
“Oh... and don’t forget all data collected by the agents, in all runs,
including the first experiments. Now THAT would be ultra cool.
“Ngh!?”
Journal of Ambient Intelligence and Smart Environments
41. We need to make publishing Linked Research Data...
...a lot easier...
... more persistent ...
... and more rewarding.
Linked Data is sóóóóó 2005
42. We need to make publishing Linked Research Data...
...a lot easier...
... more persistent ...
... and more rewarding.
“People as frontier in computing” - Haym Hirsch, Pietro Michelucci
43. We need to make publishing Linked Research Data...
...a lot easier...
... more persistent ...
... and more rewarding.
http://linkitup.data2semantics.org
44. We need to make publishing Linked Research Data...
...a lot easier...
... more persistent ...
•
•
•
•
•
•
... and more rewarding.
Lightweight web application
Interface to API of existing data repositories
Enrich metadata by linking to (linked) data resources
Human in the Loop
Track provenance
Publish rich metadata as new data publication
Nanopublication + OA
+ PROV-O + DCTerms + FOAF
http://linkitup.data2semantics.org
45. We need to make publishing Linked Research Data...
...a lot easier...
... more persistent ...
•
•
•
•
•
•
... and more rewarding.
Lightweight web application
Interface to API of existing data repositories
Enrich metadata by linking to (linked) data resources
Human in the Loop
Track provenance
Publish rich metadata as new data publication
Nanopublication + OA
+ PROV-O + DCTerms + FOAF
http://linkitup.data2semantics.org
46.
47.
48.
49. Use tags & categories to query the DBpedia endpoint
58. Plugins
Name
DBLP
ORCID
LinkedLifeData
Crossref
Elsevier LDR
DANS EASY
SameAs
DBPedia Spotlight
DBPedia/Wikipedia
NeuroLex
NIF Registry
your
Service
SPARQL
REST
REST
Custom
REST
Custom
REST
REST
SPARQL
SPARQL
REST
data
Source
Authors
Authors
Tags & Categories
Citations
Tags & Categories
Tags & Categories
Links
Description, Tags &
Categories
Tags & Categories
Tags & Categories
Tags & Categories
set
Links to
Author Identifiers
Author Identifiers
Biomedical Entities
DOIs
Funding agencies
General Datasets
General Entities
General Entities
General Entities
Neuroscience Concepts
Neuroscience Datasets
here
59. What does this solve?
http://linkeddatabook.com
•
•
•
•
•
•
•
•
Decide on resources to describe
Mint cool URIs
Decide on triples to include
Describe the dataset
Choose vocabularies
Define terms
Make links
Publish to triple store/annotations/dump
60. What does this solve?
http://linkeddatabook.com
•
•
•
•
•
•
•
•
You decide on resources to describe
We mint cool URIs
We decide on triples to include
We describe the dataset
We choose vocabularies
We define terms
Together we make links
We publish the dataset to a reliable repository
61. Coming up…
•
•
•
•
•
•
Publish directly from Dropbox, Github, …
Reconstruct provenance information (http://git2prov.org)
Analyze, convert and enrich on the fly
Generate a data report for advertisement purposes
Measure for information content of datasets (“D-Index”)
Integrate a data dashboard
62. 84
70
12
22
30
HTTP
11
Other
6
No URL provided
0
XML
35
Unknown response
… enhancing the data publication…
105
Connection reset
http://linkitup.data2semantics.org
134
Not RDF
linkitup
140
… increasing findability …
… boosting reusability …
… result is stored persistently
http://git2prov.org
http://semweb.cs.vu.nl/provoviz
http://yasgui.data2semantics.org
http://www.data2semantics.org