Multidisciplinary engineer and entrepreneur David Wood discusses the reasons, approaches and success stories for structured data on the World Wide Web. Linked Data is placed in context with the rest of the Web and that context is used to suggest some areas ripe for entrepreneurial innovation.
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveKees van Bochove
In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
A National Network of Biomedical Research ExpertiseManinder Kahlon
Overview of status on a Clinical & Translational Science Award Consortium project co-lead by UCSF and Harvard to link biomedical experts across the country.
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveKees van Bochove
In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
A National Network of Biomedical Research ExpertiseManinder Kahlon
Overview of status on a Clinical & Translational Science Award Consortium project co-lead by UCSF and Harvard to link biomedical experts across the country.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
Alice: "What version of ChEMBL are we using?"
Bob: "Er…let me check. It's going to take a while, I'll get back to you."
This simple question took us the best part of a month to resolve and involved several individuals. Knowing the provenance of your data is essential, especially when using large complex systems that process multiple datasets.
The underlying issues of this simple question motivated us to improve the provenance data in the Open PHACTS project. We developed a guideline for dataset descriptions where the metadata is carried with the data. In this talk I will highlight the challenges we faced and give an overview of our metadata guidelines.
Presentation given to the W3C Semantic Web for Health Care and Life Sciences Interest Group on 14 January 2013.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Persistent Identifier Services and their Metadata by John Kunzedatascienceiqss
Persistent identifiers (Pids) provide machine-actionable links to data and metadata that are vital to APIs (application programming interfaces) for publishing and citation. APIs are essentially request/response patterns that use Pids to reference things and metadata to describe not only the things themselves, but also any actions requested or taken. As a result, metadata design and standardization is wedded to API design and enhancement. With Pids as nouns and metadata as adjectives and qualifiers, Pid services play a key role in API implementation.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Integration of research literature and data (InFoLiS)Philipp Zumstein
Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
Alice: "What version of ChEMBL are we using?"
Bob: "Er…let me check. It's going to take a while, I'll get back to you."
This simple question took us the best part of a month to resolve and involved several individuals. Knowing the provenance of your data is essential, especially when using large complex systems that process multiple datasets.
The underlying issues of this simple question motivated us to improve the provenance data in the Open PHACTS project. We developed a guideline for dataset descriptions where the metadata is carried with the data. In this talk I will highlight the challenges we faced and give an overview of our metadata guidelines.
Presentation given to the W3C Semantic Web for Health Care and Life Sciences Interest Group on 14 January 2013.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Persistent Identifier Services and their Metadata by John Kunzedatascienceiqss
Persistent identifiers (Pids) provide machine-actionable links to data and metadata that are vital to APIs (application programming interfaces) for publishing and citation. APIs are essentially request/response patterns that use Pids to reference things and metadata to describe not only the things themselves, but also any actions requested or taken. As a result, metadata design and standardization is wedded to API design and enhancement. With Pids as nouns and metadata as adjectives and qualifiers, Pid services play a key role in API implementation.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Integration of research literature and data (InFoLiS)Philipp Zumstein
Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
The following brief details the use of linked data to connect various high quality data sets produced by the U.S. Environmental Protection Agency. Linked data is an open standards way to publish and consume data. Using a linked data approach and the REST API, developers, scientists, and the public can more easily find, access and re-use authoritative data published by the EPA.
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open DataBernadette Hyland-Wood
The following is technical brief to U.S. EPA's Chief Data Scientist on open data information architecture, the use of Linked Data and the EPA Linked Data Management Service. The briefing was held in February 2016 and was educational in nature.
Adoption of the Linked Data Best Practices in Different Topical DomainsChris Bizer
Slides from the presentation of the following paper:
Max Schmachtenberg, Christian Bizer, Heiko Paulheim: Adoption of the Linked Data Best Practices in Different Topical Domains. 13th International Semantic Web Conference (ISWC2014) - RDB Track, pp. 245-260, Riva del Garda, Italy, October 2014.
Paper URL:
http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/SchmachtenbergBizerPaulheim-AdoptionOfLinkedDataBestPractices.pdf
Abstract:
The central idea of Linked Data is that data publishers support applications in discovering and integrating data by complying to a set of best practices in the areas of linking, vocabulary usage, and metadata provision. In 2011, the State of the LOD Cloud report analyzed the adoption of these best practices by linked datasets within different topical domains. The report was based on information that was provided by the dataset publishers themselves via the datahub.io Linked Data catalog. In this paper, we revisit and update the findings of the 2011 State of the LOD Cloud report based on a crawl of the Web of Linked Data conducted in April 2014. We analyze how the adoption of the different best practices has changed and present an overview of the linkage relationships between datasets in the form of an updated LOD cloud diagram, this time not based on information from dataset providers, but on data that can actually be retrieved by a Linked Data crawler. Among others, we find that the number of linked datasets has approximately doubled between 2011 and 2014, that there is increased agreement on common vocabularies for describing certain types of entities, and that provenance and license metadata is still rarely provided by the data sources.
Phil Bourne, Protein Data Bank; Data Publication Repositories; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
Keynote presentation of Martin Kaltenböck (LOD2 project, Semantic Web Company) at the Government Linked Data Workshop in the course of the OGD Camp 2011 in Warsaw, Poland: Putting the L in front: from Open Data to Linked Open Data
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
Bob Stanley, CEO, IO Informatics, explains the utility to RDF as a standard way of defining and redefining data that could have utility in managing life science information.
The webinar will be based on LODE-BD Recommendations - Linked Open Data (LOD)-enabled bibliographical data- which aims at providing bibliographic data providers of open repositories with a set of recommendations that will support the selection of appropriate encoding strategies for producing meaningful Linked Open Data (LOD)-enabled bibliographical data (LODE-BD).
Building an Intelligent Biobank to Power Research Decision-MakingDenodo
This presentation belongs to the workshop: "Building an Intelligent Biobank to Power Research Decision-Making", from ISBER 2015 Annual Meeting by Lori A. Ball (Chief Operating Officer, President of Integrated Client Solutions at BioStorage Technologies, Inc), Brian Brunner (Senior Manager, Clinical Practice at LabAnswer) and Suresh Chandrasekaran (Senior Vice President at Denodo).
The workshop cover three different topic areas:
- Research sample intelligence: the growing need for Global Data Integration (Biobank Sample and Data Stakeholders).
- Building a research data integration plan and cloud sourcing strategy (data integration).
- How data virtualization works and the value it delivers (a data virtualization introduction, solution portfolio and current customers in Life Sciences industry).
The biomedical R&D environment is increasingly dependent on data meta-analysis and bioinformatics to support research advancements. The integration of biorepository sample inventory data with biomarker and clinical research information has become a priority to R&D organizations. Therefore, a flexible IT system for managing sample collections, integrating sample data with clinical data and providing a data virtualization platform will enable the advancement of research studies. This workshop provides an overview of how sample data integration, virtualization and analytics can lead to more streamlined and unified sample intelligence to support global biobanking for future research.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data — the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
We’ll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next we’ll delve into the evolution of the U.S. Environmental Protection Agency’s Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the world’s largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
This talk will provide a means to discuss the capture, integration and dissemination of data across large enterprises. We will show how data variety is continuing to grow, meaning new data sources are steadily becoming available for use in analysis. Data veracity is also of importance since a large amount of data is fuzzy (uncertain) in nature. The ability to integrate these various data sources and provide improved capabilities to understand and use it is of increasing importance in today’s pharma climate. We call this Reference Master Data Management (RMDM).
This talk will span an arc of data lifecycle management, beginning with instrument data, moving across to clinical studies, production, regulatory affairs and finally e-archiving (see Fig. 1). I will show how these systems can use a common semantics for modeling of important metadata, which can apply the FAIR principles of Findability, Accessibility, Interoperability and Reusability to a common “semantic hub” that can connect data sources of different varieties across the enterprise. ADF files, for example, use their Data Description layer to provide semantic metadata about file contents. Similarly, semantics can be used to describe clinical trials data, regulatory data, etc., through to archiving, for improved storage and search over long periods of time.
US EPA Resource Conservation and Recovery Act published as Linked Open Data3 Round Stones
A presentation by 3 Round Stones to the US EPA on the new Linked Open Data Management System, including Linked Open Data on 4M facilities (from FRS), 25 years of Toxic Release Inventory (TRI), chemical substances (SRS), and Resource Conservation and Recovery Act (RCRA) content. This represents one of the largest Open Data projects published by a federal government agency using Open Source Software (OSS), Open Web Standards and government Open Data.
Briefing on US EPA Open Data Strategy using a Linked Data Approach3 Round Stones
An overview presented by Ms. Bernadette Hyland on 18-Nov 2014 on the US EPA Open Data strategy, focusing on the Resource Conservation & Recovery Act (RCRA) dataset to be published as linked data . This work is in support of Presidential Memorandum M13-13 - Open Data Policy and Managing Information as an Asset.
The W3C Data Shapes Working Group has been chartered in September 2014 to "Produce a language for defining structural constraints on RDF graphs and define graph topologies for interface specification, code development, and data verification." It will do for RDF what XML Schema did for XML.
This brief was presented as part of the RDF-AP Special Session at DCMI 2014, the Dublin Core Metadata Initiative Conference.
Open Data is the idea that "certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control”. Open Data follows similar “open” concepts that have proven to be valuable in the information economy such as Open Standards, Open Source Software, Open Content and has been followed more recently by variations on the theme such as Open Science and Open Government.
Open Data allows information of common value to be reused without needing to be recreated. The economic benefits of Open Data include cost reduction, organizational efficiencies and the facilitation of commonly held understanding. The costs of implementing Open Data deployment strategies tend to be iterative on top of existing information infrastructure.
This presentation will describe Open Data and its place in the ecosystem of economic and governmental discourse.
Lightning Talk SLIDES for Callimachus Enterprise by 3 Round Stones3 Round Stones
Lightning talk delivered by Bernadette Hyland, co-founder & CEO of 3 Round Stones at the Semantic Technology & Business Conference in San Jose California on 19-Aug-2014. Highlights how North and South American public sector, non-profits and private firms are using Callimachus to publish trusted, authoritative data on the Web. Callimachus features in-browser development features to rapidly create mobile first applications. Learn more at http://3RoundStones.com
Why Your Next Product Should be Semantic by Dr. David Wood3 Round Stones
David Wood, co-founder & CTO of 3 Round Stones, author & pioneer on Data Exchange Standards for the Web speaks at the 10th Anniversary Semantic Technology & Business Conference in San Jose California on 20-August 2014. Dr. Wood will describe how data is core to your organization's effectiveness and efficiency. He'll describe why and how to make your next product semantically-enhanced for increased speed to market & responsiveness to your customers.
Celebrating 10 years of the Semantic Technology Conference 20143 Round Stones
3 Round Stones is pleased to speak & exhibit at the premier event on NoSQL and semantic technologies the 10th Anniversary Semantic Technology & Business Conference in San Jose CA. This is what we're showing on how customers are using the leading Web application server for data driven, mobile first apps, Callimachus Enterprise. See us at booth #406 on 8/20 - 8/21/2014.
Enterprise & Scientific Data Interoperability Using Linked Data at the Health...3 Round Stones
Organizations are under pressure to collect, curate, integrate, analyze and act on increasing amounts of data from many sources in order to drive innovation. This 2 hour tutorial offered at the National Health Datapalooza on June 1, 2014 in Washington DC is for people who are both new and experienced in enterprise and scientific data sharing. Includes an overview of Linked Data, the open Web standard for publishing and consuming data, by Dr. David Wood, author of "Linked Data: Structured Data on the Web" (Manning, 2014).
Publishing Data on the Web presented to the DC/Virginia/Maryland Search Engine Marketing Meetup Group. This is a gentle intro into why and how public & private sector organizations are adding structured content to their Websites to improve data sharing, search engine optimization and drive data re-use.
Slides for a half-day tutorial on Callimachus Enterprise. Originally delivered at the Conference on Semantics in Healthcare and Life Sciences (CSHALS) 2014 in Boston, MA, USA.
Improving Scientific Information Sharing by Fostering Reuse - Presentation at...3 Round Stones
Most scientific developments are recorded in published papers and communicated via presentations. Scientific findings are presented within organizations, at conferences, via Webinars and other fora. Yet after delivery to an audience, important information is often left to wither on hard drives, document management systems and even the Web. Accessing the underlying data for scientific findings has been the Achilles Heel of researchers due to closed and proprietary systems. This presentation shows an alternative to sharing scientific information using a Linked Data approach.
Linked Data Overview - structured data on the web for US EPA 201402033 Round Stones
This presentation provides a Jargon-free overview of Linked Open Data. Linked Data is being used by the US EPA for US Government data publication. The Linked Data approach allows for an increased ability to combine data from multiple sources and decreased costs.
ORGpedia: The Open Organizational Data Project3 Round Stones
Funded by the Alfred P. Sloan Foundation, the OrgPedia project is developing a free, not-for-profit online directory based on open data about domestic and international, public and private companies.
The ORGpedia beta site make available and downloadable a rich tapestry of information including corporate owners of regulated facilities including nuclear power plants located in the US. ORGpedia uses open government data published by the U.S. EPA, U.S. Nuclear Regulatory Commission, and U.S. Securities and Exchange Commission, as well as, crowd-sourced content from sites including Open Street Maps and ORGpedia itself.
Linked Data: The Jargon-free Primer on Integrating Data on the Web3 Round Stones
Dr. David Wood and Ms. Bernadette Hyland delivered this jargon-free presentation at the National Health Datapalooza in Washington DC on how and why integrating data from the Web matters and why a Linked Data approach is relevant.
Delivering on Standards for Publishing Government Linked Data3 Round Stones
Progress report on publishing open government data using Open Web Standards. Delivered by Bernadette Hyland, co-chair W3C Government Linked Data Working Group at the European Data Forum 2013, Dublin, Ireland.
The Power of Linked Data for Government & Healthcare Information Integration3 Round Stones
Government open data strategies aimed at wider access and re-use by entrepreneurs, publishers and the wider US healthcare delivery industry. Presentation to the OMG Standards Community technical workshop on semantics, held in Reston VA on 20-March 2013. Presentation by Bernadette Hyland, CEO 3 Round Stones, Inc and co-chair W3C Government Linked Data Working Group.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
4. David Wood
RDF Database
RDF Database
Management
RDF Usage ongoing
Linked Data
Management
ongoing
company founded products disposition
2002
2005
@𝛑Plugged In Software
8. 40% annual growth in data produced
5% annual growth in IT spending
1.8 ZB
35 ZB
2012 2020
Digital Information Produced
294B
1 Trillion
2 Trillion
3 Trillion
4 Trillion
5 Trillion
Online Ad
Impressions
Emails Tweets
Daily (2013)
230M
4.8T
14. “The Web is the minimal concession to
hypertext that a sequence-and-hierarchy
chauvinist could possibly make.”
“HTML is precisely what we were trying to
PREVENT-- ever-breaking links, links
going outward only, quotes you can't
follow to their origins, no version
management, no rights management.”
“The "Browser" is an extremely silly
concept-- a window for looking sequentially
at a large parallel structure. It does not
show this structure in a useful way.”
29. New Data Requirements
• Global access
• Open format
• Record context
• to allow sharing
• to allow reuse
• Record provenance
30. Challenges
• Global access: Need to publish to the Web
• Open format: Most data currently bound
to proprietary tools/formats
• Context: Data often structured for
individual use without thought to sharing
• Provenance: Paradoxically easy given
solutions to the others
31. Linked Data on the Web
my data
collector
collected by
measurement
Michael
first name
Hausenblaslast name
Person
a
a measurement
2011-01-01
date
0
value
units of measure
degrees
Centigrade
...
Galway Airport
collected at
or
39. HTTP-accessible endpoints capable of returning XML or textual content
Convert XML or textual results to
RDF
Render RDF to HTML via template
User resolves a
single URI to an
Active PURL
Multiple targets queried
independently
1
David Wood1 and Tom Plasterer2
1david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com
Active PURLs for Clinical Study Aggregation
The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.
The solution: Gather, convert, aggregate and format for display
Challenges
Next steps
How semantic technologies help
3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus
Project, an Open Source management system for Linked Data.
Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enable
PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs.
Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company's
network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is
dynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.
Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable
versions of the data are also available.
Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.
Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed
enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing
researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.
Distributed queries have many known
limitations, such as the introduction of
multiple single points of failure in any
given PURL resolution. HTTP timeouts,
auth/auth errors or other network failures
can slow or stop a pipeline from returning
correctly.
Similarly, distributed queries can result
in variant query-time performance due to
complex network and endpoint perform-
ance variances.
Proactive caching and cache manage-
meant strategies can improve runtime
performance and protect end users from
the limitations inherent in a distributed
query architecture. Caching of
intermediate results from endpoints has
not yet been implemented.
References
User experience
Users resolve a URL that
provides a unique identifier for
a clinical study, drug, chemical
or other concept managed by
this system. The user may
be presented with the URL on
HTML pages, search it via full-
text techniques or discover it
via semantic search.
1
2 Users are presented with a
dynamically generated Web
page representing aggregated
clinical study information. Users
are isolated from the complex
and distributed information
environment.
40.
41. • Linked Data warehouses
10B USD annually
• Linked Data supply chains
205M USD annually (Web)
6B USD annually (enterprise)
• Linked Data analytics
16B USD annually
Your Opportunity?