Climbing the Learning Curve with Linked Data Open Government Data Camp 20-Oct-2011 Bernadette Hyland, CEO firstname.lastname@example.org Twitter @BernHylandWednesday, October 19, 2011Information overload, Impatient society, Change is the only constantSoftware is not valued by its usefulness ... but by its expected future value
• Linked Data is about publishing and consuming data using international data standards • Based on 20 year old idea • Goal is to solve organizational issues related to data silos, requirements for faster data integration and an environment of reduced IT budgetsWednesday, October 19, 2011Why am I speaking on Linked Data and sharing? I’m here in my role as the co-chair of W3C GLD WG.I’m also a long time entrepreneur in this space having founded companies that led to several of the mostwidely used Open Source projects for Linked Data, including Mulgara, OpenRDF/Sesame, the PURLs2.0 and Callimachus. I’ve authored chapters in two of these peer-reviewed books published by Springerwhich are available in hardcopy or for free, via the Web.
There is a Process Identify Model Name Describe Convert Publish MaintainWednesday, October 19, 2011Identify the data, model exemplar records -- what you are going to carry forward & what youare going to leave behind.Name all of the NOUNs. Turn the records into URIs.Next, describe RESOURCES with vocabularies.Write a script or process to convert from canonical form to RDF. Then publish. Maintain overtime.
Preparation 1. Leverage what exists • Request a copy of the logical and physical model of the database(s) • Obtain data extracts (i.e., databases and/or spreadsheets) or create data in a way that can be replicated.Wednesday, October 19, 2011 Linked Data modelers typically model two or three exemplar objects to begin the process. We ﬁgure out the relationships and identify how each object relates to the real world, initially drawing on a large white board or collaborative wiki site.
Model the data 2. Model data without context to allow for reuse and easier merging of data sets • Traditional DBAs organize data for speciﬁed Web services or applications. • With LD, application logic does not drive the data schema, concepts, etc.Wednesday, October 19, 2011 LD domain experts model data without context versus traditional modelers who typically organize data for speciﬁed Web services or applications. Application logic does not drive the data schema. Better enables data reuse and easier merging of data sets.
Model the data 3.Look for real world objects of interest (e.g., people, places, things, locations, etc.) and model them. • Investigate how others are already modeling similar or related data. • Look for duplication and normalize the data • Use common sense to decide whether or not to make linkWednesday, October 19, 2011 Linked Data modeling experts typically model two or three exemplar objects to begin the process. We ﬁgure out the relationships and identify how each object relates to the real world, initially drawing on a large white board or collaborative wiki site.
Model the data ... 4. Connect data from different sources and authoritative vocabularies (see list of popular vocabularies below). •Use URIs as names for your objectsWednesday, October 19, 2011 During the modeling process, donʼt think about how an application will use your data. Instead, focus on modeling real world things that are known about the data and how it is related to other objects. Take the time to understand the data and how the objects represented in the data are related to each other.
Model the data ... •Put aside immediate needs of any application •Don’t think about how an application will use your data •Do think about time and how the data will change over time.Wednesday, October 19, 2011 Focus on modeling real world things that are known about the data and how it is related to other objects. Take the time to understand the data and how the objects represented in the data are related to each other.
Convert, Publish & Maintain 5.Write a script or process to convert the data set repeatedly 6.Publish to the Web and announce it! (more details shortly) 7.Maintenance strategy (more details in the social contract at the end)Wednesday, October 19, 2011 1.Expect to be maintained in perpetuity 2.Do not encode the name of the department or agency currently deﬁning and naming a concept, as that may be re-assigned 3.Support a direct response, or redirect to department/agency servers
Take the plunge ... Be forgiving • Simplistic data models can still be useful • Better to make progress with something rather than do nothing because we cannot be comprehensive and completeWednesday, October 19, 2011Science still doesn’t have a good understanding of a gene. We have gene therapy yet wehaven’t agreed on a deﬁnition of a gene.We capture vast quantities of topographical data (USGS), yet scientists still debate themeaning of topographical elements. From the time we are young children, we use monosyllabic words to navigate trees and roads. If our parents said we cannot do anythingbecause we don’t have a perfect model of the world, we couldn’t have learned to navigate ourhome as toddlers.
Take an iterative approach 1. Review of modeling decisions 2. Review vocabularies chosen and developed 3. Modify/update data conversion scripts 4. Do a maintenance walk-through with real use cases 5. Show how to explore data with SPARQL and visualizations 6. Discuss a persistent identiﬁer strategy (think PURLs)Wednesday, October 19, 2011Iterate on this process in short sprints, two weeks at a time. Don’t be afraid to reviewmodeling decisions with SMEs. Review vocabulary choicesDo a maintenance walk through with actual use cases and ensure the team can carry forwardShow people their OWN DATA in visualization tools like Callimachus.
Wednesday, October 19, 2011We used two common RDF vocabulary description languages in our modeling for SRS: RDFSchema (RDFS) and Simple Knowledge Organization System (SKOS). RDFS is used to givelabels to objects, synonyms and substance lists. Human-readable comments were addedusing rdfs:comment property.
Possible Solutions for Data Management Roll your own three-tier Content Management System Wiki-based Linked Data Management SystemWednesday, October 19, 2011A few different possible solutions to the three challenges stated earlier
Content Management SystemsWednesday, October 19, 2011The big downside to 3 tier architecture is the upfront cost, as well as getting people to agree upfront on theschemaSo we then looked at CMS. These are systems that can be up and running the same day, however these systemsare architected to work well with primarily unstructured content.
Wednesday, October 19, 2011We have a strong heritage in FLOSS projects starting with the ﬁrst community supported RDFdatabase in 2003. We offered a commercial version used by the US defense communityprimarily, and in 2004 open sourced 80% into what became the Mulgara triple store and isused by institutions all over the world. OpenRDF and Sesame was led by Aduna.
Linked Data Management System Callimachus (kəәlĭməәkəәs) is a framework for data-driven applications based on Linked Data principles. Callimachus allows Web authors to quickly and easily create semantically-enabled Web applications.Wednesday, October 19, 2011Wiki Systems dont handle structured content well nor promulgate change well.A tool for Web 2.0 developers creating DATA RICH web sites was needed …We created Callimachus, a triples up & down solution (no mySQL under the covers). HIGHLY SCALABLE for real world use.Named for the father of Bibliography (The Pinakes) at the Great Library of Alexandria. Lived during 305-c. 240 BCE.He could not categorize his own work using Aristotles hierarchical system. He was the ﬁrst person who deﬁned the use case for LinkedData.
Wednesday, October 19, 2011Callimachus uses RDFa as a query langage; templates are parsed to build SPARQL from RDFamarkup and the query result set is returned to the Web page for human to read, or a machineto parse. This is very valuable and to our knowledge, there is no other solution available asFLOSS or commercially that compares to Callimachus at this time.
Wednesday, October 19, 2011Once we had the data modeled, validated with SMEs, we converted & loaded into Callimachus.We spent about 1 hour creating templates to view the data in Callimachus. So here is thepower of LOD in action -- Within one hour, we could view the data, navigate through the dataand verify the contents without being a DBA or Java developer!
Wednesday, October 19, 2011Callimachus’ forms driven interface allows authorized users to modify the underlying triplesin the database -- we are round tripping create/modify/delete to a triple store via a Webpage!
Wednesday, October 19, 2011 Note the ﬁxed name and added comment.
Wednesday, October 19, 2011 A history of changes is kept. Note the change to the name and the added comment, along with the time/date and name of the user who made the edit.
Wednesday, October 19, 2011Callimachus view page of the SRS, created in less than an hour. Someone with HTML, CSS andRDFa / SPARQL skills can create this type of page. No understanding of semantics, deep RDFknowledge is required.
Wednesday, October 19, 2011Notice the wiki like editing capabilities of a Callimachus page!
Web 2.0 developers can create data driven application with templates in hours Triples up & down (no mySQL under the covers) Wiki editing of content Access control Collaboration via Web Change tracking (history) Page/form TemplatesWednesday, October 19, 2011Callimachus is a great way to collaboratively manage your Linked DataMedia Wiki is to free text what Callimachus is to linked dataCallimachus uses a straight forward ACL for linked data
Join the Community Callimachus has benefited from 2+ years of corporate support We’re using it for real world Web applications in environmental protection, finance and healthcare We’d love to work with the publishing industry Open Source project Visit callimachusproject.org Join the discussionWednesday, October 19, 2011
@BernHyland Email. email@example.comWednesday, October 19, 2011
Next talk today @ 14:00 Sala I - “Linked Open Government Data Workshop” WHY SHARE AND WHO BENEFITS? Bernadette Hyland, co-chair W3C Government Linked Data Working Group http://purl.org/net/bhyland/why-share-2011-10Wednesday, October 19, 2011