Domain Semantics or:How I Learned to Stop Worrying and Love the Ontology Michael Lang Jr. Director of Ontology Services Revelytix, Inc.
Software TodaySiloed Information Management - Good ● Drives nearly all day-to-day operations of any business ● Optimized for transactions - ACID, CRUDDistributed Information Management - Bad ● Massive amounts of data generated ● Operations mindset assumes single application context ○ Stovepipes and silos ● Managing distributed information is extremely difficult ○ Analysis!
Use Cases● Pharma ○ Drug Pipeline Management● Department of Defense ○ Enterprise Information Web● Financial Services ○ Back-office Trade Data Analysis
Pharma - Drug Pipeline Management ● Long, expensive development time ● Low success rate
Pharma - Drug Pipeline Management ● Better ability to analyze data improves success rate ● Any increase in success rate of drugs in pipeline can represent huge ROI ○ Kill one drug in Phase 1 instead of Phase 3, save $1 Billion
DOD - Business Mission Area● Services own and operate their own systems● Numerous OSD-level reporting requirements (LAWS!) ○ DOD is not audit-able (ILLEGAL!)
Financial ServicesBack-office Trade Data Analysis ● Many compliance regulations written by national and global bodies which change often ● Relatively small IT ecosystem...still cant meet compliance reporting requirements
Financial ServicesBack-office Trade Data Analysis
Financial ServicesBack-office Trade Data Analysis ● Inability to meet compliance reporting requirements means regulations are impossible to enforce ● No regulations means trouble... ○ 2008 Financial crisis
Problem Summary● Most major business operations involve many different groups of people ○ Different Roles ○ Different Organizations ○ Different Companies● Different groups use different systems ○ Systems are built in silos● Many different sets of semantics● Many different schemas Analyzing and sharing data is difficult
Distributed Information ManagementEnterprises require a new paradigm of information technologywhere distributed information is assumed...SemanticsCapabilities include (some good buzzwords): ● Data Integration ○ Virtualization ○ Federation ● Data Quality ○ Provenance ○ Validation ● Data Discovery
Semantic Technology:The Ground Floor Anyone can say Anything about AnythingStandards ● URI - the universal identification scheme ○ URL - the universal location scheme ● RDF - the data model ● SPARQL - the query languageBenefits ● URIs give universal identifiers (non-local) identifiers to things ● RDF is schema-less; extensibility is not an issue ● RDF-merge defines a standard way to combine disparate datasets ● SPARQL specification defines federation capabilities ● SPARQL operates over HTTP using URLs
Why Ontology?Remaining Challenges ● URIs/RDF/SPARQL take you a long way, but you are not home yet ● Distributed data is easy to combine and access, but difficult to interpret ○ How do you know how to combine data? ○ How do you find they data you need? ○ How do you know what anything means?
Domain OntologyMachine and human readable description of a domain ● Expressed as RDF ○ Part of your data ○ Meta layers depend on your point of view; not your toolset ● Formal semantics ○ Define your vocabulary with precision ○ Infer new information ○ Detect data quality issues ● Layered Descriptions ○ Easily combine one type of description with another ■ Data model, Provenance, Architecture, Standards, Policies, Processes...anything
Data Integration, Federation,and Virtualization
Ontology Architecture● A collection of descriptions that are used to enable a specific set of analytic use cases ○ Enumerates the set of ontologies to be used ○ Defines the high-level structure and logical profile of individual ontologies ○ Defines relationships between ontologies● Not defined in a vacuum ○ Domain Ontology ○ Metadata Ontology ○ Executable Semantic Languages ○ R2RML, SPARQL, RIF ○ Tools! ■ Triple-stores, query engines, RDB2RDF translators, rule engines, existing applications, etc.
Data ProvenanceW3C Provenance Ontology in development byProvenance Working Group
Distributed Information Management● Integration, Virtualization, Federation, Quality, Provenance, Validation, Discovery● Semantic Technologies lay the foundation for a new paradigm ○ RDF, RDFS, OWL, SPARQL, R2RML, RIF, Provenance Ontology...● Tools are catching up● Domain Ontology makes sense of it all
Questions?See Revelytix.com for more information Thank You! Michael Lang Jr. email@example.com