The slides were presented in the Departmental Seminar at the Computer Science Department of the University of Oxford on 2015-11-10. It explains the principles of Datalog reasoning with RDFox and explains how HEDIS CDC was computed on 265K patients using this technology. It explains the HL7 RIM inspired data model and shows an example of an RDFox Datalog rule used to compute the results.
What is the fuzz on triple stores? Will triple stores eventually replace relational databases? This talk looks at the big picture, explains the technology and tries to look at the road ahead.
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version (http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html)
2014-09-09: Getty special session: short version (http://VladimirAlexiev.github.io/pres/20140905-CIDOC-GVP/GVP-LOD-CIDOC-short.pdf)
What is the fuzz on triple stores? Will triple stores eventually replace relational databases? This talk looks at the big picture, explains the technology and tries to look at the road ahead.
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version (http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html)
2014-09-09: Getty special session: short version (http://VladimirAlexiev.github.io/pres/20140905-CIDOC-GVP/GVP-LOD-CIDOC-short.pdf)
YesWorkflow: More Provenance Mileage from Scientific Workflows and Scripts. Keynote at WORKS 2015: Workshop
Workflows in Support of Large-Scale Science. Sunday Nov. 15, 2015, Austin, Texas.
Study after study show that data scientists spend 50-90 percent of their time gathering and preparing data. In many large organizations this problem is exacerbated by data being stored on a variety of systems, with different structures and architectures. Apache Drill is a relatively new tool which can help solve this difficult problem by allowing analysts and data scientists to query disparate datasets in-place using standard ANSI SQL without having to define complex schemata, or having to rebuild their entire data infrastructure. In this talk I will introduce the audience to Apache Drill—to include some hands-on exercises—and present a case study of how Drill can be used to query a variety of data sources. The presentation will cover:
* How to explore and merge data sets in different formats
* Using Drill to interact with other platforms such as Python and others
* Exploring data stored on different machines
A talk presented at an NSF Workshop on Data-Intensive Computing, July 30, 2009.
Extreme scripting and other adventures in data-intensive computing
Data analysis in many scientific laboratories is performed via a mix of standalone analysis programs, often written in languages such as Matlab or R, and shell scripts, used to coordinate multiple invocations of these programs. These programs and scripts all run against a shared file system that is used to store both experimental data and computational results.
While superficially messy, the flexibility and simplicity of this approach makes it highly popular and surprisingly effective. However, continued exponential growth in data volumes is leading to a crisis of sorts in many laboratories. Workstations and file servers, even local clusters and storage arrays, are no longer adequate. Users also struggle with the logistical challenges of managing growing numbers of files and computational tasks. In other words, they face the need to engage in data-intensive computing.
We describe the Swift project, an approach to this problem that seeks not to replace the scripting approach but to scale it, from the desktop to larger clusters and ultimately to supercomputers. Motivated by applications in the physical, biological, and social sciences, we have developed methods that allow for the specification of parallel scripts that operate on large amounts of data, and the efficient and reliable execution of those scripts on different computing systems. A particular focus of this work is on methods for implementing, in an efficient and scalable manner, the Posix file system semantics that underpin scripting applications. These methods have allowed us to run applications unchanged on workstations, clusters, infrastructure as a service ("cloud") systems, and supercomputers, and to scale applications from a single workstation to a 160,000-core supercomputer.
Swift is one of a variety of projects in the Computation Institute that seek individually and collectively to develop and apply software architectures and methods for data-intensive computing. Our investigations seek to treat data management and analysis as an end-to-end problem. Because interesting data often has its origins in multiple organizations, a full treatment must encompass not only data analysis but also issues of data discovery, access, and integration. Depending on context, data-intensive applications may have to compute on data at its source, move data to computing, operate on streaming data, or adopt some hybrid of these and other approaches.
Thus, our projects span a wide range, from software technologies (e.g., Swift, the Nimbus infrastructure as a service system, the GridFTP and DataKoa data movement and management systems, the Globus tools for service oriented science, the PVFS parallel file system) to application-oriented projects (e.g., text analysis in the biological sciences, metagenomic analysis, image analysis in neuroscience, information integration for health care applications, management of experimental data from X-ray sources, diffusion tensor imaging for computer aided diagnosis), and the creation and operation of national-scale infrastructures, including the Earth System Grid (ESG), cancer Biomedical Informatics Grid (caBIG), Biomedical Informatics Research Network (BIRN), TeraGrid, and Open Science Grid (OSG).
For more information, please see www.ci.uchicago/swift.
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
A talk given at the BioBankCloud conference in Feb 2015 about distributed computing in the contexts of genomics and health.
In this one, we exposed what results we obtained exploring the 1000genomes data using ADAM, followed by an introduction to our scalable GA4GH server implementation built using ADAM, Apache Spark and Play Framework 2.
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LF3pBA
This CloudxLab Introduction to Pig & Pig Latin tutorial helps you to understand Pig and Pig Latin in detail. Below are the topics covered in this tutorial:
1) Introduction to Pig
2) Why Do We Need Pig?
3) Pig - Usecases
4) Pig - Philosophy
5) Pig Latin - Data Flow Language
6) Pig - Local and MapReduce Mode
7) Pig Data Types
8) Load, Store, and Dump in Pig
9) Lazy Evaluation in Pig
10) Pig - Relational Operators - FOREACH, GROUP and FILTER
11) Hands-on on Pig - Calculate Average Dividend of NYSE
As Daniel Tiger wisely sings, "It's OK to make mistakes. Try to fix them, and learn from them too."
Come learn common mistakes developers make as they model their data in document databases. You'll leave this session ready to spot and correct common document database schema design anti-patterns.
Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
PhD defense talk about SPLENDID, a state-of-the-art implementation for efficient distributed SPARQL query processing on Linked Data using SPARQL endpoints and voiD descriptions.
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopGoDataDriven
When exploring very large raw datasets containing massive interconnected networks, it is sometimes helpful to extract your data, or a subset thereof, into a graph database like Neo4j. This allows you to easily explore and visualize networked data to discover meaningful patterns.
When your graph has 100M+ nodes and 1000M+ edges, using the regular Neo4j import tools will make the import very time-intensive (as in many hours to days).
In this talk, I’ll show you how we used Hadoop to scale the creation of very large Neo4j databases by distributing the load across a cluster and how we solved problems like creating sequential row ids and position-dependent records using a distributed framework like Hadoop.
Although animals do not use language, they are capable of many of the same kinds of cognition as us; much of our experience is at a non-verbal level.
Semantics is the bridge between surface forms used in language and what we do and experience.
Language understanding depends on world knowledge (i.e. “the pig is in the pen” vs. “the ink is in the pen”)
We might not be ready for executives to specify policies themselves, but we can make the process from specification to behavior more automated, linked to precise vocabulary, and more traceable.
Advances such as SVBR and an English serialization for ISO Common Logic means that executives and line workers can understand why the system does certain things, or verify that policies and regulations are implemented
YesWorkflow: More Provenance Mileage from Scientific Workflows and Scripts. Keynote at WORKS 2015: Workshop
Workflows in Support of Large-Scale Science. Sunday Nov. 15, 2015, Austin, Texas.
Study after study show that data scientists spend 50-90 percent of their time gathering and preparing data. In many large organizations this problem is exacerbated by data being stored on a variety of systems, with different structures and architectures. Apache Drill is a relatively new tool which can help solve this difficult problem by allowing analysts and data scientists to query disparate datasets in-place using standard ANSI SQL without having to define complex schemata, or having to rebuild their entire data infrastructure. In this talk I will introduce the audience to Apache Drill—to include some hands-on exercises—and present a case study of how Drill can be used to query a variety of data sources. The presentation will cover:
* How to explore and merge data sets in different formats
* Using Drill to interact with other platforms such as Python and others
* Exploring data stored on different machines
A talk presented at an NSF Workshop on Data-Intensive Computing, July 30, 2009.
Extreme scripting and other adventures in data-intensive computing
Data analysis in many scientific laboratories is performed via a mix of standalone analysis programs, often written in languages such as Matlab or R, and shell scripts, used to coordinate multiple invocations of these programs. These programs and scripts all run against a shared file system that is used to store both experimental data and computational results.
While superficially messy, the flexibility and simplicity of this approach makes it highly popular and surprisingly effective. However, continued exponential growth in data volumes is leading to a crisis of sorts in many laboratories. Workstations and file servers, even local clusters and storage arrays, are no longer adequate. Users also struggle with the logistical challenges of managing growing numbers of files and computational tasks. In other words, they face the need to engage in data-intensive computing.
We describe the Swift project, an approach to this problem that seeks not to replace the scripting approach but to scale it, from the desktop to larger clusters and ultimately to supercomputers. Motivated by applications in the physical, biological, and social sciences, we have developed methods that allow for the specification of parallel scripts that operate on large amounts of data, and the efficient and reliable execution of those scripts on different computing systems. A particular focus of this work is on methods for implementing, in an efficient and scalable manner, the Posix file system semantics that underpin scripting applications. These methods have allowed us to run applications unchanged on workstations, clusters, infrastructure as a service ("cloud") systems, and supercomputers, and to scale applications from a single workstation to a 160,000-core supercomputer.
Swift is one of a variety of projects in the Computation Institute that seek individually and collectively to develop and apply software architectures and methods for data-intensive computing. Our investigations seek to treat data management and analysis as an end-to-end problem. Because interesting data often has its origins in multiple organizations, a full treatment must encompass not only data analysis but also issues of data discovery, access, and integration. Depending on context, data-intensive applications may have to compute on data at its source, move data to computing, operate on streaming data, or adopt some hybrid of these and other approaches.
Thus, our projects span a wide range, from software technologies (e.g., Swift, the Nimbus infrastructure as a service system, the GridFTP and DataKoa data movement and management systems, the Globus tools for service oriented science, the PVFS parallel file system) to application-oriented projects (e.g., text analysis in the biological sciences, metagenomic analysis, image analysis in neuroscience, information integration for health care applications, management of experimental data from X-ray sources, diffusion tensor imaging for computer aided diagnosis), and the creation and operation of national-scale infrastructures, including the Earth System Grid (ESG), cancer Biomedical Informatics Grid (caBIG), Biomedical Informatics Research Network (BIRN), TeraGrid, and Open Science Grid (OSG).
For more information, please see www.ci.uchicago/swift.
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
A talk given at the BioBankCloud conference in Feb 2015 about distributed computing in the contexts of genomics and health.
In this one, we exposed what results we obtained exploring the 1000genomes data using ADAM, followed by an introduction to our scalable GA4GH server implementation built using ADAM, Apache Spark and Play Framework 2.
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LF3pBA
This CloudxLab Introduction to Pig & Pig Latin tutorial helps you to understand Pig and Pig Latin in detail. Below are the topics covered in this tutorial:
1) Introduction to Pig
2) Why Do We Need Pig?
3) Pig - Usecases
4) Pig - Philosophy
5) Pig Latin - Data Flow Language
6) Pig - Local and MapReduce Mode
7) Pig Data Types
8) Load, Store, and Dump in Pig
9) Lazy Evaluation in Pig
10) Pig - Relational Operators - FOREACH, GROUP and FILTER
11) Hands-on on Pig - Calculate Average Dividend of NYSE
As Daniel Tiger wisely sings, "It's OK to make mistakes. Try to fix them, and learn from them too."
Come learn common mistakes developers make as they model their data in document databases. You'll leave this session ready to spot and correct common document database schema design anti-patterns.
Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
PhD defense talk about SPLENDID, a state-of-the-art implementation for efficient distributed SPARQL query processing on Linked Data using SPARQL endpoints and voiD descriptions.
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopGoDataDriven
When exploring very large raw datasets containing massive interconnected networks, it is sometimes helpful to extract your data, or a subset thereof, into a graph database like Neo4j. This allows you to easily explore and visualize networked data to discover meaningful patterns.
When your graph has 100M+ nodes and 1000M+ edges, using the regular Neo4j import tools will make the import very time-intensive (as in many hours to days).
In this talk, I’ll show you how we used Hadoop to scale the creation of very large Neo4j databases by distributing the load across a cluster and how we solved problems like creating sequential row ids and position-dependent records using a distributed framework like Hadoop.
Although animals do not use language, they are capable of many of the same kinds of cognition as us; much of our experience is at a non-verbal level.
Semantics is the bridge between surface forms used in language and what we do and experience.
Language understanding depends on world knowledge (i.e. “the pig is in the pen” vs. “the ink is in the pen”)
We might not be ready for executives to specify policies themselves, but we can make the process from specification to behavior more automated, linked to precise vocabulary, and more traceable.
Advances such as SVBR and an English serialization for ISO Common Logic means that executives and line workers can understand why the system does certain things, or verify that policies and regulations are implemented
A practical guide on how to query and visualize Linked Open Data with eea.daviz Plone add-on.
In this presentation you will get an introduction to Linked Open Data and where it is applied. We will see how to query this large open data cloud over the web with the language SPARQL. We will then go through real examples and create interactive and live data visualizations with full data tracebility using eea.sparql and eea.daviz.
Presented at the PLOG2013 conference http://www.coactivate.org/projects/plog2013
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
Workshop within the Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).
We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.
In this talk I will show Visualbox, a "visualization server" based on LODSPeaKr that can make easy for non javascript experts to create simple but meaningful visualizations.
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
Presentation of an investigation into how Python's RDFLib and SQLAlchemy can be used to leverage PostgreSQL's capabilities to provide a persistent storage back-end for Graphs, and become the elusive practical RDF triple store for the Semantic Web (or simply help you export your data to someone who's expecting RDF)!
Talk presented at FOSDEM 2017 in Brussels on 04-05/02/2017. Practical & hands-on presentation with example code which is certainly not optimal ;)
Video:
MP4: http://video.fosdem.org/2017/H.1309/postgresql_semantic_web.mp4
WebM/VP8: http://ftp.osuosl.org/pub/fosdem/2017/H.1309/postgresql_semantic_web.vp8.webm
Data modeling is hard, especially in the world of distributed NoSQL stores. With relational databases, developers have tended to store normalized data and shape their query model around that structure. This can come back to bite you when it comes time to scale, as complex queries across dozens of tables begin to affect application performance. It’s common to find developers rethinking their data model as query latency increases under load.
With NoSQL stores, developers must consider their query patterns from the outset of application development, designing their data model to fit those patterns. A number of techniques, new and old, can be used to allow for maximum performance and scalability.
Topics covered will include: De-normalization, time boxing, conflict resolution, and convergent & commutative replicated data types. Additionally, discussions of common query patterns in light of the capabilities of various NoSQL data stores will be reviewed.
This is a lecture note #10 for my class of Graduate School of Yonsei University, Korea.
It describes SPARQL to retrieve and manipulate data stored in Resource Description Framework format
Enabling Exploratory Analysis of Large Data with Apache Spark and RDatabricks
R has evolved to become an ideal environment for exploratory data analysis. The language is highly flexible - there is an R package for almost any algorithm and the environment comes with integrated help and visualization. SparkR brings distributed computing and the ability to handle very large data to this list. SparkR is an R package distributed within Apache Spark. It exposes Spark DataFrames, which was inspired by R data.frames, to R. With Spark DataFrames, and Spark’s in-memory computing engine, R users can interactively analyze and explore terabyte size data sets.
In this webinar, Hossein will introduce SparkR and how it integrates the two worlds of Spark and R. He will demonstrate one of the most important use cases of SparkR: the exploratory analysis of very large data. Specifically, he will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.
The Semantic Web is about to grow up. By efforts such as the Linked Open Data initiative, we finally find ourselves at the edge of a Web of Data becoming reality. Standards such as OWL 2, RIF and SPARQL 1.1 shall allow us to reason with and ask complex structured queries on this data, but still they do not play together smoothly and robustly enough to cope with huge amounts of noisy Web data. In this talk, we discuss open challenges relating to querying and reasoning with Web data and raise the question: can the emerging Web of Data ever catch up with the now ubiquitous HTML Web?
One of the greatest benefits of Clojure is its ability to create simple, powerful abstractions that operate at the level of the problem while also operating at the level of the language.
This talk discusses a query processing engine built in Clojure that leverages this abstraction power to combine streams of data for efficient concurrent execution.
* Representing processing trees as s-expressions
* Streams as sequences of data
* Optimizing processing trees by manipulating s-expressions
* Direct execution of s-expression trees
* Compilation of s-expressions into nodes and pipes
* Concurrent processing nodes and pipes using a fork/join pool
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Semantic Web Technologies in Health Care Analytics
1. SEMANTIC WEB TECHNOLOGIES IN HEALTH CARE ANALYTICS
AN IMPACT SCENARIO FOR DATALOG REASONING WITH RDFOX
Robert Piro
Departmental Seminar
Robert Piro Semantic Web Technologies in Health Care 1/15
2. OVERVIEW
1 RDFOX
RDF
Datalog
2 PROJECT WITH KAISER PERMANENTE
HEDIS Measures for Diabetic Care
Data Model
Data Model as RDF Triples
The Datalog Rules
3 CONCLUSION & FUTURE WORK
Robert Piro Semantic Web Technologies in Health Care 2/15
3. RDFox
RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT
RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)
in memory RDF Triple Store — optimised indexing
parallel Datalog Reasoner — very good scalability
Robert Piro Semantic Web Technologies in Health Care 3/15
4. RDFox
RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT
RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)
in memory RDF Triple Store — optimised indexing
parallel Datalog Reasoner — very good scalability
FEATURES
load RDF data (Triples/Turtle)
materialise data — (extended) Datalog language
incremental reasoning / equality reasoning
query data — SPARQL query Language
Robert Piro Semantic Web Technologies in Health Care 3/15
5. RDFox
RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT
RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)
in memory RDF Triple Store — optimised indexing
parallel Datalog Reasoner — very good scalability
FEATURES
load RDF data (Triples/Turtle)
materialise data — (extended) Datalog language
incremental reasoning / equality reasoning
query data — SPARQL query Language
INTEGRATION
stand-alone C++ implementation / C++ library
Java/Python Bridge
SPARQL end-point
Robert Piro Semantic Web Technologies in Health Care 3/15
6. RDFox RDF
RDF — RESOURCE DESCRIPTION FRAMEWORK
RDF
data format with types W3C standard encode semantic data
Triple: subject predicate object (s, p, o)
building blocks: resources & literals
URI — <http://www.w3.org/2001/XMLSchema#double>
String, Boolean, Integer, Decimal — "0.789"ˆˆxsd:double
Robert Piro Semantic Web Technologies in Health Care 4/15
7. RDFox RDF
RDF — RESOURCE DESCRIPTION FRAMEWORK
RDF
data format with types W3C standard encode semantic data
Triple: subject predicate object (s, p, o)
building blocks: resources & literals
URI — <http://www.w3.org/2001/XMLSchema#double>
String, Boolean, Integer, Decimal — "0.789"ˆˆxsd:double
EXAMPLE (ENCODING A DATABASE TABLE IN RDF)
Table: PATIENT VISIT
REC | MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22
001 | 007 | 20151101 | ...
@prefix ex: <http://my.example.com/FieldName/> .
@prefix visit: <http://my.example.com/Rec/PATIENT VISIT/> .
visit:001 ex:MBR "007" .
visit:001 ex:SERV DT "2015-11-01"ˆˆxsd:date .
Robert Piro Semantic Web Technologies in Health Care 4/15
8. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].
Robert Piro Semantic Web Technologies in Health Care 5/15
9. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
Robert Piro Semantic Web Technologies in Health Care 5/15
10. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
Robert Piro Semantic Web Technologies in Health Care 5/15
11. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [?rec, ex:MBR, "007"],
[?rec, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
Robert Piro Semantic Web Technologies in Health Care 5/15
12. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "001"]
[v:001, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
Robert Piro Semantic Web Technologies in Health Care 5/15
13. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "007"],
[v:001, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
Robert Piro Semantic Web Technologies in Health Care 5/15
14. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "007"],
[v:001, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
p:007 ex:has ex:Diabetes .
Robert Piro Semantic Web Technologies in Health Care 5/15
15. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
p:007 ex:has ex:Diabetes .
Robert Piro Semantic Web Technologies in Health Care 5/15
16. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
p:007 ex:has ex:Diabetes .
RDFOX COMPUTES all CONSEQUENCES . . .
also from newly derived data
in a systematic way
Robert Piro Semantic Web Technologies in Health Care 5/15
17. RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].
Data
p:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
p:007 ex:has ex:Diabetes .
RDFOX COMPUTES all CONSEQUENCES . . . AND TERMINATES
also from newly derived data
in a systematic way
Robert Piro Semantic Web Technologies in Health Care 5/15
18. RDFox Datalog
RDFOX AND DATALOG
STATS
Name Start (Trp) End (Trp) Mem Cores Time
DBpedia 112M 118M 6.1GB 8 28s
Claros 19M 96 M 4.2GB 16(32) 127s
LUBM-1K 134M 182M 9.3GB 16 8s
LUBM-9K 6G 9G ≈100GB 128(1024) 8s
Robert Piro Semantic Web Technologies in Health Care 6/15
19. RDFox Datalog
RDFOX AND DATALOG
STATS
Name Start (Trp) End (Trp) Mem Cores Time
DBpedia 112M 118M 6.1GB 8 28s
Claros 19M 96 M 4.2GB 16(32) 127s
LUBM-1K 134M 182M 9.3GB 16 8s
LUBM-9K 6G 9G ≈100GB 128(1024) 8s
FEATURES OF RDFOX DATALOG
Allows many more constructs (arithmetic*, string ops*, comparisons)
Will allow negation, aggregation (can be simulated already)
Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog
Robert Piro Semantic Web Technologies in Health Care 6/15
20. RDFox Datalog
RDFOX AND DATALOG
STATS
Name Start (Trp) End (Trp) Mem Cores Time
DBpedia 112M 118M 6.1GB 8 28s
Claros 19M 96 M 4.2GB 16(32) 127s
LUBM-1K 134M 182M 9.3GB 16 8s
LUBM-9K 6G 9G ≈100GB 128(1024) 8s
FEATURES OF RDFOX DATALOG
Allows many more constructs (arithmetic*, string ops*, comparisons)
Will allow negation, aggregation (can be simulated already)
Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog
GENERAL FEATURES OF DATALOG
Intuitive if-then-statements
Declarative (say what, not how to compute)
Powerful due to recursion
Robert Piro Semantic Web Technologies in Health Care 6/15
21. Project with Kaiser Permanente
KAISER PERMANENTE
THE ORGANISATION
Kaiser HealthPlan, Kaiser Hospitals, Permanente Medical Group
KP largest ‘managed care’ organisation in the U.S.
KP HealthConnect; largest private electronic health record system
STATS
9.6M members
38 medical centres
620 medical offices
177k emloyees
17k physicians
50k nurses
Turn over 56.4G USD
Net income 3.1G USD
Robert Piro Semantic Web Technologies in Health Care 7/15
22. Project with Kaiser Permanente HEDIS Measures for Diabetic Care
HEALTHCARE EFFECTIVENESS DATA AND INFORMATION SET
HEDIS
Performance measure specification issued NCQA1
(USA)
Percentages of a precisely defined eligible population:
#Eligible with eye exam
#Eligible(is Diabetic,≤65yo, etc)
Entry requirements for government funded healthcare (Medicare)
1
National Committee for Quality assurance
Robert Piro Semantic Web Technologies in Health Care 8/15
23. Project with Kaiser Permanente HEDIS Measures for Diabetic Care
HEALTHCARE EFFECTIVENESS DATA AND INFORMATION SET
HEDIS
Performance measure specification issued NCQA1
(USA)
Percentages of a precisely defined eligible population:
#Eligible with eye exam
#Eligible(is Diabetic,≤65yo, etc)
Entry requirements for government funded healthcare (Medicare)
HEDIS MEASURE COMPUTATION: TODAY
Disparate data sources (historically grown)
Ad-hoc schemas used to store data (meaning implicit)
Involved programs for analytics software
mix data (re)formatting and measuring
difficult to maintain
require high expertise of IT-experts
1
National Committee for Quality assurance
Robert Piro Semantic Web Technologies in Health Care 8/15
24. Project with Kaiser Permanente HEDIS Measures for Diabetic Care
HEDIS MEASURE COMPUTATION IN OUR PROJECT
NEW APPROACH (PETER HENDLER, ROBERT PIRO)
Separate data aggregation and reformatting from computing measures!
Data model inspired by HL7 RIM: ‘Entities in Roles Participating in Acts’
Data translated as RDF-triples into the data model first (Java/Scala)
RDFox Datalog rules compute measures according to this model
Results are read out through simple queries
Robert Piro Semantic Web Technologies in Health Care 9/15
25. Project with Kaiser Permanente HEDIS Measures for Diabetic Care
HEDIS MEASURE COMPUTATION IN OUR PROJECT
NEW APPROACH (PETER HENDLER, ROBERT PIRO)
Separate data aggregation and reformatting from computing measures!
Data model inspired by HL7 RIM: ‘Entities in Roles Participating in Acts’
Data translated as RDF-triples into the data model first (Java/Scala)
RDFox Datalog rules compute measures according to this model
Results are read out through simple queries
BENEFITS
Reusability: uniform data model reusable for other tasks
Efficiency: rules are close to natural language & concise
Maintainability: rules are declarative and easy to understand
Robert Piro Semantic Web Technologies in Health Care 9/15
26. Project with Kaiser Permanente Data Model
DATA MODEL
INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)
Entity Role Participation Act
hasRole hasPart hasAct
ISO standard: ISO/HL7 21731:2014
Process centric (Administrative KR)
Developed for/in the medical community; BUT ‘NHS experience’
Robert Piro Semantic Web Technologies in Health Care 10/15
27. Project with Kaiser Permanente Data Model
DATA MODEL
INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)
Entity Role Participation Act
hasRole hasPart hasAct
ISO standard: ISO/HL7 21731:2014
Process centric (Administrative KR)
Developed for/in the medical community; BUT ‘NHS experience’
EXAMPLE
Getting a coffee
Person Customer Purchaser
‘Buying a
product’
Person Barista Preparer
Subst Coffee Product
Person Customer Consumer
hasRole hasPart hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct
Robert Piro Semantic Web Technologies in Health Care 10/15
28. Project with Kaiser Permanente Data Model
DATA MODEL
INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)
Entity Role Participation Act
hasRole hasPart hasAct
ISO standard: ISO/HL7 21731:2014
Process centric (Administrative KR)
Developed for/in the medical community; BUT ‘NHS experience’
EXAMPLE
Contract for Work
Person Customer Offering Party
‘Buying a
product’
Person Representative Accepting Party
Subst Coffee Work Result
Person Customer Beneficiary
hasRole hasPart hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct
Robert Piro Semantic Web Technologies in Health Care 10/15
29. Project with Kaiser Permanente Data Model
DATA MODEL
INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)
Entity Role Participation Act
hasRole hasPart hasAct
ISO standard: ISO/HL7 21731:2014
Process centric (Administrative KR)
Developed for/in the medical community; BUT ‘NHS experience’
EXAMPLE
Prescription
Person Physician Prescriber Prescription
Person Pharmacist Dispenser
Subst Drug Medication
Person Patient Recipient
hasRole hasPart hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct
Robert Piro Semantic Web Technologies in Health Care 10/15
30. Project with Kaiser Permanente Data Model as RDF Triples
DATA MODEL AS RDF TRIPLES
DATA MODEL USED FOR HEDIS
Entity(EN00)
Name: ”John Smith”
Gender: kp:male
DoB: ”1973-10-22”ˆˆxsd:date
type: cat:person
Role(RL00)
type : cat:Patient
Act(ACT00)
Date : “2013-03-22”ˆˆxsd:date
type: cat:Diagnosis
Participation(PT00)
type : cat:Subject
kp:hasRole
kp:hasPart
kp:hasContext
Robert Piro Semantic Web Technologies in Health Care 11/15
31. Project with Kaiser Permanente Data Model as RDF Triples
DATA MODEL AS RDF TRIPLES
DATA MODEL USED FOR HEDIS
Entity(EN00)
Name: ”John Smith”
Gender: kp:male
DoB: ”1973-10-22”ˆˆxsd:date
type: cat:person
Role(RL00)
type : cat:Patient
Act(ACT00)
Date : “2013-03-22”ˆˆxsd:date
type: cat:Diagnosis
Participation(PT00)
type : cat:Subject
kp:hasRole
kp:hasPart
kp:hasContext
ENCODING IN RDF-TRIPLES
EN00 kp:DoB ”1973-10-22”ˆˆxsd:date PT00 kp:hasContext ACT00 .
EN00 kp:hasRole RL00 . ACT00 rdf:type cat:Diagnosis .
RL00 rdf:type kp:Patient .
RL00 kp:hasPart PT00 .
Robert Piro Semantic Web Technologies in Health Care 11/15
32. Project with Kaiser Permanente Data Model as RDF Triples
DATA TRANSLATION
DATA PROVIDED
Real Data from a KP regional branch2
Data: ASCII-files, one record per line, pipe-separated fields
MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22 | PROVNBR
2
The data never left Kaiser
Robert Piro Semantic Web Technologies in Health Care 12/15
33. Project with Kaiser Permanente Data Model as RDF Triples
DATA TRANSLATION
DATA PROVIDED
Real Data from a KP regional branch2
Data: ASCII-files, one record per line, pipe-separated fields
MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22 | PROVNBR
DATA STATS
About Records Size About Records Size
Providers 113k 6.8M Labs 28.3M 1.4GB
Members 466k 84MB Prescriptions 8.9M 892MB
Enrollments 3.3M 332MB Visits 54M 8.6GB
2
The data never left Kaiser
Robert Piro Semantic Web Technologies in Health Care 12/15
34. Project with Kaiser Permanente Data Model as RDF Triples
DATA TRANSLATION
DATA PROVIDED
Real Data from a KP regional branch2
Data: ASCII-files, one record per line, pipe-separated fields
MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22 | PROVNBR
DATA STATS
About Records Size About Records Size
Providers 113k 6.8M Labs 28.3M 1.4GB
Members 466k 84MB Prescriptions 8.9M 892MB
Enrollments 3.3M 332MB Visits 54M 8.6GB
TRANSLATION & IMPORT
Translation time: 45min @ 8threads
902M triples (4.6GB gzipped), 547M unique
RDFox import time 390s @ 8threads
2
The data never left Kaiser
Robert Piro Semantic Web Technologies in Health Care 12/15
35. Project with Kaiser Permanente The Datalog Rules
DATALOG RULES
RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS
174 rules in 607 lines of code distributed in 21 files
authored on a 200 patient test set using an interactive autoring tool
Robert Piro Semantic Web Technologies in Health Care 13/15
36. Project with Kaiser Permanente The Datalog Rules
DATALOG RULES
RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS
174 rules in 607 lines of code distributed in 21 files
authored on a 200 patient test set using an interactive autoring tool
MATERIALISATION
8 Intel Xeon E5-2680@2.7GHz with 64GB RAM
Data import + materialisation: 1h40m
Maximal number of triples before subgraph extraction: 731M (43GB)
Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)
Robert Piro Semantic Web Technologies in Health Care 13/15
37. Project with Kaiser Permanente The Datalog Rules
DATALOG RULES
RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS
174 rules in 607 lines of code distributed in 21 files
authored on a 200 patient test set using an interactive autoring tool
MATERIALISATION
8 Intel Xeon E5-2680@2.7GHz with 64GB RAM
Data import + materialisation: 1h40m
Maximal number of triples before subgraph extraction: 731M (43GB)
Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)
SUMMARY
Data is translated into RDF triples
RDFox computes with a Datalog Program and the RDF triples the
materialisation
Results are obtained by querying the triple store (SPARQL)
Robert Piro Semantic Web Technologies in Health Care 13/15
38. Project with Kaiser Permanente The Datalog Rules
RULE EXAMPLE
EXAMPLE
Patients must be enrolled and can have multiple enrollements in a year.
Enrollments are given as [begin-date,end-date] pair per patient.
“Compute all patients with contintuous enrollments within the
measurement year” i.e. the enrollments must form a connected chain
[x0, x1] . . . [xi , xi+1][xi+1, xi+2] . . . [xn−1, xn]
such that “2013-01-01” and “2013-12-31” are enclosed by some interval
[?Patient, aux : continiousEnrollment, ?PredEnr] ←
[?Patient, aux : continiousEnrollment, ?Enr],
[?Enr, kp : hasBeginConnectDateTime, ?begin],
[?Patient, aux : roleHasEnrollment, ?PredEnr],
[?PredEnr, kp : hasEndDateTime, ?begin] .
Robert Piro Semantic Web Technologies in Health Care 14/15
39. Conclusion & Future Work
CONCLUSION & FUTURE WORK
CONCLUSION
Created a use-case / Impact Scenario: real requirements, real data
Rooting of reasearch; usefulness of RDFox, new avenues, benchmarks
FUTURE WORK
Rule authoring tool / anoymisation of the data
Research
stratification of the reasoning
negation + aggregates
Big data reasoning + browsing
www.rdfox.org
Robert Piro Semantic Web Technologies in Health Care 15/15