Big data and Knowledge Engineering for HealthProf. Anthony J Brookes: University of Leicester, UK May 2012, London Eduserv Symposium 2012: Big Data, Big Deal?
Different or No Big Data problems?- Changing or stable rate of data generation / availability- Changing or stable complexity of data- Changing or stable requirement to use the data- Changing or stable tooling to use the data- Changing or stable mass of ‘useless’ data (vs knowledge)
‘KNOWLEDGE ENGINEERING’ for HEALTHKnowledge engineering was first defined in 1983 as “anengineering discipline that involves integratingknowledge into computer systems in order to solvecomplex problems normally requiring a high level ofhuman expertise”(Feigenbaum and McCorduck, 1983).
Building and engaging with the community:- presentation & discussion at many international meetings and forums - 1/2 day workshop as satellite to ESHG (6 invited speakers) - workshop session at MIE2011 (3 invited speakers, audience discussion) - I-Health 2011 workshop in Brussels, 3-4 Oct 2011- growing community, currently >150 academics, companies, healthcare providers
Integration and Interpretation of Information for Individualised Healthcare http://www.i4health.eu/
150,000 published Mostly unknown<100 routinely used to Healthcare
Data RESEARCH HEALTHCAREBIO-INFORMATICS MED-INFORMATICS ACADEMICS COMPANIES Data Biobanks Registries
I 4 H E A L T H HEALTHCARERESEARCH ‘KNOWLEDGE ENGINEERING’ for HEALTH
RESEARCH WORLD ‘KNOWLEDGE GENERATION’ ...make sense of these entities ‘KNOWLEDGE ENGINEERING’ ...identify & use the bits you understandCLINICAL WORLD
STANDARDS• Semantic Standards (to allow unambiguous understanding of the data) – Terminologies, Ontologies, Vocabularies, Coding systems – Need cross-mapping between semantic standards, and across languages• Syntactic Standards (to make data structures interoperable) – Data and Metadata object models, and Exchange formats – Minimal content specifications, harmonised across domains – Robust core requirements, with general principles that bring flexibility• Technical Standards (to build a system that works efficiently) – Database models, Search systems, and User interfaces (e.g., browsers) – Web-service specifications, Web 2.0 technologies – ID solutions for data, databases, publications, biobanks, researchers – Technologies for controlling data access and user permissions – Ethical and Legal policies, implementation, and recognition-rewards structures• Quality Standards (to match data to needs) – measuring and representing quality in a meaningful way – Important role here for metadata – Recording and standardising SOPs
..personal data 2012-02-07 DCC roadshow East Midlands - CC-BY-SA 15
Electronic Healthcare Records Recording Expressiveness Terminology Precision/rigour Collection Searchability Models Comparability Best Practice Search andRetrieval Models EHR Classifications Utility Categorisation Decision Secondary use Making Information Communication Model Models Registration Structure and Location Detail Search Models Storage Interoperability Notify, Find
Data sharing - Incentive/reward systems - 3 categories of risk, with ‘speed pass’ access control - Compulsion/sanctions - Researcher IDs (ORCID) - Open data discovery (e.g., Cafe Variome) - Remote pooled analysis (e.g., Data Shield, EU-ADR/EMIF)
Decision BioScience Text & Computer Support & Omics Web pages Modalities Models Systems data Systems Databases Biosensors EHR Feedback / Optimisation DISORGANISED DIGITAL INFORMATION RELEVANT TO PERSONALIZED HEALTHCARE Self- Emerging architectural ConceptOptimising
Data Imaging Instrumentation Omics Clinical +Information + Personal Population Models KnowledgeKnowledge PortalsHealth Care Optimised Utility Healthcare
Big Data can mainly stay at ‘source’, feeding the Knowledge Extraction process Knowledge Extraction/Distillation filters therefore need to be created
Policy and Strategy- To kick start the field: Put money into research, development, and application projects based upon the Knowledge Engineering concept- To create the needed expertise: Cross-train people who have a talent for engineering in computer science + bioscience + healthcare- To ensure interoperability across the total system: Organise activities on a middle-out basis, rather than the usual top-down or bottom-up approaches- To ensure innovation and sustainability: explore ways to get academic and commercial players working together- To start bringing the system to life: Emphasize knowledge ‘filtration’, ‘distillation’, and ‘provision’ from sources of (Big) Data
Acknowledgments• GEN2PHEN Partners• My team: Robert Free, Rob Hastings, Adam Webb, Tim Beck, Sirisha Gollapudi, Gudmundur Thorisson, Owen Lancaster• Some key discussants: Søren Brunak, Debasis Dash, Carlos Diaz, Norbert Graf, Johan van der Lei, Heinz Lemke, Ferran Sanz “Data-to-Knowledge-for-Practice” (D2K4P) Center This work received funding from the European Communitys Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project.