Developing Knowledge Brokering Standards for Biological Text and Data Integration: The Pistoia SESL Project
The Pistoia SESL ProjectDeveloping Knowledgefor Collaboration: An Emerging Vehicle Brokering Standards The Pistoia Alliance for Biological Text and Data Integration Ian Harrow, Wendy Filsell, Dietrich Rebholz Schuhmann ALPSP Workshop http://pistoiaalliance.org 11th May 2010
Pistoia Background – How it all started 2007 2008 2009 Now Informal Met in Create Pistoia as Not Official 7 of top 10 Pharma as meeting Pistoia for profit company Launch members Stanhope Gate 33 members Pistoia Domains Established Lhasa Curzon Informal Collaborations Collaboration/project meetingPistoia Description HistoryThe primary purpose of the Pistoia Alliance is to Initial Meeting with GSK, AZ,streamline non-competitive elements of the life Pfizer and Novartis – outlinedscience workflow by the specification of common similar challenges andstandards, business terms, relationships and frustrations in the Informatics sector of DiscoveryprocessesPistoia Goals The advent of Web Services and Web2.0 allow for decoupling of proprietary data from technology• to allow this framework to encompass/supportmost pre-competitive work between the Publicly available structural and biological DBs alloworganisations for a non-IP related analysis and as a scientific test suite.• to support life science workflow prior tosubmission Sponsorship from R&D IS heads within Life Science• to work with other Standards organisations industry
Pistoia Domains Pistoia Domains group areas of interest, scope out and deliver projects Pistoia Domain – high level collectionPistoia Groups – as of Working Groups with common themes External Groupsdefined in byelaws Domain Allows governance across outside of Steering a domain using Working Pistoia Board of Groups Group chairs and Technical Committee reps Directors Could: •join Pistoia Working The main project delivery •influence Pistoia Working mechanism in Pistoia. All Officers Groups members Groups standards will be •influence through (Operational delivered by WGs other standards Team) groups and activities Provide expertise for WGs •Collaborate on and running Pistoia standards’ feasibility Technical Pistoia Define: studies •Requirements •Collaborate through Committee Members •Technical Standards non-Pistoia •Service Standards Standards initiatives
Pistoia Domains Pistoia Domains focus on business workflows /supply chains Enabling Knowledge and Information Services Vocabulary Visualisation Application Integration Workflow Others Biology Chemistry Translational Data Data Data Services Services Services
SESL: Biomedical Knowledge Brokering• Challenge: – No single system for retrieving gene to disease relationships contained in both published & biological database content – Need a ‘push model’ for biomedical knowledge access: the current model requires the consumer to search 1000’s of content sources• Opportunity: Pilot Project with key stakeholders – Pilot a ‘push model’ for biomedical knowledge brokering – Engage multiple consumers, content providers and a single, public group to develop the necessary infrastructure to explore the standards required for the model to work in production• History: – May 2008: Common Disease Knowledge Environment (CDKE) IMI call drafted – Sep 2008: postponed call publication – Jan 2009: x-pharma meeting in London on how to progress CDKE – Apr 2009: CDKE presented at SESL workshop – Oct 2009: SESL Pilot meeting (funders) – Jan 2010: Pilot launch
The Knowledge Service Framework Multiple Consumers‘Consumer’ Disease Dossier KnowledgeFirewall Applications Service Layer Std Public CommonOpen Assertion & Meta Data Mgmt Vocabularies ServiceStds Transform / Translate Business Broker Integrator RulesSupplierFirewall Content Suppliers Db 2 Effort required Db 4 to fit DBs to Corpus 1 service layer Db 3 Corpus 5 6
A Production Service ...ConsumerSide Exemplar Disease Dossier Application License Service Layer Std Public Service Layer Std Public Service Layer Std Public Service Layer Std Public Vocabularies Vocabularies Vocabularies Vocabularies Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Transform / Translate Business Transform / Translate Business Transform / Translate Business Transform / Translate Business Rules Rules Rules Rules Integrator Integrator Integrator Integrator Broker Org #1 Broker Org #2 Broker Org #3 Internal Broker LicenseCorpus 1 Db 3 Corpus 5 Db 7 Corpus 9 Db 11 Corpus 13 Db 15 Corpus 4 Corpus 8 Corpus 12 Corpus 16 Db 2 Db 6 Db 10 Db 14SupplierSide
The Pilot• Deliverables: – Publication of standards & recommendations for service implementation – Pilot implementation of service for a single disease (assertions from pre-defined document sets & databases) – Establish ways of working precompetitively across industry/vendor/academia – Dialogue and assessment of cost / value, with key content suppliers in moving to such a push model for content (viability of moving to production)• Status: – AZ, Pfizer, GSK, Roche, Unilever, EBI, NPG, OUP, Elsevier & RSC – 12 month project, £200K direct funding (+ PM & Architecture support) – Contract between Pistoia & EBI signed 20th January 2010 for 1 year• Scope: – Development of an assertion database in combination with a user interface and associated web services for one disease/indication/phenotype of broad interest: Type II Diabetes – Assertional content derived from 3 structured data sources and limited Journal content (co-occurrence & statistical derivation from full text) – Assertional evidence for filtering and drill down to primary data. – Limited vocabulary development for area of focus: Type II Diabetes
Timelines: Development PhaseTask/Deliverable Phase Type Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Month 0 Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Month 10 Month 11 Month 12Finalised Technical Specification Deliverabledocument (Month 4) 1 ^Build vocabularies within scope Development Task 2RDF data export from UniProt Development Task 3and EnsemblRDF data export of Array Express Development Task 4Extract literature assertions for Development Task 6T2DB from publishers’ contentDevelop RDF triple store schema Development Task 7and demonstratorDevelop query definitions Development Task 8Establish API services for remote Development Task 9accessDevelop simple user interface for Development Task 10demonstrator (based on mock-up)Write documentation that Development Task 11defines the standard frameworkAccess to early prototype Deliverabledemonstrator and report 2&3 ^^(Month 7 & 8)
Timelines: Testing and Communication PhaseTask/Deliverable Phase Type Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Month 0 Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Month 10 Month 11 Month 12Develop RDF triple store schema Development Task 7and demonstratorDevelop query definitions Development Task 8Establish API services for remote Development Task 9accessDevelop simple user interface Development Task 10for demonstrator (based onmock-up)Write documentation that Development Task 11defines the standard frameworkAccess to early prototype Deliverabledemonstrator and report 2&3 ^^(Month 7 & 8)Tests of the demonstrator (full Testing and Task 12private and limited public communicationinstance)Deploy publc demonstrator Testing and Task 13 communicationWrite publication for standard Testing and Task 14definition communicationDevelop recommendations for Testing and Task 15post-pilot project communicationFinal prototype demonstrator, Deliverablerecommendations post-pilot 4&5 ^ ^and report (Month 11 & 12)Public release of limited Deliverabledemonstrator (Month 13) 6 ^
AcknowledgementsIndustry Content Providers EBIIan Dix Claire Bird – OUP Cath BrooksbankNick Lynch Richard O’Bierne – OUP Dominic ClarkAshley George Jabe Wilson – Elsevier Christoph GrabmuellerMike Westaway Bradley Allen – Elsevier Silvestras KavaliauskasIan Stott Colin Batchelor – RSC Roderigo LopezNigel Wilkinson Richard Kidd – RSC Jo McEntyreMichael Braxenthaler David Hoole – NPG Janet ThorntonCatherine Marshall Alf Eaton - NGP
Thank youfor listening to us. Now we’d like to listen to you! So let’s have a Q&A session.
Questions.....• Assuming a successful technical outcome from the SESL experiment by year end... – What opportunities does SESL bring to you? – Do you benefit from full integration of the literature into a biomedical infrastructure? – Who would gain most from a push model? – Does the publication process benefit from this new service model? – How might it change how you do business? – What challenges do you foresee? – How can we reach out further to publishers?