Notes From Session 3: Information Exchange with CROs – Biology/NCD Gordon Baxter (Facilitator) Kees Van Bochove Chris Watkins Rob Gill Peter Boogaard Allan Price There were three in the group with a big Pharma background but all were currently employed in academia or by vendors to the pharmaceutical industry. All in the group could be categorized as having more of a ‘Discovery’ than ‘Development’ mindset. We agreed that the ecosystem was rich with lots of players and that efficient data interchange and exchange was important and required. The current situation was not as good as it could be. What was meant by reference to Non‐clinical data was explained to the group on request. There was a low level of familiarity with NCD exchange as an activity. Only two of the group regularly exchanged data with traditional CRO’s (Charles River, TNO etc) so we agreed that we could consider the issue from the perspective or any organization that needs to exchange data with another as part of their ongoing business. Examples from the group included delivery of data to sponsors in BEL format. Delivery of data in ‘Array Express format’ and receipt of data in GEO format. Drive toward data interchange mostly driven by need to do something with it. There are few standards and very few groups (authorities / regulators) are imposing them in the ‘discovery’ community. All felt that better access to and use of data was of value. Early ADME endpoints helping to kill bad projects early. Molecular footprints – chemical validation etc. In many cases there were significant barriers to data exchange including a paranoia that results may be re‐interpreted and different conclusions made. There was a feeling amongst the group that although the idea of data interchange sounded compelling and most people would want to see other data from other groups – they were typically reluctant to share their own. There was some feeling that the data being shared was generally not particularly useful and perhaps data sharing initiatives were rather less important than imagined. In some systems the data is exchanged, does into a system and then is rendered immediately invisible. Someone made reference to COPD Eclipse and GWAS in this regard? Most felt the quality of annotation was a serious problem in data exchange. This was generally addressed by additional manual annotation or by writing bespoke code.
The BEL data format was introduced as having potential in data exchange. There was some discussion about whether BEL was better for knowledge exchange rather than raw data exchange. It was suggested that in the future all content would be free and the real value would be in mapping and analysis. This means you have to get the format right so there is less time spent in data parsing and more time spent gaining insights. There was some discussion about the benefits of triples versus relational databases – particularly with regard to how efficiently the resulting structures could be searched. We came to the conclusion that they generally facilitated different things with triples providing better ‘brows‐ability’ and the ‘distillation’ of knowledge. DB Pedia, a semantic rendering of Wikipedia was mentioned as being more difficult to search than the original. OpenPhacts was mentioned as a format that might compete with BEL. AT least one member of the group has seen an early prototype of a system and was impressed. This led to a discussion on standards. Some felt that there were potentially too many standards and most were not adequate in allowing representation of the complexity or ‘richness’ of the data. It was suggested that standards need to be clearly focused on an outcome and that they should be smaller? Something was needed to ‘drive’ uptake of standards. Needed to be based on both carrot (pleasure of gain) and stick (fear of loss). Most people in the group thought that the only way a standard would be adopted was if it was imposed by an authority. Standards needed to be open and not closed. No standard – either structural (format, protocol, metadata convention) or semantic (vocabulary, ontology, terminology) could fit all requirements. It was obvious that the interchange of data for research was very different from that for submission purposes. It was also generally acknowledged that we could learn something from the NCD community. We talked about the potential for an interface between Discovery and Development data and agreed that on the face of it this was an important part of any translational science strategy. The concept of collaborative spaces that were scientifically aware was discussed as an interesting driver for nurturing an information ecosystem. What was meant by scientifically aware? The IT container was aware of what the content represented? E.g. intelligence represented in the space as text could be searched by chemical structure or substructure.