BioDA Workshop, 8th December, 2004 at NeSC
                Notes From Discussion Sessions

Session I - User Experience of ...
-   Screen Scraping
 -   Role of registry

Key issues (presentation by Arijit)
 - DQP uses OQL internally
 - When DQP gets...
Session III – Bioinformatics Requirements for Data Access and
Integration on the Grid
(Chair – Alex Gray; Recorder: Richar...
Currently reviewing architecture of OGSA DAI improving concurrency model,
framework architecture, better definition of ext...
MPA is this not something users supply as additional data. We could supply this if
there is agreement as to what it is.
Upcoming SlideShare
Loading in …5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. BioDA Workshop, 8th December, 2004 at NeSC Notes From Discussion Sessions Session I - User Experience of OGSA-DAI Schema Mappings – transforms or wrappers. (Chair – Noel Kelly; Recorder – Brian Matthews) How do you find changing the application to use OD? - needed to produce a product quickly – to get feedback – used JDBC. Now migrating – putting OD in front of JDBC – using federated DB2 – stored procedures, painless to do that, but does it exercise OD? Not a lot of need for GRID for the types of queries needed at the moment. Only advantage is if WSDL could be exposed to client then that would be a good idea. Component Metadata Extractor – provide a metadata name which could extract the stored procedure. DQP extends the metadata extractor. Ease of integration – NK – quite easy to use, after a year or so. Not using OGSA - Globus GT2. Integration PoV – installation of OD? Had a reputation of being hard to install, do the new wizards help? Still need to do some Jar copying, but should be simplified – no need to modify XML files by hand. GT3 core is installed too. Show of hands of using OD? Jaspreet – used OD a year ago – DBs with WS front end. GDSFactories to create GDS, major problem dynamically searching 600 – needed a WS for each DB – this could not be afforded (Scaling problem) access did OD require DB to be on the same machine – Tom says no, this is not the fault of OD which allows this – mySQL config . Problems with supporting SQLServer? Now can, with some problems with the JBDC driver. Performance? Successfully coordinated a search over 6-7 machines. Interop with BDWorld – wrapper – good idea to wrap a DB accessed by OD – give the interop with BDW – also integrating OD via the BDI to the BDW architecture. In future, a more flexible way of using XML Schemas. OD could fit in different places in the BDW arch. Discussed desirability for consistent mechanism for exposing metadata Session II - GRID Data Access Architecture and OGSA-DAI (Chair – Brian Matthews; Recorder – Andrew Jones) Issues to discuss - Interaction with Web Services - Role of DQP - Security - Various mechanisms for access including XML, flat file, etc: ways of designing a useful higher-level API
  2. 2. - Screen Scraping - Role of registry Key issues (presentation by Arijit) - DQP uses OQL internally - When DQP gets integrated with OGSA-DAI a number of divergences from the present OGSA-DAI implementation will be addressed - DQP doesn’t currently provide any security except a login password facility, but they plan to look at this. Note that WSRF will mean new way in which security should be added Norman Paton - Target platforms they want to target in immediate future are OMII and GT4 - Not yet known how to interoperate between OMII and WS-Security - Plan is for OGSA-DAI to have a way of fitting in with either security model. Plan to introduce security here by release 6. - More complex for OGSA-DQP because want to interoperate across both kinds of platforms. Plan to bring in security (authentication) here by release 7. - Note it’s fairly easy to switch on message-level security in Globus; this is a separate issue Noted that the key project that is worried about security at the moment is eHTPX. (Michael Gleaves?) GeneGrid will be worrying about security shortly. (Noel Kelly) General principle: OGSA-DAI to provide level of security that will support public- sector bioinformaticians adequately (at least as good as what they would normally rely on). (Norman Paton) Jaspreet: what methods are being adopted with regard to semantic data integration? Norman: OGSA-DAI doesn’t address semantic or schematic integration explicitly. OGSA-DQP is a distributed query evaluation mechanism but not an integration mechanism. Any schematic or semantic integration is in the ‘user’s hands. Plan: to allow global-as views to address primarily schematic heterogeneity (time-scale for this not determined). Straw poll: schema integration was seen as higher priority than XML access. HTML screen scraping … Noted Neil had mentioned plan to integrate some kind of screen scraping Activity, configurable with an URL, into OGSA-DAI. So that e.g. a database could be populated for use in OGSA-DAI. So the purpose of this is to provide the means for linking the screen scraper into an OGSA-DAI environment. Scenario, a <perform> … document could include transformation activities (one that is already provided is XSLT). Then Came Coffee …
  3. 3. Session III – Bioinformatics Requirements for Data Access and Integration on the Grid (Chair – Alex Gray; Recorder: Richard White) Michael Gleaves’ lead presentation: Biotechnology techniques (used in Bioinformatics): genomics, “transgenomics”, proteomics, structural genomics, metabolomics, systems biology (based on the whole cell). Andrew: Plus the species diversity level as in BDWorld. Bionformatics’ role is to collect and interpret data and add to knowledge. What are the next problems? Discussion of this question: Continuing use of flat files in bioinformatics; in biodiversity level, much legacy data and low rate of data increase; at lower levels (genomics, proteomics), no old data but huge rate of data increase; protein structural modelling produces large data sets. Different data sources at the different levels. Data of different types need to be brought together. Can you find the right data at the right time? Data discovery and coordination, self-adapting. Arijit: MyGrid people think an issue is to link data to its provenance data, which may be stored elsewhere, e.g. using the LSID as a link. Use this to provide an audit trail. MyGrid has a data model, provenance data which fits this has the highest priority. Users rarely fill in provenance data manually – need to capture it as automatically as possible, in the lab while the experiments are being performed [Alex], or when the workflow is run [Arijit] in the MyGrid environment. Michael: Instrument makers are being persuaded to generate XML provenance files as they generate the data files. Provenance data capture afterwards is second best; at data creation is best. Michael: Next point is data transfer. GridFTP is primitive. How about SRB? Differences in approach between SRB and OGSA-DAI: Depends on your view of the data. SRB puts more emphasis on managing and maintaining data files, OGSA-DAI puts emphasis on data retrieval. Are your applications expecting to receive files (and metadata)? – if so, SRB may be best. OGSA-DAI is better when you’re trying to retrieve different types of data from different sources, and integrate it. Neil: they’re complementary, orthogonal and parallel at the same time [chuckles] Michael: Are the projects’ needs met by OGSA-DAI? eHTPX involves data-mining at only two points, therefore putting more emphasis on SRB. Tom: What limitations do the portal systems encounter, e.g. Spice. Tim: data sources searched in sequence, not in parallel; need to throttle back rate of firing queries at the data sources. Noel: GeneGrid finds OGSA-DAI does everything they want, with minor exceptions of a mathematical nature, at the moment. (May need more complex joins later.) Session IV – OGSA-DAI Road Map and Priorities (Chair - Richard White; Recorder - Alex Gray) Current road map and look at the question other way. What do community want from OCSA DAI. MPA: OCSA DAI is a flexible framework which is able to support new facilities. Can we supply test cases with the required facilities as it gives guidance as to what the requirements are. If we can give a good stat then it is likely to lead to a better implantation of the requirement. Neil road map is changing in the distance due to input from meetings like this
  4. 4. Currently reviewing architecture of OGSA DAI improving concurrency model, framework architecture, better definition of extensibility points Support for WS security profiles, stored procedures other than DB2, data transport improvement by going beyond Grid FTP, XQUERY as a bridge, Database specific types and SQL (virtualised resource) Additionally – JDBC and ODBC driver for OGSA DAI, contribution process WS-RF is yet another set of Web Service specifications Adds WS- Resource properties Split state and stateless services Web services with some extensions MPA whole bunch of services will come in with this change eg synchronous delivery of data results. Globus Toolkit WS-RF core going into Apache What is right toolkit to use if starting now MPA history shows we don’t understand distributed systems completely and there is no simple answer to this question. Most projects do not need distributed systems Release 6 Data integration example scenarios are helpful (looking for common patterns distributed union and join already identified) OGSA DQP integrated JDBC driver WS-security Stored procedures Additionally other features Release 7 Compliance with DAIS specs Contribution to OGSA DAI by its user communities eg what features do you want added. What model is best for this community Can we determine this and help OGSA DAI identify its best architecture. DISCUSSION MPA release 6 wish list is the full list all might not be entered. Team are identifying priorities not all will be implemented Performance is not a top priority but improvement is difficult to identify. Measure is do you tap fingers while waiting for response Metadata and extent it is used - What is the metadata extractor and will it affect the applications. Metadata extractor can be used to extract information about the structure of the data eg size, number of access, RJW it is not metadata about the database structure etc it is metadata about the data.
  5. 5. MPA is this not something users supply as additional data. We could supply this if there is agreement as to what it is. Metadata about stored procedures is needed to expose the stored procedure. Provenance database. This can be supplied automatically by generation of the information. MPA this could be supplied in future to a standard by OGSA DAI but in future releases not 6. Do we need a standard for this ? What are the priorities? Neil showed a list of features in the road map project that are on the website but have not been discussed to assign priorities. It is on project website at Can we supply examples of use of the identified features. Metadata capture from lab instruments. May be possible to support WSI and WSR in future (Neil)