Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Project team members drawn from Commercial organisations and Non-commercial organisations
  • The breakdown areas are somewhat arbitrary for now. It's only to give an indication of how the space partitions.
  • CHU - Churchill Hospital and Oxford University UCL - St George's Hospital at University College London KCL - Guy's & St Thomas' Hospital & King College London UED - The Ardmillan and Edinburgh University
  • Time zone date problem - a date was converted to an absolute time which goes over as UTC for midnight local time on that date which converts to the previous day locally.
  • DiscoveryLink->Information Integrator->Masala Use PERMIS ( PrivilEge and Role Management Infrastructure Standards Validation) http://www.permis.org More up to date stuff available from http://sec.isi.salford.ac.uk/permis/
  • MGI - Mouse Genome Information (for more information see http://www.informatics.hax.org)
  • If you can locate the same object over different wave lengths you are able to forma a more complete picture of that object. There are many telescope facilities around the world that are recording vast quantities of astronomy data directly to digital archive. These telescopes operate with various instruments and record data across different wavelength ranges. Many astronomers want to access this data simultaneously in order to study and analyse a more complete picture of outer space.
  • PPT

    1. 1. OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong [email_address]
    2. 2. <ul><li>Motivation </li></ul><ul><li>Goals </li></ul><ul><li>Partners </li></ul><ul><li>Features </li></ul><ul><li>Projects </li></ul><ul><li>Further information </li></ul><ul><li>Overview and demo of FirstDIG/INWA </li></ul>Overview
    3. 3. OGSA-DAI Motivation <ul><li>Entering an age of data </li></ul><ul><ul><li>Data Explosion </li></ul></ul><ul><ul><ul><li>CERN: LHC will generate 1GB/s = 10PB/y </li></ul></ul></ul><ul><ul><ul><li>VLBA (NRAO) generates 1GB/s today </li></ul></ul></ul><ul><ul><ul><li>Pixar generate 100 TB/Movie </li></ul></ul></ul><ul><ul><li>Storage getting cheaper </li></ul></ul><ul><li>Data stored in many different ways </li></ul><ul><ul><li>Data resources </li></ul></ul><ul><ul><ul><li>Relational databases </li></ul></ul></ul><ul><ul><ul><li>XML databases </li></ul></ul></ul><ul><ul><ul><li>Flat files </li></ul></ul></ul><ul><li>Need ways to facilitate </li></ul><ul><ul><li>Data discovery </li></ul></ul><ul><ul><li>Data access </li></ul></ul><ul><ul><li>Data integration </li></ul></ul><ul><li>Empower e-Business and e-Science </li></ul><ul><ul><li>The Grid is a vehicle for achieving this </li></ul></ul>
    4. 4. Goals for OGSA-DAI <ul><li>Aim to deliver application mechanisms that: </li></ul><ul><ul><li>Meet the data requirements of Grid applications </li></ul></ul><ul><ul><ul><li>Functionally, performance and reliability </li></ul></ul></ul><ul><ul><ul><li>Reduce development cost of data centric Grid applications </li></ul></ul></ul><ul><ul><ul><li>Provide consistent interfaces to data resources </li></ul></ul></ul><ul><ul><li>Acceptable and supportable by database providers </li></ul></ul><ul><ul><ul><li>Trustable, imposed demand is acceptable, etc. </li></ul></ul></ul><ul><ul><ul><li>Provide a standard framework that satisfies standard requirements </li></ul></ul></ul><ul><li>A base for developing higher-level services </li></ul><ul><ul><li>Data federation </li></ul></ul><ul><ul><li>Distributed query processing </li></ul></ul><ul><ul><li>Data mining </li></ul></ul><ul><ul><li>Data visualisation </li></ul></ul>
    5. 5. Integration Scenario <ul><li>A patient moves hospital </li></ul>DB2 Oracle CSV file A: (PID, name, address, DOB) B: (PID, first_contact) C: (PID, first_name, last_name, address, first_contact, DOB) Amalgamated patient record Data A Data B Data C
    6. 6. Why OGSA-DAI? <ul><li>Why use OGSA-DAI over JDBC? </li></ul><ul><ul><li>Language independence at the client end </li></ul></ul><ul><ul><ul><li>Do not need to use Java </li></ul></ul></ul><ul><ul><li>Platform independence </li></ul></ul><ul><ul><ul><li>Do not have to worry about connection technology and drivers </li></ul></ul></ul><ul><ul><li>Can handle XML and file resources </li></ul></ul><ul><ul><li>Can embed additional functionality at the service end </li></ul></ul><ul><ul><ul><li>Transformations, Compression, Third party delivery </li></ul></ul></ul><ul><ul><ul><li>Avoiding unnecessary data movement </li></ul></ul></ul><ul><ul><li>Provision of Metadata is powerful </li></ul></ul><ul><ul><li>Usefulness of the Registry for service discovery </li></ul></ul><ul><ul><ul><li>Dynamic service binding process </li></ul></ul></ul><ul><ul><li>The quickest way to make data accessible on the Grid </li></ul></ul><ul><ul><ul><li>Installation and configuration of OGSA-DAI is fast and straightforward </li></ul></ul></ul>
    7. 7. Project Partners <ul><li>Funded by the Grid Core Programme </li></ul><ul><ul><li>OGSA-DAI </li></ul></ul><ul><ul><li>£3 million, 18 months, from Feb 2002 </li></ul></ul><ul><ul><ul><li>Three major releases, three interim releases </li></ul></ul></ul><ul><ul><li>DAIT (DAI-Two) </li></ul></ul><ul><ul><ul><li>Keep the OGSA-DAI brand name </li></ul></ul></ul><ul><ul><ul><li>£1.5 million, 24 months, </li></ul></ul></ul><ul><ul><ul><li>from Oct 2003 </li></ul></ul></ul><ul><ul><ul><li>Four major releases </li></ul></ul></ul><ul><ul><ul><li>GGF DAIS WG </li></ul></ul></ul><ul><ul><li>Strong involvement. </li></ul></ul><ul><ul><li>Standardise the interfaces </li></ul></ul><ul><ul><ul><li>OGSA-DAI to be a reference implementation </li></ul></ul></ul>Powered by ….
    8. 8. Core features <ul><li>An extensible framework for building applications </li></ul><ul><ul><li>Supports relational, xml and some files </li></ul></ul><ul><ul><ul><li>MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL </li></ul></ul></ul><ul><ul><li>Supports various delivery options </li></ul></ul><ul><ul><ul><li>SOAP, FTP, GridFTP, HTTP, files, email, inter-service </li></ul></ul></ul><ul><ul><li>Supports various transforms </li></ul></ul><ul><ul><ul><li>XSLT, ZIP, GZip </li></ul></ul></ul><ul><ul><li>Supports message level security using X509 certificates </li></ul></ul><ul><ul><li>Client Toolkit library for application developers </li></ul></ul><ul><ul><li>Comprehensive documentation and tutorials </li></ul></ul><ul><li>Third production release is coming in November </li></ul><ul><ul><li>OGSI/GT3 based </li></ul></ul><ul><ul><li>Also previews of WS-I and WS-RF/GT4 releases </li></ul></ul>
    9. 9. Activities are the drivers <ul><li>Express a task to be performed by a GDS </li></ul><ul><li>Three broad classes of activities: </li></ul><ul><ul><li>Statement </li></ul></ul><ul><ul><li>Transformations </li></ul></ul><ul><ul><li>Delivery </li></ul></ul><ul><li>Extensible: </li></ul><ul><ul><li>Easy to add new functionality </li></ul></ul><ul><ul><li>Does not require modification to the service interface </li></ul></ul><ul><ul><li>Extension operate within the OGSA-DAI framework </li></ul></ul><ul><li>Functionality: </li></ul><ul><ul><li>Implemented at the service </li></ul></ul><ul><ul><li>Work where the data is (do not require to move data back) </li></ul></ul>
    10. 10. OGSA-DAI Deck
    11. 11. Client Toolkit <ul><li>Why? Nobody wants to write XML! </li></ul><ul><li>A programming API which makes writing applications easier </li></ul><ul><ul><li>Now: Java </li></ul></ul><ul><ul><li>Next: Perl, C, C#?, ML!? </li></ul></ul>// Create a query SQLQuery query = new SQLQuery(SQLQueryString); ActivityRequest request = new ActivityRequest(); request.addActivity(query); // Perform the query Response response = gds.perform(request); // Display the result ResultSet rs = query.getResultSet(); displayResultSet(rs, 1);
    12. 12. Project classification OGSA-DAI Biological Sciences Physical Sciences Commercial Applications Computer Sciences <ul><li>FirstDig </li></ul><ul><li>INWA </li></ul><ul><li>Bridges </li></ul><ul><li>AstroGrid </li></ul><ul><li>BioSimGrid </li></ul><ul><li>BioGrid </li></ul><ul><li>eDiamond </li></ul><ul><li>myGrid </li></ul><ul><li>ODD-Genes </li></ul><ul><li>N2Grid </li></ul><ul><li>GEON </li></ul><ul><li>MCS </li></ul><ul><li>IU RGBench </li></ul><ul><li>OGSA Web-DB </li></ul><ul><li>GeneGrid </li></ul><ul><li>GridMiner </li></ul>
    13. 13. <ul><li>e-D igital M amm O graphy N ational D atabase </li></ul><ul><li>Built a prototype of a national database of mammographic images in support of the UK Breast screening programme </li></ul><ul><li>Employ Grid technologies to facilitate this process </li></ul>
    14. 14. DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Federation OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI Database Files OGSA-DAI Core Services Core Services Core Services Core Services Data Load Training App Training Services UCL KCL UED CHU Core API Training API Training Application Core & Training API OGSA-DAI Data Load Training App Core & Training API Data Load Training App Core & Training API Data Load Training App Core & Training API
    15. 15. <ul><li>eDiaMoND Findings: </li></ul><ul><ul><li>OGSA-DAI provides a flexible framework </li></ul></ul><ul><ul><li>Dynamically configure the system through discovery </li></ul></ul><ul><ul><li>Activities can operate with different levels of granularity </li></ul></ul><ul><ul><li>Federation can introduced at various levels </li></ul></ul><ul><ul><li>Extended Activities to access IBM DB2 Content Manager </li></ul></ul>
    16. 16. GeneGrid <ul><li>Grid Based Framework for Bioinformatics – Virtual Bioinformatics Laboratory </li></ul><ul><ul><li>Integration of Existing Technologies & Data Sets </li></ul></ul><ul><ul><li>Gene Study in Silico </li></ul></ul><ul><ul><li>Develop Specialist Data Sets </li></ul></ul><ul><ul><li>Grid Services for Commercial or 3 rd Party Use </li></ul></ul><ul><li>Data resources as XML collections (XIndice), flat files and relational databases (MySQL) </li></ul><ul><ul><li>OGSA-DAI plus custom extensions </li></ul></ul><ul><ul><li>Beta testers for file based activities </li></ul></ul><ul><li>http://www.qub.ac.uk/escience/projects/genegrid/ </li></ul>
    17. 17. GeneGrid Architecture GeneGrid Application Management Registry GeneGrid Data Manager Registry GeneGrid Input &Results Parameters GeneGrid Environment GeneGrid Workflow Manager Service GeneGrid Process Manager Service GeneGrid Portal EMBL Database SwissProt Database iGAP GAM Service SDSC BeSC EBI GDM Service GDM Service GDM Service TMHMM Blast GAM Service SignalP mpiBlast GAM Service SwissProt DB GDM Service EMBL DB GDM Service GeneGrid Workflow Definition GeneGrid Workflow Status
    18. 18. Distributed Query Processing <ul><li>Queries mapped to algebraic expressions for evaluation </li></ul><ul><li>Parallelism represented by partitioning queries </li></ul><ul><ul><li>Use exchange operators </li></ul></ul><ul><li>Prototype available from: </li></ul><ul><ul><li>http://www.ogsadai.org.uk </li></ul></ul>table_scan (protein) table_scan termID=S92 (proteinTerm) reduce reduce hash_join (proteinId) op_call (Blast) reduce exchange exchange 3,4 1 2
    19. 19. GridMiner <ul><li>Test application area: medical </li></ul><ul><ul><li>traumatic brain injury treatment </li></ul></ul><ul><ul><li>Predicting the outcome of seriously ill patients </li></ul></ul><ul><ul><li>analytical part focuses on data mining and On-Line Analytical Processing (OLAP) </li></ul></ul><ul><li>Target: </li></ul><ul><ul><li>provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources </li></ul></ul><ul><ul><li>building on and extending OGSA-DAI </li></ul></ul><ul><li>http://www.gridminer.org/ </li></ul>
    20. 20. GridMiner Scenario <ul><li>Heterogeneities: </li></ul><ul><ul><li>Name in A is „First Last“ (as the target format) </li></ul></ul><ul><ul><li>Name in C has to be combined </li></ul></ul><ul><li>Distribution: </li></ul><ul><ul><li>3 data sources </li></ul></ul>
    21. 21. Future work <ul><li>Architecture review </li></ul><ul><ul><li>better concurrency model </li></ul></ul><ul><ul><li>better AAA framework </li></ul></ul><ul><ul><li>better definition of extensibility points </li></ul></ul><ul><ul><ul><li>security, activities, dynamic configuration, mobile code,… </li></ul></ul></ul><ul><li>Improved support for </li></ul><ul><ul><li>WS Security profiles </li></ul></ul><ul><ul><li>Stored procedures </li></ul></ul><ul><ul><li>Data transport </li></ul></ul><ul><ul><li>XQuery </li></ul></ul><ul><ul><li>Database specific datatypes and SQL </li></ul></ul><ul><li>Additionally </li></ul><ul><ul><li>JDBC and ODBC driver for OGSA-DAI </li></ul></ul><ul><ul><li>Contribution process </li></ul></ul>
    22. 22. Further information <ul><li>The OGSA-DAI Project Site: </li></ul><ul><ul><li>http://www.ogsadai.org.uk </li></ul></ul><ul><li>The DAIS-WG site: </li></ul><ul><ul><li>http://forge.gridforum.org/projects/dais-wg/ </li></ul></ul><ul><li>OGSA-DAI Users Mailing list </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>General discussion on grid DAI matters </li></ul></ul><ul><li>Formal support for OGSA-DAI releases </li></ul><ul><ul><li>http://www.ogsadai.org.uk/support </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><li>OGSA-DAI training courses </li></ul>
    23. 23. Project Membership IBM Dissemination Team Charaka Tom Mike Ally Amy Mario Malcolm Kostas Norman Paul Neil Andy Simon Dave Patrick Neil IBM Development Team Principal Investigators Project Manager Programme Management Board Chair Technical Review Board Chair Research Team EPCC Team
    24. 24. The End <ul><li>Questions? </li></ul>
    25. 25. INWA Objectives <ul><li>Innovation Node Western Australia </li></ul><ul><ul><li>Informing Business & Regional Policy: Grid-enabled fusion of global data and local knowledge </li></ul></ul><ul><li>Project </li></ul><ul><ul><li>Run from Nov 2003 - Aug 2004 </li></ul></ul><ul><ul><li>Involved 10 partners (6 UK + 4 Australia) </li></ul></ul><ul><li>Aim </li></ul><ul><ul><li>Data mine commercially sensitive data </li></ul></ul><ul><ul><li>Security an absolute MUST </li></ul></ul><ul><ul><li>Employ Grid technologies </li></ul></ul><ul><ul><li>Need access to data and computational resources </li></ul></ul><ul><li>Demonstrator using: </li></ul><ul><ul><li>OGSA-DAI </li></ul></ul><ul><ul><ul><li>Incorporate data resources </li></ul></ul></ul><ul><ul><li>Sun DCG's TOG (Transfer-queue Over Globus) </li></ul></ul><ul><ul><ul><li>Handle job submission to analyse micro array data </li></ul></ul></ul>
    26. 26. INWA [email_address] Curtin,Australia EPCC,UK Grid Engine Bank Telco Grid Engine Bank Telco OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI TOG TOG Data Browser Data Browser [email_address] Telco data Bank data Australian property UK Property
    27. 27. INWA: Lessons Learned <ul><li>Performing Data Integration: </li></ul><ul><ul><li>TimeZone date problems </li></ul></ul><ul><li>Security issues: </li></ul><ul><ul><li>Bugs in </li></ul></ul><ul><ul><ul><li>JavaCoG in GT3 </li></ul></ul></ul><ul><ul><ul><li>OGSA-DAI could not switch security for Grid data transfers </li></ul></ul></ul><ul><ul><ul><li>TOG had no security option </li></ul></ul></ul><ul><ul><li>All of these have been fixed </li></ul></ul><ul><li>Middleware not mature enough for commercial deployment </li></ul>
    28. 28. <ul><li>B iomedical R esearch I nformatics D elivered by G rid E nabled S ervices </li></ul><ul><li>Want a Grid enabled front end to their software </li></ul><ul><li>Want to do a comparison evaluation between </li></ul><ul><ul><li>IBM's Information Integrator </li></ul></ul><ul><ul><li>OGSA-DAI </li></ul></ul>
    29. 29. Bridges: Data Sources Edinburgh Glasgow Leicester Oxford MRC/Imperial Eindhoven Maastricht
    30. 30. IBM Information Integrator OGSA-DAI Client MGI CSV MGI CSV
    31. 31. FirstDIG <ul><li>Data mining with the First Transport Group, UK </li></ul><ul><ul><li>Example: “When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10%” </li></ul></ul><ul><ul><li>http://www.epcc.ed.ac.uk/firstdig </li></ul></ul>OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI Client Application Data Mining Application
    32. 32. EdSkyQuery-G Sky Data  Sky Data  Sky Data  Sky Data 
    33. 33. PostgreSQL MySQL Xindice DB2 Scratch DB Data Service Data Service Data Service Data Service Data Service Scratch DB Data Service Data Service Data Service Data Service Scratch DB Data Service Oracle Oracle Oracle Oracle Federation DB2 DB2 DB2 DB2 Federation
    34. 34. OGSA-DAI Downloads R4 <ul><li>690 downloads since May 04 </li></ul><ul><li>Actual user downloads not search engine crawlers </li></ul><ul><li>-Does not include downloads as part of GT3.2 releases </li></ul><ul><li>Total of 838 registered users </li></ul><ul><li>(@ 7/10/04) </li></ul><ul><li>Version (release date) Downloads </li></ul><ul><li>R1.0 (Jan 03) 104 </li></ul><ul><li>R1.5 (Feb 03) 108 </li></ul><ul><li>R2.0 (Apr 03) 250 </li></ul><ul><li>R2.5 (Jun 03) 291 </li></ul><ul><li>R3.0 (Jul 03) 792 </li></ul><ul><li>R3.1 (Feb 04) 630 </li></ul><ul><li>Total 2865 </li></ul>United Kingdom 21% China 26% United States 13% Japan 5% Unknown 7% Germany 5% Italy 5% Austria 2% Australia 2% France 3% Taiwan 2% Downloads by Country – OGSA-DAI R4.0
    35. 35. Users Group <ul><li>A separate independent body to engage with users and feedback to developers </li></ul><ul><ul><li>Chair: Prof. Beth Plale of Indiana University </li></ul></ul><ul><li>Twice-yearly meetings </li></ul>