Cerngridtech Bapril


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Cerngridtech Bapril

    1. 1. Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University [email_address]
    2. 2. Different Types of Grids <ul><li>Compute and File-oriented Grids </li></ul><ul><li>“ Internet Computing” Grids (Desktop Grids) </li></ul><ul><li>Peer-to-peer Grids </li></ul><ul><li>Information Grids: to distinguish between File, database and “Perl Filter” based Grids </li></ul><ul><li>Semantic Grids </li></ul><ul><li>Integrated (Hybrid, Complexity) Grids </li></ul><ul><ul><li>Bio and Geocomplexity </li></ul></ul><ul><li>Campus Grids </li></ul><ul><li>Enterprise Grids </li></ul>
    3. 3. Compute and File-oriented Grids <ul><li>Different Grids have different structures </li></ul><ul><li>Compute/File oriented Grids are well represented by “production part of particle physics” either in </li></ul><ul><ul><li>Monte Carlo </li></ul></ul><ul><ul><li>Production of Data Summary Tapes </li></ul></ul><ul><li>This is nearer the “Globus GT2” rather than the “Web Service” vision of the Grid </li></ul><ul><li>Strongly supported of course by EDG (European Data Grid and Trillium project in the US (Virtual Data Toolkit) </li></ul><ul><li>Physics Analysis phase of particle physics requires more collaboration and is more dynamic </li></ul>
    4. 4. What do HEP experiments want to do on the GRID in the long term ? <ul><li>Production: </li></ul><ul><ul><li>Simulation (Monte Carlo generators). </li></ul></ul><ul><ul><li>Reconstruction (including detector geometry …). </li></ul></ul><ul><ul><li>Event Mixing (bit wise superposition of Signal and Backgrounds). </li></ul></ul><ul><ul><li>Reprocessing (Refinement, improved reconstruction data production). </li></ul></ul><ul><ul><li>Production (production of AODs and ESDs starting from Raw data). </li></ul></ul><ul><ul><ul><li>Very organized activity, generally centrally managed by prod teams </li></ul></ul></ul><ul><li>Physics analysis : </li></ul><ul><ul><li>Searches for specific event signatures or particle types. (data access can be very sparse, perhaps on the order of one event out of each million). </li></ul></ul><ul><ul><li>Measurement of inclusive and exclusive cross sections for a given physics channel – Measurement of relevant kinematical quantities </li></ul></ul><ul><ul><ul><li>I/O not feasible to organize the input data in a convenient fashion unless one constructs new files containing the selected events . </li></ul></ul></ul><ul><ul><ul><li>the activities are also uncoordinated (not planned in advance) and (often) iterative. </li></ul></ul></ul>
    5. 5. EDG “Compute/File” Grid Work Packages <ul><li>WP1: Work Load (Resource) Management System </li></ul><ul><li>WP2: Data (Replication/Caching) Management </li></ul><ul><li>WP3: Grid Monitoring / Grid Information Systems (general meta-data lookup </li></ul><ul><li>WP4: Fabric Management (software etc. on cluster) </li></ul><ul><li>WP5: Storage Element (Grid Interface to mass storage) </li></ul><ul><li>WP6: Security </li></ul><ul><li>WP7: Network Monitoring </li></ul>
    6. 6. Compute/File Grid Requirements I <ul><li>Called Data Grid by Globus team </li></ul><ul><li>Terabytes or petabytes of data </li></ul><ul><ul><li>Often read-only data, “published” by experiments </li></ul></ul><ul><li>Large data storage and computational resources shared by researchers around the world </li></ul><ul><ul><li>Distinct administrative domains </li></ul></ul><ul><ul><li>Respect local and global policies governing how resources may be used </li></ul></ul><ul><li>Access raw experimental data </li></ul><ul><li>Run simulations and analysis to create “derived” data products </li></ul>
    7. 7. Compute/File Grid Requirements II <ul><li>Locate data </li></ul><ul><ul><li>Record and query for existence of data </li></ul></ul><ul><li>Data access based on metadata </li></ul><ul><ul><li>High-level attributes of data </li></ul></ul><ul><li>Support high-speed, reliable data movement </li></ul><ul><ul><li>E.g., for efficient movement of large experimental data sets </li></ul></ul><ul><li>Support flexible data access </li></ul><ul><ul><li>e.g., databases , hierarchical data formats (HDF), aggregation of small objects </li></ul></ul><ul><li>Data Filtering </li></ul><ul><ul><li>Process data at storage system before transferring </li></ul></ul>
    8. 8. Compute/File Grid Requirements III <ul><li>Planning, scheduling and monitoring execution of data requests and computations </li></ul><ul><li>Management of data replication </li></ul><ul><ul><li>Register and query for replicas </li></ul></ul><ul><ul><li>Select the best replica for a data transfer </li></ul></ul><ul><li>Security </li></ul><ul><ul><li>Protect data on storage systems </li></ul></ul><ul><ul><li>Support secure data transfers </li></ul></ul><ul><ul><li>Protect knowledge about existence of data </li></ul></ul><ul><li>Virtual data </li></ul><ul><ul><li>Desired data may be stored on a storage system (“materialized”) or created on demand </li></ul></ul>
    9. 9. Functional View of Compute/File Grid Location based on data attributes Location of one or more physical replicas State of grid resources, performance measurements and predictions Metadata Service Application Replica Location Service Information Services Planner: Data location, Replica selection, Selection of compute and storage nodes Security and Policy Executor: Initiates data transfers and computations Data Movement Data Access Compute Resources Storage Resources
    10. 10. Layered C/F Grid Architecture
    11. 11. C/F Grid Architecture I (from the bottom up) <ul><li>Fabric Layer </li></ul><ul><ul><li>Storage systems </li></ul></ul><ul><ul><li>Compute systems </li></ul></ul><ul><ul><li>Networks </li></ul></ul><ul><li>Connectivity Layer </li></ul><ul><ul><li>Communication protocols (e.g., TCP/IP protocol stack) </li></ul></ul><ul><ul><li>Authentication and Authorization protocols (e.g., GSI) </li></ul></ul>
    12. 12. C/F Grid Architecture II <ul><li>Resource Layer: sharing single resources </li></ul><ul><ul><li>Data Access Protocol or Service (e.g., Globus gridftp) </li></ul></ul><ul><ul><li>Storage Resource Management (e.g., SRM/DRM/HRM from Lawrence Berkeley Lab) </li></ul></ul><ul><ul><li>Data Filtering or Transformation Services (e.g., DataCutter from Ohio State University) </li></ul></ul><ul><ul><li>Database Management Services (e.g., local RDBMS) </li></ul></ul><ul><ul><li>Compute Resource Management Services (e.g., local supercomputer scheduler) </li></ul></ul><ul><ul><li>Resource Monitoring/Auditing Service </li></ul></ul>
    13. 13. C/F Grid Architecture III <ul><li>Collective 1 Layer: General Services for Coordinating Multiple Resources </li></ul><ul><ul><li>Data Transport Services (e.g., Globus Reliable File Transfer and Multiple File Transfer Service from LBNL) </li></ul></ul><ul><ul><li>Data Federation Services </li></ul></ul><ul><ul><li>Data filtering or Transformation Service (e.g., Active ProxyG from Ohio State University) </li></ul></ul><ul><ul><li>General Data Discovery Services (e.g., Globus Replica Location Service and Globus Metadata Catalog Service) </li></ul></ul><ul><ul><li>Storage management/brokering </li></ul></ul><ul><ul><li>Compute management/brokering (e.g., Condor from University of Wisconsin, Madison) </li></ul></ul><ul><ul><li>Monitoring/auditing service </li></ul></ul>
    14. 14. C/F Grid Architecture IV <ul><li>Collective 2 Layer: Services for Coordinating Multiple Resources that are Specific to an Application Domain or a Virtual Organization </li></ul><ul><ul><li>Request Interpretation and Planning Services (e.g., Globus Chimera and Pegasus for Physics Applications and Condor DAGMan) </li></ul></ul><ul><ul><li>Workflow management service (e.g., Globus Pegasus) </li></ul></ul><ul><ul><li>Application-Specific Data Discovery Services (e.g., Earth Systems Grid Metadata Catalog) </li></ul></ul><ul><ul><li>Community Authorization service (e.g., Globus CAS) </li></ul></ul><ul><ul><li>Consistency Services with varying levels of consistency, including data versioning, subscription, distributed file systems or distributed databases </li></ul></ul>
    15. 15. Composing These Services To Provide Higher-Level Functionality <ul><li>For example, a Grid File System might compose: </li></ul><ul><ul><li>Fabric layer: storage components, compute elements </li></ul></ul><ul><ul><li>Connectivity layer: security and communication protocols </li></ul></ul><ul><ul><li>Resource layer: data access protocols or services and storage resource management </li></ul></ul><ul><ul><li>Collective layers: transport and discovery services, collective storage management, monitoring and auditing, authorization and consistency services </li></ul></ul>
    16. 16. Peer to Peer Network Peers Peers are Jacks of all Trades linked to “all” peers in community Typically Integrated Clients Servers and Resources User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing
    17. 17. Peer to Peer (Hybrid) Grid Dynamic Message or Event Routing from Peers or Servers Services NB Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing User Resource Service Routing
    18. 18. Peer to Peer Grid Peers Peers Peer to Peer Grid A democratic organization User Facing Web Service Interfaces Service Facing Web Service Interfaces Chapter 18 and 19 Grid Book Database Database Event/ Message Brokers Event/ Message Brokers Event/ Message Brokers
    19. 19. Entropia: Desktop Grid <ul><li>Entropia (chapter 12 of book), United Devices , Parabon , [email_address] etc. have demonstrated “internet Computing” or Desktop Grid very succesfully </li></ul><ul><li>Used to be called peer-to-peer computing but that fell out of favor due to Napster’s bad name </li></ul><ul><li>Condor has similar types of utility but Entropia optimized for </li></ul><ul><ul><li>Huge number of clients </li></ul></ul><ul><ul><li>Providing a secure “sandbox” for application to run in which guarantees that application will not harm client </li></ul></ul>
    20. 20. Scaling of Entropia Application
    21. 21. Entropia Architecture Application Execution on the Entropia System. End-user submits computation to Job Management (1). The Job Manager breaks up the computation into many independent “subjobs” (2) and submits the subjobs to the resource scheduler. In the mean time, the available resources of a client are periodically reported to the Node Manager (a) that informs the Subjob Scheduler (b) using the resource descriptions. The Subjob Scheduler matches the computation needs with the available resources (3) and schedules the computation to be executed by the clients (4,5,6). Results of the computation are sent to the Job Manager (7), put together, and handed back to the end-user (8).
    22. 22. Information Grids I <ul><li>Actually nearly all Grids consist of composing access to data with processing of that data in some computer program </li></ul><ul><li>In Compute/File Grids (Data Grids for Globus), one naturally allowed database access from programs although in some cases dominant access is to files </li></ul><ul><li>In Information Grids, we consider access to databases but view of course files as a special case of databases </li></ul><ul><li>Real difference is what tier we are looking at: </li></ul><ul><ul><li>Compute/File Grids are looking at “backend resources” </li></ul></ul><ul><ul><li>Information Grids are looking at “middle tier” because typically data volumes are not large enough to stress typical middle-tier mechanisms </li></ul></ul>
    23. 23. Information Grids II <ul><li>Should use Middle tier where possible and adopt hybrid model with control always in middle tier and using backend only where needed </li></ul><ul><ul><li>This would require reworking a lot of tools e.g. Condor should schedule services not jobs </li></ul></ul><ul><li>Most programming models either specify “program view” or “service view” and do not separate </li></ul><ul><ul><li>Developments like GT3 will allow changes but it will take a long time before key tools are implemented in hybrid mode </li></ul></ul><ul><li>Note Bioinformatics and many other Information Grids only require service view </li></ul><ul><ul><li>These applications have in UK e-Science started with “Web Service” and not “Globus” view </li></ul></ul>
    24. 24. Raw (HPC) Resources Middleware Portal Services System Services System Services System Services Application Service System Services System Services Grid Computing Environments User Services “ Core” Grid Database Service View Program View
    25. 25. OGSA-DAI (Malcolm Atkinson Edinburgh) UK e-Science Grid Core Programme Development of Data Access and Integration Services for OGSA - Access to XML Databases - - Access to Relational Databases - - Distributed Query Processing - - XML Schema Support for e-Science -
    26. 26. DAI Key Services GridDataService GDS Access to data & DB operations GridDataServiceFactory GDSF Makes GDS & GDSF GridDataServiceRegistry GDSR Discovery of GDS(F) & Data GridDataTranslationService GDTS Translates or Transforms Data GridDataTransportDepot GDTD Data transport with persistence Integrated Structured Data Transport Relational & XML models supported Role-based Authorisation Binary structured files (later)
    27. 27. 1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2b. Factory creates GridDataService to manage access 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions Registry Factory Grid Data Service Client XML / Relational database
    28. 28. Composing Components OGSA-DAI Component OGSA-DAI Component OGSA-DAI Component Data Transport Data Transport Data Transport Data Transport Not clear if this part of OGSA-DAI or should Be composed using “general workflow”
    29. 29. Relational database Interface transparency: one GDS supports multiple database types Client Client Client Grid Data Service Directory / File system XML database
    30. 30. Software Availability <ul><li>Available now </li></ul><ul><li>Phase 1 prototype of GDS, GDSF & GDSR for XML </li></ul><ul><li>Java implementations for the axis/tomcat platform and the Xindice database </li></ul><ul><li>Globus-2 Relational database support </li></ul><ul><li>BinX Schema v0.2 </li></ul><ul><li> </li></ul><ul><li>An XML Schema for describing the structure of binary datafiles – the power of XML for terabyte files </li></ul><ul><li>Software Q1 2003 </li></ul><ul><li>Reference implementation 1 </li></ul><ul><li>Access & Update </li></ul><ul><ul><li>XML databases </li></ul></ul><ul><ul><li>Relational databases </li></ul></ul><ul><li>To be released as Basic Services in Globus Toolkit 3 </li></ul>
    31. 31. Advanced Components DB Consumer GDS Client GDT Translation Translation GDS:PerformScript
    32. 32. Composed Components
    33. 33. Futures of OGSA-DAI <ul><li>Allow querying of distributed databases – this is using Grid to federate multiple databases </li></ul><ul><ul><li>Grid is “intrinsically” federation technology – need to mimic classic database federation ideas in a Grid language </li></ul></ul><ul><ul><li>Form composite Schema from integration of those of individual databases (OGSA-DAI allows you to query each database web service to find schema) </li></ul></ul><ul><li>Decide how to deal with very important case where user view is a complex filter run on database query </li></ul><ul><ul><li>Hardest when need to dynamically assign resource to perform filter </li></ul></ul><ul><ul><li>Could view as a “simulation Web Service” outside OGSA-DAI </li></ul></ul>DB Filter WSDL Of Filter
    34. 34. <ul><li>“ The Semantic Web is an extension of the current Web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration and reuse across various applications . The Web can reach its full potential if it becomes a place where data can be processed by automated tools as well as people ” </li></ul><ul><li>From the W3C Semantic Web Activity statement </li></ul>Semantic Grid starts with the Semantic Web which is a “dream” and a project of W3C <ul><li>Digital Brilliance is phase transition coming from “collective effect” in the Grid Spin </li></ul><ul><li>Glass. </li></ul><ul><li>The Hosting environment is the “Ether” </li></ul><ul><li>The Resources are the Spins </li></ul><ul><li>The forces are the meta-data linking resources </li></ul><ul><li>Knowledge (The Higgs) will emerge when we get enough meta-data to force phase transition </li></ul>
    35. 35. Resource Description Framework
    36. 36. Classical Web Semantic Web Richer semantics
    37. 37. OWL Web Ontology Language “ The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of web accessible resources . These descriptions must be in addition to the human-readable versions of that information. The OWL Guide
    38. 38. SW Tools Good Tools for recording meta-data (OWL) but not so advanced in looking at their implications
    39. 39. <ul><li>Semantic Web requires a metadata-enabled Web </li></ul><ul><li>Where will the metadata come from? </li></ul><ul><li>How about from the linked rich resources of a virtual organization? </li></ul><ul><li>A Grid ……. </li></ul>Classical Web Classical Grid More computation
    40. 40. Compute Resources Catalogs Data Archives Information Discovery Metadata delivery Data Discovery Data Delivery Catalog Mediator Data mediator 1. Portals and Workbenches Bulk Data Analysis Catalog Analysis Metadata View Data View 4.Grid Security Caching Replication Backup Scheduling 2.Knowledge & Resource Management Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Standard APIs and Protocols Concept space 3. 5. 6. 7. Derived Collections Astronomy Sky Survey Data Grid Grid is metadata based middleware
    41. 41. An Example of RDF and Dublin Core <ul><li><rdf:RDF xmlns:rdf=&quot;; xmlns:dc=&quot;; > </li></ul><ul><li><rdf:Description about=&quot; &quot;> </li></ul><ul><li><dc:Title> D-Lib Program - Research in Digital Libraries </dc:Title> <dc:Description> The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing. </dc:Description> <dc:Publisher> Corporation For National Research Initiatives </dc:Publisher> <dc:Date> 1995-01-07 </dc:Date> </li></ul><ul><li><dc:Subject> </li></ul><ul><ul><li><rdf:Bag> <rdf:li> Research; statistical methods </rdf:li> <rdf:li> Education, research, related topics </rdf:li> <rdf:li> Library use Studies </rdf:li> </rdf:Bag> </li></ul></ul><ul><li></dc:Subject> <dc:Type> World Wide Web Home Page </dc:Type> </li></ul><ul><li><dc:Format> text/html </dc:Format> </li></ul><ul><li><dc:Language> en </dc:Language> </li></ul><ul><li></rdf:Description> </rdf:RDF> </li></ul>
    42. 42. <ul><li>Annotations of results, workflows and database entries could be represented by RDF graphs using controlled vocabularies described in RDF Schema and OWL </li></ul><ul><li>Personal notes can be XML documents annotated with metadata or RDF graphs linked to results or experimental plans </li></ul><ul><li>Exporting results as RDF makes them available to be reasoned over </li></ul><ul><li>RDF graphs can be the “glue” that associates all the components (literature, notes, code, databases, intermediate results, sketches, images, workflows, the person doing the experiment, the lab they are in, the final paper) </li></ul><ul><li>The provenance trails that keep a record of how a collection of services were orchestrated so they can be replicated or replayed, or act as evidence </li></ul>For example…
    43. 43. <ul><li>Represent the syntactic data types of e-Science objects using XML Schema data types </li></ul><ul><li>Represent domain ontologies for the semantic mediation between database schema, an application’s inputs and outputs, and workflow work items </li></ul><ul><li>Represent domain ontologies and rules for parameters of machines or algorithms to reason over allowed configurations </li></ul><ul><li>Use reasoning over execution plans, workflows and other combinations of services to ensure the semantic validity of the composition </li></ul><ul><li>Use RDF as a common data model for merging results drawn from different resources or instruments </li></ul><ul><li>Capture the structure of messages that are exchanged between components </li></ul>More meta-data …
    44. 44. <ul><li>At the data/computation layer: classification of computational and data resources, performance metrics, job control, management of physical and logical resources </li></ul><ul><li>At the information layer: schema integration, workflow descriptions, provenance trail </li></ul><ul><li>At the knowledge layer: problem solving selection, intelligent portals </li></ul><ul><li>Governance of the Grid, for example access rights to databases, personal profiles and security groupings </li></ul><ul><li>Charging infrastructure, computational economy, support for negotiation; e.g. through auction model </li></ul>And more meta-data …
    45. 45. Classical Web Classical Grid Semantic Web Richer semantics More computation Semantic Grid Source: Norman Paton
    46. 46. Summary of Grid Types <ul><li>Compute/File Grid: The “Linux workstation view of distributed system” – need planning, scheduling of 10,000’s jobs, efficient movement of data to processors </li></ul><ul><li>Desktop Grid: as above but use huge numbers of “foreign” compute resources </li></ul><ul><li>Information Grids: Web service access to meta-data rich data repositories </li></ul><ul><li>Hybrid (complexity) Grids: Combination of Information and Compute/File Grids </li></ul><ul><li>Peer-to-peer Grid: Unstructured general purpose access to other style grids </li></ul><ul><li>Semantic Grid: Enables knowledge discovery in all Grids </li></ul>