Leveraging Open Source Technologies to Enable Scientific Archiving and DiscoveryResearch Data Access & PreservationDenver, ColoradoMarch 31 - April 1, 2011Steve HughesDan CrichtonChris MattmannSean Kelly
TopicsE-Science TrendsSoftware ArchitecturesOpen SourceObject-Oriented Data TechnologyUse CaseData Driven2Leveraging Open Source Technologies to Enable Scientific Discovery
“eScience” TrendsHighly distributed, multi-organizational systemsSystems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environmentsSharing of data and services which allow for the discovery, access, and transformation of data Systems are moving towards publishing of services and data in order to address data and computationally-intensive problemsInfrastructures which are being built to handle future demandUse of commodity services to address elasticityAddress complex modeling, inter-disciplinary science and decision support needsNeed a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questionsNeed to ensure information architecture support the varying science needsChanging the way in which data analysis is performedMoving towards analysis of distributed data to increase the study powerEnabling greater collaboration across centersSystematizing, where possible3Leveraging Open Source Technologies to Enable Scientific Discovery
Highly Distributed Science EnvironmentsLeveraging Open Source Technologies to Enable Scientific Discovery4Highly distributed/federatedCollaborativeInformation-centricDiscipline-specificGrowing/evolvingHeterogeneous (Implementations)
Why Software Architecture?Software Architecture: The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution.  (ANSI/IEEE Std. 1471-2000)Architecture is about strategy to address key architectural concerns…How can we exploit common patterns to improve reuse?Can we develop software product lines?Can we improve interoperability?Can we reduce dependencies? What are the architectural principles..?: loosely-coupled, data-driven, highly distributed, commodity services, service oriented, collaborative/multi-institutional5Leveraging Open Source Technologies to Enable Scientific Discovery
Notional Service Architectures Concept6Leveraging Open Source Technologies to Enable Scientific DiscoveryClient BClient ACService Interface Service The service architecture concept exploits many of the architectural concepts discussed
Loosely coupled
Elasticity (e.g. Commodity-based)
Multi-organizational
 etc
At an enterprise-scale, architectures don’t need to prescribe what’s inside services….just their interfaces, function, behavior, etc…
Services might include….
Data discovery
Data access
Security
TransformationC2 Architectural Style
What does this have to do with open source?The identification of core software product lines and tools, that can be reused, are excellent examples of opportunities to create open source projectsAcross a federation of organizations, systems and users, what be developed and shared?How can software components be developed in generic ways, but allow for extensions?Open source itself is a strategyCan improve collaborations Can drive a robust set of reusable software components and toolsCan push standards developmentCan encourage use of common architectural patternsLeveraging Open Source Technologies to Enable Scientific Discovery7
Open Source ModelsSoftware sharing with an open source license (e.g, BSD-style license)Software distribution through open source organizations (e.g., SourceForge)Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation)Ad hoc open source project communities with their own governanceLeveraging Open Source Technologies to Enable Scientific Discovery8
Open Source Models: Our OpinionSoftware sharing with an open source license (e.g, BSD-style license)It’s a great startLimited community involvementSoftware distribution through open source organizations (e.g., SourceForge)Provides good software distribution supportSoftware projects under the governance of an open source community/foundation (e.g., Apache Software Foundation)This moves from just distribution support to collaboration and governance over the developmentAd hoc open source project communities with their own governanceThis can make a lot of sense for larger federations…Leveraging Open Source Technologies to Enable Scientific Discovery9
The Apache Software FoundationLargest open sourcesoftware development entity in the worldOver 2300+ committersOver 3500+ contributors84 Top Level Projects36 Incubating30 Lab Projects8 retired projects in the “Attic”Over 1.2 million revisionsLeveraging Open Source Technologies to Enable Scientific Discovery10Over 10M successful requests served a day across the world
HTTPD web server used on 100+ million web sites (52+% of the market)OODT: An Open Source Framework for Building Distributed Science Data Mgmt EnvironmentsFocus on
distribute environments

Hughes RDAP11 Data Publication Repositories

  • 1.
    Leveraging Open SourceTechnologies to Enable Scientific Archiving and DiscoveryResearch Data Access & PreservationDenver, ColoradoMarch 31 - April 1, 2011Steve HughesDan CrichtonChris MattmannSean Kelly
  • 2.
    TopicsE-Science TrendsSoftware ArchitecturesOpenSourceObject-Oriented Data TechnologyUse CaseData Driven2Leveraging Open Source Technologies to Enable Scientific Discovery
  • 3.
    “eScience” TrendsHighly distributed,multi-organizational systemsSystems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environmentsSharing of data and services which allow for the discovery, access, and transformation of data Systems are moving towards publishing of services and data in order to address data and computationally-intensive problemsInfrastructures which are being built to handle future demandUse of commodity services to address elasticityAddress complex modeling, inter-disciplinary science and decision support needsNeed a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questionsNeed to ensure information architecture support the varying science needsChanging the way in which data analysis is performedMoving towards analysis of distributed data to increase the study powerEnabling greater collaboration across centersSystematizing, where possible3Leveraging Open Source Technologies to Enable Scientific Discovery
  • 4.
    Highly Distributed ScienceEnvironmentsLeveraging Open Source Technologies to Enable Scientific Discovery4Highly distributed/federatedCollaborativeInformation-centricDiscipline-specificGrowing/evolvingHeterogeneous (Implementations)
  • 5.
    Why Software Architecture?SoftwareArchitecture: The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. (ANSI/IEEE Std. 1471-2000)Architecture is about strategy to address key architectural concerns…How can we exploit common patterns to improve reuse?Can we develop software product lines?Can we improve interoperability?Can we reduce dependencies? What are the architectural principles..?: loosely-coupled, data-driven, highly distributed, commodity services, service oriented, collaborative/multi-institutional5Leveraging Open Source Technologies to Enable Scientific Discovery
  • 6.
    Notional Service ArchitecturesConcept6Leveraging Open Source Technologies to Enable Scientific DiscoveryClient BClient ACService Interface Service The service architecture concept exploits many of the architectural concepts discussed
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    At an enterprise-scale,architectures don’t need to prescribe what’s inside services….just their interfaces, function, behavior, etc…
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    What does thishave to do with open source?The identification of core software product lines and tools, that can be reused, are excellent examples of opportunities to create open source projectsAcross a federation of organizations, systems and users, what be developed and shared?How can software components be developed in generic ways, but allow for extensions?Open source itself is a strategyCan improve collaborations Can drive a robust set of reusable software components and toolsCan push standards developmentCan encourage use of common architectural patternsLeveraging Open Source Technologies to Enable Scientific Discovery7
  • 18.
    Open Source ModelsSoftwaresharing with an open source license (e.g, BSD-style license)Software distribution through open source organizations (e.g., SourceForge)Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation)Ad hoc open source project communities with their own governanceLeveraging Open Source Technologies to Enable Scientific Discovery8
  • 19.
    Open Source Models:Our OpinionSoftware sharing with an open source license (e.g, BSD-style license)It’s a great startLimited community involvementSoftware distribution through open source organizations (e.g., SourceForge)Provides good software distribution supportSoftware projects under the governance of an open source community/foundation (e.g., Apache Software Foundation)This moves from just distribution support to collaboration and governance over the developmentAd hoc open source project communities with their own governanceThis can make a lot of sense for larger federations…Leveraging Open Source Technologies to Enable Scientific Discovery9
  • 20.
    The Apache SoftwareFoundationLargest open sourcesoftware development entity in the worldOver 2300+ committersOver 3500+ contributors84 Top Level Projects36 Incubating30 Lab Projects8 retired projects in the “Attic”Over 1.2 million revisionsLeveraging Open Source Technologies to Enable Scientific Discovery10Over 10M successful requests served a day across the world
  • 21.
    HTTPD web serverused on 100+ million web sites (52+% of the market)OODT: An Open Source Framework for Building Distributed Science Data Mgmt EnvironmentsFocus on
  • 22.