The Terra Data Fusion Project aims to fuse data from NASA's Terra satellite's five instruments - ASTER, CERES, MISR, MODIS, and MOPITT - into a single product. This presents challenges due to the huge data volumes, different locations of input data, instrument granularities, data storage methods, and file formats. The project overcame these by using NCSA supercomputing facilities, converting files to HDF5, organizing data by instrument and granule, and adding metadata. The final fused data product contains over 1 million input files totaling 2.3 petabytes, reduced to 1 petabyte using HDF5 compression.
This document summarizes a presentation on best practices for creating Earth observation data products that follow the Hierarchical Data Format for Earth Observing System (HDF-EOS) standard. It provides guidance on including geo-location variables, using named dimensions, following Climate and Forecast Metadata conventions, testing products with various tools, and using the HDF Product Designer tool to help design and validate compliant products. The work aims to improve the usability of data products and reduce questions received through help desks.
The document discusses the Worldwide LHC Computing Grid (WLCG) which distributes and analyzes data from the Large Hadron Collider (LHC) experiments. It notes that the LHC generates 50 petabytes of data per year which is distributed across multiple computing tiers and sites in 42 countries. The WLCG integrates these computing centers into a single infrastructure for LHC physicists. Upcoming upgrades to the LHC will greatly increase data and computing needs, posing challenges that may require new computing models leveraging cloud resources.
Case Studies in advanced analytics with RWit Jakuczun
A talk I gave at SQLDay 2017:
About 1,5 years ago Microsoft finalised acquisition of Revolution Analytics – a provider of software and services for R. In my opinion this was one of the most important event for R community. Now it is crucial to present its capabilities to SQL Server community. It will be beneficial for both parties. I will present three case studies: cash optimisation in Deutsche Bank, midterm model for energy prices forecasting, workforce demand optimising. The case studies were implemented with our analytical workflow R Suite that will be also shortly presented.
HNSciCloud has shared with the IT experts in scientific computing belonging to the HEPiX forum, the status of the ongoing Pre-Commercial Procurement of innovative cloud services and what the expected results might be.
The document discusses taming the size and cardinality of OLAP data cubes over big data. It presents an overview of data warehouse systems and architectures, OLAP cubes, and decision support system benchmarks. It then introduces the TPC-H*d benchmark for evaluating multi-dimensional databases and the AutoMDB tool for automating multi-dimensional database design. Lastly, it discusses application scenarios for benchmarking data servers, multi-dimensional database schemas, and parallel OLAP servers.
Let’s talk about reproducible data analysisGreg Landrum
The document discusses common problems in data analysis such as ensuring repeatability and reproducibility, using multiple tools and data sources, enabling collaboration between users of different skill levels, deployment of models and results, and organization of work. It introduces KNIME as an open source platform that can help address these problems through its use of workflows to capture parameters, data, and analysis steps in a visual interface, allowing for interactive, reproducible, collaborative, deployable, and findable data analysis.
The Terra Data Fusion Project aims to fuse data from NASA's Terra satellite's five instruments - ASTER, CERES, MISR, MODIS, and MOPITT - into a single product. This presents challenges due to the huge data volumes, different locations of input data, instrument granularities, data storage methods, and file formats. The project overcame these by using NCSA supercomputing facilities, converting files to HDF5, organizing data by instrument and granule, and adding metadata. The final fused data product contains over 1 million input files totaling 2.3 petabytes, reduced to 1 petabyte using HDF5 compression.
This document summarizes a presentation on best practices for creating Earth observation data products that follow the Hierarchical Data Format for Earth Observing System (HDF-EOS) standard. It provides guidance on including geo-location variables, using named dimensions, following Climate and Forecast Metadata conventions, testing products with various tools, and using the HDF Product Designer tool to help design and validate compliant products. The work aims to improve the usability of data products and reduce questions received through help desks.
The document discusses the Worldwide LHC Computing Grid (WLCG) which distributes and analyzes data from the Large Hadron Collider (LHC) experiments. It notes that the LHC generates 50 petabytes of data per year which is distributed across multiple computing tiers and sites in 42 countries. The WLCG integrates these computing centers into a single infrastructure for LHC physicists. Upcoming upgrades to the LHC will greatly increase data and computing needs, posing challenges that may require new computing models leveraging cloud resources.
Case Studies in advanced analytics with RWit Jakuczun
A talk I gave at SQLDay 2017:
About 1,5 years ago Microsoft finalised acquisition of Revolution Analytics – a provider of software and services for R. In my opinion this was one of the most important event for R community. Now it is crucial to present its capabilities to SQL Server community. It will be beneficial for both parties. I will present three case studies: cash optimisation in Deutsche Bank, midterm model for energy prices forecasting, workforce demand optimising. The case studies were implemented with our analytical workflow R Suite that will be also shortly presented.
HNSciCloud has shared with the IT experts in scientific computing belonging to the HEPiX forum, the status of the ongoing Pre-Commercial Procurement of innovative cloud services and what the expected results might be.
The document discusses taming the size and cardinality of OLAP data cubes over big data. It presents an overview of data warehouse systems and architectures, OLAP cubes, and decision support system benchmarks. It then introduces the TPC-H*d benchmark for evaluating multi-dimensional databases and the AutoMDB tool for automating multi-dimensional database design. Lastly, it discusses application scenarios for benchmarking data servers, multi-dimensional database schemas, and parallel OLAP servers.
Let’s talk about reproducible data analysisGreg Landrum
The document discusses common problems in data analysis such as ensuring repeatability and reproducibility, using multiple tools and data sources, enabling collaboration between users of different skill levels, deployment of models and results, and organization of work. It introduces KNIME as an open source platform that can help address these problems through its use of workflows to capture parameters, data, and analysis steps in a visual interface, allowing for interactive, reproducible, collaborative, deployable, and findable data analysis.
The HDF Group provides updates on new features in HDF including faster compression, single writer/multiple reader file access, virtual datasets, and dynamically loaded filters. They also discuss tools like HDFView, nagg for data aggregation, and a new HDF5 ODBC driver. The work is supported by NASA.
This document discusses two HDF5-based file formats for storing Earth observation data:
1. The Sorted Pulse Data (SPD) format stores laser scanning data including pulses and point data with attributes. It was created in 2008 and updated to version 4 to improve flexibility.
2. The KEA image file format implements the GDAL raster data model in HDF5, allowing large raster datasets and attribute tables to be stored together with compression. It was created in 2012 to address limitations of other formats.
Both formats take advantage of HDF5 features like compression but also discuss some limitations and lessons learned for effectively designing scientific data formats.
The document discusses a project to improve long-term preservation of Earth Observing System (EOS) data by creating independent maps of HDF4 data objects. The project aims to map HDF4 files to allow simple readers to access data without relying on HDF software. It involves categorizing NASA HDF4 data, prototyping an XML mapping file format, and building tools to create maps and read data based on maps. The project will investigate integrating the mapping schema with standards, address HDF-EOS2 requirements, redesign the schema, implement production mapping and reading tools, and deploy the tools at NASA data centers.
This document discusses using the BDE (Big Data Europe) platform to demonstrate downscaling of climatic and meteorological data, data ingestion and export, analytics capabilities, and data lineage tracking. The SC5 Pilot project ingests NetCDF files, performs WRF-based downscaling on institutional resources as requested, and maintains lineage records. Basic analytics like climate change indices are performed using Hive queries. Further development may investigate NetCDF transformations for WRF and test data lineage and downscaling customization. The BDE is expected to provide scalability, efficient resource use, and data reproducibility. Hands-on access to a Jupyter notebook is provided.
Moving from Artisanal to Industrial Machine LearningGreg Landrum
This document summarizes Greg Landrum's presentation on moving machine learning from an artisanal to industrial process. The presentation discusses using the CRISP-DM process to build predictive models for bioactivity in a reproducible way. Two datasets with different numbers of active compounds are used to illustrate modeling workflows in KNIME. The models achieve good accuracy but poor kappa scores due to class imbalance. Adjusting the decision threshold for predictions is shown to improve kappa scores substantially. The artisanal approach of tuning thresholds is presented as a way to improve models for imbalanced data in an industrial setting.
Managing large (and small) R based solutions with R SuiteWit Jakuczun
The presentation I gave at DataMass Gdańsk Summit in 2017:
R is a great tool for data scientist. Being very dynamic and popular is now one of the most important technology on the market. Unfortunately out-of-the-box R is not suited for large scale applications. I will present R Suite that is an open-source solution developed by us for us to manage R development process.
The document discusses the National Snow and Ice Data Center's (NSIDC) use of HDF and HDF-EOS file formats to manage and distribute scientific cryosphere data. NSIDC collects data from satellites like MODIS, AMSR-E and aircraft missions, processes the data, and makes it available in HDF(5) formats. The HDF formats allow for efficient storage of multi-dimensional scientific datasets along with metadata. NSIDC develops tools to allow users to access, analyze and visualize data stored in HDF files.
The document summarizes changes at The HDF Group, including new staff members and their roles. It also outlines recent and upcoming HDF software releases, new operating system and compiler support, tools for HDF and netCDF interoperability, ongoing research projects involving parallel I/O and analysis, and potential projects of interest involving scientific domains like plasma physics, particle accelerators, and digital twin technology.
Optalysis: Disruptive Optical Processing Technology for HPCinside-BigData.com
In this video from the Disruptive Technologies Session at the 2015 HPC User Forum, Nick New from Optalysis describes the company's optical processing technology.
"Optalysys technology uses light, rather than electricity, to perform processor intensive mathematical functions (such as Fourier Transforms) in parallel at incredibly high-speeds and resolutions. It has the potential to provide multi-exascale levels of processing, powered from a standard mains supply. The mission is to deliver a solution that requires several orders of magnitude less power than traditional High Performance Computing (HPC) architectures."
Watch the video presentation: http://wp.me/p3RLHQ-ewz
This document discusses efforts to standardize data products from three NASA laser altimeter missions - ICESat, ICESat-2, and MABEL. It describes designing similar data products for all three missions to promote interoperability. Products are being developed for ICESat data using lessons from MABEL. Code is also being generated to help create products from specifications to reduce development time. The goal is to make the multi-rate point data from all three missions easily accessible and usable.
This document discusses a pilot project to incorporate ISO 19115-2 metadata attributes at the granule level for the NASA SWOT mission. The metadata will be stored in HDF5 groups and generated in two ways - via XML serialization or XML style sheet conversions. The project aims to capture essential metadata attributes from the SWOT information architecture and ISO metadata model, and generate an HDF5 structure specification and example metadata snippets.
In this RichReport slidecast, Dr. Nick New from Optalysys describes how the company's optical processing technology delivers accelerated performance for FFTs and Bioinformatics.
"Our prototype is on track to achieve game-changing improvements to process times over current methods whilst providing high levels of accuracy that are associated with the best software processes.”
Watch the video: https://wp.me/p3RLHQ-hn0
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this presentation, delivered at the DI4R 2016 Conference, Bob Jones of CERN traced the cloud procurement trails performed within the context of the Helix Nebula Initiative and the on-going Pre-Commercial Procurement action undertaken by a group of 10 of Europe’s leading publicly funded research organisations.
The Helix Nebula Science Cloud (HNSciCloud) project aims to procure innovative cloud computing services from commercial providers to integrate them with existing research infrastructure. Funded through the EU Horizon 2020 program, HNSciCloud is conducting a multi-phase procurement process with several contractors to design, prototype, and pilot cloud services that can meet the large-scale computing and storage needs of the scientific community in a cost-effective and seamless manner. Recent achievements include issuing a tender, selecting initial design phase contractors, and making progress on system specifications.
Bob Jones, PICSE & HNSciCloud Coordinator, was invited to present the PICSE results and the new HNSciCloud project in the "data infrastructure" session of the Open Science Conference on the 5th of April 2016.
ICESat-2 is a NASA satellite mission scheduled to launch in December 2017 that will use photon counting laser altimetry to measure ice sheet and sea ice elevations. It will carry an advanced laser altimeter that splits each laser pulse into 6 beams in a cross-track pattern to provide dense sampling. This will allow for improved elevation estimates over rough terrain. The document discusses ICESat-2's science objectives, measurement concept, data products, processing workflow, and approach to managing metadata across different levels of data products.
Know your R usage workflow to handle reproducibility challengesWit Jakuczun
R is used in a vast ways. From pure ad-hoc by hobbysts to an organized and structured way in an enterprise. Each way of R usage brings different reproducibility challenges. Going through range of typical workflows we will show that understanding reproducibility must start with understanding your workflow. Presenting workflows we will show how we deal reproducibiilty challenges with open-source R Suite (http://rsuite.io) solution developed by us to support our large scale R development.
The HDF Group presented on their support for the National Polar-orbiting Partnership/Joint Polar Satellite System (NPP/JPSS) program. Their goals included providing HDF5 support for distributing data from the Visible Infrared Imaging Radiometer Suite (VIIRS) and other sensors. They outlined priorities for testing software on critical platforms, developing tools to access and manage NPP/JPSS products, and providing rapid support. The presentation described software released or under development like h5edit, h5augjpss, and nagg, an aggregation tool.
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
This document summarizes a presentation about the three-year ExtremeEarth project. It discusses the ExtremeEarth platform architecture, which brings together Earth observation data access from DIASes, end-user products from TEPs, and scalable AI capabilities from Hopsworks. The architecture provides infrastructure on Creodias and uses Hopsworks to develop end-to-end machine learning pipelines for processing petabytes of Earth observation data. Results have been exploited through additional research projects and a product offering on Hopsworks.ai. The project has also led to several publications and blog posts about applying AI to Earth observation data.
Alexander Fölling, Christian Grimme,
Joachim Lepping, and Alexander Papaspyrou: The Gain of Resource Delegation in Distributed Computing Environments
15th Workshop on Job Scheduling for Parallel Processing ; April 23, 2010 - Atlanta, GA, USA
Overview of Hydrogen TCP, Task 41. Introduce discussion points from the hydro...IEA-ETSAP
This document provides an overview of the IEA Hydrogen TCP Task 41, which aims to improve hydrogen modeling and collaboration with the ETSAP community. It has four subtasks: a) consolidating hydrogen technology data, b) developing knowledge on modeling hydrogen in energy systems, c) collaboration with IEA analysts and ETSAP, and d) providing updated parameters for hydrogen technologies. The task will provide a database, examine modeling approaches, and establish closer collaboration to represent hydrogen technologies and value chains more accurately in energy system models. It seeks to understand ideal modeling tools and represent interconnectivity while focusing on tools like TIMES.
ESCAPE Kick-off meeting - HL-LHC ESFRI Landmark (Feb 2019)ESCAPE EU
The document discusses the European Organization for Nuclear Research (CERN) and its role in particle physics research. It provides background on CERN's founding, membership, budget, and scientific goals. It also summarizes CERN's Large Hadron Collider project and the Worldwide LHC Computing Grid consortium for data distribution and analysis. Finally, it discusses plans for the High-Luminosity LHC upgrade and associated computing challenges.
The HDF Group provides updates on new features in HDF including faster compression, single writer/multiple reader file access, virtual datasets, and dynamically loaded filters. They also discuss tools like HDFView, nagg for data aggregation, and a new HDF5 ODBC driver. The work is supported by NASA.
This document discusses two HDF5-based file formats for storing Earth observation data:
1. The Sorted Pulse Data (SPD) format stores laser scanning data including pulses and point data with attributes. It was created in 2008 and updated to version 4 to improve flexibility.
2. The KEA image file format implements the GDAL raster data model in HDF5, allowing large raster datasets and attribute tables to be stored together with compression. It was created in 2012 to address limitations of other formats.
Both formats take advantage of HDF5 features like compression but also discuss some limitations and lessons learned for effectively designing scientific data formats.
The document discusses a project to improve long-term preservation of Earth Observing System (EOS) data by creating independent maps of HDF4 data objects. The project aims to map HDF4 files to allow simple readers to access data without relying on HDF software. It involves categorizing NASA HDF4 data, prototyping an XML mapping file format, and building tools to create maps and read data based on maps. The project will investigate integrating the mapping schema with standards, address HDF-EOS2 requirements, redesign the schema, implement production mapping and reading tools, and deploy the tools at NASA data centers.
This document discusses using the BDE (Big Data Europe) platform to demonstrate downscaling of climatic and meteorological data, data ingestion and export, analytics capabilities, and data lineage tracking. The SC5 Pilot project ingests NetCDF files, performs WRF-based downscaling on institutional resources as requested, and maintains lineage records. Basic analytics like climate change indices are performed using Hive queries. Further development may investigate NetCDF transformations for WRF and test data lineage and downscaling customization. The BDE is expected to provide scalability, efficient resource use, and data reproducibility. Hands-on access to a Jupyter notebook is provided.
Moving from Artisanal to Industrial Machine LearningGreg Landrum
This document summarizes Greg Landrum's presentation on moving machine learning from an artisanal to industrial process. The presentation discusses using the CRISP-DM process to build predictive models for bioactivity in a reproducible way. Two datasets with different numbers of active compounds are used to illustrate modeling workflows in KNIME. The models achieve good accuracy but poor kappa scores due to class imbalance. Adjusting the decision threshold for predictions is shown to improve kappa scores substantially. The artisanal approach of tuning thresholds is presented as a way to improve models for imbalanced data in an industrial setting.
Managing large (and small) R based solutions with R SuiteWit Jakuczun
The presentation I gave at DataMass Gdańsk Summit in 2017:
R is a great tool for data scientist. Being very dynamic and popular is now one of the most important technology on the market. Unfortunately out-of-the-box R is not suited for large scale applications. I will present R Suite that is an open-source solution developed by us for us to manage R development process.
The document discusses the National Snow and Ice Data Center's (NSIDC) use of HDF and HDF-EOS file formats to manage and distribute scientific cryosphere data. NSIDC collects data from satellites like MODIS, AMSR-E and aircraft missions, processes the data, and makes it available in HDF(5) formats. The HDF formats allow for efficient storage of multi-dimensional scientific datasets along with metadata. NSIDC develops tools to allow users to access, analyze and visualize data stored in HDF files.
The document summarizes changes at The HDF Group, including new staff members and their roles. It also outlines recent and upcoming HDF software releases, new operating system and compiler support, tools for HDF and netCDF interoperability, ongoing research projects involving parallel I/O and analysis, and potential projects of interest involving scientific domains like plasma physics, particle accelerators, and digital twin technology.
Optalysis: Disruptive Optical Processing Technology for HPCinside-BigData.com
In this video from the Disruptive Technologies Session at the 2015 HPC User Forum, Nick New from Optalysis describes the company's optical processing technology.
"Optalysys technology uses light, rather than electricity, to perform processor intensive mathematical functions (such as Fourier Transforms) in parallel at incredibly high-speeds and resolutions. It has the potential to provide multi-exascale levels of processing, powered from a standard mains supply. The mission is to deliver a solution that requires several orders of magnitude less power than traditional High Performance Computing (HPC) architectures."
Watch the video presentation: http://wp.me/p3RLHQ-ewz
This document discusses efforts to standardize data products from three NASA laser altimeter missions - ICESat, ICESat-2, and MABEL. It describes designing similar data products for all three missions to promote interoperability. Products are being developed for ICESat data using lessons from MABEL. Code is also being generated to help create products from specifications to reduce development time. The goal is to make the multi-rate point data from all three missions easily accessible and usable.
This document discusses a pilot project to incorporate ISO 19115-2 metadata attributes at the granule level for the NASA SWOT mission. The metadata will be stored in HDF5 groups and generated in two ways - via XML serialization or XML style sheet conversions. The project aims to capture essential metadata attributes from the SWOT information architecture and ISO metadata model, and generate an HDF5 structure specification and example metadata snippets.
In this RichReport slidecast, Dr. Nick New from Optalysys describes how the company's optical processing technology delivers accelerated performance for FFTs and Bioinformatics.
"Our prototype is on track to achieve game-changing improvements to process times over current methods whilst providing high levels of accuracy that are associated with the best software processes.”
Watch the video: https://wp.me/p3RLHQ-hn0
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this presentation, delivered at the DI4R 2016 Conference, Bob Jones of CERN traced the cloud procurement trails performed within the context of the Helix Nebula Initiative and the on-going Pre-Commercial Procurement action undertaken by a group of 10 of Europe’s leading publicly funded research organisations.
The Helix Nebula Science Cloud (HNSciCloud) project aims to procure innovative cloud computing services from commercial providers to integrate them with existing research infrastructure. Funded through the EU Horizon 2020 program, HNSciCloud is conducting a multi-phase procurement process with several contractors to design, prototype, and pilot cloud services that can meet the large-scale computing and storage needs of the scientific community in a cost-effective and seamless manner. Recent achievements include issuing a tender, selecting initial design phase contractors, and making progress on system specifications.
Bob Jones, PICSE & HNSciCloud Coordinator, was invited to present the PICSE results and the new HNSciCloud project in the "data infrastructure" session of the Open Science Conference on the 5th of April 2016.
ICESat-2 is a NASA satellite mission scheduled to launch in December 2017 that will use photon counting laser altimetry to measure ice sheet and sea ice elevations. It will carry an advanced laser altimeter that splits each laser pulse into 6 beams in a cross-track pattern to provide dense sampling. This will allow for improved elevation estimates over rough terrain. The document discusses ICESat-2's science objectives, measurement concept, data products, processing workflow, and approach to managing metadata across different levels of data products.
Know your R usage workflow to handle reproducibility challengesWit Jakuczun
R is used in a vast ways. From pure ad-hoc by hobbysts to an organized and structured way in an enterprise. Each way of R usage brings different reproducibility challenges. Going through range of typical workflows we will show that understanding reproducibility must start with understanding your workflow. Presenting workflows we will show how we deal reproducibiilty challenges with open-source R Suite (http://rsuite.io) solution developed by us to support our large scale R development.
The HDF Group presented on their support for the National Polar-orbiting Partnership/Joint Polar Satellite System (NPP/JPSS) program. Their goals included providing HDF5 support for distributing data from the Visible Infrared Imaging Radiometer Suite (VIIRS) and other sensors. They outlined priorities for testing software on critical platforms, developing tools to access and manage NPP/JPSS products, and providing rapid support. The presentation described software released or under development like h5edit, h5augjpss, and nagg, an aggregation tool.
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
This document summarizes a presentation about the three-year ExtremeEarth project. It discusses the ExtremeEarth platform architecture, which brings together Earth observation data access from DIASes, end-user products from TEPs, and scalable AI capabilities from Hopsworks. The architecture provides infrastructure on Creodias and uses Hopsworks to develop end-to-end machine learning pipelines for processing petabytes of Earth observation data. Results have been exploited through additional research projects and a product offering on Hopsworks.ai. The project has also led to several publications and blog posts about applying AI to Earth observation data.
Alexander Fölling, Christian Grimme,
Joachim Lepping, and Alexander Papaspyrou: The Gain of Resource Delegation in Distributed Computing Environments
15th Workshop on Job Scheduling for Parallel Processing ; April 23, 2010 - Atlanta, GA, USA
Overview of Hydrogen TCP, Task 41. Introduce discussion points from the hydro...IEA-ETSAP
This document provides an overview of the IEA Hydrogen TCP Task 41, which aims to improve hydrogen modeling and collaboration with the ETSAP community. It has four subtasks: a) consolidating hydrogen technology data, b) developing knowledge on modeling hydrogen in energy systems, c) collaboration with IEA analysts and ETSAP, and d) providing updated parameters for hydrogen technologies. The task will provide a database, examine modeling approaches, and establish closer collaboration to represent hydrogen technologies and value chains more accurately in energy system models. It seeks to understand ideal modeling tools and represent interconnectivity while focusing on tools like TIMES.
ESCAPE Kick-off meeting - HL-LHC ESFRI Landmark (Feb 2019)ESCAPE EU
The document discusses the European Organization for Nuclear Research (CERN) and its role in particle physics research. It provides background on CERN's founding, membership, budget, and scientific goals. It also summarizes CERN's Large Hadron Collider project and the Worldwide LHC Computing Grid consortium for data distribution and analysis. Finally, it discusses plans for the High-Luminosity LHC upgrade and associated computing challenges.
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...rodrickmero
Tanzanian Higher Learning Institutions (HLIs) are facing challenges in providing the necessary Information Technology (IT) support for education, research and development activities. Currently, HLIs use traditional computing (TC) which has proven to be uneconomical in terms of maintenance, software purchase costs, huge power consumption and staffing.
Cloud computing (CC) is the way forward for HLIs in solving the computing challenges. However, the HLIs policies regarding security of critical data in CC environment prevent adoption of CC services from existing vendors. The reliable and secure way is to establish and operate CC data centers dedicated to HLIs critical data and services. Owning and operating the traditional data centers is a challenge to HLIs because it consumes huge amounts of power. Tanzania like other developing countries has a low level of electrification, while the need for electric power consumption is increasing year after year. The need to consider energy efficient approaches in data center operation is very important for reducing both the operation costs and carbon footprint to the environment.
Therefore, this thesis presents the eco-efficient cloud computing framework that integrates renewable and non-renewable power sources, and free cooling in reducing carbon emission and power consumption in HLIT cloud data centers.
To develop the framework, we conducted a study in Tanzania HLIs to explore the current situation and cloud computing requirements. Interview, Observation, and document review were data collection method used by the study. After analysis of the results, we defined guidelines for developing CC building blocks. We used CloudSim tool kit and Netbin IDE to develop and to simulate eco-efficient framework.
At the end, eco-efficient framework has shown improvement on power consumption, efficiency and carbon emission. Therefore, eco-efficient approaches give HLIs of Tanzania sustainable solution to their computing needs by significantly reducing operating costs. Moreover, it ensures environment protection for the benefit of current and future generations.
CloudLightning - Project and Architecture OverviewCloudLightning
This is a PowerPoint presentation delivered by Prof John Morrison (UCC) on 9 December 2016 at the IC4 and Host in Ireland Workshop: Data Centres in Ireland.
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
Overview of what has happened in HNSciCloud over the last five months, Delivered by Helge Meinhard of CERN at the HEPiX Workshop on October 21st, 2016, in Berkeley, California, USA.
This document summarizes a proof-of-concept (POC) using Apache Storm for real-time energy data analytics. The POC evaluated Storm's capabilities for processing time series data and developing key performance indicators and forecasts. Simulated smart meter data was streamed through Storm topologies that performed simple and complex aggregations, scoring, and next-day forecasting using generalized additive models. The POC found Storm easy to use but noted challenges with debugging R code within topologies and potential bottlenecks from R computations. Overall, the POC demonstrated Storm's viability for real-time energy analytics applications.
CERN operates the Large Hadron Collider (LHC) and other particle physics experiments that generate enormous amounts of data. To handle this data, CERN uses a hybrid cloud model combining on-premise resources with public cloud services. CERN is participating in a joint procurement project called HNSciCloud to procure cloud services that can seamlessly integrate with CERN's internal resources and European research networks. The goal is to establish a hybrid cloud platform to meet the growing computing needs of particle physics and other research domains dealing with large datasets.
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
This talk supports the Ph.D. in Computational & Data Enabled Science & Engineering at Jackson State University. It describes related educational activities at Indiana University, the Big Data phenomena, jobs and HPC and Big Data computations. It then describes how HPC and Big Data can be converged into a single theme.
The document provides a summary of Guy Tel-Zur's experience at the SC10 supercomputing conference. It outlines the various talks, panels, and presentations Tel-Zur attended over the course of the conference related to topics like computational physics, GPU computing, climate modeling, earthquake simulations, and the future of high performance computing. It also mentions visiting the exhibition and learning about technologies like Eclipse PTP, Elastic-R, Python for scientific computing, Amazon Cluster GPU instances, and the Top500 list of supercomputers.
This is a presentation by Prof. Anne Elster at the International Workshop on Open Source Supercomputing held in conjunction with the 2017 ISC High Performance Computing Conference.
Hybrid Cloud for CERN
Experience with Open Telecom Cloud and OpenStack
1) CERN uses a hybrid cloud approach combining on-premise resources with public clouds like the Open Telecom Cloud to address its increasing computing needs for projects like the LHC.
2) A pilot project using the Open Telecom Cloud was successful but identified some issues around networking and storage.
3) CERN is now participating in the HNSciCloud project to jointly procure innovative hybrid cloud services that fully integrate commercial clouds with in-house and European e-infrastructure resources.
The document provides an overview of research data management for the School of Engineering at the University of Lincoln. It discusses the benefits of research data management, including increased transparency, collaboration, and opportunities for new research. It also outlines some of the support and requirements for research data management from funders and institutions.
A Spatio-temporal Optimization Model for the Analysis of Future Energy System...IEA-ETSAP
The document discusses power-to-hydrogen pathways and system analysis for future energy systems. It describes the following:
1) The IEA HIA Task 38, which aims to provide comprehensive analysis of technical, economic, legal and regulatory conditions for power-to-hydrogen and hydrogen-to-X applications.
2) The Department of Process and System Analysis (VSA) at Forschungszentrum Jülich, which conducts research on renewable energies, storage infrastructures, transport, industry and residential sectors.
3) Simulation, time series aggregation and optimization methods used by VSA to assess energy systems with power-to-hydrogen pathways.
Computational science and engineering (CSE) utilizes high performance computing, large-scale simulations, and scientific applications to enable data-driven discovery. The speaker discusses CSE initiatives at UC Berkeley and Lawrence Berkeley National Lab focused on areas like health, freshwater, food security, ecosystems, and urban metabolism using exascale computing and big data analytics.
CERN uses cloud computing and virtualization to manage its large computing infrastructure needed for particle physics experiments like the Large Hadron Collider. Five years ago CERN transitioned to using open source tools like OpenStack, Puppet, and Ceph to automate management of its infrastructure across two data centers and improve agility, efficiency, and sustainability. This has enabled CERN to scale its cloud from managing a few thousand servers to over 70,000 virtual machines and 9,000 hypervisors while maintaining high performance and responding rapidly to security issues like Meltdown.
CERN operates the largest particle physics laboratory in the world. It manages over 8,000 servers to support its research. In 2012, CERN recognized limits with its existing infrastructure management tools and formed a team to define a new "Agile Infrastructure Project." The project goals were to improve resource provisioning time, enable cloud interfaces, improve monitoring and accounting, and boost efficiency. The team adopted open source tools like OpenStack, Puppet, and Ceph to create a new cloud service spanning two data centers. This allowed on-demand provisioning in minutes versus months and helped CERN better support its expanding computing needs for research.
In this deck from the HPC User Forum at Argonne, Thomas Schulthess presents: An Update on CSCS.
"CSCS develops and operates cutting-edge high-performance computing systems as an essential service facility for Swiss researchers. These computing systems are used by scientists for a diverse range of purposes – from high-resolution simulations to the analysis of complex data. Simulation is considered the third pillar of science, alongside theory and experimentation, and scientists from an increasing number of disciplines are using high-performance computing simulation for their research. For example, supercomputers are used to model new materials with hitherto unknown properties. Climate modelling and weather forecasting would be impossible without them. In social science, simulations can help prevent mass panic by predicting people’s behavior. In medicine, computer simulations aid diagnostics help improve treatment methods. Moreover, they can facilitate risk assessments for natural hazards such as earthquakes and the tsunamis they might trigger.
CSCS has a strong track record in supporting the processing, analysis and storage of scientific data, and is investing heavily in new tools and computing systems to support data science applications. For more than a decade, CSCS has been involved in the analysis of the many petabytes of data produced by scientific instruments such as the Large Hadron Collider (LHC) at CERN. Supporting scientists in extracting knowledge from structured and unstructured data is a key priority for CSCS."
Learn more: https://www.cscs.ch/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses accelerating science discovery with AI inference-as-a-service. It describes showcases using this approach for high energy physics and gravitational wave experiments. It outlines the vision of the A3D3 institute to unite domain scientists, computer scientists, and engineers to achieve real-time AI and transform science. Examples are provided of using AI inference-as-a-service to accelerate workflows for CMS, ProtoDUNE, LIGO, and other experiments.
Similar to This Helix Nebula Science Cloud Pilot Phase Open Session (20)
The document discusses the Helix Nebula Science Cloud procurement project. It provides updates on:
- Ramping up computing and storage resources for the project over 2018.
- Testing and consolidating the approach across procurers to provide shared resources for large-scale tests.
- Upcoming events where the project will demonstrate resources and tools.
- Two proposed use cases, PanCancer and ALICE, detailing their computing, storage and network requirements.
- Introducing vouchers as a means for procurers to provide short-term access to resources for additional users.
The document provides a status report on testing the Helix Nebula Science Cloud for interactive data analysis by end users of the TOTEM experiment. It summarizes the deployment of a "Science Box" platform on the Helix Nebula Cloud using technologies like EOS, CERNBox, SWAN and SPARK. Initial tests of the platform were successful in 2017 using a single VM. Current tests involve a scalable deployment with Kubernetes and using SPARK as the computing engine. Synthetic benchmarks and a TOTEM data analysis example show the platform is functioning well with room to scale out storage and computing resources for larger datasets and analyses.
This document discusses container federation use cases. It describes motivations for federation like handling periodic load spikes and simplifying management. It also outlines some key components of federation like uniform APIs, replication, and load balancing. Specific examples are given like using federation to connect a Condor scheduler across clusters for matchmaking and resource sharing. Federating Kubernetes clusters can provide a unified way to deploy containers and workloads across multiple datacenters and clouds while maintaining a common API and operational model.
CERN provides 230,000 computing cores via its batch service using HTCondor. The batch service runs jobs on both local resources and public clouds. CERN uses Docker and Terraform to integrate cloud resources like those from T-Systems and Exoscale into its HTCondor pool. HTCondor is configured to route jobs to specific clouds. Challenges include network address translation and inconsistent resource sizes. Consolidating WLCG workload into a shared tenant across cloud providers could simplify job submission for experiments.
The document discusses LHCb's use of RHEA and T-Systems cloud resources for Monte Carlo simulation jobs during the HNSciCloud pilot phase. Key points: LHCb only used the CPU resources for these jobs and required managing RHEA and T-Systems as independent sites. Jobs were submitted via HTCondor CEs to provide flexibility and reduce overhead. On RHEA, LHCb had varying levels of resources and better success rates than on T-Systems which also had high failure rates due to lost heartbeats.
Helix Nebula Science Cloud is providing computing resources for the ALICE experiment through two virtual sites - Capella and Regulus. Jobs submitted to these sites are directed to T-Systems or RHEA cloud resources. Since starting in April 2018, ALICE has been able to run Monte Carlo simulations on the cloud resources, though reconstruction jobs saw high failure rates until being restricted to simulations in May. While efficiencies still need measurement for other job types, the cloud resources have shown efficiencies comparable to CERN sites for simulations. Further optimization is needed to address expired jobs and improve performance of I/O-intensive workloads.
The document provides an overview of the Helix Nebula Science Cloud pilot platform. It describes the consortium partners that provide different components of the platform, including Exoscale for IaaS cloud services, Onedata for data management, and Nuvla for authentication and orchestration. It outlines challenges around user access, data throughput, and application support that the platform aims to address. Finally, it previews upcoming features and demonstrations of the platform capabilities.
The document discusses the Helix Nebula Science Cloud (HNSciCloud) project. Key points:
- HNSciCloud is a joint procurement by 10 research institutions to provide a common cloud platform for the European research community.
- It aims to increase computing and storage capabilities for scientific data analysis through a hybrid cloud model combining procurers' data centers, commercial cloud providers, and high-speed networks.
- The project has progressed through different phases including requirements analysis, prototype development, and pilots with two commercial providers to test cloud services for high-energy physics and other research workflows.
This presentation at CERN during the IT Technical Forum on 24 Nov 2017 highlighted the achievement of the Up2University Project (https://up2university.eu/, funded under the EC Call ICT-22-2016: Technologies for Learning and Skills), which aims at bridging the gap between secondary schools, higher education, and the research domain adopting learning technology and methodology to let high school students use the very same tools & services used by real researchers doing Big Science at CERN.
In order to provide concrete example of CERN core technologies running in containers, the Up2U cloud based education services have been ported to the HNSciCloud prototype systems provided by T-Systems and IBM.
TNC17, its 33rd edition, was held in the UNESCO City of Media Arts of Linz, Austria between 29 May - 2 June 2017.
The 30th of May, Edoardo Martinelli, CERN, gave a presentation (available below) entitled "Network experiences with Public Cloud Services" during the session "Connecting to cloud providers". This session was meant to present the current state of networking for cloud services, to lay out important challenges in the area, and to point to future work. The session was also intended to stimulate and facilitate discussion in the community.
Edoardo Martinelli presented with the aim of highlight the experience of a large institution (CERN) in procuring large-scale cloud resources on behalf of a scientific community, and the results from initial deployments of such services. The presentation has given details of networking challenges, how these have been overcome, and the performance obtained. Based on these experiences, recommendations for networks in support of cloud resources have been given. The presentation was included 2017 experience from the HNsciCloud project.
The document discusses the objectives and timeline of the second Expert Group on the European Open Science Cloud (EOSC). The group aims to advise on practical implementation of EOSC by end of 2018, including governance and financing. Key dates include publishing an interim report in March 2018 and a final report in December 2018. The document also provides examples of EOSC in practice through the Helix Nebula Science Cloud and outlines some priorities for EOSC, including interoperability, capacity building, and analyzing national funding streams.
This document discusses public procurement of innovation (PCP, PPI) in Europe. It notes that public procurement can help drive company growth by providing first buyers for innovators. Studies show companies involved in PPI are more likely to win other public contracts. The document provides statistics on participation in PCP/PPI, including high levels of SME participation and cross-border growth. It outlines European Commission funding and focus areas for PCP and PPI from 2018-2020. The rest of the document discusses policy context around the digital single market, European Cloud Initiative, and other DSM areas like digital health and e-government.
This document summarizes a meeting for the Early Adopter Group for the Helix Nebula Science Cloud Pilot Phase Award Ceremony. It discusses expanding access and deployments during the pilot phase. Key points include open access to resulting cloud services, evaluating projected results and a roadmap for the European Open Science Cloud. It also describes growing the buyers group to include more research organizations as early adopters and conditions for organizations to access the services. A list of upcoming events for demonstrations and training is provided.
The document outlines the objectives and work plan for WP5 of the Helix Nebula Science Cloud project, which marks the start of the Pilot Phase. Key points include:
- Conducting extended tests of scalability, robustness, functionality and security on the prototype platforms.
- Gradually expanding prototype platforms to larger pilot platforms and executing application workloads and tests.
- Giving selected end-users access to the pilot platforms to deploy applications and provide feedback.
- Conducting reviews at the end of April and May to evaluate progress before completing WP5.
The document summarizes a presentation about piloting a hybrid cloud solution for science. It discusses ramping up the pilot phase, commercializing the solution by extending the service portfolio and security certifications, and addressing challenges. A key focus is providing flexible, secure access to distributed data across public and private clouds for open innovation, science and research organizations. Example use cases are processing large datasets for CERN and providing cloud services for the Copernicus program.
The document summarizes a pilot project to create a hybrid multi-cloud platform for scientific data analysis called the Helix Nebula Science Cloud. It will bring together various cloud providers and data management solutions to offer a flexible, scalable, and secure platform for scientific workloads. The goal is to lay the foundations for the future European Open Science Cloud by delivering an open and commercially viable solution through this pilot phase.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
3. The Mission of CERN
Push back the frontiers of knowledge
E.g. the secrets of the Big Bang …what was the matter like
within the first moments of the Universe’s existence?
Develop new technologies for
accelerators and detectors
Information technology - the Web and the GRID
Medicine - diagnosis and therapy
Train scientists and engineers of
tomorrow
Unite people from different countries
and cultures
7. DocumentClassification:Public
Run 3 Planning (2021-2023):
Similar to 2018
• If the experiments luminosity level at a higher pile-up and for longer
• Potentially higher average pileup
• Non-linear increase in CPU time
• Possibly less time between fills – more live time
• Overall the best estimate is 30% (50% conservatively) more resources needed than in 2018
• But we have not seen 2018 yet
• For 2021: 1st year after LS2, could be only half-year live time but ramp up to optimal
conditions rapidly
• Unknown:
• Still need plans for experiment trigger rates
• And plans for luminosity levelling
Frédéric Hemmer, 14.6.2018 HNSciCloud Open Session 7
9. DocumentClassification:Public
However …
• ALICE and LHCb are upgrading during LS2, so the expectations of their
needs do not follow the assumptions in the previous slides:
• LHCb:
• luminosity and pileup increase by factor 5.
• Major changes in computing model result in higher trigger rate and HLT output bandwidth.
• LHCC milestone for computing model in Q3/2018, together with engineering TDR – currently
under review
• ALICE:
• Factor 100 increase in readout rate (50 kHz)
• Data volume increase mitigated by online reconstruction and raw data compression in new
O2 facility
• O2 TDR is approved; summary needs are:
• Increases in 2021 wrt 2018: CPU: 48%, disk: 74%, tape 90%
Frédéric Hemmer, 14.6.2018 HNSciCloud Open Session 9
10. DocumentClassification:Public
Scale of data tomorrow …
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
Run 1 Run 2 Run 3 Run 4
CMS
ATLAS
ALICE
LHCb
Frédéric Hemmer, 14.6.2018 HNSciCloud Open Session 10
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
Data: ~25 PB/year 400 PB/year Compute: Growth > x50
10 Year Horizon
What we think is
affordable unless we do
something differently
2010 2015 2021 2024 2010 2015 2021 2024
Run 1 LS1 Run 2 LS2 Run 3 LS3 HL-LHC
…
FCC?
2009 2013 2014 2015 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …20252016
Technology revolutions are needed
11. DocumentClassification:Public
The WLCG Strategy Document
• The HL-LHC computing challenge: provide the computing capacity needed
for the LHC physics program, managing the cost
• The WLCG strategy document is a specific view of the CWP, prioritizing
R&Ds relevant to the HL-LHC computing challenge
• The prototyped solutions will be the foundation of the WLCG TDR for HL-
LHC, planned for 2020. Timing to be re-considered?
• This is a presentation of the content of the strategy document
• http://cern.ch/go/Tg79
Frédéric Hemmer,
14.6.2018
HNSciCloud Open Session 11
12. DocumentClassification:Public
WLCG Strategy - Outline
The strategy develops around five
main themes …
1. Software performance
2. Algorithmic improvements /
changes (e.g. generators, fast MC,
reconstruction)
3. Reduction of data volumes
4. Managing operations cost
5. Optimizing hardware costs
It defines an R&D program with rough
timelines, organized in sections:
• The HL-LHC challenge, hardware
trends and a cost model
• Computing Models
• Experiments Software
• System Performance and Efficiency
• Data and Processing Infrastructures
• Sustainability
• Data Preservation and Reuse
HNSciCloud Open Session 12Frédéric Hemmer, 14.6.2018
The goal is to demonstrate to the funding agencies that we are in control of the HL-LHC cost, while
exploiting the full potential of the physics program
14. Sharing Open Science Services
EOSC Summit, Brussels 11th June 2018
14
Many research disciplines
15. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018
15
Business models
15
Data Controller vs.
Data Processor
For long tail of science,
new & exploratory usage,
SLA breach compensation
voucher
Need to repatriate data
16. Thank you for your attention
"The task of the mind is to produce future"
Paul Valéry