From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
This document discusses the concept of a Science DMZ, which consists of three key components: 1) a dedicated "friction-free" network path with high-performance networking devices located near the site perimeter to facilitate science data transfer, 2) dedicated high-performance data transfer nodes optimized for data transfer tools, and 3) a performance measurement/test node. It contrasts this approach with the typical ad-hoc deployment of a data transfer node wherever space allows, which often fails to provide necessary performance. Details of an example Science DMZ deployment at Lawrence Berkeley National Laboratory are provided.
Shared services - the future of HPC and big data facilities for UK researchMartin Hamilton
Slides from Jisc panel session at HPC & Big Data 2016 with contributions from the Francis Crick Institute, QMUL and King's College London covering their use of the Jisc shared data centre and the eMedLab project
Goonhilly Earth Station played a key role in the development of the Internet. It was involved in the first demonstration of packet radio networking across the Atlantic in 1977. Goonhilly is now exploring ways to extend Internet connectivity to space, such as by developing disruption tolerant networking to enable an interplanetary Internet and supporting private lunar missions. Goonhilly also aims to diversify its business by offering commercial satellite services and partnering with universities on radio astronomy research.
Chair: Shirley Wood, training and support director, Jisc.
Welcome to Networkshop45
Speaker: Professor Edward Peck, Nottingham Trent University.
Janet update
Speakers:
Rolly Trice, deputy network operations director, Jisc
Steve Kennett, security director, Jisc
Machine learning for network security
Speaker: Miranda Mowbray.
In this presentation from the Dell booth at SC13, Joseph Antony from NCI describes how they are using HPC Virtualization to meet user needs.
Watch the video presentation: http://insidehpc.com/2013/12/05/panel-discussion-thought-hpc-virtualization-never-going-happen/
This document summarizes a presentation on supporting data intensive applications. It discusses the Janet end-to-end performance initiative which aims to engage with data intensive research communities to help optimize performance. Some key points include:
- Seeing increasing data intensive science applications and remote computation scenarios requiring high bandwidth.
- Importance of understanding researcher requirements and setting expectations on practical throughput limits.
- Using perfSONAR to measure network characteristics and identify performance issues between sites on the Janet network.
- Adopting the "Science DMZ" model of separating research and campus traffic to avoid bottlenecks and optimize data transfer performance.
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
This document discusses the concept of a Science DMZ, which consists of three key components: 1) a dedicated "friction-free" network path with high-performance networking devices located near the site perimeter to facilitate science data transfer, 2) dedicated high-performance data transfer nodes optimized for data transfer tools, and 3) a performance measurement/test node. It contrasts this approach with the typical ad-hoc deployment of a data transfer node wherever space allows, which often fails to provide necessary performance. Details of an example Science DMZ deployment at Lawrence Berkeley National Laboratory are provided.
Shared services - the future of HPC and big data facilities for UK researchMartin Hamilton
Slides from Jisc panel session at HPC & Big Data 2016 with contributions from the Francis Crick Institute, QMUL and King's College London covering their use of the Jisc shared data centre and the eMedLab project
Goonhilly Earth Station played a key role in the development of the Internet. It was involved in the first demonstration of packet radio networking across the Atlantic in 1977. Goonhilly is now exploring ways to extend Internet connectivity to space, such as by developing disruption tolerant networking to enable an interplanetary Internet and supporting private lunar missions. Goonhilly also aims to diversify its business by offering commercial satellite services and partnering with universities on radio astronomy research.
Chair: Shirley Wood, training and support director, Jisc.
Welcome to Networkshop45
Speaker: Professor Edward Peck, Nottingham Trent University.
Janet update
Speakers:
Rolly Trice, deputy network operations director, Jisc
Steve Kennett, security director, Jisc
Machine learning for network security
Speaker: Miranda Mowbray.
In this presentation from the Dell booth at SC13, Joseph Antony from NCI describes how they are using HPC Virtualization to meet user needs.
Watch the video presentation: http://insidehpc.com/2013/12/05/panel-discussion-thought-hpc-virtualization-never-going-happen/
This document summarizes a presentation on supporting data intensive applications. It discusses the Janet end-to-end performance initiative which aims to engage with data intensive research communities to help optimize performance. Some key points include:
- Seeing increasing data intensive science applications and remote computation scenarios requiring high bandwidth.
- Importance of understanding researcher requirements and setting expectations on practical throughput limits.
- Using perfSONAR to measure network characteristics and identify performance issues between sites on the Janet network.
- Adopting the "Science DMZ" model of separating research and campus traffic to avoid bottlenecks and optimize data transfer performance.
Repositories are systems mainly used to store and publish academic contents. This presentation discusses why repositories contents should be published as Linked (Open) Data and how repositories can be extended to do so.
The document provides updates on Edina National Data Centre services and projects. Key points include:
- Digimap services added new map styles, formats and MasterMap data. Go-Geo! saw increased usage and new content categories.
- Projects like AddressingHistory and CHALICE aim to link historical maps and directories to create open, linked data gazetteers. A mobile scoping study evaluated delivering Digimap via mobile.
- Other activities included work on the Scottish Spatial Data Infrastructure and the ESDIN best practices network for INSPIRE compliance. The OpenStream service provides access to OS OpenData.
This document summarizes the AddressingHistory project, which created an online crowdsourcing tool combining digitized historical Scottish Post Office Directories (PODs) with historical maps. The project had two phases: the first created the initial tool using three POD volumes from 1784-1805, 1865, and 1905-1906. The second phase expanded coverage to additional years and locations, improved parsing of names and occupations, and added new search and visualization features. Lessons learned included the need for ongoing refinement, sustainability planning, and engagement of relevant communities.
ARIADNE is an EU-funded project that provides an overview of the data lifecycle from initial project design and data creation through archiving and re-use. The stages include planning methods, recording data during fieldwork or laboratory work, documenting data to support future analysis and reuse, and depositing well-documented data in an archive. Proper documentation and metadata capture at each stage, from project start to archiving, ensures data can be understood, selected for long-term preservation, and discovered for new research uses over time. Reusing existing archived data supports new discoveries and data preservation.
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
Slides of my keynote at the CLARIAH Toogdag 2018 on 9 March at the National Library of the Netherlands. The main topics were the development of the distributed digital heritage network and the alignment to and cooperation with the CLARIAH infrastructure and data. It also points at some of the current limitations of the semantic web technology.
The document provides information about the 2nd Global Summit and Expo on Multimedia & Applications conference taking place August 15-16, 2016 in London, UK. It includes details about the conference themes, sessions, speakers, venue, and registration deadlines. Over 200 participants are expected to attend presentations, workshops, and interactive sessions on topics related to multimedia technologies, signal processing, computer vision, and more.
1) Postgres and PostGIS have been used at EDINA for over 8 years to power major geospatial services like Digimap.
2) It is used for data storage, mapping, spatial indexing, querying, and data downloads. Postgres allows EDINA to handle large amounts of geospatial data and large user bases.
3) EDINA finds Postgres reliable, performant, scalable, and standards-compliant with good support tools. It will continue being the core database for EDINA's geoservices.
TrunkDB is the new cloud-based version of ORDs (Oxford Research Database Service), which was originally designed to provide database hosting and manipulation services for researchers. TrunkDB allows researchers to create multiple versions of databases, share data with colleagues, and access data securely from anywhere through an online interface. It aims to support researchers by treating their data, rather than just the database, as the primary object and allowing various ways of organizing, updating, and viewing data over time through a versioning system. TrunkDB is currently in private beta testing with plans to launch publicly in June.
PHIDIAS - Boosting the use of cloud services for marine data management, serv...Phidias
Description and scope of the Project
Phidias HPC is aimed at developing a consolidated and shared HPC and Data service by building on pre-existing and emerging infrastructure in order to create a federation of "user to infrastructure" services.
To achieve its purpose and to gain a comprehensive picture of the European infrastructure landscape, three data area tests will develop and provide new services to discover, manage and process spatial and environmental data produced by research communities tackling scientific challenges such as atmospheric, marine and earth observation issues.
Webinar: How to improve the cloud services for marine data
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing, which is why scientists say that "a measurement not made today is lost forever". For these reasons, it is fundamental to properly store both the data and metadata, so that their access can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Inter-operable and Reusable.
PHIDIAS HPC has organised a webinar entitled "PHIDIAS: Boosting the use of cloud services for marine management, services and processing" to be held on 4th June 2020 at 11 AM CEST. The webinar aims to introduce the Phidias HPC initiative, in collaboration with the Blue-Cloud project, to the European HPC and Research community, specifically in the Blue economy, to improve the use of (1) cloud services for marine data management, (2) data services to the user in a FAIR perspective, and (3) data processing on demand.
These objectives will be pursued in coherence with the development of the European Open Science Cloud (EOSC) and the Copernicus Data and Information Access Services (DIAS).
Nanopublications and Decentralized PublishingTobias Kuhn
1) Current methods of publishing and sharing research results and data pose problems regarding verifiability, immutability, and permanence over time.
2) Nanopublications use cryptographic hashes to create "Trusty URIs" that make digital objects verifiable, immutable, and permanent by linking identifiers to content.
3) A decentralized network of nanopublication servers allows for open, real-time publishing and retrieval of nanopublications without a central authority through propagation across nodes.
ESCAPE Kick-off meeting - HL-LHC ESFRI Landmark (Feb 2019)ESCAPE EU
The document discusses the European Organization for Nuclear Research (CERN) and its role in particle physics research. It provides background on CERN's founding, membership, budget, and scientific goals. It also summarizes CERN's Large Hadron Collider project and the Worldwide LHC Computing Grid consortium for data distribution and analysis. Finally, it discusses plans for the High-Luminosity LHC upgrade and associated computing challenges.
Dr. Frank Wuerthwein from the University of California at San Diego presentation at International Super Computing Conference on Big Data, 2013, US Until recently, the large CERN experiments, ATLAS and CMS, owned and controlled the computing infrastructure they operated on in the US, and accessed data only when it was locally available on the hardware they operated. However, Würthwein explains, with data-taking rates set to increase dramatically by the end of LS1 in 2015, the current operational model is no longer viable to satisfy peak processing needs. Instead, he argues, large-scale processing centers need to be created dynamically to cope with spikes in demand. To this end, Würthwein and colleagues carried out a successful proof-of-concept study, in which the Gordon Supercomputer at the San Diego Supercomputer Center was dynamically and seamlessly integrated into the CMS production system to process a 125-terabyte data set.
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
Integrated SW System Provide the “Glue”
Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to be scheduled, to be managed and coordinated to support collaborative operations
From Super-computer to Super-network
In the past, computer processors were the fastest part
peripheral bottlenecks
In the future optical networks will be the fastest part
Computer, processor, storage, visualization, and instrumentation - slower "peripherals”
eScience Cyber-infrastructure focuses on computation, storage, data, analysis, Work Flow.
The network is vital for better eScience
The European Open Science Cloud: just what is it?Carole Goble
Presented at Jisc and CNI leaders conference 2018, 2 July 2018, Oxford, UK (https://www.jisc.ac.uk/events/jisc-and-cni-leaders-conference-02-jul-2018). The European Open Science Cloud. What exactly is it? In principle it is conceived as a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines. How? By federating existing scientific data infrastructures, currently dispersed across disciplines and Member States. In practice, what it is depends on the stakeholder. To European Research Infrastructures it’s a coordinated mission to organise and exchange their data, metadata, software and services to be FAIR – Findable, Accessible, Interoperable, Reusable – and to use e-Infrastructures, either EU or commercial. To EU e-Infrastructures offering data storage and cloud services, it’s a funding mission to integrate their services, policies and organisational structures, and to be used by the Research Infrastructures. To agencies it’s a means to promote Open Science, standardisation, cross-disciplinary research and coordinated investment with a dream of a “one stop shop” for researchers. And for Libraries?
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...ariadnenetwork
Presentation by Cesar Gonzalez-Perez, (Incipit) and Patricia Martín-Rodilla.
Spanish National Research Council (CSIC)
EAA 2013 in the 'New Digital Developments in Heritage Management and Research' session
Pilsen, Czech Republic
5 September 2013
Challenges and Issues of Next Cloud Computing PlatformsFrederic Desprez
Cloud computing has now crossed the frontiers of research to reach industry. It is used every day , whether to exchange emails or make
reservations on web sites. However, many research works remain to be done to improve the performance and functionality of these platforms of tomorrow. In this talk, I will do an overview of some these theoretical and appliead researches done at INRIA and particularly around Clouds distribution, energy monitoring and management, massive data processing and exchange, and resource management.
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Repositories are systems mainly used to store and publish academic contents. This presentation discusses why repositories contents should be published as Linked (Open) Data and how repositories can be extended to do so.
The document provides updates on Edina National Data Centre services and projects. Key points include:
- Digimap services added new map styles, formats and MasterMap data. Go-Geo! saw increased usage and new content categories.
- Projects like AddressingHistory and CHALICE aim to link historical maps and directories to create open, linked data gazetteers. A mobile scoping study evaluated delivering Digimap via mobile.
- Other activities included work on the Scottish Spatial Data Infrastructure and the ESDIN best practices network for INSPIRE compliance. The OpenStream service provides access to OS OpenData.
This document summarizes the AddressingHistory project, which created an online crowdsourcing tool combining digitized historical Scottish Post Office Directories (PODs) with historical maps. The project had two phases: the first created the initial tool using three POD volumes from 1784-1805, 1865, and 1905-1906. The second phase expanded coverage to additional years and locations, improved parsing of names and occupations, and added new search and visualization features. Lessons learned included the need for ongoing refinement, sustainability planning, and engagement of relevant communities.
ARIADNE is an EU-funded project that provides an overview of the data lifecycle from initial project design and data creation through archiving and re-use. The stages include planning methods, recording data during fieldwork or laboratory work, documenting data to support future analysis and reuse, and depositing well-documented data in an archive. Proper documentation and metadata capture at each stage, from project start to archiving, ensures data can be understood, selected for long-term preservation, and discovered for new research uses over time. Reusing existing archived data supports new discoveries and data preservation.
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
Slides of my keynote at the CLARIAH Toogdag 2018 on 9 March at the National Library of the Netherlands. The main topics were the development of the distributed digital heritage network and the alignment to and cooperation with the CLARIAH infrastructure and data. It also points at some of the current limitations of the semantic web technology.
The document provides information about the 2nd Global Summit and Expo on Multimedia & Applications conference taking place August 15-16, 2016 in London, UK. It includes details about the conference themes, sessions, speakers, venue, and registration deadlines. Over 200 participants are expected to attend presentations, workshops, and interactive sessions on topics related to multimedia technologies, signal processing, computer vision, and more.
1) Postgres and PostGIS have been used at EDINA for over 8 years to power major geospatial services like Digimap.
2) It is used for data storage, mapping, spatial indexing, querying, and data downloads. Postgres allows EDINA to handle large amounts of geospatial data and large user bases.
3) EDINA finds Postgres reliable, performant, scalable, and standards-compliant with good support tools. It will continue being the core database for EDINA's geoservices.
TrunkDB is the new cloud-based version of ORDs (Oxford Research Database Service), which was originally designed to provide database hosting and manipulation services for researchers. TrunkDB allows researchers to create multiple versions of databases, share data with colleagues, and access data securely from anywhere through an online interface. It aims to support researchers by treating their data, rather than just the database, as the primary object and allowing various ways of organizing, updating, and viewing data over time through a versioning system. TrunkDB is currently in private beta testing with plans to launch publicly in June.
PHIDIAS - Boosting the use of cloud services for marine data management, serv...Phidias
Description and scope of the Project
Phidias HPC is aimed at developing a consolidated and shared HPC and Data service by building on pre-existing and emerging infrastructure in order to create a federation of "user to infrastructure" services.
To achieve its purpose and to gain a comprehensive picture of the European infrastructure landscape, three data area tests will develop and provide new services to discover, manage and process spatial and environmental data produced by research communities tackling scientific challenges such as atmospheric, marine and earth observation issues.
Webinar: How to improve the cloud services for marine data
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing, which is why scientists say that "a measurement not made today is lost forever". For these reasons, it is fundamental to properly store both the data and metadata, so that their access can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Inter-operable and Reusable.
PHIDIAS HPC has organised a webinar entitled "PHIDIAS: Boosting the use of cloud services for marine management, services and processing" to be held on 4th June 2020 at 11 AM CEST. The webinar aims to introduce the Phidias HPC initiative, in collaboration with the Blue-Cloud project, to the European HPC and Research community, specifically in the Blue economy, to improve the use of (1) cloud services for marine data management, (2) data services to the user in a FAIR perspective, and (3) data processing on demand.
These objectives will be pursued in coherence with the development of the European Open Science Cloud (EOSC) and the Copernicus Data and Information Access Services (DIAS).
Nanopublications and Decentralized PublishingTobias Kuhn
1) Current methods of publishing and sharing research results and data pose problems regarding verifiability, immutability, and permanence over time.
2) Nanopublications use cryptographic hashes to create "Trusty URIs" that make digital objects verifiable, immutable, and permanent by linking identifiers to content.
3) A decentralized network of nanopublication servers allows for open, real-time publishing and retrieval of nanopublications without a central authority through propagation across nodes.
ESCAPE Kick-off meeting - HL-LHC ESFRI Landmark (Feb 2019)ESCAPE EU
The document discusses the European Organization for Nuclear Research (CERN) and its role in particle physics research. It provides background on CERN's founding, membership, budget, and scientific goals. It also summarizes CERN's Large Hadron Collider project and the Worldwide LHC Computing Grid consortium for data distribution and analysis. Finally, it discusses plans for the High-Luminosity LHC upgrade and associated computing challenges.
Dr. Frank Wuerthwein from the University of California at San Diego presentation at International Super Computing Conference on Big Data, 2013, US Until recently, the large CERN experiments, ATLAS and CMS, owned and controlled the computing infrastructure they operated on in the US, and accessed data only when it was locally available on the hardware they operated. However, Würthwein explains, with data-taking rates set to increase dramatically by the end of LS1 in 2015, the current operational model is no longer viable to satisfy peak processing needs. Instead, he argues, large-scale processing centers need to be created dynamically to cope with spikes in demand. To this end, Würthwein and colleagues carried out a successful proof-of-concept study, in which the Gordon Supercomputer at the San Diego Supercomputer Center was dynamically and seamlessly integrated into the CMS production system to process a 125-terabyte data set.
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
Integrated SW System Provide the “Glue”
Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to be scheduled, to be managed and coordinated to support collaborative operations
From Super-computer to Super-network
In the past, computer processors were the fastest part
peripheral bottlenecks
In the future optical networks will be the fastest part
Computer, processor, storage, visualization, and instrumentation - slower "peripherals”
eScience Cyber-infrastructure focuses on computation, storage, data, analysis, Work Flow.
The network is vital for better eScience
The European Open Science Cloud: just what is it?Carole Goble
Presented at Jisc and CNI leaders conference 2018, 2 July 2018, Oxford, UK (https://www.jisc.ac.uk/events/jisc-and-cni-leaders-conference-02-jul-2018). The European Open Science Cloud. What exactly is it? In principle it is conceived as a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines. How? By federating existing scientific data infrastructures, currently dispersed across disciplines and Member States. In practice, what it is depends on the stakeholder. To European Research Infrastructures it’s a coordinated mission to organise and exchange their data, metadata, software and services to be FAIR – Findable, Accessible, Interoperable, Reusable – and to use e-Infrastructures, either EU or commercial. To EU e-Infrastructures offering data storage and cloud services, it’s a funding mission to integrate their services, policies and organisational structures, and to be used by the Research Infrastructures. To agencies it’s a means to promote Open Science, standardisation, cross-disciplinary research and coordinated investment with a dream of a “one stop shop” for researchers. And for Libraries?
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...ariadnenetwork
Presentation by Cesar Gonzalez-Perez, (Incipit) and Patricia Martín-Rodilla.
Spanish National Research Council (CSIC)
EAA 2013 in the 'New Digital Developments in Heritage Management and Research' session
Pilsen, Czech Republic
5 September 2013
Challenges and Issues of Next Cloud Computing PlatformsFrederic Desprez
Cloud computing has now crossed the frontiers of research to reach industry. It is used every day , whether to exchange emails or make
reservations on web sites. However, many research works remain to be done to improve the performance and functionality of these platforms of tomorrow. In this talk, I will do an overview of some these theoretical and appliead researches done at INRIA and particularly around Clouds distribution, energy monitoring and management, massive data processing and exchange, and resource management.
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Solving Network Throughput Problems at the Diamond Light SourceJisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Electron Microscopy Between OPIC, Oxford and eBICJisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
BT Security provides protection for customers by monitoring for potential security incidents and threats. They review BTID operations proactively to prevent incidents from occurring. The talk discussed reactive monitoring, blocking IP addresses temporarily due to reallocation issues, and intelligence scanning to identify ways to improve security processes. BT Security recommends choosing strong, unique passwords and changing them regularly to help protect customer accounts and information.
Data and information governance: getting this right to support an information...Jisc
This document discusses establishing data and information governance to support an information security program. It outlines establishing frameworks for information security and data management with defined roles, policies, procedures and tools. This includes classifying data, establishing data management principles, oversight groups and governance bodies to define strategies, manage risks and ensure compliance. The goal is to understand and promote the value of data assets while protecting confidentiality, integrity and availability. It also describes applying these frameworks and changing roles and responsibilities to better manage information assets.
Cyber crime is increasing in sophistication, impact, and frequency according to a presentation by Charlie McMurdie of PwC. A wide range of threat actors carry out attacks including organized criminals, nation states, hackers, and insiders. Common motivations include financial gain, hacktivism, and espionage. High profile breaches have stolen personal and payment details impacting millions. Companies face direct costs like investigation, indirect costs like loss of customers, and intangible costs like damage to brand. Cyber attacks are now conducted on an industrial scale by organized criminal networks. Recent news reports highlight teenage hackers operating underground forums and groups like Anonymous targeting financial institutions. McMurdie argues a network approach is needed to counter
The document discusses the role of the Chief Information Security Officer (CISO) at the University of Edinburgh. It outlines that the CISO was appointed to provide central leadership on information security risks across the university. The CISO's main responsibilities include leading the information security strategy, managing information security risks from internal and external threats, advising on security threats, and developing security policies and governance. Initial priorities for the CISO included recruiting a security team, focusing on users, overhauling risk governance, and supporting strategic projects. Keys to success are aligning with the university's digital transformation strategy, gaining buy-in from colleges, ensuring business areas own their risks, and providing supporting services through collaboration.
The document discusses cyber incident handling and reporting. It notes that 65% of large firms and 1 in 4 businesses experienced a cyber breach or attack in the past year. It outlines steps for businesses to take to prepare for and handle cyber incidents, including having an incident response plan, understanding network topology, and ensuring key points of contact. It provides details on where to report historic or ongoing cyber incidents and crimes. It also describes the Cyber Information Sharing Partnership (CiSP), a platform for sharing cyber threat information between government and industry.
Certifying and Securing a Trusted Environment for Health Informatics Research...Jisc
The document discusses the certification and securing of a trusted environment for health informatics research data at the University of Dundee. It provides an overview of the Health Informatics Centre, its research data management platform, safe haven architecture, and ISO27001 certification. The platform standardizes data extraction and release, adds metadata and quality checks. A safe haven uses pseudonymized data and virtual environments prevent data from leaving. ISO27001 certification provides governance and reduces documentation through standardized information security practices.
Nick Moore discusses working with students at the University of Gloucestershire on ISO27001, an international information security standard. He proposes involving computing students who are now in the industry to provide a real-life scenario that builds links between students and staff while developing IT Services' defensive capabilities with a managed risk profile. The key is maintaining balance between business goals, student expectations, and quantified risks.
Closing plenary and keynote from Lauren Sager WeinsteinJisc
Host: Andy McGregor, deputy chief innovation officer, Jisc.
Keynote speaker: Lauren Seger Weinstein, chief data officer at Transport for London.
In our final plenary, we'll hear from Lauren Sager Weinstein.
We'll also be announcing the winners of our edtech start-up competition, as we bring Digifest to a close.
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Network Engineering for High Speed Data SharingGlobus
Network Engineering for High Speed Data Sharing
The document discusses modernizing network architecture to improve data sharing performance for science. It proposes separating portal logic from data handling by placing data on dedicated high-performance infrastructure in science DMZs. This allows data to be efficiently transferred between facilities while portals focus on search and access. The Petascale DTN project achieved over 50Gbps transfers between HPC sites using this model. Long-term, interconnected science DMZs could create a global high-performance network enabling efficient data movement for discovery.
The University of Edinburgh is undergoing a large project to reprocure its campus networking infrastructure. The existing network, which has grown organically over many years, contains equipment that is up to 20 years old and no longer meets the university's needs. After an internal review in 2014 recommended a new network be procured, the university embarked on a multi-stage competitive dialogue procurement process that is still ongoing. The process involves pre-market engagement, shortlisting bidders, and multiple rounds of dialogue and evaluation to refine solutions before selecting a final vendor. The procurement has proven to be a large undertaking but may result in a network solution tailored to the university's unique requirements.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
The document discusses several Storage Area Network (SAN) configurations for different projects within EDC. It provides an overview of the goals, architecture, and experiences of SAN implementations for the CR1, Landsat, and LPDAAC projects. It also discusses some general realities and challenges of implementing and managing SANs.
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...Larry Smarr
This document summarizes Dr. Larry Smarr's invited talk about the Pacific Research Platform (PRP) given at the San Diego Supercomputer Center in April 2019. The PRP is building a distributed big data machine learning supercomputer by connecting high-performance computing and data resources across multiple universities in California and beyond using high-speed networks. It provides researchers with petascale computing power, distributed storage, and tools like Kubernetes to enable collaborative data-intensive science across institutions.
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
This document discusses data plane evolutions and future implementations. It summarizes a presentation on network virtualization overlays (NVO3) and encapsulation considerations. Programmable silicon that is field upgradable could simplify deployment of future encapsulations. The P4 programming language also aims to accelerate programmability and wider feature deployment in a target independent way. Overall, future data plane implementations require openness, flexibility, and careful consideration to avoid overly complex architectures.
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
Google’s TensorFlow is one of the most popular deep learning (DL) frameworks. In distributed TensorFlow, gradient updates are a critical step governing the total model training time. These updates incur a massive volume of data transfer over the network.
In this talk, we first present a thorough analysis of the communication patterns in distributed TensorFlow. Then we propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMA-gRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance. Our design includes advanced features such as message pipelining, message coalescing, zero-copy transmission, etc. The performance evaluations show that our proposed design can significantly speed up gRPC throughput by up to 1.5x compared to the default gRPC design. By integrating our RDMA-gRPC with TensorFlow, we are able to achieve up to 35% performance improvement for TensorFlow training with CNN models.
Speakers
Dhabaleswar K (DK) Panda, Professor and University Distinguished Scholar, The Ohio State University
Xiaoyi Lu, Research Scientist, The Ohio State University
- James Blessing is the Deputy Director of Network Architecture at Future Services. He discussed Ciena's MCP network management software, the need for automation of network provisioning through APIs, and the JiscMail NETWORK-AUTOMATION mailing list as a resource.
- The document then covered topics like Netpath services, layer 2 and 3 VPNs, network function virtualization, IPv6 adoption, the Janet end-to-end performance initiative, science DMZ principles, network performance monitoring with perfSONAR, and working with the GÉANT project.
The SKA Project - The World's Largest Streaming Data Processorinside-BigData.com
In this presentation from the 2014 HPC Advisory Council Europe Conference, Paul Calleja from University of Cambridge presents: The SKA Project - The World's Largest Streaming Data Processor.
"The Square Kilometre Array Design Studies is an international effort to investigate and develop technologies which will enable us to build an enormous radio astronomy telescope with a million square meters of collecting area."
Watch the video presentation: http://wp.me/p3RLHQ-cot
Pacific Wave and PRP Update Big News for Big DataLarry Smarr
The Pacific Research Platform (PRP) aims to create a "Big Data freeway system" across research institutions in the western United States and Pacific region by leveraging high-bandwidth optical fiber networks. The PRP connects multiple universities and national laboratories, providing bandwidth up to 100Gbps for data-intensive science applications. Initial testing of the PRP demonstrated disk-to-disk transfer speeds exceeding 5Gbps between many sites. The PRP will be expanded with SDN/SDX capabilities to enable even higher performance for large-scale datasets from fields like astronomy, genomics, and particle physics.
This document outlines a project to develop a low-cost robotic tape library system using open source technology. The system was created to provide a cost-effective data storage solution for the Square Kilometre Array radio telescope project. An open source based prototype was created that supports one tape drive, has over twice the storage capacity of a comparable commercial system, and costs around 70% less. Open source tape library systems are suitable for applications that involve infrequently accessed cold data stored for long periods, and can provide affordable long-term data storage for research institutes and archives.
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
Presentation at Networkshop46.
FRµIT: Raspberry Pi clusters and other adventures in networking research - by Phil Basford, University of Southampton.
Programmable network infrastructure: what does it mean for the campus? - by Matthew Broadbent, University of Lancaster.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
11.04.06
Joint Presentation
UCSD School of Medicine Research Council
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
Similar to Archiving data from Durham to RAL using the File Transfer Service (FTS) (20)
The document announces a community launch event for digital storytelling in January 2024. It discusses using digital storytelling in higher education to support learning and teaching. Examples include using digital stories for formative assessment, reflective exercises, and research dissemination across various disciplines. Feedback from students and staff who participated in digital storytelling workshops was very positive and found it to be transformative and help give voice to their experiences. The document also profiles speakers who will discuss using digital stories to explore difficult concepts, hear the student voice, and facilitate staff reflections. It emphasizes that digital storytelling can introduce humanity and creativity into pedagogy and help develop core skills. Attendees will participate in a Miro activity to discuss benefits, applications,
This document summarizes a Jisc strategy forum that took place in Northern Ireland on December 14, 2023. It outlines Jisc's planned services and initiatives for 2023-2024, including expanding network access and launching new cybersecurity, analytics, and equipment services. It discusses feedback received from further and higher education members on how Jisc can better deliver solutions, empower communities, and provide vision/strategy. Activities at the forum focused on understanding members' needs/challenges and discussing how Jisc can better support key priorities in Northern Ireland, such as affordable infrastructure, digital skills, and cybersecurity for FE and efficiency, student experience, and collaboration for HE.
This document summarizes a Jisc Scotland strategy forum that took place on December 12, 2023. It outlines Jisc's planned solutions and services for 2023-2024 including deploying resilient Janet access, IT health checks, online surveys, SD-WAN services, and more. The document discusses how Jisc engages stakeholders through relationship management, research, communities, training and events. It summarizes feedback from further education and higher education members on how Jisc can improve advocacy by delivering the right solutions, empowering communities, and having a clear vision and strategy. Finally, it outlines activities for the forum, including understanding members' needs and priorities and discussing how Jisc supports national priorities in Scotland.
The Jisc provided a strategic update to stakeholders. Key highlights included:
- Achievements from the last year like data collection and analysis following the HESA merger, digital transformation support, and cost savings from licensing deals.
- Customer testimonials from Bridgend College on extending eduroam and from the University of Northampton on curriculum design support from Jisc.
- Priorities for the coming year like connectivity upgrades, new cybersecurity services, and improved customer experience.
- A financial summary showing income sources like membership fees and expenditures on areas like connectivity and cybersecurity.
This document summarizes VirtualSpeech, a company that provides virtual reality (VR) and artificial intelligence (AI) powered professional development training. It offers over 150 online courses covering topics like public speaking, leadership, and sales. Users can practice skills in immersive VR scenarios and receive feedback from conversational AI. The training is used by over 450,000 individuals across 130 countries and 150 universities. VirtualSpeech aims to enhance traditional learning with interactive VR practice sessions and real-time feedback to boost skills retention.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Archiving data from Durham to RAL using the File Transfer Service (FTS)
1. Lydia Heck, Campus network engineering workshop
19/10/2016
Archiving data from Durham to RAL using the FileTransfer
Service (FTS)
2. 19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
2
Archiving data from Durham to
RAL using the File Transfer
Service (FTS)
Lydia Heck
Institute for Computational Cosmology
Manager of the DiRAC-2/2.5 Data Centric Facility
COSMA
3. 19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
3
Introduction to DiRAC
l
DIRAC -- Distributed Research utilising Advanced
Computing established in 2009 with DiRAC-1
• Support of research in theoretical astronomy, particle physics and
nuclear physics
• Funded by STFC with infrastructure money allocated from the
Department for Business, Innovation and Skills (BIS)
• The running costs, such as staff costs and electricity are funded by
STFC
• DiRAC is classed as a major research facility by STFC on a par with the big
telescopes
4. What is DiRAC
l A national service run/managed/allocated by the
scientists who do the science funded by BIS and
STFC
l The systems are built around and for the applications
with which the science is done.
l We do not rival a facility like ARCHER, as we do not
aspire to run a general national service.
19 October 2016 4Campus Network Engineering for Data
Intensive Science Workshop
5. What is DiRAC – cont’d?
l For the highlights of science carried out on the
DiRAC facility please see:
http://www.dirac.ac.uk/science.html
l Specific example: Large scale structure
calculations with the Eagle run
4096 cores
~8 GB RAM/core
47 days = 4,620,288 cpu hours
200 TB of data
19 October 2016 5Campus Network Engineering for Data
Intensive Science Workshop
6. The DiRAC computing systems
19 October 2016 6Campus Network Engineering for Data
Intensive Science Workshop
Blue Gene
Edinburgh
Cosmos
Cambridge
Complexity
Leicester
Data Centric
Durham
Data Analytic
Cambridge
7. COSMA @ DiRAC (Data Centric)
Durham – Data Centric
system –IBM IDataplex
6720 Intel Sandy Bridge
cores
53.8 TB of RAM
FDR10 infiniband 2:1
blocking
2.5 Pbyte of GPFS
storage (2.2 Pbyte used!)
19 October 2016 7Campus Network Engineering for Data
Intensive Science Workshop
8. Resources of DiRAC
l Long projects with significant amount of CPU hours allocated
for 3 years typically on a specific system on one or more of the
available 5 systems. Resources available:
l
l
l
l
l
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
8
System cpu hours storage location
Bluegene 98,304 cores 861 M 1 PB (GPFS) Edinburgh
Data Centric 6720 Xeon
cores
59 M 2.5 PB (GPFS) Durham (DiRAC2)
Data Centric 8000 Xeon
cores
> 71 M 2.5 PB data (Lustre)
1.8 PB scratch (Lustre)
Durham (DiRAC2.5)
Complexity 4352 Xeon
cores
38 M 0.8 PB (Panasas) Leicester
Data Analytic 4800 Xeon
cores
42 M 0.75 PB (Lustre) Cambridge
SMP 1784 Xeon cores
shared memory
15.6M 146 TB (EXT) Cambridge
9. Why do we need to copy data ?
During and when a project is completed copy data to home institutions
l requires additional storage resource at researchers’ home institutions
l Not enough provision – will require additional funds.
Make backup copies
l if disaster struck many cpu hours of calculations would be lost.
Copy data to other sites to leverage compute resources for post processing.
Storage on HPC facility runs out of capacity
data creation considerably above expectation ?
l
19 October 2016 9Campus Network Engineering for Data
Intensive Science Workshop
10. Why do we copy data to RAL ?
Research data must now be available to interested parties for
specified period of time
l We could install DiRAC's own archive
• requires funds and there is (currently) no budget
We needed to get started:
l to gain experience
l to get a valid backup
l to remove data as the resources run out
l Identify bottlenecks and technical challenges
Jeremy Yates (Director of DiRAC) negotiated access to the
RAL archiving systems
Set up collaborations and make use of previous experience
and pool resources
AND: copy data!
l
l
19 October 2016 10Campus Network Engineering for Data
Intensive Science Workshop
11. Network connectivity of Durham University
• 2012 – upgrade to 4x1 Gbit to Janet
• Janet advised to investigate optimal utilisation of
available bandwith before applying for further upgrade
• 2014 – upgrade to 6 Gbit to Janet
• currently: 8 Gbit to Janet should be a full 10 Gbit by the
end of the year – technical issues
19 October 2016 11Campus Network Engineering for Data
Intensive Science Workshop
12. network bandwidth – situation for Durham
l 2014: Measured throughput ?
l
l
19 October 2016 12Campus Network Engineering for Data
Intensive Science Workshop
13. 2014: Measured Limits ?
l
l
19 October 2016 13Campus Network Engineering for Data
Intensive Science Workshop
14. September 2014 – Measured limits
l
l
19 October 2016 14Campus Network Engineering for Data
Intensive Science Workshop
15. Making optimal use of available bandwidth
• planning and investment to by-pass the external campus firewall:
• Prepartory work started in October/November 2014 two new
routers (~£80k) – configured for throughput with minimal ACL
enough to safeguard site.
• deploying internal firewalls – part of new security
infrastructure anyhow but essential for such a venture
• security now relies on front-end systems of Durham DiRAC
and Durham GridPP
• IPPP was moved outside the firewall in April 2015 with a clear
mandate to manage security for their installation.
• The DiRAC Data Transfer system was moved outside about 1
month later.
19 October 2016 15Campus Network Engineering for Data
Intensive Science Workshop
16. GridPP Site FW config for endpoint node
19 October 2016 16Campus Network Engineering for Data
Intensive Science Workshop
GridFTP
Port
blocking
GridFTP
Pass
thru
GridFTP
GridFTP
Monitor
w/fw
GridFTP
Bypass
site fw
17. Result for DiRAC and GridPP in Durham
• guaranteed 3 Gbit/sec in/out
• Consequences:
• pushed the network performance for Durham GridPP from bottom 3 in the
country to top 5 of the UK GridPP sites
• Now they experience different bottlenecks, but they under their control
• DiRAC data transfers achieve up to 300 – 400 Mbyte/sec throughput to RAL
on archiving depending on file sizes.
• faster data sharing with other collaboration sites
• recently (October 2016) offered service to Earth Sciences with 70-80
MByte/sec from site in Switzerland
•
19 October 2016 17Campus Network Engineering for Data
Intensive Science Workshop
18. Collaboration between DiRAC and
GridPP/RAL
l Durham Institute for Computational Cosmology (ICC)
volunteered to be the prototype installation
l Huge thanks to Jens Jensen and Brian Davies - there
were many emails exchanged, many questions asked
and many answers given.
l Resulting document
“Setting up a system for data archiving using FTS3” by
Lydia Heck, Jens Jensen and Brian Davies
19 October 2016 18Campus Network Engineering for Data
Intensive Science Workshop
l https://www.cosma.dur.ac.uk/documentation
19. Setting up the archiving tools
l Identify appropriate hardware – could mean
extra expense:
need freedom to modify and experiment with
cannot have HPC users logged in and working
when you need to reboot the system!
l free to do very latest security updates
This might not always be possible on an HPC
system
l requires optimal connection to storage
For the transfer system this meant an infiniband
card
19 October 2016 19Campus Network Engineering for Data
Intensive Science Workshop
20. Setting up the archiving tools
l Create an interface to access the file/archving
service at RAL using the GridPP tools
• gridftp – Globus Toolkit – also provides Globus
Connect
• Trust anchors (egi-trustanchors)
• voms tools (emi3-xxx)
• fts3 (cern)
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
20
21. 19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
21
Chose to use FTS3 with
GridFTP
User submits transfer
lists
(and credentials)
GPFS
data.cosma.dur.ac.uk
(GridFTP)
CASTOR-GEN
srm-dirac.gridpp.rl.ac.uk
(SRM)
GridFTP
FTS3
22. Learning to use certificates and proxies
l long-lived voms proxy?
l myproxy-init; myproxy-logon; voms-proxy-init; fts-transfer-
delegation
l How to create a proxy and delegation that lasts weeks
even months?
l This is still an issue for a voms proxy. But circumvented it
using normal proxy.
l grid-proxy-init; fts-transfer-delegation
l grid-proxy-init –valid HH:MM
l fts-transfer-delegation –e time-in-seconds
l creates proxy that lasts up to certificate life time.
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
22
23. Experiences
1. Large files – optimal throughput limited by network bandwidth
2. Many small files – limited by latency
3. many parallel sessions: impedes on proper functioning of
archive server.
4. Ownership, creation dates not preserved – one grid owner
5. Simple approach of “just” pushing files will not work!
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
23
24. Actions to overcome issues
• tar files up in chunks - ~256 Gbyte
• exclude checked out versioning subdirectories
• preserves ownership, and time stamps in the tar archive
• keep record of archived files
• Files to transfer are large – limited by bandwidth, not by latency
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
24
25. Open issues
l depends on single admin to carry out. Not
automatic.
l what happens when content in directories
change? – complete new archive sessions?
l Create a tool more like rsync – requires
extensive scripting
l When trying to get data back, get back all of a
subset, to find single or string of files
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
25
26. Conclusions
l With the right network speed we can archive the DiRAC data to
RAL or anywhere else with the right tools and connectivity.
l Documenting the procedure is very important to transfer the
knowledge and duplicating effort. The documentation is online
https://www.cosma.dur.ac.uk/documentation
l Each DiRAC site should have their own dirac0X account
l Start with and keep on archiving – this is more difficult as it is
not completely automatic yet and more development is
required.
l Collaboration between DiRAC and GridPP/RAL DOES work!
l The work has been of benefit to other transfer actions, which
significantly helps research and reflects well on the service we
can deliver.
l Can we aspire to more?
19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
26