In the mid-1990s, the high-energy physics community (think FermiLab and CERN) started planning for the Large Hadron Collider. Managing the petabytes of data that would be generated by the facility and sharing it with the globally distributed community of over 10,000 researchers would be a major infrastructure and technology problem. This same community that brought us the web has now developed standards, software, and infrastructure for grid computing. In this seminar I'll present some of the exciting science that is being done on the Open Science Grid, the US national cyberinfrastructure linking 60 institutions (Harvard included) into a massive distributed computing and data processing system.
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
Adapting federated cyberinfrastructure for shared data collection facilities ...Boston Consulting Group
Early stage experimental data in structural biology is generally unmaintained
and inaccessible to the public. It is increasingly believed that this data, which
forms the basis for each macromolecular structure discovered by this field, must
be archived and, in due course, published. Furthermore, the widespread use of
shared scientific facilities such as synchrotron beamlines complicates the issue of
data storage, access and movement, as does the increase of remote users. This
work describes a prototype system that adapts existing federated cyberinfrastructure
technology and techniques to significantly improve the operational
environment for users and administrators of synchrotron data collection
facilities used in structural biology. This is achieved through software from the
Virtual Data Toolkit and Globus, bringing together federated users and facilities
from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon
Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical
School. The performance and experience with the prototype provide a model for
data management at shared scientific facilities.
Stokes-Rees, Ian, Ian Levesque, Frank V. Murphy, Wei Yang, Ashley Deacon, and Piotr Sliz. “Adapting Federated Cyberinfrastructure for Shared Data Collection Facilities in Structural Biology.” Journal of Synchrotron Radiation 19, no. 3 (April 6, 2012). http://scripts.iucr.org/cgi-bin/paper?S0909049512009776.
The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.
Other content at http://j.mp/bostonpython-continuum
Abstract:
We're pretty obsessed with Python-centric data analytics, and figuring out how to do this well led to the creation of Continuum Analytics. This talk will tell you about three pieces of that puzzle we have developed over the past year: Wakari, a web based analytics platform, leveraging IPython and IPython Notebook; Blaze, a next generation NumPy; and Bokeh, providing interactive data visualization. All are available now for free as services or open source projects.
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
Adapting federated cyberinfrastructure for shared data collection facilities ...Boston Consulting Group
Early stage experimental data in structural biology is generally unmaintained
and inaccessible to the public. It is increasingly believed that this data, which
forms the basis for each macromolecular structure discovered by this field, must
be archived and, in due course, published. Furthermore, the widespread use of
shared scientific facilities such as synchrotron beamlines complicates the issue of
data storage, access and movement, as does the increase of remote users. This
work describes a prototype system that adapts existing federated cyberinfrastructure
technology and techniques to significantly improve the operational
environment for users and administrators of synchrotron data collection
facilities used in structural biology. This is achieved through software from the
Virtual Data Toolkit and Globus, bringing together federated users and facilities
from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon
Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical
School. The performance and experience with the prototype provide a model for
data management at shared scientific facilities.
Stokes-Rees, Ian, Ian Levesque, Frank V. Murphy, Wei Yang, Ashley Deacon, and Piotr Sliz. “Adapting Federated Cyberinfrastructure for Shared Data Collection Facilities in Structural Biology.” Journal of Synchrotron Radiation 19, no. 3 (April 6, 2012). http://scripts.iucr.org/cgi-bin/paper?S0909049512009776.
The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.
Other content at http://j.mp/bostonpython-continuum
Abstract:
We're pretty obsessed with Python-centric data analytics, and figuring out how to do this well led to the creation of Continuum Analytics. This talk will tell you about three pieces of that puzzle we have developed over the past year: Wakari, a web based analytics platform, leveraging IPython and IPython Notebook; Blaze, a next generation NumPy; and Bokeh, providing interactive data visualization. All are available now for free as services or open source projects.
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Science Engagement: A Non-Technical Approach to the Technical DivideCybera Inc.
A presentation for the Future of Networking session at the 2014 Cyber Summit by Jason Zurawski, Science Engagement Engineer, ESnet (Lawrence Berkeley National Laboratory).
This presentation outlines the need to invest intellectual and expert human effort in data publication in order to see compelling research outcomes.
I gave this presentation on April 10th, 2014 at the University of Pennsylvania in an event sponsored by the Penn Humanities forum (http://humanities.sas.upenn.edu/13-14/dhf_opendata.shtml)
Understanding the Big Picture of e-ScienceAndrew Sallans
A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...PyData
By Kerstin Kleese van Dam
PyData New York City 2017
New instrument technologies are enabling a new generation of in-situ and in-operando experiments, with extremely fine spatial and temporal resolution, that allows researchers to observe as physics, chemistry and biology are happening. These new methodologies go hand in hand with an exponential growth in data volumes and rates - petabyte scale data collections and terabyte/sec. At the same time scientists are pushing for a paradigm shift. As they can now observe processes in intricate details, they want to analyze, interpret and control those processes. Given the multitude of voluminous, heterogenous data streams involved in every single experiment, novel real time, data driven analysis and decision support approaches are needed to realize their vision. This talk will discuss state of the art streaming analysis for experimental facilities, its challenges and early successes. It will present where commercial technologies can be leveraged and how many of the novel approaches differ from commonly available solutions.
Presentation for the San Francisco #IDCC14 conference (http://www.dcc.ac.uk/events/idcc14/day-two-papers). The presentation covers publishing zooarchaeology data with Open Context (http://opencontext.org) to study the spread of farming from the Near East to Europe through Anatolia. It looks at editorial processes, linked data annotation, and other workflow concerns relating to making raw data more usable for comparative analysis.
XII Experimental Chaos and Complexity Conference talk, Michigan US.
Related papers:
http://journals.aps.org/pre/abstract/10.1103/PhysRevE.88.012712
http://www.biomedcentral.com/1471-2202/14/S1/O18
Accelerating Data-driven Discovery in Energy ScienceIan Foster
A talk given at the US Department of Energy, covering our work on research data management and analysis. Three themes:
(1) Eliminate data friction (use of SaaS for research data management)
(2) Liberate scientific data (research on data extraction, organization, publication)
(3) Create discovery engines at DOE facilities (services that organize data + computation)
Science is witnessing a data revolution. Data are now created by faster and cheaper physical technologies, software tools and digital collaborations. Examples of these include satellite networks, simulation models and social network data. To transform these data successfully into information then into knowledge and finally into wisdom, we need new forms of computational thinking. These may be enabled by building "instruments" that make data comprehensible for the "naked mind" in a similar fashion to the way in which telescopes reveal the universe to the naked eye. These new instruments must be grounded in well-founded principles to ensure they have the fidelity and capacity to transform the complex and large-scale data into comprehensive forms; this demands new data-intensive methods.
Data-intensive refers to huge volumes of data, complex patterns of data integration and analysis and intricate interactions between data and users. Current methods and tools are failing to address data-intensive challenges effectively: they fail for several reasons, all of which are aspects of scalability. I will introduce three main aspects of data-intensive research and show how we are addressing the challenges that arise from the interaction of these aspects. I will make use of results from our interdisciplinary collaborations as examples of solutions to specific challenges that can arise when scaling up intensity.
Beyond Preservation: Situating Archaeological Data in Professional PracticeEric Kansa
I presented this lecture at the German Archaeological Institute (DAI) in Berlin on Nov. 6, 2014 (see: http://www.dainst.org/termin/-/event-display/ogNX4Gtxkd87/342513)
The lecture focuses on how archaeological data fits in professional practice. It looks at scholarly communications, government policies toward the sciences and humanities, and professional reward structures.
The lecture then shows examples of how Open Context publishes archeological data, including editorial processes to promote data quality and relate contributed data to the 'Web of Data' using Linked Open Data methods. Research applications of Open Context and linked archaeological data include the Digital Index of North American Archaeology (DINAA) project (see: http://ux.opencontext.org/blog/archaeology-site-data/) and a data integration study exploring the development and dispersal of animal husbandry economies in Epipaleolithic - Chalcolithic Anatolia (see: http://dx.doi.org/10.1371/journal.pone.0099845)
The lecture concludes with how archaeologists need to invest more intellectually in the method and theory of modeling and creating data. It also looks at how concepts and expectations of publishing static artifacts need to be revised (using techniques like version control) to enable continued and more transparent revision of data to fix problems, implement new standards, and meet new research goals.
From a keynote talk given at the ESIP Winter 2011 meetings in Washington, DC Talk was titled Some Reflections on "The Burden of Proof: Data as Evidence"
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
More Related Content
Similar to 2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Science Engagement: A Non-Technical Approach to the Technical DivideCybera Inc.
A presentation for the Future of Networking session at the 2014 Cyber Summit by Jason Zurawski, Science Engagement Engineer, ESnet (Lawrence Berkeley National Laboratory).
This presentation outlines the need to invest intellectual and expert human effort in data publication in order to see compelling research outcomes.
I gave this presentation on April 10th, 2014 at the University of Pennsylvania in an event sponsored by the Penn Humanities forum (http://humanities.sas.upenn.edu/13-14/dhf_opendata.shtml)
Understanding the Big Picture of e-ScienceAndrew Sallans
A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...PyData
By Kerstin Kleese van Dam
PyData New York City 2017
New instrument technologies are enabling a new generation of in-situ and in-operando experiments, with extremely fine spatial and temporal resolution, that allows researchers to observe as physics, chemistry and biology are happening. These new methodologies go hand in hand with an exponential growth in data volumes and rates - petabyte scale data collections and terabyte/sec. At the same time scientists are pushing for a paradigm shift. As they can now observe processes in intricate details, they want to analyze, interpret and control those processes. Given the multitude of voluminous, heterogenous data streams involved in every single experiment, novel real time, data driven analysis and decision support approaches are needed to realize their vision. This talk will discuss state of the art streaming analysis for experimental facilities, its challenges and early successes. It will present where commercial technologies can be leveraged and how many of the novel approaches differ from commonly available solutions.
Presentation for the San Francisco #IDCC14 conference (http://www.dcc.ac.uk/events/idcc14/day-two-papers). The presentation covers publishing zooarchaeology data with Open Context (http://opencontext.org) to study the spread of farming from the Near East to Europe through Anatolia. It looks at editorial processes, linked data annotation, and other workflow concerns relating to making raw data more usable for comparative analysis.
XII Experimental Chaos and Complexity Conference talk, Michigan US.
Related papers:
http://journals.aps.org/pre/abstract/10.1103/PhysRevE.88.012712
http://www.biomedcentral.com/1471-2202/14/S1/O18
Accelerating Data-driven Discovery in Energy ScienceIan Foster
A talk given at the US Department of Energy, covering our work on research data management and analysis. Three themes:
(1) Eliminate data friction (use of SaaS for research data management)
(2) Liberate scientific data (research on data extraction, organization, publication)
(3) Create discovery engines at DOE facilities (services that organize data + computation)
Science is witnessing a data revolution. Data are now created by faster and cheaper physical technologies, software tools and digital collaborations. Examples of these include satellite networks, simulation models and social network data. To transform these data successfully into information then into knowledge and finally into wisdom, we need new forms of computational thinking. These may be enabled by building "instruments" that make data comprehensible for the "naked mind" in a similar fashion to the way in which telescopes reveal the universe to the naked eye. These new instruments must be grounded in well-founded principles to ensure they have the fidelity and capacity to transform the complex and large-scale data into comprehensive forms; this demands new data-intensive methods.
Data-intensive refers to huge volumes of data, complex patterns of data integration and analysis and intricate interactions between data and users. Current methods and tools are failing to address data-intensive challenges effectively: they fail for several reasons, all of which are aspects of scalability. I will introduce three main aspects of data-intensive research and show how we are addressing the challenges that arise from the interaction of these aspects. I will make use of results from our interdisciplinary collaborations as examples of solutions to specific challenges that can arise when scaling up intensity.
Beyond Preservation: Situating Archaeological Data in Professional PracticeEric Kansa
I presented this lecture at the German Archaeological Institute (DAI) in Berlin on Nov. 6, 2014 (see: http://www.dainst.org/termin/-/event-display/ogNX4Gtxkd87/342513)
The lecture focuses on how archaeological data fits in professional practice. It looks at scholarly communications, government policies toward the sciences and humanities, and professional reward structures.
The lecture then shows examples of how Open Context publishes archeological data, including editorial processes to promote data quality and relate contributed data to the 'Web of Data' using Linked Open Data methods. Research applications of Open Context and linked archaeological data include the Digital Index of North American Archaeology (DINAA) project (see: http://ux.opencontext.org/blog/archaeology-site-data/) and a data integration study exploring the development and dispersal of animal husbandry economies in Epipaleolithic - Chalcolithic Anatolia (see: http://dx.doi.org/10.1371/journal.pone.0099845)
The lecture concludes with how archaeologists need to invest more intellectually in the method and theory of modeling and creating data. It also looks at how concepts and expectations of publishing static artifacts need to be revised (using techniques like version control) to enable continued and more transparent revision of data to fix problems, implement new standards, and meet new research goals.
From a keynote talk given at the ESIP Winter 2011 meetings in Washington, DC Talk was titled Some Reflections on "The Burden of Proof: Data as Evidence"
Similar to 2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees (20)
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Keynote at Gateways 2017 Conference, Ann Arbor MI
Speaker: Ian Stokes-Rees
"Connecting Cyberinfrastructure Back To The Laptop"
Science Gateways today are generally built to provide a web-accessible interface for a particular scientific community to access a combination of software, hardware, and data deployed in an expertly managed computing center. But what happens when the scientist wants to repatriate their data? Or perform some analysis that is not supported by the gateway? Both for the purposes of encouraging innovative workflows and serving an audience with a wide range of computational experience it is important to consider how a gateway can fit into the broader computational ecosystem of a particular researcher or research group. One simple starting point for this is to ask the question "how can the gateway connect back to the laptop?". This talk will consider how this is being done today in science gateways and present some ideas for how this could be expanded in the future.
A talk from AnacondaCON presenting my personal journey from physics to finance to biology and how collaborative team-based data science has been the big enabler. The talk looks at Python, Big Data, Jupyter Notebooks, Anaconda. Discusses CERN LHCb particle physics computing, protein structure determination, and patterns in data science.
Working with thousands, millions, or billions of data records in high dimensions is increasingly becoming the reality for scientific research. What are some techniques to make this kind of data volume tractable? How can parallel computing help? In this talk I'll review data management tools and infrastructures, languages, and paradigms that help in this regard. In particular, I'll discuss Hadoop, MapReduce, Python, NumPy, and Globus Online to provide a survey of ways in which researchers can manage their data and process it in parallel.
Harvard HPC Seminar Series
Theresa Kaltz, PhD, High Performance Technical Computing, FAS, Harvard
Due to the wide availability and low cost of high speed networking, commodity clusters have become the de facto standard for building high performance parallel computing systems. This talk will introduce the leading technology for high speed interconnects called Infiniband and compare its deployment and performance to Ethernet. In addition, some emerging interconnect technologies and trends in cluster networking will be discussed.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
1. Accelerating
Science
with
the
Open
Science
Grid
Ian
Stokes-‐Rees,
PhD
Harvard
Medical
School,
Boston,
USA
http://portal.sbgrid.org
ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
2. Slides
and
Contact
ijstokes@hkl.hms.harvard.edu
http://linkedin.com/in/ijstokes
http://slidesha.re/ijstokes-cs50
http://www.sbgrid.org
http://portal.sbgrid.org
http://www.opensciencegrid.org
http://www.xsede.org
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
3. Abstract
In
the
mid-‐1990s,
the
high-‐energy
physics
community
(think
FermiLab
and
CERN)
started
planning
for
the
Large
Hadron
Collider.
Managing
the
petabytes
of
data
that
would
be
generated
by
the
facility
and
sharing
it
with
the
globally
distributed
community
of
over
10,000
researchers
would
be
a
major
infrastructure
and
technology
problem.
This
same
community
that
brought
us
the
web
has
now
developed
standards,
software,
and
infrastructure
for
grid
computing.
In
this
seminar
I'll
present
some
of
the
exciting
science
that
is
being
done
on
the
Open
Science
Grid,
the
US
national
cyberinfrastructure
linking
60
institutions
(Harvard
included)
into
a
massive
distributed
computing
and
data
processing
system.
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
4. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
5. About
Me
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
6. Particle
Physics
Standard
Model
proton:
uud
neutron
:
udd
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
7. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
8. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
9. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
10. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
11. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
12. animation
1
-
ATLAS
collision
momentum
and
charge
(sign)
heavy
particle
decays
(fast)
neutral
particles
and
total
energy
muon
tracking
and
energy
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
13. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
14. animation
2
-
tracking
and
energy
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
15. 40
MHz
bunch
crossing
rate
10
million
data
channels
1
KHz
level
1
event
recording
rate
1-10
MB
per
event
14
hours
per
day,
7+
months
/
year
4
detectors
6
PB
of
data
/
year
globally
distribute
data
for
analysis
(x2)
Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
16. What
are
the
computing
challenges?
• Track
reconstruction
• Data
distributed
globally,
processed,
and
re-‐processed
around
the
clock
• Search
for
new
physics
• Modify
physics
model,
see
if
better
predictions
are
produced
• Data
infrastructure
• A
lot
of
data
to
move
around
• Computing
infrastructure
• A
lot
of
data
to
process
• Global
collaboration
• thousands
of
users
around
the
world:
how
to
securely
share
systems,
data,
computations,
results
• federated
scientiXic
computing
infrastructure
Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
17. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
18. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
19. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
20. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
21. 3
billion
pixels
55k
x
55k
equivaltent
Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
22. Study
of
Protein
Structure
and
Function
400m
1mm
10nm
• Shared
scientiXic
data
collection
facility
• Data
intensive
(10-‐100
GB/day)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
23. Cryo
Electron
Microscopy
• Previously,
1-10,000
images,
managed
by
hand
• Now,
robotic
systems
collect
millions
of
hi-res
images
• estimate
250,000
CPU-hours
to
reconstruct
model
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
26. Next
Generation
Sequencing
Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
27. This
is
the
edge
of
science
....
....
and
computing
is
everywhere
• computational
model
of
system
• simulation
• analysis
• visualization
• data
management
infrastructure
• computational
infrastructure
• collaboration,
identity
mgmt,
access
control
Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
28. Boston
Life
Sciences
Hub
• Biomedical
researchers
• Government
agencies
• Life
sciences Tufts
• Universities
Universit
y
School
of
Medicin
e
• Hospitals
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
29. ScientiMic
Research
Today
• International
collaborations
• IT
becomes
embedded
into
research
process:
data,
results,
analysis,
visualization
• Crossing
institutional
and
national
boundaries
• Computational
techniques
increasingly
important
• ...
and
computationally
intensive
techniques
as
well
• requires
use
of
high
performance
computing
systems
• Data
volumes
are
growing
fast
• hard
to
share
• hard
to
manage
• ScientiXic
software
often
difXicult
to
use
• or
to
use
properly
• Web
based
tools
increasingly
important
• but
often
lack
disconnect
from
persisted
and
shared
results
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
30. Required:
Collaborative
environment
for
compute
and
data
intensive
science
Saturday, November 5, 2011
32. Open
Science
Grid
http://opensciencegrid.org
• US
National
Cyberinfrastructure
• Primarily
used
for
high
energy
physics
computing
• 80
sites
• 100,000
job
slots
5,073,293
hours
• 1,500,000
hours
per
day ~570
years
• PB
scale
aggregate
storage
• 1
PB
transferred
each
day
• Virtual
Organization-‐based
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
33. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
34. 10k
grid
jobs
Example
Job
Set approx
30k
CPU
hours
99.7%
success
rate evicted - red
24
wall
clock
hours completed - green
held - orange
MIT
5292
UWisc
1173
1077
120
1657
3
662
Cornell
840
20
Buffalo
720
628
ND
76
407
47
421
Caltech 190
FNAL
1409
237
12
24
79
4
47
UNL
6
1159
3
HMS
60
20
Purdue
349
10,000 jobs
52
17
39
UCR RENCI
local queue
remote queue SPRACE
1216
running
316
248
Grid Overview - Ian Stokes-Rees 24 hours ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
37. Grid
Architectural
Details
• Resources • Information
• Uniform
compute
clusters • LDAP
based
most
common
(not
• Managed
via
batch
queues optimized
for
writes)
• Local
scratch
disk • Domain
speciXic
layer
• Sometimes
high
perf.
network
• Open
problem!
(e.g.
InXiniBand) • Fabric
• Behind
NAT
and
Xirewall • In
most
cases,
assume
functioning
• No
shell
access Internet
• Data • Some
sites
part
of
experimental
private
networks
• Tape-‐backed
mass
storage
• Disk
arrays
(100s
TB
to
PB) • Security
• High
bandwidth
(multi-‐stream)
• Typically
underpinned
by
X.509
transfer
protocols Public
Key
Infrastructure
• File
catalogs • Same
standards
as
SSL/TLS
and
• Meta-‐data “server
certs”
for
“https”
• Replica
management
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
38. OSG
Components
(I)
• Centralized
• X.509
CertiXicate
Authority:
Energy
Science
Network
CA
@
LBL
• Accounting:
Gratia
logging
system
to
track
usage
(CPU,
Network,
Disk)
• Status:
LDAP
directory
with
details
of
each
participating
system
• Support:
Central
clearing
house
for
support
tickets
• Software:
distribution
system,
update
testing,
bug
reporting
and
Xixing
• Communication:
Wikis,
docs,
mailing
lists,
workshops,
conferences,
etc.
• Per
Site
• Compute
Element/Gatekeeper
(CE/GK):
access
point
for
external
users,
acts
as
frontend
for
any
cluster.
Globus
GRAM
+
local
batch
system
• Storage
Element
(SE):
grid-‐accessible
storage
system,
GridFTP-‐based
+
SRM
• Worker
Nodes
(WN):
cluster
nodes
with
grid
software
stack
• User
Interface
(UI):
access
point
for
local
users
to
interact
with
remote
grid
• Access
Control:
GUMS
+
PRIMA
for
ACLs
to
local
system
by
grid
identities
• Admin
contact:
need
a
local
expert
(or
two!)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
39. OSG
Components
(II)
• Per
Virtual
Organization
(user
community)
• VO
Management
System
(VOMS):
to
organize
and
register
users
• Registration
Authority
(RA):
to
validate
community
users
with
X.509
issuer
• User
Interface
system
(UI):
provide
gateway
to
OSG
for
users
• Support
Contact:
users
are
supported
by
their
VO
representatives
• Per
User
• X.509
user
certiXicate
(although
I’d
like
to
hide
that
part)
• Induction:
unless
it
is
through
a
portal,
grid
computing
is
not
shared
Xile
system
batch
computing!
Many
more
failure
modes
and
gotchas.
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
40. Grid
Opportunities
• New
compute
intensive
workXlows
• think
big:
tens
or
hundreds
of
thousands
of
hours
Xinished
in
1-‐2
days
• sharing
resources
for
efXicient
and
large
scale
utilization
• Data
intensive
problems
• we
mirror
20
GB
of
data
to
30
computing
centers
• Data
movement,
management,
and
archive
• Federated
identity
and
user
management
• labs,
collaborations
or
ad-‐hoc
groups
• role-‐based
access
control
(RBAC)
and
IdM
• Collaborative
environment
• Web-‐based
access
to
applications
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
42. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
43. Typical
Layered
Environment
Fortran bin
• Command
line
application
(e.g.
Fortran)
• Friendly
application
API
wrapper Python API
Map- • Batch
execution
wrapper
for
N-‐iterations Multi-exec wrapper
Reduce • Results
extraction
and
aggregation Result aggregator
• Grid
job
management
wrapper Grid management
• Web
interface Web interface
• forms,
views,
static
HTML
results
• GOAL
eliminate
shell
scripts
• often
found
as
“glue”
language
between
layers
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
45. Service
Architecture
GlobusOnline UC San Diego
@Argonne GUMS
User GUMS
GridFTP + glideinWMS
data Hadoop factory Open Science Grid
computations
MyProxy
@NCSA, UIUC
monitoring interfaces data computation ID mgmt
Ganglia scp Condor FreeIPA
Apache DOEGrids CA
Nagios GridFTP Cycle Server @Lawrence
GridSite LDAP
RSV SRM VDT Berkley Labs
Django VOMS
Globus
pacct WebDAV
Sage Math GUMS
glideinWMS Gratia Acct'ing
R-Studio GACL @FermiLab
file SQL
shell CLI server DB cluster
Monitoring
SBGrid Science Portal @ Harvard Medical School @Indiana
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
46. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
47. • NEBioGrid
Django
Portal • PyGACL
Interactive
dynamic
web
portal
for
Python
representation
of
GACL
model
workXlow
deXinition,
submission,
and
API
to
work
with
GACL
Xiles
monitoring,
and
access
control • osg_wrap
• NEBioGrid
Web
Portal Swiss
army
knife
OSG
wrapper
script
to
GridSite
based
web
portal
for
Xile-‐system
handle
Xile
staging,
parameter
sweep,
level
access
(raw
job
output),
meta-‐data
DAG,
results
aggregation,
monitoring
tagging,
X.509
access
control/sharing,
• sbanalysis
CGI
data
analysis
and
graphing
tools
for
• PyCCP4 structural
biology
data
sets
Python
wrappers
around
CCP4
• osg.monitoring
structural
biology
applications
tools
to
enhance
monitoring
of
job
set
• PyCondor and
remote
OSG
site
status
Python
wrappers
around
common
• shex
Condor
operations
Write
bash
scripts
in
Python:
replicate
enhanced
Condor
log
analysis commands,
syntax,
behavior
• PyOSG • xconGig
Python
wrappers
around
common
OSG
Universal
conXiguration
operations
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
48. Web
Portals
for
Collaborative,
Multi-‐disciplinary
Research...
...
which
leverage
capabilities
of
federated
grid
computing
environments
Saturday, November 5, 2011
49. The
Browser
as
the
Universal
Interface
• If
it
isn’t
already
obvious
to
you
• Any
interactive
application
developed
today
should
be
web-‐based
with
a
RESTful
interface
(if
at
all
possible)
• A
rich
set
of
tools
and
techniques
• AJAX,
HTML4/5,
CSS,
and
JavaScript
• Dynamic
content
negotiation
• HTTP
headers,
caching,
security,
sessions/cookies
• Scalable,
replicable,
centralized,
multi-‐threaded,
multi-‐user
• Alternatives
• Command
Line
(CLI):
great
for
scriptable
jobs
• GUI
toolkits:
necessary
for
applications
with
high
graphics
or
I/O
demands
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
50. What
is
a
Science
Portal?
• A
web-‐based
gateway
to
resources
and
data
• simpliXied
access
• centralized
access
• uniXied
access
(CGI,
Perl,
Python,
PHP,
static
HTML,
static
Xiles,
etc.)
• Attempt
to
provide
uniform
access
to
a
range
of
services
and
resources
• Data
access
via
HTTP
• Leverage
brilliance
of
Apache
HTTPD
and
associated
modules
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
51. SBGrid
Science
Portal
Objectives
A.
Extensible
infrastructure
to
facilitate
development
and
deployment
of
novel
computational
workXlows
B.
Web-‐accessible
environment
for
collaborative,
compute
and
data
intensive
science
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
52. TeraGrid
SBGrid User NERSC
Community
Open Science Grid
National Federated
Cyberinfrastructure Odyssey
Facilitate
interface
between
community
and
cyberinfrastructure Orchestra
EC2
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
53. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
54. Results
Visualization
and
Analysis
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
55. REST
• Don’t
try
to
read
too
much
into
the
name
• REpresentational
State
Transfer:
coined
by
Roy
Fielding,
co-‐author
of
HTTP
protocol
and
contributor
to
original
Apache
httpd
server
• Idea
• The
web
is
the
worlds
largest
asynchronous,
distributed,
parallel
computational
system
• Resources
are
“hidden”
but
representations
are
accessible
via
URLs
• Representations
can
be
manipulated
via
HTTP
operations
GET
PUT
POST
HEAD
DELETE
and
associated
state
• State
transitions
are
initiated
by
software
or
by
humans
• Implication
• Clean
URLs
(e.g.
Flickr)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
56. Cloud
Computing:
Industry
solution
to
the
Grid
• Virtualization
has
taken
off
in
the
past
5
years
• VMWare,
Xen,
VirtualPC,
VirtualBox,
QEMU,
etc.
• Builds
on
ideas
from
VMS
(i.e.
old)
• (Good)
System
administrators
are
hard
to
come
by
• And
operating
a
large
data
center
is
costly
• Internet
boom
means
there
are
companies
that
have
Xigured
out
how
to
do
this
really
well
• Google,
Amazon,
Yahoo,
Microsoft,
etc.
• Outsource
IT
infrastructure!
Outsource
software
hosting!
• Amazon
EC2,
Microsoft
Azure,
RightScale,
Force.com,
Google
Apps
• Over
simpliXied:
• You
can’t
install
a
cloud
• You
can’t
buy
a
grid
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
57. Is
“Cloud”
the
new
“Grid”?
• Grid
is
about
mechanisms
for
federated,
distributed,
heterogeneous
shared
compute
and
storage
resources
• standards
and
software
• Cloud
is
about
on-‐demand
provisioning
of
compute
and
storage
resources
• services
No
one
buys
a
grid.
No
one
installs
a
cloud.
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
58. The
interesting
thing
about
Cloud
Computing
is
that
we’ve
redeSined
Cloud
Computing
to
include
everything
that
we
already
do.
.
.
.
I
don’t
understand
what
we
would
do
differently
in
the
light
of
Cloud
Computing
other
than
change
the
wording
of
some
of
our
ads.
Larry
Ellison,
Oracle
CEO,
quoted
in
the
Wall
Street
Journal,
September
26,
2008*
*http://blogs.wsj.com/biztech/2008/09/25/larry-‐ellisons-‐brilliant-‐anti-‐cloud-‐computing-‐rant/
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
59. When
is
cloud
computing
interesting?
• My
deXinition
of
“cloud
computing”
• Dynamic
compute
and
storage
infrastructure
provisioning
in
a
scalable
manner
providing
uniform
interfaces
to
virtualized
resources
• The
underlying
resources
could
be
•
“in-‐house”
using
licensed/purchased
software/hardware
• “external”
hosted
by
a
service/infrastructure
provider
• Consider
using
cloud
computing
if
• You
have
operational
problems/constraints
in
your
current
data
center
• You
need
to
dynamically
scale
(up
or
down)
access
to
services
and
data
• You
want
fast
provisioning,
lots
of
bandwidth,
and
low
latency
• Organizationally
you
can
live
with
outsourcing
responsibility
for
(some
of)
your
data
and
applications
• Consider
providing
cloud
computing
services
if
• You
have
an
ace
team
efXiciently
running
your
existing
data
center
• You
have
lots
of
experience
with
virtualization
• You
have
a
speciXic
application/domain
that
could
beneXit
from
being
tied
to
a
large
compute
farm
or
disk
array
with
great
Internet
connectivity
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
61. User
access
to
results
data
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
62. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
63. Experimental
Data
Access
• Collaboration
• Access
Control
• Identity
Management
• Data
Management
• High
Performance
Data
Movement
• Multi-‐modal
Access
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
64. Data
Model
• Data
Tiers
• VO-wide:
all
sites,
admin
managed,
very
stable
• User
project:
all
sites,
user
managed,
1-‐10
weeks,
1-‐3
GB
• User
static:
all
sites,
user
managed,
indeXinite,
10
MB
• Job
set:
all
sites,
infrastructure
managed,
1-‐10
days,
0.1-‐1
GB
• Job:
direct
to
worker
node,
infrastructure
managed,
1
day,
<10
MB
• Job
indirect:
to
worker
node
via
UCSD,
infrastructure
managed,
1
day,
<10
GB
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
65. About
2PB
with
100
front
end
servers
for
high
bandwidth
parallel
Xile
transfer
Data
Management
quota
du
scan
tmpwatch
conventions
workXlow
integration
Data
Movement
scp
(users)
rsync
(VO-‐wide)
grid-‐ftp
(UCSD)
curl
(WNs)
cp
(NFS)
htcp
(secure
web)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
66. Globus
Online:
High
Performance
Reliable
3rd
Party
File
Transfer
GUMS
DN
to
user
mapping CertiXicate
Authority
VOMS root
of
trust
VO
membership
portal
cluster
Globus
Online
Xile
transfer
service
data collection
lab file
facility
server
desktop laptop
Saturday, November 5, 2011
67. Architecture
✦ SBGrid
• manages
all
user
account
creation
and
credential
mgmt
• hosts
MyProxy,
VOMS,
GridFTP,
and
user
interfaces
✦ Facility
• knows
about
lab
groups
• e.g.
“Harrison”,
“Sliz”
• delegates
knowledge
of
group
membership
to
SBGrid
VOMS
• facility
can
poll
VOMS
for
list
of
current
members
• uses
X.509
for
user
identiXication
• deploys
GridFTP
server
✦ Lab
group
• designates
group
manager
that
adds/removes
individuals
• deploys
GridFTP
server
or
Globus
Connect
client
✦ Individual
• username/password
to
access
facility
and
lab
storage
• Globus
Connect
for
personal
GridFTP
server
to
laptop
• Globus
Online
web
interface
to
“drive”
transfers
Saturday, November 5, 2011
69. Objective
✦ Easy
to
use
high
performance
data
mgmt
environment
✦ Fast
Xile
transfer
• facility-‐to-‐lab,
facility-‐to-‐individual,
lab-‐to-‐individual
✦ Reduced
administrative
overhead
✦ Better
data
curation
Saturday, November 5, 2011
70. Challenges
✦ Access
control
• visibility
• policies
✦ Provenance
• data
origin
• history
✦ Meta-‐data
• attributes
• searching
Saturday, November 5, 2011
72. UniMied
Account
Management
Hierarchical
LDAP
database
user
basics
passwords
Standard
schemas
Relational
DB
user
custom
proXiles
institutions
lab
groups
Custom
schemas
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
73. X.509
Digital
CertiMicates
✦ Analogy
to
a
passport:
• Application
form
• Sponsor’s
attestation
• Consular
services
• veriXication
of
application,
sponsor,
and
accompanying
identiXication
and
eligibility
documents
• Passport
issuing
ofXice
✦ Portable,
digital
passport
• Xixed
and
secure
user
identiXiers
• name,
email,
home
institution
• signed
by
widely
trusted
issuer
• time
limited
• ISO
standard
Saturday, November 5, 2011
75. Access
Control
• Need
a
strong
Identity
Management
environment
• individuals:
identity
tokens
and
identiXiers
• groups:
membership
lists
• Active
Directory/CIFS
(Windows),
Open
Directory
(Apple),
FreeIPA
(Unix)
all
LDAP-‐
based
• Need
to
manage
and
communicate
Access
Control
policies
• institutionally
driven
• user
driven
• Need
Authorization
System
• Policy
Enforcement
Point
(shell
login,
data
access,
web
access,
start
application)
• Policy
Decision
Point
(store
policies
and
understand
relationship
of
identity
token
and
policy)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
76. Access
Control
• What
is
a
user?
• .htaccess
and
.htpasswd
• local
system
user
(NIS
or
/etc/passwd)
• portal
framework
user
(proprietary
DB
schema)
• grid
user
(X.509
DN)
• What
are
we
securing
access
to?
• Web
pages?
• URLs?
• Data?
• SpeciXic
operations?
• Meta
Data?
• What
kind
of
policies
do
we
enable?
• Simplify
to
READ
WRITE
EXECUTE
LIST
ADMIN
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
77. Existing
Security
Infrastructure
• X.509
certiXicates
• Department
of
Energy
CA
• Regional/Institutional
RAs
(SBGrid
is
an
RA)
• X.509
proxy
certiXicate
system
• Users
self-‐sign
a
short-‐lived
passwordless
proxy
certiXicate
used
for
“portable”
and
“automated”
grid
processing
identity
token
• Similarities
to
Kerberos
tokens
• Virtual
Organizations
(VO)
for
deXinitions
of
roles,
groups,
attrs
• Attribute
CertiXicates
• Users
can
(attempt)
to
fetch
ACs
from
the
VO
to
be
attached
to
proxy
certs
• POSIX-‐like
Xile
access
control
(Grid
ACL)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
78. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
79. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
80. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
81. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
82. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
83. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011
84. Acknowledgements
&
Questions
• Piotr
Sliz
Please
contact
me
• Principle
Investigator,
head
of
SBGrid
with
any
questions:
• SBGrid
Science
Portal • Ian
Stokes-‐Rees
• Daniel
O’Donovan,
Meghan
Porter-‐Mahoney
• ijstokes@hkl.hms.harvard.edu
• SBGrid
System
Administrators • ijstokes@spmetric.com
• Ian
Levesque,
Peter
Doherty,
Steve
Jahl
• Globus
Online
Team Look
at
our
work
• Steve
Tueke,
Ian
Foster,
Rachana
• portal.sbgrid.org
Ananthakrishnan,
Raj
Kettimuthu
• www.sbgrid.org
• Ruth
Pordes • www.opensciencegrid.org
• Director
of
OSG,
for
championing
SBGrid
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
Saturday, November 5, 2011