SlideShare a Scribd company logo
Manage Data with Assurance
Ian Foster
Rachana Ananthakrishnan
Steve Tuecke
Vas Vasiliadis
Mission
Increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
Data keeps moving!
3
Globus by the numbers...
7,400
active shared
endpoints
100+
subscribers
600 PB
moved
22,000
active personal
endpoints
90 billion
files processed
1,800
active server
endpoints
3 months
longest running transfer
1 PB
largest single
transfer to date
99.9%
availability
600+
identity providers
2000+
most shared
endpoints
at a single
institution 138,000
registered users
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 Jul-16 Jan-17 Jul-17 Jan-18 Jul-18 Jan-19
Active Endpoints by Month
Free Subscribed
Globus User Story Highlights
File Sharing
Value
Improved
Performance
Ease of Use
Connector
Benefits
“We needed an easy way to share terabytes of data on a regular basis
with dozens of researchers. Thanks to Globus sharing, it’s easy for us to
get our researchers the data they need.”
Platform
Development
“Now Canadian researchers have a single repository where data can
easily and securely be accessed, searched and shared.”
“With Globus, our
researchers have one less
thing to worry about!”
“I routinely have to move hundreds of gigabytes of data – Globus makes it
easy, so I can execute these transfers with very little effort.”
“Users can quickly, effectively, and
securely share data with their research
community or the broader public.”
“WVU uses Globus to
archive research data
out to Google Drive.”
“[BlackPearl with Globus] enables us
to archive and share petabytes of
information in a convenient solution.”
Usage Briefs: www.globus.org/usage-brief-library User Stories: www.globus.org/user-stories
What makes it all worthwhile
“Whatever you are studying right now, if
you are not getting up to speed on deep
learning, neural networks, etc., you lose.
We are going through the process where
software will automate software,
automation will automate automation.”
-- Mark Cuban
10
Configure apparatus/write code
Run experiments
Solve
societal
problems
Create knowledge
What scientists
want to do
Most
scientist
time
Analyze and plan
Opportunities for AI in science:
Research today
11
Run experiments
Create knowledge
Most
scientist
time
AI
assistants
Analyze and plan
Opportunities for AI in science:
Research tomorrow
Solve
societal
problems
Configure apparatus/write code
AI at Argonne: data-driven discovery
Strong and weak lensing
in sky survey data
Prediction of antimicrobial
resistance phenotypes
Prediction of radiation
stopping power
Identification and tracking
of storms
Parameter extraction in
atom probe tomography
Learning for dynamic
sampling in spectroscopy
Structure-property-process
triangle in additive manufact.
Vehicle energy
consumption prediction
Photometric red shift
estimation
New materials for efficient
solar cells
Cosmic Microwave
Background emulation
Enhancement of noisy
tomographic images
Nowcasting with
convolutional LSTMs
Efficient climate model
emulators
Defect-level prediction in
seminconductors
Flying object detector for
edge deployment
Discovery of new energy
storage materials
Reduced order modeling
of laser sintering
13
Model
creation
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerato
rs
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Rethinking Data infrastructure for Science AI
14
Model
creation
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
Rethinking Data infrastructure for Science AI
15
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
Transfer
Auth
Sharing
Model
creation
Rethinking Data infrastructure for Science AI
16
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
funcX
Transfer
Automate
Auth
Sharing
Identifers
Model
creation
Rethinking Data infrastructure for Science AI
17
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
DLHub
xDF
funcX
Parsl
Transfer
Automate
Petrel
Auth
Sharing
Identifers
Model
creation
CANDLE
Rethinking Data infrastructure for Science AI
DLHub: Organizing and Serving Models
• Collect, publish, categorize models
• Serve models via API with access
controls to simplify sharing,
consumption, and access
• Leverage ALCF resources and
prepare for Exascale ML
• Deploy and scale automatically
• Provide citable DOI for
reproducible science
Argonne Advanced Computing LDRD Cherukara et al.
Energy Storage Tomography
www.dlhub.org Models and Processing Logic as a Service
X-Ray Science
Ward et al. TomoGAN: Liu et al.
Input
Output
funcX: Think “compute endpoints”
funcX: Think “compute endpoints”
Automation: Ripple Pipelines21
Automation: Neuroanatomy
Web
form
User input
Search
Ingest
Share
Set policy
Identifier
Mint DOI
funcX
Auth
Get
credentials
Automate
Run job
Describe
Get
metadata
Transfer
Transfer
data
funcX
Run job
Transfer
Transfer
data
Manage Protected Data
25
Higher assurance levels for HIPAA and other regulated data
• Support for managed data
transfer of protected data such
as health related information
• Share data with collaborators
while meeting compliance
requirements
• Administration and
management of access
• Includes BAA option
Globus for high assurance data management
• Restricted data handling
– PHI (Protected Health Information)
– PII (Personally identifiable information)
– Controlled Unclassified Information
• University of Chicago security controls
– NIST 800-53 Low
– Superset of 800-171 Low
• Business Associate Agreements (BAA) between
University of Chicago and our subscribers
Services in scope
• Globus Services: Auth, Transfer & Sharing, Groups
• Globus Connect Server v5.2 and above
• Globus Connect Personal v3.x
• Web app (app.globus.org)
• Globus Command Line Interface (CLI)
• Connectors: POSIX, Google Drive, AWS S3, CEPH
Restricted data disclosure to Globus
• Globus never sees file contents
– File contents can have restricted data
• File paths/name can have restricted data (e.g. PHI)
• No other elements (endpoint definitions, labels,
collection definitions) can contain restricted data
Product enhancements for high assurance
• Additional authentication assurance
– Authenticate with specific identity within specific time within a
session
• Isolation of applications
– Authentication context is per application, per session (~browser
session)
• Enforces encryption of all user data in transit
• Audit logging
– Both at the institution and Globus services
Product enhancements for high assurance
• Additional security requirements enforced on
management of all high assurance resources
– Data access, and any interaction that can lead to data access
– Examples: Groups, Management Console
• Enhanced user interfaces for seamless management
of protected data
– Webapp and CLI
Operational enhancements for high assurance
• Intrusion detection and prevention
• Encryption
• Enhanced logging
• Secure remote access, access control, and secure
practices for laptops
• Uniform configuration management and change control
• AWS best practices for secure environment: VPCs,
security groups, IAM best practices
New subscription levels
• High Assurance
– 33% uplift on Standard subscription
and on premium connectors used for
high assurance data
• BAA
– All High Assurance features + BAA
with University of Chicago
– 50% uplift on Standard subscription
and on premium connectors used
under a BAA
High Assurance
Demonstration
33
Web app enhancements
• Accessibility
– Target WCAG 2.0 AA compliance
• Responsiveness and touch
• Works with new connectors
collections.globus.org
34
Web app enhancements
• Customizable interface
• Full screen view
• Compact file listing
display
• Remember user
configuration
– Single vs. dual panel
– Columns displayed
• Continue incorporating
user feedback
CLI enhancements
• Support for use with high assurance collections
• '--format UNIX' flag - output suitable for line-oriented
processing with typical Unix tools
• 'globus rm' command
• 'globus whoami --linked-identities' flag to show all linked
identities
• '--timeout-exit-code' flag overrides the default exit code
for commands which wait on tasks
• Enhancements to SDK as needed.
36
Connector updates
• Enhanced user experience for credential handling for
several connectors (GCSv5)
• AWS S3
– Automated multi-region support
• Google Drive
– Enhancement to retry handling for large transfers
• HPSS
– Support added for HPSS 7.5 (7.3 to 7.5 supported)
– Improved asynchronous staging from tape
– New home for documentation: docs.globus.org/premium-
storage-connectors/hpss
38
S3 compatible systems
• Initial customer
deployments
• Validation, testing and
vendor engagement planned
• Additional systems driven
by customer demand
39
Announcing our latest
connector…
beta
globus.org/connectors/box
Globus for Box
• Extends the value of your Box deployment
• Unifies access to cloud and on-prem storage
• Transitions protected data (HIPAA-regulated,
CUI) seamlessly between Box and other storage
systems
41
42
Box for Globus
Demonstration
Make Box part of your
research storage ecosystem
globus.org/connectors/box
docs.globus.org/premium-storage-connectors/box
Globus Connect Server v5.3
• Subsumes GCS version 5.0, 5.1, 5.2
• Standard and high assurance guest collections (sharing)
• High assurance mapped collections
• Connectors: POSIX, AWS S3, CEPH, Google Drive, Box
• Data access protocols: GridFTP and HTTPS
• Single deployment support both high assurance and
standard gateway
• Upgrade all v5.x deployments to v5.3
Recent Transfer enhancements
• Verify transfer using client provided checksums
– User provided checksum used rather than source checksum for
verification
• Improvements for scaling transfer service
– Multiple nodes for transfer service for higher availability and
reliability
– Allows for code updates with no downtime
46
SSH with OAuth
• Securely access resource using SSH with federated identity
– Facilitates automation, eliminates SSH key management
– Replacement for deprecated GSI OpenSSH
• First version released
– Server side PAM module with Globus Auth support
– Command line client
• Open source, community support
– Not part of the standard subscription
– OAuth SSH Client: https://pypi.org/project/oauth-ssh/
– OAuth SSH Server PAM module: https://github.com/xsede/oauth-ssh
Where are we headed?
Enhancing the core:
Transfer
Building the future:
Platform
Globus Transfer: A complete solution
☑ Bulk transfer and sync
☑ Good end-to-end performance in myriad of real world settings
☑ End-to-end reliability
☑ Robust security, with federated identities
☑ Layers onto diverse storage systems
☑ Web-compatible client/server remote access
☑ Easy to use interfaces
☑ Easy installation and administration
☑ Sharing data with guest users
☑ Dedicated professional support
50
HTTPS and what it enables
• Browser based up/download
• Allow your
(research) storage
to be “on the web”
• Enforce same security
policies
51
Globus Connect Server v5 Milestones
v5.0: Google
Drive
v5.1: POSIX guest
collections, HTTPS
v5.x: v4 feature parity+
v5.3
• Multi DTN support
• Additional storage
systems
• Endpoint specific
identity providers
• …
Other
features
v5.2: High
assurance
v5.4: …
GCSv5: Key enabling technology for the future
• Challenge: Managing increasing amount of shared, dynamic state among multiple
DTNs
– Endpoint configuration
– Multiple storage gateway configurations
– Collection configurations
– Credentials (user and system)
• Approach: Stateless DTNs
– No persistent state on DTN
– Multi-DTN endpoints without a shared file system
• GCS state stored in the cloud
– Dynamic sync of state to each DTN
– Enabled by our use of AWS AppSync
• Customer managed encryption keys with optional escrow
– Only you can see and modify your endpoint’s state
• Facilitates creation of new Globus Connect features
GCSv5 has significant admin benefits
• Greatly simplified multi-DTN deployment
– Bootstrap DTN from only client id & secret, and encryption key
– No more copy-pasting GCS config files with every change
– Command line, REST API, and (eventually) web admin of GCS
– Automatic synchronization amongst DTNs
• Rapid recovery from failures
– Restore all nodes from stored state with minimal effort
– No local backups of GCS state required
• Lost client ID/secret? Recover them from Auth.
• Enables us to roll out new features more quickly
What does it mean for you?
• No sudden moves!
• Ready for GCS v4 to v5 migration late this year
• Tools will be available for migration from GCS v4
• Comprehensive documentation
• Long migration period with parallel support of v5 & v4
• Only use GCS v5 today if you need its specific
features, otherwise continue to use GCS v4
Planned Features for Globus Transfer
• S3 compatible HTTPS interface to GCSv5 storage
• Browser based up/downloaders
• Multiple checksum algorithm support
• Manifest support
• Automated recurring replication as a service
• …
57
Rethinking data publication
• Limited adoption
– Not easily customizable
• Maintenance Challenges
– Costly to maintain
– JRE licensing concerns
• Going forward
– Code will be open source
– Leverage platform
• Invest in higher priorities
Platform challenge
• Transform how research applications, services, and
workflows are created, delivered, used, and sustained
– Scientific instrument data processing
– Repositories: Make data more FAIR
– Science gateways
• Interoperable ecosystem
59
Globus platform services
• Identity and Access Management (IAM)
– Federated identity login, Groups, Attributes, Access Control
– Auth: Oauth authorization provider
• Connect
• Transfer
– Will become a family of services
• Execution
• Search, Identifiers
• Automation
– Queues, Events, Actions, Triggers
– Flows
60
Globus Platform: Automation
61
Platform status
• Generally Available in a few years
• Separate product with separate sustainability model
• Early engagements help shape product direction
– Argonne Leadership Computing Facility, Materials Data Facility,
– NCAR Research Data Archive, NSO, …
– Use in Globus products
• Multiple integrations facilitate more complete solution
– e.g. Django, JupyterHub
– Follow progress: globus-integration-examples.readthedocs.io
• Currently accessible via professional services team
We are committed to doing
all this sustainably
Our focus: You, the
research community
is
Why not do a for-profit?
Focus: Investor ROI
è can’t serve you properly!
Sustainability >> $$
No single points of failure
Subscriber Value =
Engineering (DevOps)
+
Customer facing operations
(support, sales, outreach, training,
professional services)
Freemium means
managing tension!
Meeting current
customer needs…
…and furthering
strategic aspirations
Customer community
Delivering on requests
Product planning process
Contractual challenges
Is there a better model?
Internet2-like membership?
Network infrastructure
services provider
Research software
provider
Member fee ≈ sustainability
Governance model ≈
product influence
Do the dynamics change?
- Willingness to join/pay?
- Sufficient revenue growth?
- Greater subscriber satisfaction?
Why now?
Increasing view of Globus
as “enterprise” service
RCC à CIO
Data management needs are
increasingly pervasive
✓ Network
✓ Cycles
✓ Storage
Robust data management for all?
Expand the dialogue
HPC Management
+ IT Leadership
+ Researcher Community
From “Purchase” to “Invest”
Everyone derives more value if
Globus is a strategic partner
Intrigued?
Confused?
Amused?
Share your thoughts with us!
Thank you to our sponsors...
U . S . D E P A R T M E N T O F
ENERGY
THANK YOU, subscribers!
Program Preview
• Today
– Lightning talks
– Guest keynotes: Tom Barton, Bobby Kasthuri
– Reception
• Tomorrow
– Tutorials
– Office Hours
• Friday morning
– Customer forum
globusworld.org/conf/program
#globusworld
@globus

More Related Content

What's hot

What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Robert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
Robert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Robert Grossman
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
Ian Foster
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
Ian Foster
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
Vivien Bonazzi
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
Michael Bar-Sinai
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
Vaticle
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
Merce Crosas
 
CV-KS-Jun2015
CV-KS-Jun2015CV-KS-Jun2015
CV-KS-Jun2015
Kamran Sartipi
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
University of California Curation Center
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble
 

What's hot (20)

What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
CV-KS-Jun2015
CV-KS-Jun2015CV-KS-Jun2015
CV-KS-Jun2015
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 

Similar to GlobusWorld 2019 Opening Keynote

Webinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesWebinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription Features
Globus
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
Ian Foster
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
LizLyon
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Globus
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
Ian Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
Ian Foster
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
Ravi Madduri
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
Gary Wilhelm
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
Ian Foster
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
Globus
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
Ian Foster
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
LEARN Project
 
Physion.PDF
Physion.PDFPhysion.PDF
Physion.PDF
Sunanda Nair
 

Similar to GlobusWorld 2019 Opening Keynote (20)

Webinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesWebinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription Features
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
Physion.PDF
Physion.PDFPhysion.PDF
Physion.PDF
 

More from Globus

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdfExtending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus
 
Globus at the United States Geological Survey
Globus at the United States Geological SurveyGlobus at the United States Geological Survey
Globus at the United States Geological Survey
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus
 
Reactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the GapReactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus
 

More from Globus (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdfExtending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
 
Globus at the United States Geological Survey
Globus at the United States Geological SurveyGlobus at the United States Geological Survey
Globus at the United States Geological Survey
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflows
 
Reactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the GapReactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the Gap
 

Recently uploaded

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 

Recently uploaded (20)

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 

GlobusWorld 2019 Opening Keynote

  • 1. Manage Data with Assurance Ian Foster Rachana Ananthakrishnan Steve Tuecke Vas Vasiliadis
  • 2. Mission Increase the efficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software
  • 4. Globus by the numbers... 7,400 active shared endpoints 100+ subscribers 600 PB moved 22,000 active personal endpoints 90 billion files processed 1,800 active server endpoints 3 months longest running transfer 1 PB largest single transfer to date 99.9% availability 600+ identity providers 2000+ most shared endpoints at a single institution 138,000 registered users
  • 5. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 Jul-16 Jan-17 Jul-17 Jan-18 Jul-18 Jan-19 Active Endpoints by Month Free Subscribed
  • 6.
  • 7. Globus User Story Highlights File Sharing Value Improved Performance Ease of Use Connector Benefits “We needed an easy way to share terabytes of data on a regular basis with dozens of researchers. Thanks to Globus sharing, it’s easy for us to get our researchers the data they need.” Platform Development “Now Canadian researchers have a single repository where data can easily and securely be accessed, searched and shared.” “With Globus, our researchers have one less thing to worry about!” “I routinely have to move hundreds of gigabytes of data – Globus makes it easy, so I can execute these transfers with very little effort.” “Users can quickly, effectively, and securely share data with their research community or the broader public.” “WVU uses Globus to archive research data out to Google Drive.” “[BlackPearl with Globus] enables us to archive and share petabytes of information in a convenient solution.” Usage Briefs: www.globus.org/usage-brief-library User Stories: www.globus.org/user-stories What makes it all worthwhile
  • 8. “Whatever you are studying right now, if you are not getting up to speed on deep learning, neural networks, etc., you lose. We are going through the process where software will automate software, automation will automate automation.” -- Mark Cuban
  • 9.
  • 10. 10 Configure apparatus/write code Run experiments Solve societal problems Create knowledge What scientists want to do Most scientist time Analyze and plan Opportunities for AI in science: Research today
  • 11. 11 Run experiments Create knowledge Most scientist time AI assistants Analyze and plan Opportunities for AI in science: Research tomorrow Solve societal problems Configure apparatus/write code
  • 12. AI at Argonne: data-driven discovery Strong and weak lensing in sky survey data Prediction of antimicrobial resistance phenotypes Prediction of radiation stopping power Identification and tracking of storms Parameter extraction in atom probe tomography Learning for dynamic sampling in spectroscopy Structure-property-process triangle in additive manufact. Vehicle energy consumption prediction Photometric red shift estimation New materials for efficient solar cells Cosmic Microwave Background emulation Enhancement of noisy tomographic images Nowcasting with convolutional LSTMs Efficient climate model emulators Defect-level prediction in seminconductors Flying object detector for edge deployment Discovery of new energy storage materials Reduced order modeling of laser sintering
  • 13. 13 Model creation Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerato rs Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Rethinking Data infrastructure for Science AI
  • 14. 14 Model creation Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers Rethinking Data infrastructure for Science AI
  • 15. 15 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers Transfer Auth Sharing Model creation Rethinking Data infrastructure for Science AI
  • 16. 16 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers funcX Transfer Automate Auth Sharing Identifers Model creation Rethinking Data infrastructure for Science AI
  • 17. 17 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers DLHub xDF funcX Parsl Transfer Automate Petrel Auth Sharing Identifers Model creation CANDLE Rethinking Data infrastructure for Science AI
  • 18. DLHub: Organizing and Serving Models • Collect, publish, categorize models • Serve models via API with access controls to simplify sharing, consumption, and access • Leverage ALCF resources and prepare for Exascale ML • Deploy and scale automatically • Provide citable DOI for reproducible science Argonne Advanced Computing LDRD Cherukara et al. Energy Storage Tomography www.dlhub.org Models and Processing Logic as a Service X-Ray Science Ward et al. TomoGAN: Liu et al. Input Output
  • 19. funcX: Think “compute endpoints”
  • 20. funcX: Think “compute endpoints”
  • 22. Automation: Neuroanatomy Web form User input Search Ingest Share Set policy Identifier Mint DOI funcX Auth Get credentials Automate Run job Describe Get metadata Transfer Transfer data funcX Run job Transfer Transfer data
  • 23.
  • 24.
  • 25. Manage Protected Data 25 Higher assurance levels for HIPAA and other regulated data • Support for managed data transfer of protected data such as health related information • Share data with collaborators while meeting compliance requirements • Administration and management of access • Includes BAA option
  • 26. Globus for high assurance data management • Restricted data handling – PHI (Protected Health Information) – PII (Personally identifiable information) – Controlled Unclassified Information • University of Chicago security controls – NIST 800-53 Low – Superset of 800-171 Low • Business Associate Agreements (BAA) between University of Chicago and our subscribers
  • 27. Services in scope • Globus Services: Auth, Transfer & Sharing, Groups • Globus Connect Server v5.2 and above • Globus Connect Personal v3.x • Web app (app.globus.org) • Globus Command Line Interface (CLI) • Connectors: POSIX, Google Drive, AWS S3, CEPH
  • 28. Restricted data disclosure to Globus • Globus never sees file contents – File contents can have restricted data • File paths/name can have restricted data (e.g. PHI) • No other elements (endpoint definitions, labels, collection definitions) can contain restricted data
  • 29. Product enhancements for high assurance • Additional authentication assurance – Authenticate with specific identity within specific time within a session • Isolation of applications – Authentication context is per application, per session (~browser session) • Enforces encryption of all user data in transit • Audit logging – Both at the institution and Globus services
  • 30. Product enhancements for high assurance • Additional security requirements enforced on management of all high assurance resources – Data access, and any interaction that can lead to data access – Examples: Groups, Management Console • Enhanced user interfaces for seamless management of protected data – Webapp and CLI
  • 31. Operational enhancements for high assurance • Intrusion detection and prevention • Encryption • Enhanced logging • Secure remote access, access control, and secure practices for laptops • Uniform configuration management and change control • AWS best practices for secure environment: VPCs, security groups, IAM best practices
  • 32. New subscription levels • High Assurance – 33% uplift on Standard subscription and on premium connectors used for high assurance data • BAA – All High Assurance features + BAA with University of Chicago – 50% uplift on Standard subscription and on premium connectors used under a BAA
  • 34. Web app enhancements • Accessibility – Target WCAG 2.0 AA compliance • Responsiveness and touch • Works with new connectors collections.globus.org 34
  • 35. Web app enhancements • Customizable interface • Full screen view • Compact file listing display • Remember user configuration – Single vs. dual panel – Columns displayed • Continue incorporating user feedback
  • 36. CLI enhancements • Support for use with high assurance collections • '--format UNIX' flag - output suitable for line-oriented processing with typical Unix tools • 'globus rm' command • 'globus whoami --linked-identities' flag to show all linked identities • '--timeout-exit-code' flag overrides the default exit code for commands which wait on tasks • Enhancements to SDK as needed. 36
  • 37.
  • 38. Connector updates • Enhanced user experience for credential handling for several connectors (GCSv5) • AWS S3 – Automated multi-region support • Google Drive – Enhancement to retry handling for large transfers • HPSS – Support added for HPSS 7.5 (7.3 to 7.5 supported) – Improved asynchronous staging from tape – New home for documentation: docs.globus.org/premium- storage-connectors/hpss 38
  • 39. S3 compatible systems • Initial customer deployments • Validation, testing and vendor engagement planned • Additional systems driven by customer demand 39
  • 41. Globus for Box • Extends the value of your Box deployment • Unifies access to cloud and on-prem storage • Transitions protected data (HIPAA-regulated, CUI) seamlessly between Box and other storage systems 41
  • 43. Make Box part of your research storage ecosystem globus.org/connectors/box docs.globus.org/premium-storage-connectors/box
  • 44.
  • 45. Globus Connect Server v5.3 • Subsumes GCS version 5.0, 5.1, 5.2 • Standard and high assurance guest collections (sharing) • High assurance mapped collections • Connectors: POSIX, AWS S3, CEPH, Google Drive, Box • Data access protocols: GridFTP and HTTPS • Single deployment support both high assurance and standard gateway • Upgrade all v5.x deployments to v5.3
  • 46. Recent Transfer enhancements • Verify transfer using client provided checksums – User provided checksum used rather than source checksum for verification • Improvements for scaling transfer service – Multiple nodes for transfer service for higher availability and reliability – Allows for code updates with no downtime 46
  • 47. SSH with OAuth • Securely access resource using SSH with federated identity – Facilitates automation, eliminates SSH key management – Replacement for deprecated GSI OpenSSH • First version released – Server side PAM module with Globus Auth support – Command line client • Open source, community support – Not part of the standard subscription – OAuth SSH Client: https://pypi.org/project/oauth-ssh/ – OAuth SSH Server PAM module: https://github.com/xsede/oauth-ssh
  • 48. Where are we headed?
  • 50. Globus Transfer: A complete solution ☑ Bulk transfer and sync ☑ Good end-to-end performance in myriad of real world settings ☑ End-to-end reliability ☑ Robust security, with federated identities ☑ Layers onto diverse storage systems ☑ Web-compatible client/server remote access ☑ Easy to use interfaces ☑ Easy installation and administration ☑ Sharing data with guest users ☑ Dedicated professional support 50
  • 51. HTTPS and what it enables • Browser based up/download • Allow your (research) storage to be “on the web” • Enforce same security policies 51
  • 52. Globus Connect Server v5 Milestones v5.0: Google Drive v5.1: POSIX guest collections, HTTPS v5.x: v4 feature parity+ v5.3 • Multi DTN support • Additional storage systems • Endpoint specific identity providers • … Other features v5.2: High assurance v5.4: …
  • 53.
  • 54. GCSv5: Key enabling technology for the future • Challenge: Managing increasing amount of shared, dynamic state among multiple DTNs – Endpoint configuration – Multiple storage gateway configurations – Collection configurations – Credentials (user and system) • Approach: Stateless DTNs – No persistent state on DTN – Multi-DTN endpoints without a shared file system • GCS state stored in the cloud – Dynamic sync of state to each DTN – Enabled by our use of AWS AppSync • Customer managed encryption keys with optional escrow – Only you can see and modify your endpoint’s state • Facilitates creation of new Globus Connect features
  • 55. GCSv5 has significant admin benefits • Greatly simplified multi-DTN deployment – Bootstrap DTN from only client id & secret, and encryption key – No more copy-pasting GCS config files with every change – Command line, REST API, and (eventually) web admin of GCS – Automatic synchronization amongst DTNs • Rapid recovery from failures – Restore all nodes from stored state with minimal effort – No local backups of GCS state required • Lost client ID/secret? Recover them from Auth. • Enables us to roll out new features more quickly
  • 56. What does it mean for you? • No sudden moves! • Ready for GCS v4 to v5 migration late this year • Tools will be available for migration from GCS v4 • Comprehensive documentation • Long migration period with parallel support of v5 & v4 • Only use GCS v5 today if you need its specific features, otherwise continue to use GCS v4
  • 57. Planned Features for Globus Transfer • S3 compatible HTTPS interface to GCSv5 storage • Browser based up/downloaders • Multiple checksum algorithm support • Manifest support • Automated recurring replication as a service • … 57
  • 58. Rethinking data publication • Limited adoption – Not easily customizable • Maintenance Challenges – Costly to maintain – JRE licensing concerns • Going forward – Code will be open source – Leverage platform • Invest in higher priorities
  • 59. Platform challenge • Transform how research applications, services, and workflows are created, delivered, used, and sustained – Scientific instrument data processing – Repositories: Make data more FAIR – Science gateways • Interoperable ecosystem 59
  • 60. Globus platform services • Identity and Access Management (IAM) – Federated identity login, Groups, Attributes, Access Control – Auth: Oauth authorization provider • Connect • Transfer – Will become a family of services • Execution • Search, Identifiers • Automation – Queues, Events, Actions, Triggers – Flows 60
  • 62. Platform status • Generally Available in a few years • Separate product with separate sustainability model • Early engagements help shape product direction – Argonne Leadership Computing Facility, Materials Data Facility, – NCAR Research Data Archive, NSO, … – Use in Globus products • Multiple integrations facilitate more complete solution – e.g. Django, JupyterHub – Follow progress: globus-integration-examples.readthedocs.io • Currently accessible via professional services team
  • 63. We are committed to doing all this sustainably
  • 64. Our focus: You, the research community is
  • 65. Why not do a for-profit? Focus: Investor ROI è can’t serve you properly!
  • 66. Sustainability >> $$ No single points of failure
  • 67. Subscriber Value = Engineering (DevOps) + Customer facing operations (support, sales, outreach, training, professional services)
  • 68. Freemium means managing tension! Meeting current customer needs…
  • 70. Customer community Delivering on requests Product planning process Contractual challenges
  • 71. Is there a better model? Internet2-like membership?
  • 73. Member fee ≈ sustainability Governance model ≈ product influence
  • 74. Do the dynamics change? - Willingness to join/pay? - Sufficient revenue growth? - Greater subscriber satisfaction?
  • 75. Why now? Increasing view of Globus as “enterprise” service RCC à CIO
  • 76. Data management needs are increasingly pervasive ✓ Network ✓ Cycles ✓ Storage Robust data management for all?
  • 77. Expand the dialogue HPC Management + IT Leadership + Researcher Community
  • 78. From “Purchase” to “Invest” Everyone derives more value if Globus is a strategic partner
  • 80. Thank you to our sponsors... U . S . D E P A R T M E N T O F ENERGY
  • 82. Program Preview • Today – Lightning talks – Guest keynotes: Tom Barton, Bobby Kasthuri – Reception • Tomorrow – Tutorials – Office Hours • Friday morning – Customer forum globusworld.org/conf/program