"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
L'expérience du développement de CRESON, support pour des objets distants fortement cohérents dans Infinispan, par Etienne Riviere (UCLouvain).
Cet exposé présentera des résultats obtenus dans le cadre du projet européen LEADS que j'ai coordonné et où l'entreprise Red Hat était partenaire. Le code produit a été intégré dans le “staging" de la base de données NoSQL Infinispan, et évalué avec un équivalent open source de Dropbox développé par CloudSpaces, un autre projet européen.
Pull vs Push is the hot topic when you starts to evaluate a monitoring system. During this talk I showed how Prometheus and InfluxDB work and how you can get service discovery and pull mechanism with InfluxDB. The demo is linked as github repository.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
L'expérience du développement de CRESON, support pour des objets distants fortement cohérents dans Infinispan, par Etienne Riviere (UCLouvain).
Cet exposé présentera des résultats obtenus dans le cadre du projet européen LEADS que j'ai coordonné et où l'entreprise Red Hat était partenaire. Le code produit a été intégré dans le “staging" de la base de données NoSQL Infinispan, et évalué avec un équivalent open source de Dropbox développé par CloudSpaces, un autre projet européen.
Pull vs Push is the hot topic when you starts to evaluate a monitoring system. During this talk I showed how Prometheus and InfluxDB work and how you can get service discovery and pull mechanism with InfluxDB. The demo is linked as github repository.
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?Nane Kratzke
This is some input for a panel discussion about "Challenges of Cloud Computing-based Systems" I attend at the 9th International Conference on Cloud Computing, GRIDs, and Virtualization (CLOUD COMPUTING 2018) in Barcelona, Spain in February 2018.
Cloud-native applications (CNA) are build more and more often according to microservice and independent system architecture (ISA) approaches. ISA involves two architecture layers: the macro and the micro architecture layer. Software engineering outcomes on the micro layer are often distributed in a standardized form as self-contained deployment units (so called container images). There exist plenty of programming languages to implement these units: JAVA, C, C++, JavaScript, Python, R, PHP, Ruby, ... (this list is almost endless) But on the macro layer, one might mention TOSCA and little more. TOSCA is an OASIS deployment and orchestration standard language to describe a topology of cloud based web services, their components, relationships, and the processes that manage them. This works for static deployments. However, CNA are elastic, self-adaptive - almost the exact opposite of what can be defined efficiently using TOSCA. For these kind of scenarios one might mention Kubernetes or Docker Swarm as container orchestrators which are intentionally build to operate elastic services formed of containers. But these operating platforms do not provide expressive and pragmatic programming languages covering the macro layer of cloud-native applications.
So it seems there is a gap and the question arises, whether we need further (and what kind of) macro layer languages for CNA?
Open Tracing, to order and understand your mess. - ApiConf 2017Gianluca Arbezzano
This about how many api calls your applications were doing 3-4 years ago, and think about how many integration and difference services your requests is crossing before to come back to the final destination. How do you know this step of your pipeline is taking too much time? What is taking 2 seconds to answer? Is it the authentication service? Maybe it's the invoice generation service or the notification platform. Open Tracing is a distributed tracing cross vendor and open source that help you to understand bottleneck and to profile the requests from where they arrive at the final user. In an ecosystem where microservices and as a service concept are growing this can be a real challenge. During this presentation, we will see how it works from a general point of view to land in some real implementation, examples, and demo.
IDENTICAL PROGRAMMING LANGUAGES IN CLOUD COMPUTING PROJECTS
EMINENT RESEARCH IDEAS IN CLOUD COMPUTING PROJECTS
NOTICEABLE RESEARCH TOPICS IN CLOUD COMPUTING PROJECTS
Are you curious about KNIME Software?
Do you know the difference between KNIME Analytics Platform and KNIME Server?
Which data sources can KNIME connect to?
Can you run an R script from within a KNIME workflow? A Python script? Which other integrations are available?
How can KNIME help with ETL, data preparation, and general data manipulation? Which machine learning algorithms can KNIME offer?
This webinar answers all of these questions! There’s also information about connecting to big data clusters and how you can run the whole or part of your analysis on a big data platform. It also covers everything you need to know about Microsoft Azure and Amazon AWS
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...OpenNebula Project
At the departement of environment and spatial planning we started 2 projects. The first was to replace our vmware based hosting environment with an open, hardware-vendor neutral, hypervisor environment. The second project’s goal was to enable our dev-teams more. This is the story of the second project. What we built and how it works using opennebula and ceph and our existing tooling.
At the time of writing of this abstract, our opennebula environment is used by 4 dev-teams (almost 30 developers) and an infra team, hosting 700 virtual servers and counting. We are executing 300 deploys (as part of the development cycle) per week and counting …
I will be talking about the setup we realized, the choices we made and the deployment tool we ended up with, integrating the toolset we already used. I.e. svn, ansible, opennebula, f5, jfrog, ubuntu/centos, zabbix, bareos, barman, opennebula, …
YouTube: https://youtu.be/OEftbpJ_lSY
This presentation contains an introduction to using Pachyderm as a tool to enable scalable and reproducible workflows in the life sciences. Pachyderm is an open-source workflow-engine and distributed data processing tool that leverages the container ecosystem.
Our application speaks, time series are one of their languages. During this talk I will share how to use the open source Tick Stack to spin up a modern monitoring system for your application and your infrastructure. DevOps, cloud computing and containers changed how we are writing and running our applications. This talk shows what InfluxData and the community is building to have a modern and flexible monitoring toolkit.
Caching in the Cloud. Code Camp Iași April 2016. Expert Network
With a focus on caching solutions available to programmers inside Azure, they will take a deeper dive into Redis Cache. In the context of high performance web applications they'll treat topics such as patterns, data management, data expiration, concurrency and high availability.
What's new in confluent platform 5.4 online talkconfluent
To stay informed about the latest features in Confluent Platform 5.4 join Martijn Kieboom Solutions Engineer at Confluent, for the ‘What’s New in Confluent 5.4?’ on February 12 at 11 am GMT/ 12 Noon CET. Martijn will talk through the new features including:
Role-Based Access Control and how it enables highly granular control of permissions and platform access
Structured Audit Logs and how they enable the capture of authorization logs
How Multi-Region Clusters deliver asynchronous replication at the topic level, allowing companies to run a single Kafka Cluster across multiple data-centres
Schema validations role in enabling businesses that run Kafka at scale to deliver data compatibility across platforms
On-node resource manager for containerized HPC workloadsGeoffroyVallee
In this presentation, we present a new software design to efficiently do on-node resource management for HPC workloads, using Singularity containers and PMIx.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
Abstract:
Netflix is a media services provider and uses Apache Druid to measure the customer experience of watching videos, anywhere in the world. They have significant investment of Apache Druid that helps them improve their services across the board.
NTT is the fourth largest telecommunications company in the world. They use Druid for global traffic visibility for technical, economical and security use cases.
Rubicon Project is one of the world’s largest digital advertising exchanges. To achieve accurate data calculations, get great analytical performance and extract intelligence fast. They use Druid as their foundation for their realtime analytics platform.
In this presentation, we will discuss what their challenges were and why the moved to Apache Druid to meet their needs.
Presenter: Rommel Garcia
Bio:
Rommel Garcia is currently the Director of Solutions Engineering at Imply, a company founded by the same people that created Druid. He is an author of the book Virtualizing Hadoop - How to install. Deploy, and Optimize Hadoop in Virtualized Infrastructure. In the last 10 years, he had worked on Data Management both on-prem and in the cloud, distributed systems, Big Data analytics, and hardware accelerated analytics using GPUs.
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...Flink Forward
This talk presents StreamPipes (https://www.streampipes.org), an open source self-service data analytics solution leveraging existing big data technologies such as Apache Flink to provide non-technical users with an easy and intuitive way to connect, analyze and exploit a variety of different streaming data sources for their use.
Newly arising IoT-driven use cases in domains such as manufacturing, smart city or autonomous driving often demand for continuous integration and processing of sensor data in order to derive time-sensitive actions. One example is the optimization of maintenance processes based on the current condition of machines (condition-based maintenance). While this is technically already well supported by the existing big data tool landscape, building such applications still require a crucial set of expertise ranging from general domain expertise, programming skills to deep knowledge on distributed and scalable systems. Such skills are usually not present in hardware-focused manufacturing companies.
To mitigate these shortcomings, StreamPipes allows non-technical users to leverage a graphical editor to model and deploy analytical tasks as pipelines in a drag and drop manner. Pipelines are built based on a toolbox of reusable data adapters, processors and sinks. Toolbox elements encapsulate dedicated algorithms (e.g., filter, aggregation, machine learning classifiers) implemented in big data processing engines such as Apache Flink communicating over an internal distributed messaging system (e.g. Apache Kafka).
In this talk, we present technologies and tools enabling flexible modeling of real-time processing pipelines by domain experts. We motivate our talk by showing real-world examples we gathered from a number of industry projects during the past years in Industrial IoT domains such as manufacturing and supply chain management. For instance, we show how StreamPipes eases the accessibility of big data tools for non-technical users based on examples such as supervising a fleet of autonomous electric delivery vehicles as well as data analytics in one of the largest test areas for autonomous driving in Germany.
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
This deck provide an overview of containers and Kubernetes, and how these technologies can help solve the challenges faced by data scientists, ML engineers, and application developers. Next, it showcases the key capabilities required in a containers and kubernetes platform to help data scientists easily use technologies like Jupyter Notebooks, ML frameworks, programming languages to innovate faster. Finally it discusses the available platform options (e.g. KubeFlow, Open Data Hub, etc.), and some examples of how data scientists are accelerating their ML initiatives with containers and kubernetes platform.
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceIsaac Christoffersen
Many of today's enterprises are working under a false assumption that there is a trade-off between consumer-centric file sharing and corporate IT policy compliance. This is because most market-leading SaaS solutions for file sync and share are not designed around enterprise IT's needs. They represent growing risks with vendor lock-in, data security, compliance and data ownership.
With a track record in delivering innovative Open Source solutions, Vizuri has an answer to help enterprises overcome these hurdles. By leveraging innovative Red Hat and ownCloud open source solutions, this solution help corporate IT provide a simple to use file sync and share solution for employees. As a result, organizations are able to retain a greater control over valuable intellectual property.
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatOpenStack
Audience: Intermediate
About: Learn how cloud storage differs to traditional storage systems and how that delivers revolutionary benefits.
Starting with an overview of how Ceph integrates tightly into OpenStack, you’ll see why 62% of OpenStack users choose Ceph, we’ll then take a peek into the very near future to see how rapidly Ceph is advancing and how you’ll be able to achieve all your childhood hopes and dreams in ways you never thought possible.
Speaker Bio: Andrew Hatfield – Practice Lead–Cloud Storage and Big Data, Red Hat
Andrew has over 20 years experience in the IT industry across APAC, specialising in Databases, Directory Systems, Groupware, Virtualisation and Storage for Enterprise and Government organisations. When not helping customers slash costs and increase agility by moving to the software-defined storage future, he’s enjoying the subtle tones of Islay Whisky and shredding pow pow on the world’s best snowboard resorts.
OpenStack Australia Day - Sydney 2016
https://events.aptira.com/openstack-australia-day-sydney-2016/
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?Nane Kratzke
This is some input for a panel discussion about "Challenges of Cloud Computing-based Systems" I attend at the 9th International Conference on Cloud Computing, GRIDs, and Virtualization (CLOUD COMPUTING 2018) in Barcelona, Spain in February 2018.
Cloud-native applications (CNA) are build more and more often according to microservice and independent system architecture (ISA) approaches. ISA involves two architecture layers: the macro and the micro architecture layer. Software engineering outcomes on the micro layer are often distributed in a standardized form as self-contained deployment units (so called container images). There exist plenty of programming languages to implement these units: JAVA, C, C++, JavaScript, Python, R, PHP, Ruby, ... (this list is almost endless) But on the macro layer, one might mention TOSCA and little more. TOSCA is an OASIS deployment and orchestration standard language to describe a topology of cloud based web services, their components, relationships, and the processes that manage them. This works for static deployments. However, CNA are elastic, self-adaptive - almost the exact opposite of what can be defined efficiently using TOSCA. For these kind of scenarios one might mention Kubernetes or Docker Swarm as container orchestrators which are intentionally build to operate elastic services formed of containers. But these operating platforms do not provide expressive and pragmatic programming languages covering the macro layer of cloud-native applications.
So it seems there is a gap and the question arises, whether we need further (and what kind of) macro layer languages for CNA?
Open Tracing, to order and understand your mess. - ApiConf 2017Gianluca Arbezzano
This about how many api calls your applications were doing 3-4 years ago, and think about how many integration and difference services your requests is crossing before to come back to the final destination. How do you know this step of your pipeline is taking too much time? What is taking 2 seconds to answer? Is it the authentication service? Maybe it's the invoice generation service or the notification platform. Open Tracing is a distributed tracing cross vendor and open source that help you to understand bottleneck and to profile the requests from where they arrive at the final user. In an ecosystem where microservices and as a service concept are growing this can be a real challenge. During this presentation, we will see how it works from a general point of view to land in some real implementation, examples, and demo.
IDENTICAL PROGRAMMING LANGUAGES IN CLOUD COMPUTING PROJECTS
EMINENT RESEARCH IDEAS IN CLOUD COMPUTING PROJECTS
NOTICEABLE RESEARCH TOPICS IN CLOUD COMPUTING PROJECTS
Are you curious about KNIME Software?
Do you know the difference between KNIME Analytics Platform and KNIME Server?
Which data sources can KNIME connect to?
Can you run an R script from within a KNIME workflow? A Python script? Which other integrations are available?
How can KNIME help with ETL, data preparation, and general data manipulation? Which machine learning algorithms can KNIME offer?
This webinar answers all of these questions! There’s also information about connecting to big data clusters and how you can run the whole or part of your analysis on a big data platform. It also covers everything you need to know about Microsoft Azure and Amazon AWS
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...OpenNebula Project
At the departement of environment and spatial planning we started 2 projects. The first was to replace our vmware based hosting environment with an open, hardware-vendor neutral, hypervisor environment. The second project’s goal was to enable our dev-teams more. This is the story of the second project. What we built and how it works using opennebula and ceph and our existing tooling.
At the time of writing of this abstract, our opennebula environment is used by 4 dev-teams (almost 30 developers) and an infra team, hosting 700 virtual servers and counting. We are executing 300 deploys (as part of the development cycle) per week and counting …
I will be talking about the setup we realized, the choices we made and the deployment tool we ended up with, integrating the toolset we already used. I.e. svn, ansible, opennebula, f5, jfrog, ubuntu/centos, zabbix, bareos, barman, opennebula, …
YouTube: https://youtu.be/OEftbpJ_lSY
This presentation contains an introduction to using Pachyderm as a tool to enable scalable and reproducible workflows in the life sciences. Pachyderm is an open-source workflow-engine and distributed data processing tool that leverages the container ecosystem.
Our application speaks, time series are one of their languages. During this talk I will share how to use the open source Tick Stack to spin up a modern monitoring system for your application and your infrastructure. DevOps, cloud computing and containers changed how we are writing and running our applications. This talk shows what InfluxData and the community is building to have a modern and flexible monitoring toolkit.
Caching in the Cloud. Code Camp Iași April 2016. Expert Network
With a focus on caching solutions available to programmers inside Azure, they will take a deeper dive into Redis Cache. In the context of high performance web applications they'll treat topics such as patterns, data management, data expiration, concurrency and high availability.
What's new in confluent platform 5.4 online talkconfluent
To stay informed about the latest features in Confluent Platform 5.4 join Martijn Kieboom Solutions Engineer at Confluent, for the ‘What’s New in Confluent 5.4?’ on February 12 at 11 am GMT/ 12 Noon CET. Martijn will talk through the new features including:
Role-Based Access Control and how it enables highly granular control of permissions and platform access
Structured Audit Logs and how they enable the capture of authorization logs
How Multi-Region Clusters deliver asynchronous replication at the topic level, allowing companies to run a single Kafka Cluster across multiple data-centres
Schema validations role in enabling businesses that run Kafka at scale to deliver data compatibility across platforms
On-node resource manager for containerized HPC workloadsGeoffroyVallee
In this presentation, we present a new software design to efficiently do on-node resource management for HPC workloads, using Singularity containers and PMIx.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
Abstract:
Netflix is a media services provider and uses Apache Druid to measure the customer experience of watching videos, anywhere in the world. They have significant investment of Apache Druid that helps them improve their services across the board.
NTT is the fourth largest telecommunications company in the world. They use Druid for global traffic visibility for technical, economical and security use cases.
Rubicon Project is one of the world’s largest digital advertising exchanges. To achieve accurate data calculations, get great analytical performance and extract intelligence fast. They use Druid as their foundation for their realtime analytics platform.
In this presentation, we will discuss what their challenges were and why the moved to Apache Druid to meet their needs.
Presenter: Rommel Garcia
Bio:
Rommel Garcia is currently the Director of Solutions Engineering at Imply, a company founded by the same people that created Druid. He is an author of the book Virtualizing Hadoop - How to install. Deploy, and Optimize Hadoop in Virtualized Infrastructure. In the last 10 years, he had worked on Data Management both on-prem and in the cloud, distributed systems, Big Data analytics, and hardware accelerated analytics using GPUs.
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...Flink Forward
This talk presents StreamPipes (https://www.streampipes.org), an open source self-service data analytics solution leveraging existing big data technologies such as Apache Flink to provide non-technical users with an easy and intuitive way to connect, analyze and exploit a variety of different streaming data sources for their use.
Newly arising IoT-driven use cases in domains such as manufacturing, smart city or autonomous driving often demand for continuous integration and processing of sensor data in order to derive time-sensitive actions. One example is the optimization of maintenance processes based on the current condition of machines (condition-based maintenance). While this is technically already well supported by the existing big data tool landscape, building such applications still require a crucial set of expertise ranging from general domain expertise, programming skills to deep knowledge on distributed and scalable systems. Such skills are usually not present in hardware-focused manufacturing companies.
To mitigate these shortcomings, StreamPipes allows non-technical users to leverage a graphical editor to model and deploy analytical tasks as pipelines in a drag and drop manner. Pipelines are built based on a toolbox of reusable data adapters, processors and sinks. Toolbox elements encapsulate dedicated algorithms (e.g., filter, aggregation, machine learning classifiers) implemented in big data processing engines such as Apache Flink communicating over an internal distributed messaging system (e.g. Apache Kafka).
In this talk, we present technologies and tools enabling flexible modeling of real-time processing pipelines by domain experts. We motivate our talk by showing real-world examples we gathered from a number of industry projects during the past years in Industrial IoT domains such as manufacturing and supply chain management. For instance, we show how StreamPipes eases the accessibility of big data tools for non-technical users based on examples such as supervising a fleet of autonomous electric delivery vehicles as well as data analytics in one of the largest test areas for autonomous driving in Germany.
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
This deck provide an overview of containers and Kubernetes, and how these technologies can help solve the challenges faced by data scientists, ML engineers, and application developers. Next, it showcases the key capabilities required in a containers and kubernetes platform to help data scientists easily use technologies like Jupyter Notebooks, ML frameworks, programming languages to innovate faster. Finally it discusses the available platform options (e.g. KubeFlow, Open Data Hub, etc.), and some examples of how data scientists are accelerating their ML initiatives with containers and kubernetes platform.
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceIsaac Christoffersen
Many of today's enterprises are working under a false assumption that there is a trade-off between consumer-centric file sharing and corporate IT policy compliance. This is because most market-leading SaaS solutions for file sync and share are not designed around enterprise IT's needs. They represent growing risks with vendor lock-in, data security, compliance and data ownership.
With a track record in delivering innovative Open Source solutions, Vizuri has an answer to help enterprises overcome these hurdles. By leveraging innovative Red Hat and ownCloud open source solutions, this solution help corporate IT provide a simple to use file sync and share solution for employees. As a result, organizations are able to retain a greater control over valuable intellectual property.
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatOpenStack
Audience: Intermediate
About: Learn how cloud storage differs to traditional storage systems and how that delivers revolutionary benefits.
Starting with an overview of how Ceph integrates tightly into OpenStack, you’ll see why 62% of OpenStack users choose Ceph, we’ll then take a peek into the very near future to see how rapidly Ceph is advancing and how you’ll be able to achieve all your childhood hopes and dreams in ways you never thought possible.
Speaker Bio: Andrew Hatfield – Practice Lead–Cloud Storage and Big Data, Red Hat
Andrew has over 20 years experience in the IT industry across APAC, specialising in Databases, Directory Systems, Groupware, Virtualisation and Storage for Enterprise and Government organisations. When not helping customers slash costs and increase agility by moving to the software-defined storage future, he’s enjoying the subtle tones of Islay Whisky and shredding pow pow on the world’s best snowboard resorts.
OpenStack Australia Day - Sydney 2016
https://events.aptira.com/openstack-australia-day-sydney-2016/
Oscon 2017: Build your own container-based system with the Moby projectPatrick Chanezon
Build your own container-based system
with the Moby project
Docker Community Edition—an open source product that lets you build, ship, and run containers—is an assembly of modular components built from an upstream open source project called Moby. Moby provides a “Lego set” of dozens of components, the framework for assembling them into specialized container-based systems, and a place for all container enthusiasts to experiment and exchange ideas.
Patrick Chanezon and Mindy Preston explain how you can leverage the Moby project to assemble your own specialized container-based system, whether for IoT, cloud, or bare-metal scenarios. Patrick and Mindy explore Moby’s framework, components, and tooling, focusing on two components: LinuxKit, a toolkit to build container-based Linux subsystems that are secure, lean, and portable, and InfraKit, a toolkit for creating and managing declarative, self-healing infrastructure. Along the way, they demo how to use Moby, LinuxKit, InfraKit, and other components to quickly assemble full-blown container-based systems for several use cases and deploy them on various infrastructures.
From ECM to Content Services - Analyst WebinarNuxeo
Join Alan Pelz-Sharpe from Deep Analysis and Dave Jones from Nuxeo as they explore the changes in the information management space, discuss the history of the market, explore some of the failings of the past, and debate whether the move to content services is an evolutionary step, or a revolutionary leap.
Francois Martel, Solutions Architect of Portworx explains how you can tackle Data Gravity, Kubernetes, and strategies/best practices to run, scale, and leverage stateful containers in production.
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
Alluxio Community Office Hour
February 23, 2021
For more Alluxio events: https://www.alluxio.io/events/
Speaker(s):
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
Swiss IPv6 Council – Case Study - Deployment von IPv6 in einer Container Plat...Digicomp Academy AG
Die Implementierung mit IPv6 in Container Plattformen wie Docker, Kubernets oder OpenShift bietet einige Möglichkeiten, aber auch Herausforderungen. In seinem Vortrag erklärt Aarno Aukia den aktuellen Stand der IPv6-Implementierung dieser Technologien.
End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam su...Theofilos Kakantousis
Apache Beam is a key technology for building scalable End-to-End ML pipelines, as it is the data preparation and model analysis engine for TensorFlow Extended (TFX), a framework for horizontally scalable Machine Learning (ML) pipelines based on TensorFlow. In this talk, we present TFX on Hopsworks, a fully open-source platform for running TFX pipelines on any cloud or on-premise. Hopsworks is a project-based multi-tenant platform for both data parallel programming and horizontally scalable machine learning pipelines. Hopsworks supports Apache Flink as a runner for Beam jobs and TFX pipelines are supported through Airflow support in Hopsworks. We will demonstrate how to build a ML pipeline with TFX, Beam’s Python API and the Flink Runner by using Jupyter notebooks, explain how security is transparently enabled with short-lived TLS certificates, and go through all the pipeline steps, from Data Validation, to Transformation, Model training with TensorFlow, Model Analysis, Model Serving and Monitoring with Kubernetes.
To the best of our knowledge, Hopsworks is the first fully open-source on-premise platform that supports both TFX pipelines and Apache Beam.
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingMark Hinkle
And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. Whether you want to build a public, private or hybrid cloud there are free and open source tools that can help provide you a complete solution or help augment your existing Amazon or other hosted cloud solution. That’s why you need the Hitchhiker’s Guide to (Open Source) Cloud Computing (HHGTCC) or at least to attend this talk understand the current state of open source cloud computing. This talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively deploy and manage open source flavors of these technologies. Specific the guide will cover:
Infrastructure-as-a-Service – The Systems Cloud – Get a comparison of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus and OpenNebula
Platform-as-a-Service – The Developers Cloud – Learn about the tools that abstract the complexity for developers and used to build portable auto-scaling applications ton CloudFoundry, OpenShift, Stackato and more.
Data-as-a-Service – The Analytics Cloud – Want to figure out the who, what, where, when and why of big data? You’ll get an overview of open source NoSQL databases and technologies like MapReduce to help parallelize data mining tasks and crunch massive data sets in the cloud.
Network-as-a-Service – The Network Cloud – The final pillar for truly fungible network infrastructure is network virtualization. We will give an overview of software-defined networking including OpenStack Quantum, Nicira, open Vswitch and others.
Finally this talk will provide an overview of the tools that can help you really take advantage of the cloud. Do you want to auto-scale to serve millions of web pages and scale back down as demand fluctuates. Are you interested in automating the total lifecycle of cloud computing environments You’ll learn how to combine these tools into tool chains to provide continuous deployment systems that will help you become agile and spend more time improving your IT rather than simply maintaining it.
[Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]
Comparison of control plane deployment architectures in the scope of hypercon...Miroslav Halas
The OpenStack control plane can be implemented using one of the three infrastructure types: Bare Metal, Virtual Machines, and Containers. Comprehensive comparisons of these approaches are not available. In the first section of our talk, we present a reference architecture for building a virtualized control plane, which supports OpenStack controller HA, file system HA, networking HA, and enhanced performance through CPU-pinning, and SR-IOV. All the building blocks are based on established open source tools and their corresponding products, Red Hat Enterprise Virtualization, Red Hat Gluster Storage and Red Hat OpenStack. Any OpenStack deployment tool which can be used on bare metal work to build this virtualized control plane. We will focus on comparisons of different OpenStack control plane infrastructures. We compare the deployment and operational aspects of different control plane implementation on the same hardware environment. We also evaluate the different control plane deployments through benchmarking tools (such as Rally), and provide a quantitative comparison.
OpenStack clusters are most often built with servers for Nova compute VMs and servers for storage, with Ceph storage requiring 3 or more nodes. It can be more cost-effective to "hyperconverge" Nova and Ceph on to the same servers, and rising processor core counts and RAM density have made this feasible. But it is important to understand the resource demand patterns of each and protect against corner cases where one starves the other. In the second section of our talk we will present our empirical approach to:
Generating realistic system & storage loads using open source test suites
Collecting and analyzing results quantitatively
Optimizing hardware configuration and resource partitioning
We will present data, analysis, and lessons learned from our hyperconverged infrastructure work.
At the technology meeting of the Association of Independent Research Centers (http://airi.org): An overview of recent Scientific Computing activities at Fred Hutch, Seattle
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaSDaniel Krook
OpenStack is the leading open source Infrastructure-as-a-Service, and Cloud Foundry has become the leading open source Platform-as-a-Service. Deploying them together is a natural fit for your next generation systems of engagement.
This special joint meetup of the OpenStack NY and NYC Cloud Foundry communities will give both audiences an introduction to these popular open source IaaS and PaaS projects.
The presentation will describe the compelling advantages of each technology, and then explain how they can be integrated, optimized, and scaled to provide a complete cloud application hosting solution.
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
Similar to Integrating Globus into LRZ's Data Science Storage Service (20)
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Integrating Globus into LRZ's Data Science Storage Service
1.
2. 2
Integrating Globus into LRZ’s
Data Science Storage Service
GlobusWorld 2019 | 2019-05-01 | Stephan Peinkofer
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
3. 3Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Bavarian Academy of Sciences and Humanities
Leibniz Supercomputing Centre
Computer Centre
for all Munich Universities250
employees
approx.
57
years of
IT support
IT Service Backbone for the Advancement of Science and Research
Regional Computer Centre
for all Bavarian Universities
National Supercomputing Centre
(GCS)
European Supercomputing Centre
(PRACE)
4. High Performance Computing
SuperMUC-NG, LRZ Linux Cluster
Virtual Reality and Visualisation
V2C (CAVE, Powerwall)
4
Operating Cutting-Edge IT Infrastructure
LRZ as an IT Center of Excellence
Storage
Network
Cloud Computing
Cluster
HPC
Training
Consultancy
Email
High Speed Networking
Munich Scientific Network
Big Data
Bavarian State Library Digital Archive
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
5. 5Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Data Silos
6. Increasing User Demand
6
I need to share a 400TB dataset
with someone in Canada!
My experiment will generate
multiple PBs, that have to be
analyzed and backed up! How?
I want to build a WebApp that allows
users to interactively analyze my
500TB SuperMUC simulation data!
I need to share
some data
on SuperMUC
between multiple
projects!
I want to analyze a large
dataset, generated on Super-
MUC, using some special OS
image on the LRZ Cloud!
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
7. 7Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Satisfying User Demands
So basically we need to provide …
A file system that can be
shared amongst the complete
LRZ HPC Ecosystem
Some kind of external
access mechanism
for arbitrary entities
A Dropbox like
data management
approach
8. LRZ Data Science Storage
8Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Interactive processing
on LRZ Compute Cloud
Remote visualisation
on LRZs visualisation
systems
External access and sharing
via Globus Online
High performance backup
and archive of data on LRZs
Backup- and Archive System
Batch and interactive processing
on dedicated, hosted HPC Cluster at LRZ
High throughput batch processing
on LRZs Linux Cluster or SuperMUC
LRZ
Data
Science
Storage
11. Huber
LMU User: lmuuser2
LinuxCluster SuperMUC
Project: lxpr2 Project: smpr2
User: lx22bp User: sm33sx
DSS Containers
11
Maier
TUM User: tumuser1
LinuxCluster SuperMUC
Project: lxpr1 Project: smpr1
User: lx11xc User: sm11bb
DSS POSIX Group in IDM/LDAP
pr45xa-dss-0000
DSS Container à GPFS Independent Fileset
/dss/dssfs01/pr45xa-dss-0000
drwxrws--- root pr45xa-dss-0000
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
12. Technical Integration of Globus to LRZ DSS
Goal
12
Integrate Globus Sharing to
DSSWeb Self-Service Portal.
Allow Data Curators to share
DSS Containers with
arbitrary external users.
Problem Action
Globus let’s us control.
Who can share?
What can be shared?
We need to control.
Who can share what?
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
13. LRZ Data Science Storage
Technical Integration of Globus to LRZ DSS
13
DSS Container X
Container Group
/dss/dssfs01/dsscontX
DSS Container Directory
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
DSSWeb
Globus Online
LRZ MyProxy
DSS Globus Endpoint
1. Enable Globus Sharing
for DSS Container X
Data Curator
RobotUser aka
RobotUser@globusid.org
2. Login
to
MyProxy to
get
Certificate
3. Enable DSS Globus Endpoint
4. Create Shared Endpoint “LRZ DSS Container X”
LRZ DSS Container X
Shared Endpoint
6. Add RobotUser to
Container Access Group
5. Globus Magic
14. Technical Integration of Globus to LRZ DSS
14Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
DSSWeb
1. Invite
bop@wherever.com to
access DSS Container X
via Globus
Data Curator
RobotUser aka
RobotUser@globusid.org
2. Check if identity bop@wherever.com is already
known by Globus and if not create it
3. Add Globus ACL for Shared Endpoint LRZ DSS
Container X for identity bop@wherever.com
4. Globus Magic
bop@wherever.com
5. Bop is happy
LRZ Data Science Storage
DSS Container X
Container Group
/dss/dssfs01/dsscontX
DSS Container Directory
DSS Globus Endpoint*
LRZ DSS Container X
Shared Endpoint
Globus Online
15. Legal Integration of Globus to LRZ DSS
Regulation
15
European Union enforced the
EU General Data Protection
Regulation (GDPR) on 2018-05-
25
Use/Integration of Cloud
Services that process PII
requires a formal Controller-
Processor Agreement.
Transfer of personal data to third
countries requires special
safeguards
HIPPA and NIST rescue BAA to the rescue
HIPPA and NIST require
roughly similar technical and
organizational security controls
that are required by GDPR to
protect PII
Globus agreed to sign a
Controller-Processor
Agreement that contains the
EU-Model Clauses
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer