This document summarizes a presentation about building serverless machine learning applications in Python. It discusses refactoring monolithic ML pipelines into separate feature engineering, training, and inference pipelines. This allows historical data to be used for backfilling features and new models to be trained on schedules. Online inference pipelines retrieve pre-computed features from a feature store and compute additional features from application data. The document provides examples using a feature store on Hopsworks to build batch prediction services for iris flower data as a case study. It promotes serverless ML with Hopsworks which provides an unlimited free tier.
By talking about Microsoft's journey to Cloud cadence, this talk goes through all the DevOps practices such as Infrastructure as Code, CI/CD, Release Management and Hypothesis Driven Development.
It also introduces the impact of Docker and PaaS in DevOps.
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies’ feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform.
In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences.
Bio:
Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin.
Topics: feature store, MLOps.
Modernizing Testing as Apps Re-ArchitectDevOps.com
Applications are moving to cloud and containers to boost reliability and speed delivery to production. However, if we use the same old approaches to testing, we'll fail to achieve the benefits of cloud. But what do we really need to change? We know we need to automate tests, but how do we keep our automation assets from becoming obsolete? Automatically provisioning test environments seems close, but some parts of our applications are hard to move to cloud.
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
This talk will walk you through the typical workflow of a data scientist or a data analyst at Uber, how they get access to Uber's Big data and fast data sources for ad hoc and experimental analysis, how the data platforms will make it easy to discover datasets, run interactive queries against our petabyte scale data lake to identify the features you're interested in, wrangle and prepare data for advanced analytics and machine learning. Our platforms also provide capabilities to do iterative machine learning and deep learning training seamless on single nodes and distributed on our Big data and GPU clusters, analyze, visualize and share the results of their experiments with colleagues and peers to get feedback, and even productionize data analytics jobs and ML models all without a degree in CS. Interested? Come, learn how Uber's Big data platforms and Data science workbench put the power of Spark in the hands of our Data scientists and data analysts for advanced analytics and ML/DL use cases.
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
MLOps is a trend in machine learning (ML) engineering that unifies ML system development (Dev) and ML system operation (Ops). Some ML lifecycle frameworks, such as TensorFlow Extended, are based around end-to-end pipelines that start with raw data and end in production models. During this talk we will introduce the concept of a feature store as the missing piece of ML infrastructure that enables faster lower cost deployment of models. We will show how the Hopsworks Feature Store - factors monolithic end-to-end ML pipelines into feature and model training pipelines that can each run at different cadences. We will show examples of ingestion and training pipelines including hyperparameter optimization and model deployment.
By talking about Microsoft's journey to Cloud cadence, this talk goes through all the DevOps practices such as Infrastructure as Code, CI/CD, Release Management and Hypothesis Driven Development.
It also introduces the impact of Docker and PaaS in DevOps.
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies’ feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform.
In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences.
Bio:
Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin.
Topics: feature store, MLOps.
Modernizing Testing as Apps Re-ArchitectDevOps.com
Applications are moving to cloud and containers to boost reliability and speed delivery to production. However, if we use the same old approaches to testing, we'll fail to achieve the benefits of cloud. But what do we really need to change? We know we need to automate tests, but how do we keep our automation assets from becoming obsolete? Automatically provisioning test environments seems close, but some parts of our applications are hard to move to cloud.
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
This talk will walk you through the typical workflow of a data scientist or a data analyst at Uber, how they get access to Uber's Big data and fast data sources for ad hoc and experimental analysis, how the data platforms will make it easy to discover datasets, run interactive queries against our petabyte scale data lake to identify the features you're interested in, wrangle and prepare data for advanced analytics and machine learning. Our platforms also provide capabilities to do iterative machine learning and deep learning training seamless on single nodes and distributed on our Big data and GPU clusters, analyze, visualize and share the results of their experiments with colleagues and peers to get feedback, and even productionize data analytics jobs and ML models all without a degree in CS. Interested? Come, learn how Uber's Big data platforms and Data science workbench put the power of Spark in the hands of our Data scientists and data analysts for advanced analytics and ML/DL use cases.
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
MLOps is a trend in machine learning (ML) engineering that unifies ML system development (Dev) and ML system operation (Ops). Some ML lifecycle frameworks, such as TensorFlow Extended, are based around end-to-end pipelines that start with raw data and end in production models. During this talk we will introduce the concept of a feature store as the missing piece of ML infrastructure that enables faster lower cost deployment of models. We will show how the Hopsworks Feature Store - factors monolithic end-to-end ML pipelines into feature and model training pipelines that can each run at different cadences. We will show examples of ingestion and training pipelines including hyperparameter optimization and model deployment.
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
Lessons learnt and system built while solving the last mile problem in machine learning - taking models to production. Used for the talk at - http://sched.co/BLvf
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
Provides an overview of popular integration approaches, maps them to SAP's integration tools and concludes with some lessons learnt in their application.
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward
Operationalizing Machine Learning models is never easy. Our team at Comcast has been challenged with operationalizing predictive ML models to improve customer care experiences. Using Apache Flink we have been able to apply real-time streaming to all aspects of the Machine Learning lifecycle. This includes data feature exploration and preparation by data scientists, deploying live models to serve near-real-time predictions, and validating results for model retraining and iteration. We will share best practices and lessons learned from Flink’s role in our operationalized lifecycle including:
• Executing as the “Prediction Pipeline” – a model container environment for near-real-time streaming and batch predictions
• Preparing streaming features and data sets for model training, as input for production model predictions, and for a continually-updated customer context
• Using connected streams and savepoints for “Live in the Dark”, multi-variant testing, and validation scenarios
• Incorporating Flink’s Queryable State as an approach to the online “Feature Store” – a data catalog for reuse by multiple models and use cases
• Enabling versioned models, versioned feature sets, and versioned data through DevOps approaches.
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do you deploy these ML model to a production environment? How do you embed what you’ve learned into customer facing data applications?
In this talk I will discuss best practices on how data scientists productionize machine learning models, do a deep dive with actual case studies, and show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
CloudStream service is a Full Management Service in Huawei Cloud. Support several features, such as On-Demand Billing, easy-to-use Stream SQL in online SQL editor, test Stream SQL in real-time style, Multi-tenant, security isolation and so on. We choose Apache Flink as streaming compute platform. Inside of CloudStream Cluster, Flink job can run on Yarn, Mesos, Kubernetes. We also have extended Apache Flink to meet IoT scenario needs. There are specialized tests on Flink reliability with college cooperation. Finally continuously improve the infrastructure around CS including open source projects and cloud services. CloudStream is different with any other real-time analysis cloud service. The development process can also be shared at architecture and principles.
In this talk from DevCon TLV we covered:
● The power of HTML5 APIs and how you can use them in your next modern Web Apps.
● On the server side how you can use: Google Cloud Endpoints to scale your API and gain more productivity.
● We did some live Demos and talked about Big Query interfaces.
Near real-time anomaly detection at Lyftmarkgrover
Near real-time anomaly detection at Lyft, by Mark Grover and Thomas Weise at Strata NY 2018.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/69155
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Andreas Grabner
Do it like the "DevOps Unicorns" Etsy, Facebook and Co: Deploy more frequently. But how and why? Challenges?
Deploying Software Faster without Failing Faster is possible through Metrics driven Engineering. Identify problems early on using a "Shift-Left in Quality". This requires a Level-Up of Dev, Test, Ops, Biz
See some of the metrics that I think you need to look at and how to upgrade your engineering team to produce better quality right from the start
Immutable Infrastructure: Rise of the Machine ImagesC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1WlpXHF.
Axel Fontaine looks at what Immutable Infrastructure is and how it affects scaling, logging, sessions, configuration, service discovery and more. He also looks at how containers and machine images compare and why some things people took for granted may not be necessary anymore. Filmed at qconlondon.com.
Axel Fontaine is the founder and CEO of Boxfuse. Axel is also the creator and project lead of Flyway, the open source tool that makes database migration easy. He is a Continuous Delivery and Immutable Infrastructure expert, a Java Champion, a JavaOne Rockstar and a regular speaker at various large international conferences.
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...VMware Tanzu
SpringOne Platform 2017
Gilbert Lau, Data Stax; Wayne Lund, Pivotal
"Spring Cloud Data Flow satisfies all of the demands of modern streaming and task workloads. A growing number of customers are viewing Pivotal Cloud Foundry as an ideal runtime for these types of workloads to take advantage of all of the microservice architecture features of Spring Boot apps leveraging Spring Cloud Services. This is only half of the equation. Once the streaming data is persisted on their database, our customers want to generate actionable insights to provide the best customer experience to stay on top of the competitive marketplace. DataStax Enterprise (DSE) is a single and unified big data platform with Apache Cassandra NoSQL database at its core. Integrated within each node of DSE is powerful indexing, search through Apache Solr, analytics through Apache Spark, and a enterprise-ready graph functionality. It is by far the only operational data platform which can scale linearly in excess of 1,000 nodes, with no single point of failure, and is capable of providing real-time active-everywhere replication across many datacenters and cloud providers.
In this presentation and demo we will take a common social data set and show SCDF advantages on PCF for microservice scaling and pipelining data into a DataStax Enterprise Cassandra NoSQL database. Then followed by extracting meaningful information through DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Cassandra Service Broker Tile for PCF using a Spring Boot Dashboard application."
Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev
Webinar video: https://www.youtube.com/watch?v=Y3_fcJBgpMw
Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring, and Retraining
Speakers:
Stepan Pushkarev, CTO, Hydrosphere.io and Ilnur Garifullin is an ML Engineer, Hydrosphere.io
Abstract: Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model's performance. It hardly could be named "continuous delivery" if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal. In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.
Any startup has to have a clear go-to-market strategy from the beginning. Similarly, any data science project has to have a go-to-production strategy from its first days, so it could go beyond proof-of-concept. Machine learning and artificial intelligence in production would result in hundreds of training pipelines and machine learning models that are continuously revised by teams of data scientists and seamlessly connected with web applications for tenants and users.
In this demo-based talk we will walk through the best practices for simplifying machine learning operations across the enterprise and providing a serverless abstraction for data scientists and data engineers, so they could train, deploy and monitor machine learning models faster and with better quality.
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
This talk is a mental map for building ML systems as ML Pipelines that are factored into Feature Pipelines, Training Pipelines, and Inference Pipelines.
More Related Content
Similar to _Python Ireland Meetup - Serverless ML - Dowling.pdf
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
Lessons learnt and system built while solving the last mile problem in machine learning - taking models to production. Used for the talk at - http://sched.co/BLvf
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
Provides an overview of popular integration approaches, maps them to SAP's integration tools and concludes with some lessons learnt in their application.
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward
Operationalizing Machine Learning models is never easy. Our team at Comcast has been challenged with operationalizing predictive ML models to improve customer care experiences. Using Apache Flink we have been able to apply real-time streaming to all aspects of the Machine Learning lifecycle. This includes data feature exploration and preparation by data scientists, deploying live models to serve near-real-time predictions, and validating results for model retraining and iteration. We will share best practices and lessons learned from Flink’s role in our operationalized lifecycle including:
• Executing as the “Prediction Pipeline” – a model container environment for near-real-time streaming and batch predictions
• Preparing streaming features and data sets for model training, as input for production model predictions, and for a continually-updated customer context
• Using connected streams and savepoints for “Live in the Dark”, multi-variant testing, and validation scenarios
• Incorporating Flink’s Queryable State as an approach to the online “Feature Store” – a data catalog for reuse by multiple models and use cases
• Enabling versioned models, versioned feature sets, and versioned data through DevOps approaches.
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do you deploy these ML model to a production environment? How do you embed what you’ve learned into customer facing data applications?
In this talk I will discuss best practices on how data scientists productionize machine learning models, do a deep dive with actual case studies, and show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
CloudStream service is a Full Management Service in Huawei Cloud. Support several features, such as On-Demand Billing, easy-to-use Stream SQL in online SQL editor, test Stream SQL in real-time style, Multi-tenant, security isolation and so on. We choose Apache Flink as streaming compute platform. Inside of CloudStream Cluster, Flink job can run on Yarn, Mesos, Kubernetes. We also have extended Apache Flink to meet IoT scenario needs. There are specialized tests on Flink reliability with college cooperation. Finally continuously improve the infrastructure around CS including open source projects and cloud services. CloudStream is different with any other real-time analysis cloud service. The development process can also be shared at architecture and principles.
In this talk from DevCon TLV we covered:
● The power of HTML5 APIs and how you can use them in your next modern Web Apps.
● On the server side how you can use: Google Cloud Endpoints to scale your API and gain more productivity.
● We did some live Demos and talked about Big Query interfaces.
Near real-time anomaly detection at Lyftmarkgrover
Near real-time anomaly detection at Lyft, by Mark Grover and Thomas Weise at Strata NY 2018.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/69155
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Andreas Grabner
Do it like the "DevOps Unicorns" Etsy, Facebook and Co: Deploy more frequently. But how and why? Challenges?
Deploying Software Faster without Failing Faster is possible through Metrics driven Engineering. Identify problems early on using a "Shift-Left in Quality". This requires a Level-Up of Dev, Test, Ops, Biz
See some of the metrics that I think you need to look at and how to upgrade your engineering team to produce better quality right from the start
Immutable Infrastructure: Rise of the Machine ImagesC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1WlpXHF.
Axel Fontaine looks at what Immutable Infrastructure is and how it affects scaling, logging, sessions, configuration, service discovery and more. He also looks at how containers and machine images compare and why some things people took for granted may not be necessary anymore. Filmed at qconlondon.com.
Axel Fontaine is the founder and CEO of Boxfuse. Axel is also the creator and project lead of Flyway, the open source tool that makes database migration easy. He is a Continuous Delivery and Immutable Infrastructure expert, a Java Champion, a JavaOne Rockstar and a regular speaker at various large international conferences.
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...VMware Tanzu
SpringOne Platform 2017
Gilbert Lau, Data Stax; Wayne Lund, Pivotal
"Spring Cloud Data Flow satisfies all of the demands of modern streaming and task workloads. A growing number of customers are viewing Pivotal Cloud Foundry as an ideal runtime for these types of workloads to take advantage of all of the microservice architecture features of Spring Boot apps leveraging Spring Cloud Services. This is only half of the equation. Once the streaming data is persisted on their database, our customers want to generate actionable insights to provide the best customer experience to stay on top of the competitive marketplace. DataStax Enterprise (DSE) is a single and unified big data platform with Apache Cassandra NoSQL database at its core. Integrated within each node of DSE is powerful indexing, search through Apache Solr, analytics through Apache Spark, and a enterprise-ready graph functionality. It is by far the only operational data platform which can scale linearly in excess of 1,000 nodes, with no single point of failure, and is capable of providing real-time active-everywhere replication across many datacenters and cloud providers.
In this presentation and demo we will take a common social data set and show SCDF advantages on PCF for microservice scaling and pipelining data into a DataStax Enterprise Cassandra NoSQL database. Then followed by extracting meaningful information through DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Cassandra Service Broker Tile for PCF using a Spring Boot Dashboard application."
Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev
Webinar video: https://www.youtube.com/watch?v=Y3_fcJBgpMw
Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring, and Retraining
Speakers:
Stepan Pushkarev, CTO, Hydrosphere.io and Ilnur Garifullin is an ML Engineer, Hydrosphere.io
Abstract: Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model's performance. It hardly could be named "continuous delivery" if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal. In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.
Any startup has to have a clear go-to-market strategy from the beginning. Similarly, any data science project has to have a go-to-production strategy from its first days, so it could go beyond proof-of-concept. Machine learning and artificial intelligence in production would result in hundreds of training pipelines and machine learning models that are continuously revised by teams of data scientists and seamlessly connected with web applications for tenants and users.
In this demo-based talk we will walk through the best practices for simplifying machine learning operations across the enterprise and providing a serverless abstraction for data scientists and data engineers, so they could train, deploy and monitor machine learning models faster and with better quality.
Similar to _Python Ireland Meetup - Serverless ML - Dowling.pdf (20)
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
This talk is a mental map for building ML systems as ML Pipelines that are factored into Feature Pipelines, Training Pipelines, and Inference Pipelines.
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
Cloud Native London talk about the control layer of Hopsworks.ai and our choice of cloud native services. We built our own multi-tenant services as cloud native services, for the most part.
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
This talk describes the scale-out, consistent metadata architecture of Hopsworks and how we use it to support custom metadata and provenance for ML Pipelines with Hopsworks Feature Store, NDB, and ePipe . The talk is here: https://www.youtube.com/watch?v=oPp8PJ9QBnU&feature=emb_logo
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
Spark AI Summit Europe 2019 talk: Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy. How can you do directed search efficiently with Spark? The answer is Maggy - asynchronous directed search on PySpark.
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
Hopsworks is a platform for designing and operating End to End Machine Learning using PySpark and TensorFlow/PyTorch. Early access is now available on GCP. Hopsworks includes the industry's first Feature Store. Hopsworks is open-source.
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
This talk, given at Berlin Buzzwords 2019, describes the recent progress in making Hopsworks a cloud-native platform, with HA data-center support added for HopsFS.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Knowledge engineering: from people to machines and back
_Python Ireland Meetup - Serverless ML - Dowling.pdf
1. Python Ireland Meetup, Sep 14th 2022
Jim Dowling, CEO @ Hopsworks and Assoc Prof @ KTH
Serverless ML in Python
Predict surf height at Lahinch Beach
2. Beyond Notebooks: Don’t just train models, build “Prediction Services”
❌ Static Datasets
❌ Data is downloaded from a single
URL
❌ Features for ML are engineered,
correct, and unbiased
❌ Use a model evaluation metric
(accuracy) to communicate the value
of your model
💚 Data never stops coming
💚 Data comes from different
heterogeneous data sources
💚 Write code to extract and validate
the features from input data
💚 Communicate the value of your
model with a UI or app/service
💚 Build and deploy a reliable service
around your model with MLOps
3. Serverless ML “Prediction Service”
Once or Twice/day
Features Pipelines & Batch Prediction Pipelines
HOPSWORKS.AI
Features
Twice/day Predictions
Github Pages UI
Publish to UI
train model
https://github.com/jimdowling/cjsurf
Models
4. Serverless Python Functions
● render. com
● pythonanywhere.com
● replit.com
● deta.sh
● linode.com
● hetzner.com
● digitalocean.com
● AWS lambda functions
● Google Cloud Functions
Orchestration Platforms
● Astronomer (Airflow)
● Dagster
● Prefect
● Azure Data Factory
● Amazon Managed Workflows
for Apache Airflow (MWAA)
● Google Cloud Composer
● Databricks Workflows
Alternatives to Github Actions for Serverless Python
6. What height will the surf be at Lahinch this weekend?
When I lived in Dublin, I always wanted to
know what I would do the next weekend…
surfs up?
No Yes
7. We built a system called CJSurf to predict surf at Lahinch
Open Ocean Swell Predictions Lahinch Beach Surf Height Predictions
8. Swells/Waves have (1) height, (2) period, (3) direction
Height
Period is the time between waves
Direction
Wave height at the point is 4 times higher than wave height at the beach
10. Accurate Surf Height Observations by Lahinch Surf Shop
Reports are published at 10am every day by
https://www.lahinchsurfshop.com/
11. Can I write CJSurf from 2004 with with free managed services?
Can we rewrite a LAMP architecture to a free serverless Python architecture in 2022?
Java Data Collector
& K-NN Predictions.
CronJob.
Php Web App
MySQL
lahinchsurfhop.com noaa.gov (62081, 62105)
Production Machine Learning in 2004!
Lookup Precomputed Predictions
Write Features &
Predictions
12. Serverless Analytical ML Application in Python (2022)
surf-report-features.ipynb
swell-features.ipynb
batch-predict-surf.ipynb
Github
Pages
Hopsworks
Feature Store
Lahinch, NOAA
Hopsworks
Model Registry
download
model
latest_lahinch.png
insert
DataFrames
https://github.com/jimdowling/cjsurf
train-model.ipynb
add model
SERVERLESS COMPUTE SERVERLESS STATE SERVERLESS UI
14. Feature Engineering with Pandas/Spark/SQL/Flink
Feature Store
HOPSWORKS
DataFrames DataFrames/Files
Aggregations
Dimensionality
Reductions
Validations
Normalization
One-hot encoding
SQL
15. Feature Groups Feature Views
Search, Versioning, Metrics
Lineage, Source Code
</>
Feature Store: write to Feature Groups, read from Feature Views
Write DataFrames
Real-Time
Features
Batch Data
Read Feature Vectors
Online API
HOPSWORKS FEATURE STORE
Read Files/DataFrames
Offline API
17. Feature Engineering: what time does the swell “hit_at” Lahinch?
Prediction
Time=0
“hits_at”
Lahinch Time=?
The swell velocity is calculated by
multiplying the swell period by 1.5. But,
we also need to consider swell direction.
18. Swell Direction and the Swell Window at Lahinch
SWELL WINDOW
for Lahinch
Lahinch
Swell directions that work for
Lahinch ~(15-120 degrees)
24. Feature Groups Feature Views
Search, Versioning, Metrics
Lineage, Source Code
</>
Feature Store: write to Feature Groups, read from Feature Views
Write DataFrames
Real-Time
Features
Batch Data
Read Feature Vectors
Online API
HOPSWORKS FEATURE STORE
Read Files/DataFrames
Offline API
25. beach_id obs_time height min max
1 2004-01-01 10:00 1 1 1
1 2004-01-02 11:00 1.5 1 2
1 2004-01-03 12:00 3 2 4
lahinch_surf_reports updated every 24 hrs
buoy_id hits_at height direction period
62105 2004-01-01 00:00 1.25 88 9.8
62105 2004-01-02 06:00 1.30 92 10.2
62105 2004-01-03 12:00 2.45 100 11.4
noa_swells updated every 6 hrs
obs_time => hits_at height (swell) direction period height (label)
2004-01-01 10:00 1.25 88 9.8 1.5
2004-01-02 11:00 1.30 92 10.2 2
2004-01-03 12:00 2.45 100 11.4 3
Point-in-time Correct JOIN
(no future data leakage)
Join Features to create Point-in-time Correct Training Data
Training Data
33. Beyond Notebooks and Monolithic ML Pipelines
Feature
Engineering
Train
Model
Evaluate
Model
Raw Data
● Monolithic ML Pipelines are a single pipeline that transforms raw data
into features and trains and scores the model in one single program
● No easy path to production, so often just thrown over the wall to ops :)
34. Refactor Monolithic Pipelines into Feature, Training, Inference Pipelines
Feature
Pipeline
Historical Data
Hopsworks
Data Source
Batch
Inference
Pipeline
Training
Pipeline
model
features inference data
training data
predictions
model
Run on a
schedule
Run
on-demand
● A feature pipeline to create features from new live data or to backfill features
from historical data
● A training pipeline that can be run when a new model is needed
● An inference pipeline (either batch or online) that takes features from the feature
store, and if the model is online, combines them with online features.
backfill
new
data
35. Online Inference Pipelines are part of Model Serving Infra
● Some features are pre-computed and retrieved from the feature store
(typically those that require history and context information)
● Some features are computed on-demand (at run-time) with
application-supplied data (and possibly also history/context)
Feature
Pipeline
Historical Data
Hopsworks
Batch Source
Model
Serving
features
precomputed
features Application
or Service
request
on-demand
features
Stream Source
Training
Pipeline
model
training data
Run on a
schedule
Run
on-demand
Operational
Service
prediction
backfill
36. Case Study: Iris Flowers as a Batch Prediction Service
iris-feature-
pipeline.ipynb
iris.csv
Hopsworks
Synthetic Data
iris-batch-infere
nce-pipeline
.ipynb
iris-train-knn-
model.ipynb
register
model
features DataFrame
training data
iris_model
GH Actions
Once/day
Colab - run
on-demand
backfill
new
data
Github
Pages UI
https://github.com/featurestoreorg/serverless-ml-course/tree/main/src/01-module
38. Serverless ML Flywheels with Hopsworks
PyData London Exclusive: limited registrations now available at:
https://app.hopsworks.ai
Our Promise to you:
Time Unlimited Free Tier