The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks
Machine learning pipelines are a hot topic at the moment. Moving data through the pipeline in an efficient and predictable way is one of the most important aspects of running machine learning models in production.
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
Building an ML Platform with Ray and MLflowDatabricks
A successful machine learning platform allows ML practitioners to focus solely on their experiments and models and minimizes the time it takes to develop ML applications and take them to production. However, building an ML Platform is typically not an easy task due to the many different components involved in the process. In this talk, we will show how two open source projects, Ray (https://ray.io/) and MLflow (https://mlflow.org/), work together to make it easy for ML platform developers to add scaling and experiment management to their platform.
We will first provide an overview of Ray and its native libraries: Ray Tune (https://tune.io) for distributed hyperparameter tuning and Ray Serve (https://docs.ray.io/en/master/serve/index.html) for scalable model serving. Then we will showcase how MLflow provides a perfect solution for managing experiments through integrations with Ray for tracking and model deployment. Finally, we will finish with a demo of an ML platform built on Ray, MLflow, and other open source tools.
by Darin Briskman, Technical Evangelist, AWS
Database Freedom means being able to use the database engine that’s right for you as your needs evolve. Being locked into a specific technology can prevent you from achieving your mission. Fortunately, AWS Database Migration Service makes it easy to switch between different database engines. We’ll look at how to use Schema Migration Tool with DMS to switch from a commercial database to open source. You’ll need a laptop with a Firefox or Chrome browser.
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
Apache Spark is a popular in-memory data analytics engine because of its speed, scalability, and ease of use. It also fits well with DevOps practices and cloud-native software platforms. It’s good for data exploration, interactive analytics, and streaming use cases.
However, Spark, like other data-processing platforms, is not one size fits all. Different versions of Spark support different feature sets, and Spark’s machine-learning libraries can also vary in important ways between versions, or may lack the right algorithm.
In this webinar, you’ll learn:
- How to integrate data warehouse workloads with Spark
- Which workloads are better for Greenplum and for Spark
- How to use the Greenplum-Spark connector
Presenter: Kong Yew Chan, Product Manager, Pivotal
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks
Machine learning pipelines are a hot topic at the moment. Moving data through the pipeline in an efficient and predictable way is one of the most important aspects of running machine learning models in production.
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
In recent years, one of the biggest trends in applications development has been the rise of Machine Learning solutions, tools, and managed platforms. Vertex AI is a managed unified ML platform for all your AI workloads. On the MLOps side, Vertex AI Pipelines solutions let you adopt experiment pipelining beyond the classic build, train, eval, and deploy a model. It is engineered for data scientists and data engineers, and it’s a tremendous help for those teams who don’t have DevOps or sysadmin engineers, as infrastructure management overhead has been almost completely eliminated.
Based on practical examples we will demonstrate how Vertex AI Pipelines scores high in terms of developer experience, how fits custom ML needs, and analyze results. It’s a toolset for a fully-fledged machine learning workflow, a sequence of steps in the model development, a deployment cycle, such as data preparation/validation, model training, hyperparameter tuning, model validation, and model deployment. Vertex AI comes with all standard resources plus an ML metadata store, a fully managed feature store, and a fully managed pipelines runner.
Vertex AI Pipelines is a managed serverless toolkit, which means you don't have to fiddle with infrastructure or back-end resources to run workflows.
Building an ML Platform with Ray and MLflowDatabricks
A successful machine learning platform allows ML practitioners to focus solely on their experiments and models and minimizes the time it takes to develop ML applications and take them to production. However, building an ML Platform is typically not an easy task due to the many different components involved in the process. In this talk, we will show how two open source projects, Ray (https://ray.io/) and MLflow (https://mlflow.org/), work together to make it easy for ML platform developers to add scaling and experiment management to their platform.
We will first provide an overview of Ray and its native libraries: Ray Tune (https://tune.io) for distributed hyperparameter tuning and Ray Serve (https://docs.ray.io/en/master/serve/index.html) for scalable model serving. Then we will showcase how MLflow provides a perfect solution for managing experiments through integrations with Ray for tracking and model deployment. Finally, we will finish with a demo of an ML platform built on Ray, MLflow, and other open source tools.
by Darin Briskman, Technical Evangelist, AWS
Database Freedom means being able to use the database engine that’s right for you as your needs evolve. Being locked into a specific technology can prevent you from achieving your mission. Fortunately, AWS Database Migration Service makes it easy to switch between different database engines. We’ll look at how to use Schema Migration Tool with DMS to switch from a commercial database to open source. You’ll need a laptop with a Firefox or Chrome browser.
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
Apache Spark is a popular in-memory data analytics engine because of its speed, scalability, and ease of use. It also fits well with DevOps practices and cloud-native software platforms. It’s good for data exploration, interactive analytics, and streaming use cases.
However, Spark, like other data-processing platforms, is not one size fits all. Different versions of Spark support different feature sets, and Spark’s machine-learning libraries can also vary in important ways between versions, or may lack the right algorithm.
In this webinar, you’ll learn:
- How to integrate data warehouse workloads with Spark
- Which workloads are better for Greenplum and for Spark
- How to use the Greenplum-Spark connector
Presenter: Kong Yew Chan, Product Manager, Pivotal
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks.
Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model?
The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity.
The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters.
In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
#MLOps is a hot buzzword, just like #DevOps before it. It sparked a gold rush for software vendors, so it's hard to choose the best tool for your needs. Vertex AI is a unified MLOps platform for the entire #AI #workflow on #GoogleCloud. It is the 3rd iteration of the Google Cloud #ML platform (since its original launch), and we think they did it right (this time).
That's why #ServerlessTO invited 2 AI/ML gurus from #GCP (Jarek Kazmierczak & Brian Kang) to introduce the #VertexAI you to.
The lecture recording with Q&A is at https://youtu.be/X1S7360ip-k
MEETUP "CODE-ALONG" RESOURCES
Vertex workbench - Managed and User-managed Notebooks
https://cloud.google.com/vertex-ai/docs/workbench/managed/quickstarts
Example that the training code was based on - Fashion MNIST dataset
https://www.tensorflow.org/tutorials/keras/classification
Hyperparameter tuning codelab
https://codelabs.developers.google.com/vertex_hyperparameter_tuning
Vertex pipeline codelabs
https://codelabs.developers.google.com/vertex-pipelines-intro
https://codelabs.developers.google.com/vertex-pipelines-custom-model
CI/CD slides
https://github.com/shivajid/MLOpsCICD/blob/master/presentation/AI%20Workshop%20Day4.pdf
CI/CD github example
https://github.com/shivajid/MLOpsCICD
Model monitoring example
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/model_monitoring/model_monitoring.ipynb
Best practices for MLOps
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
https://cloud.google.com/resources/mlops-whitepaper
Official Vertex AI Github repository
https://github.com/GoogleCloudPlatform/vertex-ai-samples/
MEETUP CHAT LINKS
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/notebook_template.ipynb
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/official/custom
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/community/sdk
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
https://www.youtube.com/watch?v=ntBEQdD1IeQ&list=PLd31CCJlr9FrZazLqRg1Lxq7xw9b6VNP6&index=3
Getting Started with Databricks SQL AnalyticsDatabricks
It has long been said that business intelligence needs a relational warehouse, but that view is changing. With the Lakehouse architecture being shouted from the rooftops, Databricks have released SQL Analytics, an alternative workspace for SQL-savvy users to interact with an analytics-tuned cluster. But how does it work? Where do you start? What does a typical Data Analyst’s user journey look like with the tool?
This session will introduce the new workspace and walk through the various key features – how you set up a SQL Endpoint, the query workspace, creating rich dashboards and connecting up BI tools such as Microsoft Power BI.
If you’re truly trying to create a Lakehouse experience that satisfies your SQL-loving Data Analysts, this is a tool you’ll need to be familiar with and include in your design patterns, and this session will set you on the right path.
Spark as a Service with Azure DatabricksLace Lofranco
Presented at: Global Azure Bootcamp (Melbourne)
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we will go through Azure Databricks key collaboration features, cluster management, and tight data integration with Azure data sources. We’ll also walk through an end-to-end Recommendation System Data Pipeline built using Spark on Azure Databricks.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
Hyperspace: An Indexing Subsystem for Apache SparkDatabricks
At Microsoft, we store datasets (both from internal teams and external customers) ranging from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative, ‘finding needle in a haystack’ type of queries (e.g., point-lookups, summarization etc.).
SAP Cloud Platform - Integration, Extensibility & ServicesAndrew Harding
SAP Cloud Platform enables businesses to extend their SAP solutions to create new applications, integrate with other SAP solutions and external third parties (applications, businesses & government) with the addition of cloud services bringing access to the latest technologies such as IoT, Machine Learning, Intelligent RPA, etc.
Predix Builder Roadshow event content detailing the Industrial Internet of Things, Building the Digital Twin, Predix Edge Essential, Predix Dojo Program, and upcoming Predix events.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks.
Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model?
The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity.
The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters.
In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
#MLOps is a hot buzzword, just like #DevOps before it. It sparked a gold rush for software vendors, so it's hard to choose the best tool for your needs. Vertex AI is a unified MLOps platform for the entire #AI #workflow on #GoogleCloud. It is the 3rd iteration of the Google Cloud #ML platform (since its original launch), and we think they did it right (this time).
That's why #ServerlessTO invited 2 AI/ML gurus from #GCP (Jarek Kazmierczak & Brian Kang) to introduce the #VertexAI you to.
The lecture recording with Q&A is at https://youtu.be/X1S7360ip-k
MEETUP "CODE-ALONG" RESOURCES
Vertex workbench - Managed and User-managed Notebooks
https://cloud.google.com/vertex-ai/docs/workbench/managed/quickstarts
Example that the training code was based on - Fashion MNIST dataset
https://www.tensorflow.org/tutorials/keras/classification
Hyperparameter tuning codelab
https://codelabs.developers.google.com/vertex_hyperparameter_tuning
Vertex pipeline codelabs
https://codelabs.developers.google.com/vertex-pipelines-intro
https://codelabs.developers.google.com/vertex-pipelines-custom-model
CI/CD slides
https://github.com/shivajid/MLOpsCICD/blob/master/presentation/AI%20Workshop%20Day4.pdf
CI/CD github example
https://github.com/shivajid/MLOpsCICD
Model monitoring example
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/model_monitoring/model_monitoring.ipynb
Best practices for MLOps
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
https://cloud.google.com/resources/mlops-whitepaper
Official Vertex AI Github repository
https://github.com/GoogleCloudPlatform/vertex-ai-samples/
MEETUP CHAT LINKS
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/notebook_template.ipynb
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/official/custom
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/community/sdk
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
https://www.youtube.com/watch?v=ntBEQdD1IeQ&list=PLd31CCJlr9FrZazLqRg1Lxq7xw9b6VNP6&index=3
Getting Started with Databricks SQL AnalyticsDatabricks
It has long been said that business intelligence needs a relational warehouse, but that view is changing. With the Lakehouse architecture being shouted from the rooftops, Databricks have released SQL Analytics, an alternative workspace for SQL-savvy users to interact with an analytics-tuned cluster. But how does it work? Where do you start? What does a typical Data Analyst’s user journey look like with the tool?
This session will introduce the new workspace and walk through the various key features – how you set up a SQL Endpoint, the query workspace, creating rich dashboards and connecting up BI tools such as Microsoft Power BI.
If you’re truly trying to create a Lakehouse experience that satisfies your SQL-loving Data Analysts, this is a tool you’ll need to be familiar with and include in your design patterns, and this session will set you on the right path.
Spark as a Service with Azure DatabricksLace Lofranco
Presented at: Global Azure Bootcamp (Melbourne)
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we will go through Azure Databricks key collaboration features, cluster management, and tight data integration with Azure data sources. We’ll also walk through an end-to-end Recommendation System Data Pipeline built using Spark on Azure Databricks.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
Hyperspace: An Indexing Subsystem for Apache SparkDatabricks
At Microsoft, we store datasets (both from internal teams and external customers) ranging from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative, ‘finding needle in a haystack’ type of queries (e.g., point-lookups, summarization etc.).
SAP Cloud Platform - Integration, Extensibility & ServicesAndrew Harding
SAP Cloud Platform enables businesses to extend their SAP solutions to create new applications, integrate with other SAP solutions and external third parties (applications, businesses & government) with the addition of cloud services bringing access to the latest technologies such as IoT, Machine Learning, Intelligent RPA, etc.
Predix Builder Roadshow event content detailing the Industrial Internet of Things, Building the Digital Twin, Predix Edge Essential, Predix Dojo Program, and upcoming Predix events.
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
This presentation will introduce usage of Apache Apex for Time Series & Data Ingestion Service by General Electric Internet of things Predix platform. Apache Apex is a native Hadoop data in motion platform that is being used by customers for both streaming as well as batch processing. Common use cases include ingestion into Hadoop, streaming analytics, ETL, database off-loads, alerts and monitoring, machine model scoring, etc.
Abstract: Predix is an General Electric platform for Internet of Things. It helps users develop applications that connect industrial machines with people through data and analytics for better business outcomes. Predix offers a catalog of services that provide core capabilities required by industrial internet applications. We will deep dive into Predix Time Series and Data Ingestion services leveraging fast, scalable, highly performant, and fault tolerant capabilities of Apache Apex.
Speakers:
- Venkatesh Sivasubramanian, Sr Staff Software Engineer, GE Predix & Committer of Apache Apex
- Pramod Immaneni, PPMC member of Apache Apex, and DataTorrent Architect
E3: Edge and Cloud Connectivity (Predix Transform 2016)Predix
http://predixtransform.com
The edge is where the Industrial Internet starts (and ends). Understand the roles Predix Machine and Connectivity play for your app architecture. Then use the essential tool kits to build your own edge-connected apps. We'll cover edge management (enrollment and security), edge analytics, and data ingestion (e.g., HTTP and MQTT).
E1: Building the Digital Twin (Predix Transform 2016)Predix
http://predixtransform.com
Understand how to develop analytics models using the Asset and Analytics services within Predix. We'll start with a quick tour of the conceptual framework, and then dive deep into actual modeling and deployment examples that you can use. This session will include demo and code walk-through.
D4: Predix Cool Features (Predix Transform 2016) Predix
http://predixtransform.com
See what's brewing in the Predix architecture labs. We'll provide examples of features and additions currently under consideration. While we cannot guarantee that all ideas will eventually become products, we promise this session to be packed with interesting and perhaps awe-inspiring previews.
Topics covered: Predix Appliance; Extended Asset Service; Knowledge Graph; Blockchain for Industrial Internet of Things
Unified Analytics in GE’s Predix for the IIoT: Tying Operational Technology t...Altoros
Learn how to achieve holistic operational visibility into IIoT business environments by correlating the data from Operational Technology and IT, and organizing it as a single pane of glass in accordance with business processes.
PAM3: Machine Learning in the Railway Industry ( Predix Transform 2016)Predix
http://predixtransform.com
See how Machine Learning algorithms and video analytics, powered by Predix, has been used to detect defects in railways tracks. View demos build using Python and OpenCV, and an actual field video showing different cases of anomaly detection.
1. What does Predix bring to the table?
2. How is it different to Cloud Foundry and IBM Bluemix?
3. Predix service catalog. Which services can set Predix apart?
4. Top use cases and apps
5. Likely scenarios of Predix evolution
http://PredixTransform.com
How do you securely connect industrial devices to the Cloud? What if you could save a plant millions with a $250 thermal camera? This is what our team wanted to find out. We sent a member of the team to a power plant to capture thermal images of the site and then created a ( Predix based) Matlab / Python based algorithm to identify potential issues.
E4: Building Your First Predix App (Predix Transform 2016)Predix
http://predixtransform.com
How do you build your first Predix app or service? This session provides the essentials. We'll provide a step-by-step demo on building a simple app using PX and consuming some of the fundamental Predix services like UAA. We'll also cover the Predix mobile, and provide a tour of the Predix.io developer portal.
GE Predix Transform 2016 - UX & Customer EngagementDavid Bingham
With the digitization of industry comes the need for a new approach in engaging customers interested in the industrial internet. The inherent complexities of the IIoT creates latent problems that cannot be successfully addressed using traditional sales techniques or fitting legacy off-the-shelf solutions. This presentation will demonstrate how GE Digital positions UX practices at the front end of customer engagements, guiding consultative discovery sessions to shape business opportunities in the industrial internet.
The materials presented include a set of methods curated in trial-by-fire situations with Emerging Vertical customers from 2014 to 2016. Combining aspects of outcome based sales and design thinking through a co-creation process, the audience will learn how to address the interests of executive stakeholders, gather and prioritize business outcomes, and derive strategic roadmaps inclusive of user-needs.
PEM1: Device Authentication in IIOT ( Predix Transform 2016)Predix
http://PredixTransform.com
How do you securely connect industrial devices to the Cloud? People speak usernames and passwords, but machines don’t. We’ll discuss requirements and technical approaches for device authentication for the Industrial IoT, including X509 certificates, cryptographically signed token, and two-way TLS.
산업용 클라우드 플랫폼 - 프레딕스, Industrial cloud platform – Predix, 2016스마트공장 국제 컨퍼런스GE코리아
2016 스마트공장 국제 컨퍼런스
산업용 클라우드 플랫폼 - 프레딕스
Industrial cloud platform – Predix
"어제까지는 제조산업 기반의 회사였지만, 이제는 데이터 및 분석 회사로 거듭나야 합니다." 제프 이멜트 GE 회장 및 최고경영자
클라우드를 통해 생산현장을 개선합니다. 실제 현장 과 디지털 현장 이 서로 소통합니다. 1%의 생산성 개선으로 GE 내부적으로만 $500MM(6조원)를 절감할 수 있습니다.
Presentation in IBM Cloud Meet-up of Toronto
https://www.meetup.com/IBM-Cloud-Toronto/events/253903913/?_xtd=gatlbWFpbF9jbGlja9oAJGU3NmM3ZjdmLWE2NzgtNGVlNC1iNGZiLTBlZGE5ZWM0NDZjOQ
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
The US government has built hundreds of applications that must be refactored to task advantage of modern distributed systems. This session discusses EzBake, an open-source, secure big data platform deployed on top of Amazon EC2 and using Amazon S3 and Amazon RDS. This solution has helped speed the US government to the cloud and make big data easy. Furthermore this session discusses critical architecture design decisions through the creation of the platform in order to add additional security, leverage future AWS offerings, and cut total operations and maintenance costs.
Sponsored by CSC
Critical Considerations for Moving Your Core Business Applications to the Clo...Amazon Web Services
From the Amazon Web Services Singapore & Malaysia Summits 2015 Track 1 Breakout, 'Critical Considerations for Moving Your Core Business Applications to the Cloud' Presented by Leo Valaris, Director, CloudSuite Solutions - Infor
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Modernizing Testing as Apps Re-ArchitectDevOps.com
Applications are moving to cloud and containers to boost reliability and speed delivery to production. However, if we use the same old approaches to testing, we'll fail to achieve the benefits of cloud. But what do we really need to change? We know we need to automate tests, but how do we keep our automation assets from becoming obsolete? Automatically provisioning test environments seems close, but some parts of our applications are hard to move to cloud.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
7. Predix Cloud
• Scalable cloud infrastructure as PAAS
• Can handle industrial data
• Supports security and regulatory compliances
• Software Defined Infrastructure for abstraction over
hardware
• SDI enables shared infrastructure and dynamic
automation
• Based on cloud foundry
8.
9. Dev Ops
• Continually integrate and deliver new features through the
Continuous Delivery (CD) Pipeline service
• Automated software builds and application deployment
• Always be ready to deploy to production
• Always place emphasis on speed, efficiency and stability
• Source control management (SCM)
10. Biz Ops
• Subscription — The customer pays a fixed amount for the
product – monthly, quarterly, or annually.
• Utility — The customer pays as it consumes the product.
• Freemium — The customer enjoys the basic product for free
and only pays for add-on or premium services
11. Asset Services
• REST API layer —Applications can access the domain object
modeling layer using REST endpoints that provide a JSON
interface to describe all of their objects. The service translates
data from JSON to RDF triples for storage and query in the
graph database, and back to JSON again.
• Query engine — The query engine enables developers to use
Graph Expression Language (GEL) to retrieve data about any
object or property of any object in the asset service data store.
• Graph database — The Asset service data store is a graph
database that stores data as RDF triples.
12. Data Services
• data ingestion, cleanse the data, merge the data with other data
sources, and ultimately store the data in the appropriate type of
data store
• time series data store for sensor data
• Binary Large Object (BLOB) store for MRI images
• RDBMS – Postgress database
• HTTP streaming for real- or near-real-time data (‘fast’ data)
• FTP for more batch-style processing.
• Data ingestion supports industrial formats – Historian and OSI
13. Time series sensor data
• Efficient storage oftime series data
• Indexing the data for quick retrieval
• High availability
• Horizontal scalability
• Millisecond data point precision
14. Analytics
• Operational analytics — Data is analyzed in real time at the source
an aircraft engine, wind turbine, MRI machine, etc. — to detect
problems so that split-second changes can be made in the
operation of the asset to prevent damage and optimize
performance.
• Historical analytics — The collection and analysis of petabytes of
historical operational data. From this analysis, it is possible to build
predictive models that can be used to more efficiently operate
entire manufacturing plants or fleets of equipment.
15. Analytics
• Analytic Catalog service makes it easy to deploy an analytic
independently as a microservice and can be interacted through
REST APIs and the user interface.
• Each analytic is executed as a separate microservice; the
orchestration execution microservice coordinates their work.
• Orchestration is a group of analytics to be run as a single unit. Its
analytic workflow is defined within an Orchestration BPMN file (an
XML file conforming to the BPMN 2.0 standard).
16. Security
The UAA service: applications to authenticate users. An application developer can bind
to the UAA service in the marketplace and then use the industry standards SCIM and
Oauth to handle identity management and authentication, respectively. Together,
these two capabilities provide the basic login and logout support that every
application needs.
UAA supports SAML (Security Assertion Markup Language), which enables users to
login using third-party identity providers
The basic UAA features have been extended to include the following:
• User whitelisting: Ensures only a qualified subset of authenticated users
can login to an application.
• Client-side token validation: Eliminates extra network round trips and significantly
improves performance
17. Security
Access Control Service:
Predix Access Control service is a policy-driven authorization
service that enables applications to create access restrictions to
resources based on a number of criteria.
The policy language is JSON-based and was developed as an answer to
the deficiencies in XACML.
The access control service is well integrated with UAA and provides a
Spring security extension to make it easy for Spring Boot applications
to make access decisions.
19. Login:
cf login -a https://api.system.aws-usw02-pr.ice.predix.io
List the services in the Cloud Foundry marketplace:
cf marketplace
Create a UAA instance by entering the following command.
cf create-service predix-uaa <plan> <my_uaa_instance> -c
'{"adminClientSecret":"<my_secret>","subdomain":"<my_subdomain>"}’
<plan> is the plan associated with a service. For example, you can use the tiered plan
for the predix-uaa service.
-c option is used to specify following additional parameters.
adminClientSecret specifies the client secret.
subdomain specifies a sub-domain you might need to use in addition to the domain
created for UAA. This is an optional parameter. You must not add special characters in
the name of the sub-domain. The value of sub-domain is case insensitive.
20.
21.
22.
23. Extra reading
• Historian - http://help.geautomation.com/Historian55/Subsystems/iHistGS/content/hgs_overview_of_ihistorian.htm