Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
Unified MLOps: Feature Stores & Model DeploymentDatabricks
If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning.
Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks.
The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data.
In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Unified MLOps: Feature Stores & Model DeploymentDatabricks
If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning.
Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks.
The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data.
In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
"Managing the Complete Machine Learning Lifecycle with MLflow"Databricks
Machine Learning development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open-source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Graph Databases and Machine Learning | November 2018TigerGraph
Graph Database and Machine Learning: Finding a Happy Marriage. Graph Databases and Machine Learning
both represent powerful tools for getting more value from data, learn how they can form a harmonious marriage to up-level machine learning.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis.
Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them.
Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy.
This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Advanced Natural Language Processing with Apache Spark NLPDatabricks
This hands-on deep-dive session uses the open-source Apache Spark NLP library to explore advanced NLP in Python. Apache Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of some of the most recent research in applied deep learning. Apache Spark NLP is the only open-source NLP library that can natively scale to use any Apache Spark cluster, as well as take advantage of the latest processors from Intel and Nvidia. It’s the most widely used NLP library in the enterprise today.
You’ll edit and run executable Python notebooks as we walk through these common NLP tasks: document classification, named entity recognition, sentiment analysis, spell checking and correction, grammar understanding, question answering, and translation. The discussion of each NLP task includes the latest advances in deep learning and transfer learning used to tackle it – from the hundreds of BERT-based embeddings to models based on the T5 transformer, MarianNMT, multilingual and domain-specific models.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
With the breadth of sheer functionalities which need to be addressed in the Machine Learning world around building, training, serving and managing models, getting it done in a consistent, composable, portable, and scalable manner is hard. The Kubernetes framework is well suited to address these issues, which is why it's a great foundation for deploying ML workloads. Kubeflow is designed to take advantage of these benefits. In this talk, we are going to address how to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and support the full lifecycle Machine Learning using open source technologies like Kubeflow, Tensorflow, PyTorch,Tekton, Knative, Istio and others. We are going to discuss how to enable distributed training of models, model serving, canary rollouts, drift detection, model explainability, metadata management, pipelines and others. Additionally we will discuss Watson productization in progress based on Kubeflow Pipelines and Tekton, and point to Kubeflow Dojo materials and follow-on workshops.
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
In large enterprises, large solutions are sometimes required to tackle even the smallest tasks and ML is no different. At Comcast we are building a comprehensive, configuration based, continuously integrated and deployed platform for data pipeline transformations, model development and deployment. This is accomplished using a range of tools and frameworks such as Databricks, MLflow, Apache Spark and others. With a Databricks environment used by hundreds of researchers and petabytes of data, scale is critical to Comcast, so making it all work together in a frictionless experience is a high priority. The platform consists of a number of components: an abstraction for data pipelines and transformation to allow our data scientists the freedom to combine the most appropriate algorithms from different frameworks , experiment tracking, project and model packaging using MLflow and model serving via the Kubeflow environment on Kubernetes. The architecture, progress and current state of the platform will be discussed as well as the challenges we had to overcome to make this platform work at Comcast scale. As a machine learning practitioner, you will gain knowledge in: an example of data pipeline abstraction; ways to package and track your ML project and experiments at scale; and how Comcast uses Kubeflow on Kubernetes to bring everything together.
PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo using PyCaret in your existing workflows and supercharges your data science team's productivity.
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3).pptx
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
"Managing the Complete Machine Learning Lifecycle with MLflow"Databricks
Machine Learning development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open-source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Graph Databases and Machine Learning | November 2018TigerGraph
Graph Database and Machine Learning: Finding a Happy Marriage. Graph Databases and Machine Learning
both represent powerful tools for getting more value from data, learn how they can form a harmonious marriage to up-level machine learning.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis.
Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them.
Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy.
This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Advanced Natural Language Processing with Apache Spark NLPDatabricks
This hands-on deep-dive session uses the open-source Apache Spark NLP library to explore advanced NLP in Python. Apache Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of some of the most recent research in applied deep learning. Apache Spark NLP is the only open-source NLP library that can natively scale to use any Apache Spark cluster, as well as take advantage of the latest processors from Intel and Nvidia. It’s the most widely used NLP library in the enterprise today.
You’ll edit and run executable Python notebooks as we walk through these common NLP tasks: document classification, named entity recognition, sentiment analysis, spell checking and correction, grammar understanding, question answering, and translation. The discussion of each NLP task includes the latest advances in deep learning and transfer learning used to tackle it – from the hundreds of BERT-based embeddings to models based on the T5 transformer, MarianNMT, multilingual and domain-specific models.
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
With the breadth of sheer functionalities which need to be addressed in the Machine Learning world around building, training, serving and managing models, getting it done in a consistent, composable, portable, and scalable manner is hard. The Kubernetes framework is well suited to address these issues, which is why it's a great foundation for deploying ML workloads. Kubeflow is designed to take advantage of these benefits. In this talk, we are going to address how to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and support the full lifecycle Machine Learning using open source technologies like Kubeflow, Tensorflow, PyTorch,Tekton, Knative, Istio and others. We are going to discuss how to enable distributed training of models, model serving, canary rollouts, drift detection, model explainability, metadata management, pipelines and others. Additionally we will discuss Watson productization in progress based on Kubeflow Pipelines and Tekton, and point to Kubeflow Dojo materials and follow-on workshops.
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
In large enterprises, large solutions are sometimes required to tackle even the smallest tasks and ML is no different. At Comcast we are building a comprehensive, configuration based, continuously integrated and deployed platform for data pipeline transformations, model development and deployment. This is accomplished using a range of tools and frameworks such as Databricks, MLflow, Apache Spark and others. With a Databricks environment used by hundreds of researchers and petabytes of data, scale is critical to Comcast, so making it all work together in a frictionless experience is a high priority. The platform consists of a number of components: an abstraction for data pipelines and transformation to allow our data scientists the freedom to combine the most appropriate algorithms from different frameworks , experiment tracking, project and model packaging using MLflow and model serving via the Kubeflow environment on Kubernetes. The architecture, progress and current state of the platform will be discussed as well as the challenges we had to overcome to make this platform work at Comcast scale. As a machine learning practitioner, you will gain knowledge in: an example of data pipeline abstraction; ways to package and track your ML project and experiments at scale; and how Comcast uses Kubeflow on Kubernetes to bring everything together.
PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo using PyCaret in your existing workflows and supercharges your data science team's productivity.
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3).pptx
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
Lingua talk: Italiano.
Descrizione:
In questo talk parleremo di come integrare e utilizzare ArangoDB, un database multi-modello con supporto nativo ai grafi, con R. Presenteremo quindi aRangodb, il package che abbiamo sviluppato per interfacciarsi in modo più semplice e intuitivo al database. Nel corso del talk mostreremo come il package possa essere utilizzato in ambito data science usando alcuni case studies concreti.
Speaker:
Gabriele Galatolo - Data Scientist - Kode srl
Applying graph analytics on data stored in relational databases can provide tremendous value in many application domains. We discuss the importance of leveraging these analyses, and the challenges in enabling them. We present a tool, called GraphGen, that allows users to visually explore, and rapidly analyze (using NetworkX) different graph structures present in their databases.
008 GNNs at Scale With Graph Data Science Sampling and Python Client Integrat...Neo4j
Graph neural networks (GNN) is a tool that brings great predictive power to graph machine learning tasks such as link prediction and node classification. However, GNN architectures are typically very compute heavy and as such are not feasible to run at massive scale. In this talk, we will leverage the graph sampling features of the Neo4j Graph Data Science (GDS) library as well as the inductive power of GNNs to bring GNNs to scale. We will also show how the GDS Python Client can, with great performance, be used to integrate the GDS workflow with other GNN Python libraries.
Multiplaform Solution for Graph DatasourcesStratio
One of the top banks in Europe, needed a system to provide better performance, scaling almost linearly with the increase in information to be analyzed, and allowing to move the processes that were currently being executed in the Host to a Big Data infrastructure. During a year we've worked on a system which is able to provide greater agility, flexibility and simplicity for the user to view information when profiling and is now able to analyze the structure of profile data. It's a powerful way to make online queries to a graph database, which is integrated with Apache Spark and different graph libraries. Basically, we get all the necessary information through Cypher queries which are sent to a Neo4j database.
Using the last Big Data technologies like Spark Dataframe, HDFS, Stratio Intelligence or Stratio Crossdata, we have developed a solution which is able to obtain critical information for multiple datasources like text files o graph databases.
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
With the torrent of data available to us on the Internet, it's been increasingly difficult to separate the signal from the noise. We set out on a journey with a simple directive: Figure out a way to discover emerging technology trends. Through a series of experiments, trials, and pivots, we found our answer in the power of graph databases. We essentially built our "Emerging Tech Radar" on emerging technologies with graph databases being central to our discovery platform. Using a mix of NoSQL databases and open source libraries we built a scalable information digestion platform which touches upon multiple topics such as NLP, named entity extraction, data cleansing, cypher queries, multiple visualizations, and polymorphic persistence.
Apache Spark Toronto Meetup, July 27, 2016.
Wattpad talks about their experiences with Apache Spark. From starting in 2014 with Shark, to building distributed recommendation algorithms using ANN, to improving search results using a sessionized query log. We also talk about some of the issues we faced building our analytics pipeline, including getting spark to work with Luigi, an open source project by Spotify.
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
Spark and GraphX in the Netflix Recommender System: We at Netflix strive to deliver maximum enjoyment and entertainment to our millions of members across the world. We do so by having great content and by constantly innovating on our product. A key strategy to optimize both is to follow a data-driven method. Data allows us to find optimal approaches to applications such as content buying or our renowned personalization algorithms. But, in order to learn from this data, we need to be smart about the algorithms we use, how we apply them, and how we can scale them to our volume of data (over 50 million members and 5 billion hours streamed over three months). In this talk we describe how Spark and GraphX can be leveraged to address some of our scale challenges. In particular, we share insights and lessons learned on how to run large probabilistic clustering and graph diffusion algorithms on top of GraphX, making it possible to apply them at Netflix scale.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
Accelerating NLP with Dask and Saturn CloudSujit Pal
Slides for talk delivered at NY NLP Meetup. Abstract -- Python has a great ecosystem of tools for natural language processing (NLP) pipelines, but challenges arise when data sizes and computational complexity grows. Best case, a pipeline is left to run overnight or even over several days. Worst case, certain analyses or computations are just not possible. Dask is a Python-native parallel processing tool that enables Python users to easily scale their code across a cluster of machines. This talk presents an example of an NLP entity extraction pipeline using SciSpacy with Dask for parallelization. This pipeline extracts named entities from the CORD-19 dataset, using trained models from the SciSpaCy project, and makes them available for downstream tasks in the form of structured Parquet files. The pipeline was built and executed on Saturn Cloud, a platform that makes it easy to launch and manage Dask clusters. The talk will present an introduction to Dask and explain how users can easily accelerate Python and NLP code across clusters of machines.
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
Dan Lynn (AgilData) & Patrick Russell (Craftsy) present on how to do data science in the real world. We discuss data cleansing, ETL, pipelines, hosting, and share several tools used in the industry.
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
Alexey Zinoviev presented this paper on the Jocker conference http://jokerconf.com/#zinoviev.
This paper covers next topics: Data Mining, Machine Learning, Mahout, Spark, MLlib, Python, Octave, R language
In computer science and mathematics, graphs are abstract data structures that model structural relationships among objects. They are now widely used for data modeling in application domains for which identifying relationship patterns, rules, and anomalies is useful. These domains include the web graph,
social networks,etc. The ever increasing size of graph structured data for these applications creates a critical need for scalable systems that can process large amounts of it efficiently. The project aims at making a benchmarking tool for testing the performance of graph algorithms like BFS, Pagerank, DFS. with
MapReduce, Giraph, GraphLab and testing which approach works better on what kind of graphs.
Similar to Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022 (20)
The ArangoML Group had a detailed discussion on the topic "GraphSage Vs PinSage" where they shared their thoughts on the difference between the working principles of two popular Graph ML algorithms. The following slidedeck is an accumulation of their thoughts about the comparison between the two algorithms.
Webinar: ArangoDB 3.8 Preview - Analytics at Scale ArangoDB Database
The ArangoDB community and team are proud to preview the next version of ArangoDB, an open-source, highly scalable graph database with multi-model capabilities. Join our CTO, Jörg Schad, Ph.D. and Developer Relation Engineer Chris Woodward in this webinar to learn more about ArangoDB 3.8 and the roadmap for upcoming releases.
These are the slides from the Getting Started with ArangoDB Oasis webinar: https://www.arangodb.com/events/getting-started-with-arangodb-oasis/
Get your own Oasis with a free 14-day trial (no credit card required) at https://cloud.arangodb.com/home.
These are the slides to the webinar about Custom Pregel algorithms in ArangoDB https://youtu.be/DWJ-nWUxsO8. It provides a brief introduction to the capabilities and use cases for Pregel.
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
örg Schad (Head of Engineering and ML) and Chris Woodward (Developer Relations Engineer) introduce the new capabilities to work with graph in a distributed setting. In addition explain and showcase the new fuzzy search within ArangoDB's search engine as well as JSON schema validation.
Get started with ArangoDB: https://www.arangodb.com/arangodb-tra...
Explore ArangoDB Cloud for free with 1-click demos: https://cloud.arangodb.com/home
ArangoDB is a native multi-model database written in C++ supporting graph, document and key/value needs with one engine and one query language. Fulltext search and ranking is supported via ArangoSearch the fully integrated C++ based search engine in ArangoDB.
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
View the video of this webinar here: https://www.arangodb.com/arangodb-events/gvisor-kata-containers-firecracker-docker/
Containers* have revolutionized the IT landscape and for a long time. Docker seemed to be the default whenever people were talking about containerization technologies**. But traditional container technologies might not be suitable if strong isolation guarantees are required. So recently new technologies such as gVisor, Kata Container, or firecracker have been introduced to close the gap between the strong isolation of virtual machines and the small resource footprint of containers.
In this talk, we will provide an overview of the different containerization technologies, discuss their tradeoffs, and provide guidance for different use cases.
* We will define the term container in more detailed during the talk
** and yes we will also cover some of the pre-docker container space!
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
Find the recording of this webinar here: https://www.arangodb.com/arangodb-events/3-7-roadmap-performance-at-scale/
After the release of ArangoDB 3.6 we are starting to work on the next version with even more exciting features. As an open-source project we would love to hear your ideas and discuss the roadmap with our community.
Would you like to learn more about Satellite Graphs, Schema Validation, a number of performance and security improvements?
Than join Jörg Schad, Head of Engineering and Machine Learning at ArangoDB, who will share the latest plans for the upcoming ArangoDB 3.7 release as well as the long term roadmap.
The long-awaited Managed Service for ArangoDB is finally here! Users have a fully managed document, graph, and key/value store, plus a search engine, in one place. As we thought of such a powerful service — something that gives you room to breathe, relax, and having someone else taking care of everything —, we called it Oasis.
In this live webinar, Ewout Prangsma, Architect & Teamlead of ArangoDB Oasis, walks you through all the main capabilities of the new service, including high availability, elastic scalability, enterprise-grade security, and also demo the different deployment modes you have at your fingertips.
Before the Q&A part, Ewout also shares what you will be capable of in the future.
The new ArangoDB 3.5 release is here and includes a number of minor and major new features. For example, the ability to perform distributed JOIN operations with SmartJoins, new text search features in ArangoSearch, new consistent backup mechanism, and extended graph database features including k-shortest path queries and the new PRUNE keyword for more efficient queries. Jörg Schad, our Head of Engineering and Machine Learning, will discuss these new features and provide a hands-on demo on how to leverage them for your use case.
Associated webinar recording: https://youtu.be/sTWVmw4GT9A
The new ArangoDB 3.5 release is here and includes a number of minor and major new features. For example, the ability to perform distributed JOIN operations with SmartJoins, new text search features in ArangoSearch, new consistent backup mechanism, and extended graph database features including k-shortest path queries and the new PRUNE keyword for more efficient queries. Jörg Schad, our Head of Engineering and Machine Learning, will discuss these new features and provide a hands-on demo on how to leverage them for your use case.
These are the slides from the webinar, where Chris & Jan walked through the basic concepts, key features and query options you have within ArangoDB as well as discuss scalability considerations for different data models. Chris is the hands-on guy and will showcase a variety of query options you have with a native multi-model database like ArangoDB
In these slides, Jan Steemann, core member of the ArangoDB project, introduced to the idea of native multi-model databases and how this approach can provide much more flexibility for developers, software architects & data scientists.
Running complex data queries in a distributed systemArangoDB Database
With the always-growing amount of data, it is getting increasingly hard to store and get it back efficiently. While the first versions of distributed databases have put all the burden of sharding on the application code, there are now some smarter solutions that handle most of the data distribution and resilience tasks inside the database.
This poses some interesting questions, e.g.
- how are other than by-primary-key queries actually organized and executed in a distributed system, so that they can run most efficiently?
- how do the contemporary distributed databases actually achieve transactional semantics for non-trivial operations that affect different shards/servers?
This talk will give an overview of these challenges and the available solutions that some open source distributed databases have picked to solve them.
Guacamole Fiesta: What do avocados and databases have in common?ArangoDB Database
First, our CTO, Frank Celler, does a quick overview of the latest feature developments and what is new with ArangoDB.
Then, Senior Graph Specialist, Michael Hackstein talks about multi-model database movement, diving deeper into main advantages and technological benefits. He introduces three data-models of ArangoDB (Documents, Graphs and Key-Values) and the reasons behind the technology. We have a look at the ArangoDB Query language (AQL) with hands-on examples. Compare AQL to SQL, see where the differences are and what makes AQL better comprehensible for developers. Finally, we touch the Foxx Microservice framework which allows to easily extend ArangoDB and include it in your microservices landscape.
Different applications need different performance guarantees. Some applications need fastest possible ingest of a dataset (a hare). Other applications need write rate guarantees for every single record (a tortoise). The choice between the two becomes critical as CPU cores, memory size, and disk throughput decrease to save money on cloud virtual machines / AWS instances. This presentation demonstrates how to convert RocksDB's natural "hare mode" into "tortoise mode" for timing sensitive applications and/or lower cost, lower capability hardware. RocksDB's stalls and stops are explained.
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are several different necessary components which are anything but trivial to combine, and, of course, even more challenging when attempting to optimize for performance. Over the past years there has been significant progress in both the science and practical implementations of such data stores. In this talk Dan Larkin-York will introduce the audience to some of the challenges, address the difficulties of their interplay, and cover key approaches taken by some of the industry’s leaders (ArangoDB, Cassandra, CockroachDB, MarkLogic, and more).
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
This talk presents a genuine use case of ArangoDB's native multi-model approach, by means of the example of an e-commerce app. First the main advantages of a "multi-model" database are explained. Then we dive deep into the native multi-model database ArangoDB and its query language - AQL. We give an introduction to the three data-models ArangoDB covers (Documents, Graphs and Key-Values), and explain that AQL is a uniform query language that can cover all three data-models of ArangoDB, so no context switches are necessary.
The major part of the talk will explain the data model and show concrete AQL queries that would occur in an e-commerce platform. Max will demonstrate the multi-model advantages of AQL and how they lead to better performance and to a simpler life for developers.
Video: https://youtu.be/9MUhdPpPpPc
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
5. 5
What and Why of Graph
Making Relationships
a First Class Citizen
● ArangoDB turns
the value of data
relationships into
actionable results
● Data relationships
are the foundation
of AI/ML models
SQL DB
Product 1 Price Category Description
e.g. Product Listing
Product 2 Price Category Description
Product 3 Price Category Description
Graph/NoSQL DB
e.g. Co-Purchase Pattern
Product 2
Product 4
Product 1
Product 3
Product 4 Price Category Description
Rather than focus on
individual rows or products…
Graph DB captures dependencies and
relationships between those products
6. 6
Graph Database
●Collection of nodes and edges
●Naturally describes relations in data
●Feasibly handles large joins/traversals
●Built-in graph algorithms (K paths, shortest path, etc)
●Use Cases:
○ Fraud Detection
○ Supply Chain Management
○ Recommendations
○ Customer 360
○ Network Management
○ Risk Management
7. 7
ML + Graph Databases
GraphDB
ML Ecosystem
…
GraphQL
Data
Ecosystem
Knowledge Graph
MetaData
Graph Analytics
GraphML Inferences
Embeddings/ Inferences
Graph data
DGL, PyG, NetworkX,...
Cloud
16. 16
Content-based Filtering
● Very personalized recommendation
● Uses existing data to offer predictions
● Typically requires domain knowledge
● Can be fast and ad-hoc
Content-based filtering uses item features to recommend other items similar to what the
user likes, based on their previous actions or explicit feedback. - Google
17. 17
TFIDF
Term Frequency: How often the word shows up in a document.
Inverse Document Frequency: How often the word shows up across all
documents.
Attempts to rank information based on the quality of the words, not just the
frequency.
tfidf(t, d, D) = tf(t,d) * idf(t, D)
( D: all documents, d: document, t: term )
https://en.wikipedia.org/wiki/Tf-idf
19. 19
Storing it in the graph
Movie/
User
Movie/
User
{ ML (Distance, Similarity, Embedding) }
● Store ML outcomes on the edge
● Enrich new/existing data and queries
● Leverage benefits of ML
● Reduce complexity
21. 21
● Personalized recommendation
● Predictions based on combined external patterns
● Depends on existing patterns being accurate
● Can offer predictions with limited domain knowledge
Collaborative Filtering
22. 22
Matrix Factorization
●Can be efficient or not
●Sparse matrix
●Dimensionality Reduction
●Combine with content-based
●Scale with faiss
User 1 User 2 User 3 User 4
Toy Story 5 ? 2 1
Golden
Eye
? 1 5 5
Love
Actually
? 5 ? 5
Babe 5 ? 1 ?
Star Trek 1 ? 5 5
SVD
A = UΣV^T
25. 25
Graph Neural Networks
Sachin Sharma
ML Research
Engineer @ArangoDB
● Develop Intelligent Products
● Former Machine Learning
Scientist & Engineer @Define
Media Gmbh
● Former Research Intern @DFKI
● AI Blogger
● Interests: Graph ML, Vision,
NLP.
Graph ML, NVIDIA Triton, and ArangoDB: Thinking Beyond Euclidean Space
https://www.arangodb.com/events/graphml-nvidia-triton-and-arangodb-thinking-beyond-euclidean-
space/
26. 26
Graph(Node) Representation Learning
image credits Stanford:
● Map network nodes to d-dimensional embeddings space
● Similar nodes in the network should remain close to each other in the embedding space
Similarity of (u, v) in network
Dot product between node embeddings
27. 27
Graph
This is the key to machine learning on graphs, where each node
is mapped into a coordinate system so certain properties are
maintained. e.g., different node types can easily be separated
by a line, or neighbouring nodes are close to each other.
Embedding
Embedding
28. 28
Can we Apply CNNs on Graphs?
Fixed Number of Neighbors
(2D Grid - Euclidean Space)
Random Number of Neighbors
(Graph - Non-Euclidean Space)
image credits: source
Image as 2D Grid
Text/Audio as 1D Sequence
29. 29
Graph Neural Networks
●Node classification
●Graph classification
●Link prediction
○ Predict links for users and movies
31. 31
ML + Graph Databases
●Knowledge graph serves data
●Graph naturally pairs with ML
●ML Ecosystem for graph interface
Movie Knowledge Graph ML Ecosystem
Embeddings/ Inferences
Input data
33. 33
Nvidia Triton Meets ArangoDB
AI Model Repository
Deploy
Graph ML Model
(GraphSage)
Front-End
Client
Application
N3
N1
N2
N4
N5
N6
ArangoDb
Update
Update
N3
N1
N2
N4
N5
N6
ArangoDb
Retrieve all the node
embeddings of the nbors of
node ‘N5’ which are at 1-Hop
distance
Know Surroundings
35. 35
Thank you!
●Notebooks
https://github.com/arangodb/interactive_tutorials
○ Collaborative Filtering with AQL
○ Content-based Recommendations with ArangoSearch and TFIDF
○ Content-based Recommendations with FAISS, TFIDF, and Python
○ Graph Neural Networks with PyTorch
○ Matrix Factorization
Test-drive ArangoDB and ArangoML using Oasis
14-days for free
https://github.com/arangoml/
Register now at
https://bit.ly/3blNaKR
Editor's Notes
Compress matrix to something like this example. We have sparse data but can now attempt to offer predictions based on the reduced dimensions.
Learn features such as genre and how much an item actually expresses that genre (is it sci-fi and action or sci-fi but more drama)
Content based can match descriptions based on keyword and that isn’t always enough.
Must encode information about the graph (neighbors) for message passing - graph representational learning -
Challenge of going from 2d fixed euclidean space - CNN require your data is represented in a fixed euclidean space. With non-euclidean there is variable number of neighbors.