This document discusses network monitoring and management tools. It begins with an overview of NetDisco for network discovery and inventory, Cacti for graphing and alerting, and Splunk for reporting and analysis. Case studies are presented that describe how these tools were used to solve real problems, such as trending traffic usage over time using Cacti thresholds and identifying wide scale network anomalies using weathermaps. The talk concludes by discussing extending the use of these tools for additional monitoring, inventory, and limited configuration by end users and other teams.
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Databricks
Overview Uber's Michelangelo is a machine learning platform that supports training and serving thousands of models in production. Most Michelangelo customer models are based on Spark Mllib. In this talk, we will describe Michelangelo's experiences with and evolving use of Spark Mllib, particularly in the areas of model persistence and online serving. Extended Description Michelangelo [https://eng.uber.com/michelangelo/] was originally developed to support scalable machine learning for production models. Its end-to-end support for scheduled Spark-based data ingestion and model training, along with model evaluation and deployment for batch and online model serving, has gained wide acceptance across Uber. More recently, Michelangelo is evolving to handle more use cases, including evaluating and serving models trained outside of core Michelangelo, e.g., on a distributed tensorflow platform providing Horovod [https://eng.uber.com/horovod/] or using PySpark in a Jupyter notebook on Data Science Workbench [https://eng.uber.com/dsw/] To support evaluation and serving of models trained outside of Michelangelo, Michelangelo's use of Spark Mllib needed updating, to generalize its mechanisms for model persistence and online serving. In this talk, we will describe these mechanisms and explore possible avenues for open-sourcing them.
Speakers: Anne Holler, Michael Mui
Exploration of U-Net and Support Vector Machine classification methods for UAV multispectral image segmentation
Recently, many solutions have been introduced to accurately and automatically analyze data acquired with Unmanned Aerial Vehicles (UAVs), in particular by relying on algorithms based on Artificial Intelligence (AI) techniques. Among these, the most popular are those belonging to the category of neural networks. These techniques allow the development of ad-hoc and end-to-end solutions for the classification and segmentation of different object categories through the analysis of high-resolution multispectral images. In our research, two main methodologies have been explored for the automatic segmentation of crop rows from multispectral images acquired with UAVs. The first is based on Support Vector Machines, know to handle well overfitting issues, and the other through the implementation of “U-Net”, a state-of-the-art Convolution Neural Network
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
For the past two years, the open-source Hopsworks platform has used Spark to distribute hyperparameter optimization tasks for Machine Learning. Hopsworks provides some basic optimizers (gridsearch, randomsearch, differential evolution) to propose combinations of hyperparameters (trials) that are run synchronously in parallel on executors as map functions. However, many such trials perform poorly, and we waste a lot of CPU and harware accelerator cycles on trials that could be stopped early, freeing up the resources for other trials.
In this talk, we present our work on Maggy, an open-source asynchronous hyperparameter optimization framework built on Spark that transparently schedules and manages hyperparameter trials, increasing resource utilization, and massively increasing the number of trials that can be performed in a given period of time on a fixed amount of resources. Maggy is also used to support parallel ablation studies using Spark. We have commercial users evaluating Maggy and we will report on the gains they have seen in reduced time to find good hyperparameters and improved utilization of GPU hardware. Finally, we will perform a live demo on a Jupyter notebook, showing how to integrate maggy in existing PySpark applications.
A data and task co scheduling algorithm for scientific cloud workflowsFinalyearprojects Toall
To get IEEE 2015-2017 Project for above title in .Net or Java
mail to finalyearprojects2all@gmail.com or contact +91 8870791415
IEEE 2015-2016 Project Videos: https://www.youtube.com/channel/UCyK6peTIU3wPIJxXD0MbNvA
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Databricks
Overview Uber's Michelangelo is a machine learning platform that supports training and serving thousands of models in production. Most Michelangelo customer models are based on Spark Mllib. In this talk, we will describe Michelangelo's experiences with and evolving use of Spark Mllib, particularly in the areas of model persistence and online serving. Extended Description Michelangelo [https://eng.uber.com/michelangelo/] was originally developed to support scalable machine learning for production models. Its end-to-end support for scheduled Spark-based data ingestion and model training, along with model evaluation and deployment for batch and online model serving, has gained wide acceptance across Uber. More recently, Michelangelo is evolving to handle more use cases, including evaluating and serving models trained outside of core Michelangelo, e.g., on a distributed tensorflow platform providing Horovod [https://eng.uber.com/horovod/] or using PySpark in a Jupyter notebook on Data Science Workbench [https://eng.uber.com/dsw/] To support evaluation and serving of models trained outside of Michelangelo, Michelangelo's use of Spark Mllib needed updating, to generalize its mechanisms for model persistence and online serving. In this talk, we will describe these mechanisms and explore possible avenues for open-sourcing them.
Speakers: Anne Holler, Michael Mui
Exploration of U-Net and Support Vector Machine classification methods for UAV multispectral image segmentation
Recently, many solutions have been introduced to accurately and automatically analyze data acquired with Unmanned Aerial Vehicles (UAVs), in particular by relying on algorithms based on Artificial Intelligence (AI) techniques. Among these, the most popular are those belonging to the category of neural networks. These techniques allow the development of ad-hoc and end-to-end solutions for the classification and segmentation of different object categories through the analysis of high-resolution multispectral images. In our research, two main methodologies have been explored for the automatic segmentation of crop rows from multispectral images acquired with UAVs. The first is based on Support Vector Machines, know to handle well overfitting issues, and the other through the implementation of “U-Net”, a state-of-the-art Convolution Neural Network
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
For the past two years, the open-source Hopsworks platform has used Spark to distribute hyperparameter optimization tasks for Machine Learning. Hopsworks provides some basic optimizers (gridsearch, randomsearch, differential evolution) to propose combinations of hyperparameters (trials) that are run synchronously in parallel on executors as map functions. However, many such trials perform poorly, and we waste a lot of CPU and harware accelerator cycles on trials that could be stopped early, freeing up the resources for other trials.
In this talk, we present our work on Maggy, an open-source asynchronous hyperparameter optimization framework built on Spark that transparently schedules and manages hyperparameter trials, increasing resource utilization, and massively increasing the number of trials that can be performed in a given period of time on a fixed amount of resources. Maggy is also used to support parallel ablation studies using Spark. We have commercial users evaluating Maggy and we will report on the gains they have seen in reduced time to find good hyperparameters and improved utilization of GPU hardware. Finally, we will perform a live demo on a Jupyter notebook, showing how to integrate maggy in existing PySpark applications.
A data and task co scheduling algorithm for scientific cloud workflowsFinalyearprojects Toall
To get IEEE 2015-2017 Project for above title in .Net or Java
mail to finalyearprojects2all@gmail.com or contact +91 8870791415
IEEE 2015-2016 Project Videos: https://www.youtube.com/channel/UCyK6peTIU3wPIJxXD0MbNvA
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Databricks
The freedom of fast iterations of distributed deep learning tasks is crucial for smaller companies to gain competitive advantages and market shares from big tech giants. Horovod Runner brings this process to relatively accessible spark clusters.
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
As Apache Spark applications move to a containerized environment, there are many questions about how to best configure server systems in the container world. In this talk we will demonstrate a set of tools to better monitor performance and identify optimal configuration settings. We will demonstrate how Prometheus, a project that is now part of the Cloud Native Computing Foundation (CNCF: https://www.cncf.io/projects/), can be applied to monitor and archive system performance data in a containerized spark environment.
In our examples, we will gather spark metric output through Prometheus and present the data with Grafana dashboards. We will use our examples to demonstrate how performance can be enhanced through different tuned configuration settings. Our demo will show how to configure settings across the cluster as well as within each node.
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.
Running Apache Spark Jobs Using KubernetesDatabricks
Apache Spark has introduced a powerful engine for distributed data processing, providing unmatched capabilities to handle petabytes of data across multiple servers. Its capabilities and performance unseated other technologies in the Hadoop world, but while Spark provides a lot of power, it also comes with a high maintenance cost, which is why we now see innovations to simplify the Spark infrastructure.
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit
AI plays a central role in the today’s Internet applications and emerging intelligent systems, which are driving the need for scalable, distributed big data analytics with deep learning capabilities. There is increasing demand from organizations to discover and explore data using advanced big data analytics and deep learning. In this talk, we will share how we work with our users to build deep learning powered big data analytics applications (e.g., object detection, image recognition, NLP, etc.) using BigDL, an open source distributed deep learning library for Apache Spark.
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...sparktc
IBM researchers in Haifa, together with partners from the COSMOS EU-funded project, are using Spark to analyze the new wave of IoT data and solve problems in a way that is generic, integrated, and practical.
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
BigDL is a distributed deep Learning framework built for Big Data platform using Apache Spark. It combines the benefits of “high performance computing” and “Big Data” architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speedup than out-of-box open source DL frameworks (e.g., Caffe/Torch) wrt single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture. We’ll also share how our users adopt BigDL for their deep learning applications (such as image recognition, object detection, NLP, etc.), which allows them to use their Big Data (e.g., Apache Hadoop and Spark) platform as the unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.
Advanced Hyperparameter Optimization for Deep Learning with MLflowDatabricks
Building on the "Best Practices for Hyperparameter Tuning with MLflow" talk, we will present advanced topics in HPO for deep learning, including early stopping, multi-metric optimization, and robust optimization. We will then discuss implementations using open source tools. Finally, we will discuss how we can leverage MLflow with these tools and techniques to analyze the performance of our models.
In the big data world, it's not always easy for Python users to move huge amounts of data around. Apache Arrow defines a common format for data interchange, while Arrow Flight introduced in version 0.11.0, provides a means to move that data efficiently between systems. Arrow Flight is a framework for Arrow-based messaging built with gRPC. It enables data microservices where clients can produce and consume streams of Arrow data to share it over the wire. In this session, I'll give a brief overview of Arrow Flight from a Python perspective, and show that it's easy to build high performance connections when systems can talk Arrow. I'll also cover some ongoing work in using Arrow Flight to connect PySpark with TensorFlow - two systems with great Python APIs but very different underlying internal data.
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...Databricks
We will present the design and evolution of Nvidia's 100% Self-Service Streaming Big-Data Platform (ETL, Analytics, AI Training & Inferencing) powered by Spark and Nvidia GPUs. We will discuss the architecture, major challenges that we faced, and lessons learned along the way. Nvidia's data platform processes 10's of billions of events per day, supporting several Nvidia products like GPU Cloud, GeForce NOW Cloud Gaming, AI Smart Cities, DriveSim for Self Driving cars etc. In this talk, we are going to deep dive on Nvidia's next generation data platform with new custom built frameworks, automation tools, and a monitoring system on top of Spark. Thus empowering our developers to build new Spark-powered applications at the speed of light (SOL) with full self-service unified data flows. We will showcase these new tools : a) Zero-engineering dashboards, b) Out-of-the box Spark Streaming applications with automated schema management, c) Custom Spark Streaming to Elastic search connector with enhanced security, d) GDPR compliant SQL access control and auditing with a new custom token management framework, e) Migration from logstash clusters to Spark Streaming for log parsing, etc. We will discuss how decoupling Data-Platform and Applications helped us achieve the next level of scale, self-service, and, security. Finally, we will demo our Platform's App-Store, where developers can shop for new Apps and deploy them with ease - with automated dashboards, streaming ETL, analytics, monitoring, AI training and inferencing. Extended Description: With structured telemetry events and unstructured logs growing at 1000% rate year-over-year, it is extremely important to handle this scale with strict SLAs and high reliability while maintaining extremely low latency. We will discuss how we handled these scaling & security concerns to solve business requirements. Additionally, we will be open-sourcing some of our custom spark frameworks during the talk.
Speakers: Satish Dandu, Rohit Kulkarni
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Flink Forward
In this session, we will look at how Apache Flink can be used to stream anonymized API request and response data from a production environment to make sure staging environments are up-to-date and reflect the most recent features (and bugs) that comprise a service. The talk will also examine how to deal with issues of data retention, throttling, and persistence, finishing with recommendations for how to use these sandbox environments to rapidly prototype and test new features and fixes.
Tactical Data Science Tips: Python and Spark TogetherDatabricks
Running Spark and Python data science workloads can be challenging given the complexity of the various data science tools in the ecosystem like sci-kit Learn, TensorFlow, Spark, Pandas, and MLlib. All these various tools and architectures, provide important trade-offs to consider when it comes to moving to proofs of concept and going to production. While proof of concepts may be relatively straightforward, moving to production can be challenging because it’s difficult to understand not just the short term effort to develop a solution, but the long term cost of supporting projects over the long term.
This talk will discuss important tactical patterns for evaluating projects, running proofs of concept to inform going to production, and finally the key tactics we use internally at Databricks to take data and machine learning projects into production. This session will cover some architectural choices involving Spark, PySpark, Pandas, notebooks, various machine learning toolkits, as well as frameworks and technologies necessary to support them.
At improve digital we collect and store large volumes of machine generated and behavioural data from our fleet of ad servers. For some time we have performed mostly batch processing through a data warehouse that combines traditional RDBMs (MySQL), columnar stores (Infobright, impala+parquet) and Hadoop.
We wish to share our experiences in enhancing this capability with systems and techniques that process the data as streams in near-realtime. In particular we will cover:
• The architectural need for an approach to data collection and distribution as a first-class capability
• The different needs of the ingest pipeline required by streamed realtime data, the challenges faced in building these pipelines and how they forced us to start thinking about the concept of production-ready data.
• The tools we used, in particular Apache Kafka as the message broker, Apache Samza for stream processing and Apache Avro to allow schema evolution; an essential element to handle data whose formats will change over time.
• The unexpected capabilities enabled by this approach, including the value in using realtime alerting as a strong adjunct to data validation and testing.
• What this has meant for our approach to analytics and how we are moving to online learning and realtime simulation.
This is still a work in progress at Improve Digital with differing levels of production-deployed capability across the topics above. We feel our experiences can help inform others embarking on a similar journey and hopefully allow them to learn from our initiative in this space.
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Databricks
The freedom of fast iterations of distributed deep learning tasks is crucial for smaller companies to gain competitive advantages and market shares from big tech giants. Horovod Runner brings this process to relatively accessible spark clusters.
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
As Apache Spark applications move to a containerized environment, there are many questions about how to best configure server systems in the container world. In this talk we will demonstrate a set of tools to better monitor performance and identify optimal configuration settings. We will demonstrate how Prometheus, a project that is now part of the Cloud Native Computing Foundation (CNCF: https://www.cncf.io/projects/), can be applied to monitor and archive system performance data in a containerized spark environment.
In our examples, we will gather spark metric output through Prometheus and present the data with Grafana dashboards. We will use our examples to demonstrate how performance can be enhanced through different tuned configuration settings. Our demo will show how to configure settings across the cluster as well as within each node.
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.
Running Apache Spark Jobs Using KubernetesDatabricks
Apache Spark has introduced a powerful engine for distributed data processing, providing unmatched capabilities to handle petabytes of data across multiple servers. Its capabilities and performance unseated other technologies in the Hadoop world, but while Spark provides a lot of power, it also comes with a high maintenance cost, which is why we now see innovations to simplify the Spark infrastructure.
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit
AI plays a central role in the today’s Internet applications and emerging intelligent systems, which are driving the need for scalable, distributed big data analytics with deep learning capabilities. There is increasing demand from organizations to discover and explore data using advanced big data analytics and deep learning. In this talk, we will share how we work with our users to build deep learning powered big data analytics applications (e.g., object detection, image recognition, NLP, etc.) using BigDL, an open source distributed deep learning library for Apache Spark.
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...sparktc
IBM researchers in Haifa, together with partners from the COSMOS EU-funded project, are using Spark to analyze the new wave of IoT data and solve problems in a way that is generic, integrated, and practical.
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
BigDL is a distributed deep Learning framework built for Big Data platform using Apache Spark. It combines the benefits of “high performance computing” and “Big Data” architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speedup than out-of-box open source DL frameworks (e.g., Caffe/Torch) wrt single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture. We’ll also share how our users adopt BigDL for their deep learning applications (such as image recognition, object detection, NLP, etc.), which allows them to use their Big Data (e.g., Apache Hadoop and Spark) platform as the unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.
Advanced Hyperparameter Optimization for Deep Learning with MLflowDatabricks
Building on the "Best Practices for Hyperparameter Tuning with MLflow" talk, we will present advanced topics in HPO for deep learning, including early stopping, multi-metric optimization, and robust optimization. We will then discuss implementations using open source tools. Finally, we will discuss how we can leverage MLflow with these tools and techniques to analyze the performance of our models.
In the big data world, it's not always easy for Python users to move huge amounts of data around. Apache Arrow defines a common format for data interchange, while Arrow Flight introduced in version 0.11.0, provides a means to move that data efficiently between systems. Arrow Flight is a framework for Arrow-based messaging built with gRPC. It enables data microservices where clients can produce and consume streams of Arrow data to share it over the wire. In this session, I'll give a brief overview of Arrow Flight from a Python perspective, and show that it's easy to build high performance connections when systems can talk Arrow. I'll also cover some ongoing work in using Arrow Flight to connect PySpark with TensorFlow - two systems with great Python APIs but very different underlying internal data.
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...Databricks
We will present the design and evolution of Nvidia's 100% Self-Service Streaming Big-Data Platform (ETL, Analytics, AI Training & Inferencing) powered by Spark and Nvidia GPUs. We will discuss the architecture, major challenges that we faced, and lessons learned along the way. Nvidia's data platform processes 10's of billions of events per day, supporting several Nvidia products like GPU Cloud, GeForce NOW Cloud Gaming, AI Smart Cities, DriveSim for Self Driving cars etc. In this talk, we are going to deep dive on Nvidia's next generation data platform with new custom built frameworks, automation tools, and a monitoring system on top of Spark. Thus empowering our developers to build new Spark-powered applications at the speed of light (SOL) with full self-service unified data flows. We will showcase these new tools : a) Zero-engineering dashboards, b) Out-of-the box Spark Streaming applications with automated schema management, c) Custom Spark Streaming to Elastic search connector with enhanced security, d) GDPR compliant SQL access control and auditing with a new custom token management framework, e) Migration from logstash clusters to Spark Streaming for log parsing, etc. We will discuss how decoupling Data-Platform and Applications helped us achieve the next level of scale, self-service, and, security. Finally, we will demo our Platform's App-Store, where developers can shop for new Apps and deploy them with ease - with automated dashboards, streaming ETL, analytics, monitoring, AI training and inferencing. Extended Description: With structured telemetry events and unstructured logs growing at 1000% rate year-over-year, it is extremely important to handle this scale with strict SLAs and high reliability while maintaining extremely low latency. We will discuss how we handled these scaling & security concerns to solve business requirements. Additionally, we will be open-sourcing some of our custom spark frameworks during the talk.
Speakers: Satish Dandu, Rohit Kulkarni
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Flink Forward
In this session, we will look at how Apache Flink can be used to stream anonymized API request and response data from a production environment to make sure staging environments are up-to-date and reflect the most recent features (and bugs) that comprise a service. The talk will also examine how to deal with issues of data retention, throttling, and persistence, finishing with recommendations for how to use these sandbox environments to rapidly prototype and test new features and fixes.
Tactical Data Science Tips: Python and Spark TogetherDatabricks
Running Spark and Python data science workloads can be challenging given the complexity of the various data science tools in the ecosystem like sci-kit Learn, TensorFlow, Spark, Pandas, and MLlib. All these various tools and architectures, provide important trade-offs to consider when it comes to moving to proofs of concept and going to production. While proof of concepts may be relatively straightforward, moving to production can be challenging because it’s difficult to understand not just the short term effort to develop a solution, but the long term cost of supporting projects over the long term.
This talk will discuss important tactical patterns for evaluating projects, running proofs of concept to inform going to production, and finally the key tactics we use internally at Databricks to take data and machine learning projects into production. This session will cover some architectural choices involving Spark, PySpark, Pandas, notebooks, various machine learning toolkits, as well as frameworks and technologies necessary to support them.
At improve digital we collect and store large volumes of machine generated and behavioural data from our fleet of ad servers. For some time we have performed mostly batch processing through a data warehouse that combines traditional RDBMs (MySQL), columnar stores (Infobright, impala+parquet) and Hadoop.
We wish to share our experiences in enhancing this capability with systems and techniques that process the data as streams in near-realtime. In particular we will cover:
• The architectural need for an approach to data collection and distribution as a first-class capability
• The different needs of the ingest pipeline required by streamed realtime data, the challenges faced in building these pipelines and how they forced us to start thinking about the concept of production-ready data.
• The tools we used, in particular Apache Kafka as the message broker, Apache Samza for stream processing and Apache Avro to allow schema evolution; an essential element to handle data whose formats will change over time.
• The unexpected capabilities enabled by this approach, including the value in using realtime alerting as a strong adjunct to data validation and testing.
• What this has meant for our approach to analytics and how we are moving to online learning and realtime simulation.
This is still a work in progress at Improve Digital with differing levels of production-deployed capability across the topics above. We feel our experiences can help inform others embarking on a similar journey and hopefully allow them to learn from our initiative in this space.
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
By Andy Wingo.
Snabb is an open-source toolkit for building fast, flexible network functions. Since its beginnings in 2012, Snabb has seen some modest deployment success ranging from simple one-off diagnosis tools to border routers that process all IPv4 traffic for entire countries. This talk will give an introduction to Snabb. After going over Snabb's fundamental components and how they combine, the talk will move on to examples of how network engineers are taking advantage of Snabb in practice, mentioning a few of the many open-source network functions built on Snabb.
(c) RIPE 77
15 - 19 October 2018
Amsterdam, Netherlands
https://ripe77.ripe.net
Overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of-core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.
Practical virtual network functions with Snabb (SDN Barcelona VI)Igalia
By Andy Wingo.
SDN and Network Programmability Meetup in Barcelona (VI)
21 June 2017
https://www.meetup.com/es-ES/SDN-and-Network-Programmability-Meetup-in-Barcelona
/events/239667457/?eventId=239667457
Keynote given at BOSC, 2010.
Does the hype surrounding cloud match the reality?
Can we use them to solve the problems in provisioning IT services to support next-generation sequencing?
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
Stream based data / event / message processing becomes preferred way of achieving interoperability and real-time communication in distributed SOA / microservice / database architectures.
Beside lambdas, Java 8 introduced two new APIs explicitly dealing with stream data processing:
- Stream - which is PULL-based and easily parallelizable;
- CompletableFuture / CompletionStage - which allow composition of PUSH-based, non-blocking, asynchronous data processing pipelines.
Java 9 will provide further support for stream-based data-processing by extending the CompletableFuture with additional functionality – support for delays and timeouts, better support for subclassing, and new utility methods.
More, Java 9 provides new java.util.concurrent.Flow API implementing Reactive Streams specification that enables reactive programming and interoperability with libraries like Reactor, RxJava, RabbitMQ, Vert.x, Ratpack, and Akka.
The presentation will discuss the novelties in Java 8 and Java 9 supporting stream data processing, describing the APIs, models and practical details of asynchronous pipeline implementation, error handling, multithreaded execution, asyncronous REST service implementation, interoperability with existing libraries.
There are provided demo examples (code on GitHub) using Completable Future and Flow with:
- JAX-RS 2.1 AsyncResponse, and more importantly unit-testing the async REST service method implementations;
- CDI 2.0 asynchronous observers (fireAsync / @ObservesAsync);
Today’s networks are waging a ceaseless battle against an army of ingenious and fast-evolving advanced threats. Companies must be well-provisioned to deploy a quick, decisive and network-wide response to attacks. Protecting the network demands robust monitoring that is actually built into the network architecture. Learn how to build scalable network protection and improve overall security and performance of network.
Blind spots are commonly caused by these common issues: lack of SPAN ports, dropped and duplicated packets, oversubscribed security and performance tools, unseen inter-VM traffic and more.
Ixia developed a highly scalable Visibility Architecture that helps eliminate those blind spots while providing resilience and control without complexity. Ixia's new Visibility Architecture, is founded on a comprehensive product portfolio which includes:
- Network TAPs (aggregation, regeneration, 1/10/40/100G)
- Bypass Switches (for inline security deployments, 1/10/40G)
- Network Packet Brokers (intelligent filtering, load-balancing, de-duplication, matrix switching)
- Virtual TAPs (for full Virtual Network visibility)
Join NPC and Ixia to learn how Visibility Architecture helps speed application delivery and enables effective troubleshooting and monitoring for network security, application performance, and service level agreement (SLA) fulfilment — and allows IT to meet compliance mandates.
The goal of the project “An optic’s life” is, to predict the time when an optical transceiver will reach its real end-of-life-time based on the actual setup in the datacenter / colocation.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Iygapyisi cause10-slideshare
1. If You've Got a Problem, Yo – I'll Solve It:
Using Tools to Solve Network Problems
Derek Engi
Network Management Engineer
North Carolina State University
2. Agenda
Network Monitoring and Management Tools
Making the Most of Free / Open Source Software
Real World Problems, Real World Tools
Questions and Discussion
4. NetDisco
Network Discovery Engine
Mix of Perl and PostGreSQL
CDP / SNMP Mappings of Topology
VLAN and Port Up/Down Manipulation
Our Authoritative DB for Other Applications
API! - Woohoo!
5.
6. Cacti
Generate Graphs for Network Devices
PHP with MySQL Backend
Round Robin Database (RRDs)
Device / Interface Statistics
Plugin Architecture
Pseudo-API! - Pseudo-Woohoo!
7. Splunk
Fancy Log Indexer / Analyzer
Free Version and Commercial License
Applications / Plugin Architecture
Scalable and Awesome
Awesome API!
9. Case Study #1 – Trending Traffic
Conserving While Leveraging Building Fiber
Targeting Upgrade / Problem Child Area
Open Source Solution!
“My internet/backups/imaging software/$application doesn't run so hot.” Can we get
some additional bandwidth?
10. Case Study #1 – Trending Traffic
Thresholds to the Rescue!
Easy to Configure and Template Cacti Plugin
Alerting Functions
Endless Possibilities
Allows Tracking, Trending, and Review
Port-Channels Cheaper Than Hardware
11.
12.
13. Case Study #2 – Wide Scale Anomalies
Is Something Weird Happening in the DC?
We've Been Adding Port-Channels....
Whoa, Check the WeatherMaps!
“So why does it take our backups so long to complete?”
14. Case Study #2 – PHP Weathermaps
Open Source Cacti/Other Data Source App
Generate Large or Small Scale Snapshots
Integration w/ NetDisco
Getting back to our problem...Dude, something seems wrong.
15.
16.
17. Case Study #2 – Solution
HSRP Between DC Cores, equal cost paths advertised through
OSPF
Arp Cache, set to 2 hours
“I think I will forward this frame out all switch ports..yeah.....”.
Lather, Rinse, Repeat.
Changing the Arp and CAM timers to match fixes the problem.
18. Where Are We Going?
Utilizing NetDisco for Inventory, Idle Port Reports, VLAN
Management, etc.
Extending Cacti Graphs down to the LAN Admins, thresholding
more stuff, (IPTV,UPS')
Weathermaps – Unicast and Multicast Representation down to
the end-user
Splunk – Providing firewall access logs to appropriate parties via
Firewall Config Tool
19. Extending Management to the End User
Switch Admin Tool – VLAN Config, Port Descriptions,
Duplex/Speed Settings. View MACs on a port. Uses NetDisco,
Cacti.
Firewall Config Viewer – Extending visibility into the security side
of the network.
Extending tool functionality to other OIT groups via an API.