[db tech showcase Tookyo 2018] #dbts2018 #B24
『Speed Meets Scale: Analyzing & Visualizing Billions of Data Points with GPUs』
MapD Technologies - VP of Global Community Aaron Williams 氏
ADF 3D laser scanning generates in-depth point cloud data that can be used to create accurate 3D models and as-built drawings in even the most challenging facilities. Our 3D laser scanning services take site documentation to new levels of accuracy and precision with greater efficiency than ever before. We can obtain more data faster at lower costs, and with less risk to personnel and products—even in complex or hazardous environments.
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...Sri Ambati
This talk was recorded in London on October 30, 2018 and can be viewed here: https://youtu.be/EGVY7-Spv8E
Time series is a unique field in predictive modelling where standard feature engineering techniques and models are employed to get the most accurate results. In this session we will examine some of the most important features of Driverless AI’s newest recipe regarding Time Series. It will cover validation strategies, feature engineering, feature selection and modelling. The capabilities will be showcased through several cases.
Bio: Marios Michailidis is now a Competitive Data Scientist at H2O.ai He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has also nearly finished his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimization and more.
He is the creator of KazAnova, a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework. In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. Here is a blog about Marios being ranked at the top in Kaggle and sharing his knowledge with tricks and ideas.
Finally, Marios’ likendin profile can be found here, with more information about what he is working on now or past projects.
https://www.linkedin.com/in/mariosmichailidis/
Bio: A Kaggle Grandmaster and a Data Scientist at H2O.ai, Mathias Müller holds an AI and ML focused diploma (eq. M.Sc.) in computer science from Humboldt University in Berlin. During his studies, he keenly worked on computer vision in the context of bio-inspired visual navigation of autonomous flying quadrocopters. Prior to H2O.ai, he as a machine learning engineer for FSD Fahrzeugsystemdaten GmbH in the automotive sector. His stint with Kaggle was a chance encounter as he stumbled upon the data competition platform while looking for a more ML-focused platform as compared to TopCoder. This is where he entered his first predictive modeling competition and climbed up the ladder to be a Grandmaster. He is an active contributor to XGBoost and is working on Driverless AI with H2O.ai.
Linkedin: https://www.linkedin.com/in/muellermat/
Capture and Use of Geo-Located Asset Information using Reality Capture Techno...David Males
Convergence of technology (lower-cost terrestrial scanners), government regulation (UAVs) and qualified service providers supports cost-effective capture of reality data for a much greater range of municipal assets – both linear and vertical.
初のVirtual開催となったICLR2020の参加速報です。
リアルじゃないと困る部分や、Virtualでも十分イケる部分や、むしろVirtualなので便利になった部分など、色々と見えてきました。
The International Conference on Learning Representations (ICLR)
ADF 3D laser scanning generates in-depth point cloud data that can be used to create accurate 3D models and as-built drawings in even the most challenging facilities. Our 3D laser scanning services take site documentation to new levels of accuracy and precision with greater efficiency than ever before. We can obtain more data faster at lower costs, and with less risk to personnel and products—even in complex or hazardous environments.
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...Sri Ambati
This talk was recorded in London on October 30, 2018 and can be viewed here: https://youtu.be/EGVY7-Spv8E
Time series is a unique field in predictive modelling where standard feature engineering techniques and models are employed to get the most accurate results. In this session we will examine some of the most important features of Driverless AI’s newest recipe regarding Time Series. It will cover validation strategies, feature engineering, feature selection and modelling. The capabilities will be showcased through several cases.
Bio: Marios Michailidis is now a Competitive Data Scientist at H2O.ai He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has also nearly finished his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimization and more.
He is the creator of KazAnova, a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework. In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. Here is a blog about Marios being ranked at the top in Kaggle and sharing his knowledge with tricks and ideas.
Finally, Marios’ likendin profile can be found here, with more information about what he is working on now or past projects.
https://www.linkedin.com/in/mariosmichailidis/
Bio: A Kaggle Grandmaster and a Data Scientist at H2O.ai, Mathias Müller holds an AI and ML focused diploma (eq. M.Sc.) in computer science from Humboldt University in Berlin. During his studies, he keenly worked on computer vision in the context of bio-inspired visual navigation of autonomous flying quadrocopters. Prior to H2O.ai, he as a machine learning engineer for FSD Fahrzeugsystemdaten GmbH in the automotive sector. His stint with Kaggle was a chance encounter as he stumbled upon the data competition platform while looking for a more ML-focused platform as compared to TopCoder. This is where he entered his first predictive modeling competition and climbed up the ladder to be a Grandmaster. He is an active contributor to XGBoost and is working on Driverless AI with H2O.ai.
Linkedin: https://www.linkedin.com/in/muellermat/
Capture and Use of Geo-Located Asset Information using Reality Capture Techno...David Males
Convergence of technology (lower-cost terrestrial scanners), government regulation (UAVs) and qualified service providers supports cost-effective capture of reality data for a much greater range of municipal assets – both linear and vertical.
初のVirtual開催となったICLR2020の参加速報です。
リアルじゃないと困る部分や、Virtualでも十分イケる部分や、むしろVirtualなので便利になった部分など、色々と見えてきました。
The International Conference on Learning Representations (ICLR)
2018 GIS in the Rockies Vendor Showcase (Th): ERDAS Imagine What's New and Ti...GIS in the Rockies
This presentation will cover the latest release highlights as well as tips and tricks for processing LiDAR data, ERDAS Imagine modeling capabilities and a roadmap for cloud based processing.
The session will highlight exploiting the full spectrum of LiDAR from viewing and measurements to surface and terrain modeling as well as extraction of point clouds from imagery.
In addition we will discuss the migration of our image exploitation capabilities from the desktop to the cloud.
Digital Transformation & Solvency II Simulations for L&G: Optimizing, Acceler...OW2
Legal & General is engaged into a very innovative Digital Transformation, which goes from the services offered to its customers to the management of its IT Applications and Infrastructure.
As part of this strategic evolution, this talk presents how L&G together with ActiveEon Software and Services was able to replace 2 schedulers (Tibco DataSynapse and IBM AlgoBatch) with ProActive Workflows & Scheduling, and migrate the Solvency application to the Azure Cloud. A key aspect has been the ability to pipeline CPU-intensive tasks with I/O intensive ones. This alone allowed for 10% overall savings in runtime and grid resources, and for the high priority risk reports to be made available to customers 3x faster than the previous solution (16 hours down to 5 hours).
Field Activity Planner - A cloud based digital energy platformFutureOn
Field Activity Planner offers a cloud based digital platform for enabling rapid visual workflows for your offshore engineering work. The platform allows you to easily integrate with other backend systems and offshore engineering software already in use in your organization either for field design, field planning, or activity scheduling to name a few key areas.
If you are looking at different offshore software solutions to improve your day to day activities we use modern browser and cloud technology to deliver a state of the art collaborative field planning software platform that excels in easy to use 2D and 3D field layout and design of your subsea and topside projects. By using a real-time database, we ensure that you can collaborate on field design and planning with your colleagues around the globe to save both time and money by avoiding multiple revisions of proposed layouts.
We also support the most common data sources, and formats used for typical offshore software solutions e.g. bathymetry, reservoir, and well paths. Using our SaaS software, you can direct from your browser quickly design a field layout where you load up your bathymetry and/or survey data. Then simply add 3D reservoir and well data for a complete overview and start to finalize you subsea layout by placing generic or company specific subsea and topside assets in the correct locations. And while you design, modify, and collaborate on possible field layouts you will see that cost calculations are constantly updated when the design changes.
All this data is securely uploaded and processed in our cloud service and is viewable in both 2D and 3D, and you can invite coworkers into you project, and directly create a shareable URL for view only purposes that can be sent outside your organization to prospective clients and partners.
Why Open Source Works for DevOps MonitoringDevOps.com
Learn how to use open source tools for your performance monitoring of your Infrastructure, Application, & Cloud in a way that is faster, easier, and to scale. In this webinar we will provide you with step by step instruction on how to use InfluxDB, a complete Open Source Platform built from the ground up for metrics, events, and other time-based data. We will cover how to download & configure, how to collect metrics, build dashboards and alerts.
Developing Spatial Applications with CARTO for React v1.1CARTO
In this hands-on webinar, we introduce the new features of CARTO for React v1.1 and showcase how this framework can be used to accelerate the development of cloud-native geospatial applications. You can watch the recorded webinar at: https://go.carto.com/webinars/carto-react-developers
Large Scale Geospatial Indexing and Analysis on Apache SparkDatabricks
SafeGraph is a data company — just a data company — that aims to be the source of truth for data on physical places. We are focused on creating high-precision geospatial data sets specifically about places where people spend time and money. We have business listings, building footprint data, and foot traffic insights for over 7 million across multiple countries and regions.
In this talk, we will inspect the challenges with geospatial processing, running at a large scale. We will look at open-source frameworks like Apache Sedona (incubating) and its key improvements over conventional technology, including spatial indexing and partitioning. We will explore spatial data structure, data format, and open-source indexing like H3. We will illustrate how all of these fit together in a cloud-first architecture running on Databricks, Delta, MLFlow, and AWS. We will explore examples of geospatial analysis with complex geometries and practical use cases of spatial queries. Lastly, we will discuss how this is augmented by Machine Learning modeling, Human-in-the-loop (HITL) annotation, and quality validation.
Developing Spatial Applications with Google Maps and CARTOCARTO
Learn how CARTO integrates with Google Maps to unlock the advanced visualization capabilities of deck.gl and enables developers to build geospatial apps. You can watch the recorded webinar here: https://go.carto.com/webinars/google-maps-and-carto
Walking in the Cloud: A New Paradigm in Geospatial WorldICIMOD
Cloud computing allows scalable and efficient use of computing resources in the internet cloud. The use of cloud computing is increasing across all the application domains, including in the field of GIS and remote sensing.
The advent of Google Earth Engine (GEE) in particular has brought about a revolutionary change in the way we use geospatial technology. The GEE is a cloud based geospatial platform that stores petabyte of satellite imagery and geospatial data and enables carrying out of complex image processing tasks and spatial analyses without the need of any GIS and remote sensing software.
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
We present I², an interactive development environment for real-time analysis pipelines, which is based on Apache Flink and Apache Zeppelin. The sheer amount of available streaming data frequently makes it impossible to visualize all data points at the same time. I² coordinates running Flink jobs and corresponding visualizations such that only the currently depicted data points are processed in Flink and transferred towards the front end. We show how Flink jobs can adapt to changed visualization properties at runtime to allow interactive data exploration on high bandwidth data streams. Moreover, we present a data reduction technique which minimizes data transfer while providing loss free time-series plots. We show I² in a live demonstration in which we replay recorded sensor data from a football match (ca. 12k event/s). I² was first presented at EDBT'17 where it was awarded as best demonstration. The demonstration is available as open source at github.com/TU-Berlin-DIMA/i2.
The Environment Agency - Improving Incident Response - Collaborative Working ...Esri UK
The Environment Agency are continuously tackling a variety of incidents under high pressure. The introduction of an innovative new mapping system and incident role, the Incident Management Portal (IM Portal) and Mapping and Visuals Officers (MAVOs), has revolutionised the way they respond to major flooding, environmental and dry weather incidents. Join this session to learn how mobile applications, such as the Collector App, are being used to share real-time data from the field with incident rooms across the country – allowing them to better understand the situation on the ground.
The evolution of machine learning and IoT have made it possible for manufacturers to build more effective applications for predictive maintenance than ever before. Despite the huge potential that machine learning offers for predictive maintenance, it's challenging to build solutions that can handle the speed of IoT data streams and the massively large datasets required to train models that can forecast rare events like mechanical failures. Solving these challenges requires knowledge about state-of-the-art dataware, such as MapR, and cluster computing frameworks, such as Spark, which give developers foundational APIs for consuming and transforming data into feature tables useful for machine learning.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
2018 GIS in the Rockies Vendor Showcase (Th): ERDAS Imagine What's New and Ti...GIS in the Rockies
This presentation will cover the latest release highlights as well as tips and tricks for processing LiDAR data, ERDAS Imagine modeling capabilities and a roadmap for cloud based processing.
The session will highlight exploiting the full spectrum of LiDAR from viewing and measurements to surface and terrain modeling as well as extraction of point clouds from imagery.
In addition we will discuss the migration of our image exploitation capabilities from the desktop to the cloud.
Digital Transformation & Solvency II Simulations for L&G: Optimizing, Acceler...OW2
Legal & General is engaged into a very innovative Digital Transformation, which goes from the services offered to its customers to the management of its IT Applications and Infrastructure.
As part of this strategic evolution, this talk presents how L&G together with ActiveEon Software and Services was able to replace 2 schedulers (Tibco DataSynapse and IBM AlgoBatch) with ProActive Workflows & Scheduling, and migrate the Solvency application to the Azure Cloud. A key aspect has been the ability to pipeline CPU-intensive tasks with I/O intensive ones. This alone allowed for 10% overall savings in runtime and grid resources, and for the high priority risk reports to be made available to customers 3x faster than the previous solution (16 hours down to 5 hours).
Field Activity Planner - A cloud based digital energy platformFutureOn
Field Activity Planner offers a cloud based digital platform for enabling rapid visual workflows for your offshore engineering work. The platform allows you to easily integrate with other backend systems and offshore engineering software already in use in your organization either for field design, field planning, or activity scheduling to name a few key areas.
If you are looking at different offshore software solutions to improve your day to day activities we use modern browser and cloud technology to deliver a state of the art collaborative field planning software platform that excels in easy to use 2D and 3D field layout and design of your subsea and topside projects. By using a real-time database, we ensure that you can collaborate on field design and planning with your colleagues around the globe to save both time and money by avoiding multiple revisions of proposed layouts.
We also support the most common data sources, and formats used for typical offshore software solutions e.g. bathymetry, reservoir, and well paths. Using our SaaS software, you can direct from your browser quickly design a field layout where you load up your bathymetry and/or survey data. Then simply add 3D reservoir and well data for a complete overview and start to finalize you subsea layout by placing generic or company specific subsea and topside assets in the correct locations. And while you design, modify, and collaborate on possible field layouts you will see that cost calculations are constantly updated when the design changes.
All this data is securely uploaded and processed in our cloud service and is viewable in both 2D and 3D, and you can invite coworkers into you project, and directly create a shareable URL for view only purposes that can be sent outside your organization to prospective clients and partners.
Why Open Source Works for DevOps MonitoringDevOps.com
Learn how to use open source tools for your performance monitoring of your Infrastructure, Application, & Cloud in a way that is faster, easier, and to scale. In this webinar we will provide you with step by step instruction on how to use InfluxDB, a complete Open Source Platform built from the ground up for metrics, events, and other time-based data. We will cover how to download & configure, how to collect metrics, build dashboards and alerts.
Developing Spatial Applications with CARTO for React v1.1CARTO
In this hands-on webinar, we introduce the new features of CARTO for React v1.1 and showcase how this framework can be used to accelerate the development of cloud-native geospatial applications. You can watch the recorded webinar at: https://go.carto.com/webinars/carto-react-developers
Large Scale Geospatial Indexing and Analysis on Apache SparkDatabricks
SafeGraph is a data company — just a data company — that aims to be the source of truth for data on physical places. We are focused on creating high-precision geospatial data sets specifically about places where people spend time and money. We have business listings, building footprint data, and foot traffic insights for over 7 million across multiple countries and regions.
In this talk, we will inspect the challenges with geospatial processing, running at a large scale. We will look at open-source frameworks like Apache Sedona (incubating) and its key improvements over conventional technology, including spatial indexing and partitioning. We will explore spatial data structure, data format, and open-source indexing like H3. We will illustrate how all of these fit together in a cloud-first architecture running on Databricks, Delta, MLFlow, and AWS. We will explore examples of geospatial analysis with complex geometries and practical use cases of spatial queries. Lastly, we will discuss how this is augmented by Machine Learning modeling, Human-in-the-loop (HITL) annotation, and quality validation.
Developing Spatial Applications with Google Maps and CARTOCARTO
Learn how CARTO integrates with Google Maps to unlock the advanced visualization capabilities of deck.gl and enables developers to build geospatial apps. You can watch the recorded webinar here: https://go.carto.com/webinars/google-maps-and-carto
Walking in the Cloud: A New Paradigm in Geospatial WorldICIMOD
Cloud computing allows scalable and efficient use of computing resources in the internet cloud. The use of cloud computing is increasing across all the application domains, including in the field of GIS and remote sensing.
The advent of Google Earth Engine (GEE) in particular has brought about a revolutionary change in the way we use geospatial technology. The GEE is a cloud based geospatial platform that stores petabyte of satellite imagery and geospatial data and enables carrying out of complex image processing tasks and spatial analyses without the need of any GIS and remote sensing software.
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
We present I², an interactive development environment for real-time analysis pipelines, which is based on Apache Flink and Apache Zeppelin. The sheer amount of available streaming data frequently makes it impossible to visualize all data points at the same time. I² coordinates running Flink jobs and corresponding visualizations such that only the currently depicted data points are processed in Flink and transferred towards the front end. We show how Flink jobs can adapt to changed visualization properties at runtime to allow interactive data exploration on high bandwidth data streams. Moreover, we present a data reduction technique which minimizes data transfer while providing loss free time-series plots. We show I² in a live demonstration in which we replay recorded sensor data from a football match (ca. 12k event/s). I² was first presented at EDBT'17 where it was awarded as best demonstration. The demonstration is available as open source at github.com/TU-Berlin-DIMA/i2.
The Environment Agency - Improving Incident Response - Collaborative Working ...Esri UK
The Environment Agency are continuously tackling a variety of incidents under high pressure. The introduction of an innovative new mapping system and incident role, the Incident Management Portal (IM Portal) and Mapping and Visuals Officers (MAVOs), has revolutionised the way they respond to major flooding, environmental and dry weather incidents. Join this session to learn how mobile applications, such as the Collector App, are being used to share real-time data from the field with incident rooms across the country – allowing them to better understand the situation on the ground.
The evolution of machine learning and IoT have made it possible for manufacturers to build more effective applications for predictive maintenance than ever before. Despite the huge potential that machine learning offers for predictive maintenance, it's challenging to build solutions that can handle the speed of IoT data streams and the massively large datasets required to train models that can forecast rare events like mechanical failures. Solving these challenges requires knowledge about state-of-the-art dataware, such as MapR, and cluster computing frameworks, such as Spark, which give developers foundational APIs for consuming and transforming data into feature tables useful for machine learning.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks.
To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library.
Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph.
A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML.
This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.
Lessons learned building a big data analytics engine, from proprietary to ope...J On The Beach
Lessons learned building a big data analytics engine, from proprietary to open source by Álvaro Santamaria & Joel Brunger
After spending four years building a proprietary all-in-one streaming analytics engine for financial services, it became clear that open-source was starting to pull ahead. Alvaro will talk about the challenges of creating an IT operations solution for financial services; what to build, what not to build, and how to use open source tools to get past the infrastructure and focus on the business problems that matter.
This presentation was given at GIStech 2010 in Rotterdam (NL) and later to students of the University of Wageningen. In this presentation we explain the choices we've made in a number of GIS projects.
The Art of Data Visualization
Agenda:
6:00 - 6:15: Welcome
6:15 – 6:45: Guidelines for Data Visualization
6:45- 7:30 : Large-scale GPU-Accelerated Data Visualization with MapD
7:30 - 8:00: 1000+ Members Giveaway / Networking + Q&A
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
In this deck from FOSDEM'19, Christoph Angerer from NVIDIA presents: Rapids - Data Science on GPUs.
"The next big step in data science will combine the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions while taking advantage of the same technology that enables dramatic increases in speed in deep learning. This session highlights the progress that has been made on RAPIDS, discusses how you can get up and running doing data science on the GPU, and provides some use cases involving graph analytics as motivation.
GPUs and GPU platforms have been responsible for the dramatic advancement of deep learning and other neural net methods in the past several years. At the same time, traditional machine learning workloads, which comprise the majority of business use cases, continue to be written in Python with heavy reliance on a combination of single-threaded tools (e.g., Pandas and Scikit-Learn) or large, multi-CPU distributed solutions (e.g., Spark and PySpark). RAPIDS, developed by a consortium of companies and available as open source code, allows for moving the vast majority of machine learning workloads from a CPU environment to GPUs. This allows for a substantial speed up, particularly on large data sets, and affords rapid, interactive work that previously was cumbersome to code or very slow to execute. Many data science problems can be approached using a graph/network view, and much like traditional machine learning workloads, this has been either local (e.g., Gephi, Cytoscape, NetworkX) or distributed on CPU platforms (e.g., GraphX). We will present GPU-accelerated graph capabilities that, with minimal conceptual code changes, allows both graph representations and graph-based analytics to achieve similar speed ups on a GPU platform. By keeping all of these tasks on the GPU and minimizing redundant I/O, data scientists are enabled to model their data quickly and frequently, affording a higher degree of experimentation and more effective model generation. Further, keeping all of this in compatible formats allows quick movement from feature extraction, graph representation, graph analytic, enrichment back to the original data, and visualization of results. RAPIDS has a mission to build a platform that allows data scientist to explore data, train machine learning algorithms, and build applications while primarily staying on the GPU and GPU platforms."
Learn more: https://rapids.ai/
and
https://fosdem.org/2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
RAPIDS – Open GPU-accelerated Data Science
RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis.
Corey J. Nolet
Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs.
Adam Thompson
Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
Hadoop is becoming a standard platform for building critical financial applications such as risk reporting, trading and fraud detection. These applications require high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). To achieve these SLAs, organizations need to build a disaster recovery plan that cover several layers ranging from the infrastructure to the clients going through the platform and the applications. In this talk, we will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster solution that Crédit Agricole CIB is using in production. We will discuss the solution’s advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer...) into the architecture.
Some might think Docker is for developers only, but this is not really the case.Docker is here to stay and we will only see more of it in the future.
In this session learn what Docker is and how it works.This session will be covering core areas such as volumes, but also stepping it up to a few tips and tricks to help you get the most out of your Docker environment.The session will dive into a few examples of how to create a database environment within just a few minutes - perfect for testing,development, and possibly even production systems.
Machine Learning explained with Examples
Everybody is talking about machine learning. What is it actually and how can I use it?
In this presentation we will see some examples of solving real life use cases using machine learning. We will define Tasks and see how that task can be addressed using machine learning.
SQL Server 2017でLinuxに対応し、その延長線でDocker対応やKubernetesによる可用性構成が組めるようになりました。そしてリリースを間近に控えたSQL Server 2019ではKubernetesを活用したBig Data Cluster機能の提供が予定されており、コンテナの活用範囲はさらに広がっています。
本セッションではこれからSQL Serverコンテナに触れていくための基礎知識と実際に触れてみるための手順やサンプルをお届けします。
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
18. • mapd.com/demos
Play with our demos - everything demo you
saw in this talk was live!
• mapd.cloud
Get a MapD instance in less than 60 seconds
• www.mapd.com/platform/downloads/
Download the Community Edition
• community.mapd.com
Ask questions and share your experiences
Next Steps