Machine learning is becoming a must for many business domains and applications. H2O is a best-of-breed, open source, distributed machine learning library written in Java. The presentation shows how to create and train machine learning models easily using H2O Flow web interface, including Deep Learning Neural Networks (DNNs). The session provides a tutorial how to develop and deploy fullstack-reactive face recognition demo using React + RxJS WebSocket front-end, OpenCV, Caffe CNN for image segmentation, OpenFace CNN for feature extraction, H20 Flow for face recognition interactive model training and export as POJO. The trained POJO model is incorporated in a real-time streaming web service implemented using Spring 5 Web Flux and Spring Boot. All demo is 100% Java!
Rapid Web API development with Kotlin and KtorTrayan Iliev
Introduction to Kotlin and Ktor with flow, async and channel examples. Ktor is an async web framework with minimal ceremony that leverages the advantages of Kotlin like coroutines and extensible functional DSLs..
Spring 5 Webflux - Advances in Java 2018Trayan Iliev
Brief introduction to distributed stream processing, reactive programming, and novelties in Spring 5, Spring Boot 2, and reactive Spring Data + programming examples in GitHub. More information will be provided during upcoming Spring 5 course: http://iproduct.org/en/courses/spring-mvc-rest/
Microservices with Spring 5 Webflux - jProfessionalsTrayan Iliev
Spring 5 introduces new functional and reactive programming model for building web applications and (micro-)services.
The session @jProfessionals dev conference demonstrates how to build REST microservices using Spring WebFlux and Spring Boot using code examples on GitHub. It includes:
- Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure);
- Comparison between annotation-based and functional reactive ;programming approaches for building REST services with WebFlux;
- Router, handler and filter functions;
- Using reactive repositories and reactive database access with Spring Data;
- Building end-to-end non-blocking reactive web services using Netty-based web runtime;
- Reactive WebClients and integration testing;
- Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
Stream based data / event / message processing becomes preferred way of achieving interoperability and real-time communication in distributed SOA / microservice / database architectures.
Beside lambdas, Java 8 introduced two new APIs explicitly dealing with stream data processing:
- Stream - which is PULL-based and easily parallelizable;
- CompletableFuture / CompletionStage - which allow composition of PUSH-based, non-blocking, asynchronous data processing pipelines.
Java 9 will provide further support for stream-based data-processing by extending the CompletableFuture with additional functionality – support for delays and timeouts, better support for subclassing, and new utility methods.
More, Java 9 provides new java.util.concurrent.Flow API implementing Reactive Streams specification that enables reactive programming and interoperability with libraries like Reactor, RxJava, RabbitMQ, Vert.x, Ratpack, and Akka.
The presentation will discuss the novelties in Java 8 and Java 9 supporting stream data processing, describing the APIs, models and practical details of asynchronous pipeline implementation, error handling, multithreaded execution, asyncronous REST service implementation, interoperability with existing libraries.
There are provided demo examples (code on GitHub) using Completable Future and Flow with:
- JAX-RS 2.1 AsyncResponse, and more importantly unit-testing the async REST service method implementations;
- CDI 2.0 asynchronous observers (fireAsync / @ObservesAsync);
Sensor data is streamed in realtime from Arduino + accelerometeres, gyroscopes & compass 3D, ultrasound distance sensor, etc. using UDP protocol. The data processing is done with reactive Java alterantive implementations: callbacks, CompletableFutures and using Spring 5 Reactor library. The web 3D visualization with Three.js is streamed using Server Sent Events (SSE).
A video for the IoT demo is available @YouTube: https://www.youtube.com/watch?v=AB3AWAfcy9U
All source code of the demo is freely available @GitHub: https://github.com/iproduct/reactive-demos-iot
There are more reactive Java demos in the same repository - callbacks, CompletableFuture, realtime event streaming. Soon I'll add a description how to build the device and upload Arduino sketch, as well as describe CompletableFuture and Reactor demos and 3D web visualization part with Three.js. Please stay tuned :)
Presentation from Angular Sofia Meetup event focuses on integration between state-of-the-art Angular, component libraries and supporting technologies, necessary to build a scalable and performant single-page apps. Topics include:
- Composing NGRX Reducers, Selectors and Middleware;
- Computing derived data using Reselect-style memoization with RxJS;
- NGRX Router integration;
- Normalization/denormalization and keeping data locally in IndexedDB;
- Processing Observable (hot) streams of async actions, and isolating the side effects using @Effect decorator with NGRX/RxJS reactive transforms;
- Integration of Material Design with third party component libraries like PrimeNG;
- more: lazy loading, AOT...
In recent times, Reactive Programming has gained a lot of popularity. It is not a “silver bullet” nor it is a solution for every problem. Yet, it is a paradigm to build applications which are non-blocking, event-driven and asynchronous and require a small number of threads to scale. Spring Framework 5 embraces Reactive Streams and Reactor for its own reactive use, as well as in many of its core APIs. It also adds an ability to code in a declarative way, as opposed to imperatively, resulting in more responsive and resilient applications. On top of that, you are given an amazing toolbox of functions to combine, create and filter any data stream. It becomes easy to support input and output streaming scenarios for microservices, scatter/gather, data ingestion, and so on. This presentation is about support and building blocks for reactive programming, that come with the latest versions of Spring Framework 5 and Spring Boot 2.
Rapid Web API development with Kotlin and KtorTrayan Iliev
Introduction to Kotlin and Ktor with flow, async and channel examples. Ktor is an async web framework with minimal ceremony that leverages the advantages of Kotlin like coroutines and extensible functional DSLs..
Spring 5 Webflux - Advances in Java 2018Trayan Iliev
Brief introduction to distributed stream processing, reactive programming, and novelties in Spring 5, Spring Boot 2, and reactive Spring Data + programming examples in GitHub. More information will be provided during upcoming Spring 5 course: http://iproduct.org/en/courses/spring-mvc-rest/
Microservices with Spring 5 Webflux - jProfessionalsTrayan Iliev
Spring 5 introduces new functional and reactive programming model for building web applications and (micro-)services.
The session @jProfessionals dev conference demonstrates how to build REST microservices using Spring WebFlux and Spring Boot using code examples on GitHub. It includes:
- Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure);
- Comparison between annotation-based and functional reactive ;programming approaches for building REST services with WebFlux;
- Router, handler and filter functions;
- Using reactive repositories and reactive database access with Spring Data;
- Building end-to-end non-blocking reactive web services using Netty-based web runtime;
- Reactive WebClients and integration testing;
- Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
Stream based data / event / message processing becomes preferred way of achieving interoperability and real-time communication in distributed SOA / microservice / database architectures.
Beside lambdas, Java 8 introduced two new APIs explicitly dealing with stream data processing:
- Stream - which is PULL-based and easily parallelizable;
- CompletableFuture / CompletionStage - which allow composition of PUSH-based, non-blocking, asynchronous data processing pipelines.
Java 9 will provide further support for stream-based data-processing by extending the CompletableFuture with additional functionality – support for delays and timeouts, better support for subclassing, and new utility methods.
More, Java 9 provides new java.util.concurrent.Flow API implementing Reactive Streams specification that enables reactive programming and interoperability with libraries like Reactor, RxJava, RabbitMQ, Vert.x, Ratpack, and Akka.
The presentation will discuss the novelties in Java 8 and Java 9 supporting stream data processing, describing the APIs, models and practical details of asynchronous pipeline implementation, error handling, multithreaded execution, asyncronous REST service implementation, interoperability with existing libraries.
There are provided demo examples (code on GitHub) using Completable Future and Flow with:
- JAX-RS 2.1 AsyncResponse, and more importantly unit-testing the async REST service method implementations;
- CDI 2.0 asynchronous observers (fireAsync / @ObservesAsync);
Sensor data is streamed in realtime from Arduino + accelerometeres, gyroscopes & compass 3D, ultrasound distance sensor, etc. using UDP protocol. The data processing is done with reactive Java alterantive implementations: callbacks, CompletableFutures and using Spring 5 Reactor library. The web 3D visualization with Three.js is streamed using Server Sent Events (SSE).
A video for the IoT demo is available @YouTube: https://www.youtube.com/watch?v=AB3AWAfcy9U
All source code of the demo is freely available @GitHub: https://github.com/iproduct/reactive-demos-iot
There are more reactive Java demos in the same repository - callbacks, CompletableFuture, realtime event streaming. Soon I'll add a description how to build the device and upload Arduino sketch, as well as describe CompletableFuture and Reactor demos and 3D web visualization part with Three.js. Please stay tuned :)
Presentation from Angular Sofia Meetup event focuses on integration between state-of-the-art Angular, component libraries and supporting technologies, necessary to build a scalable and performant single-page apps. Topics include:
- Composing NGRX Reducers, Selectors and Middleware;
- Computing derived data using Reselect-style memoization with RxJS;
- NGRX Router integration;
- Normalization/denormalization and keeping data locally in IndexedDB;
- Processing Observable (hot) streams of async actions, and isolating the side effects using @Effect decorator with NGRX/RxJS reactive transforms;
- Integration of Material Design with third party component libraries like PrimeNG;
- more: lazy loading, AOT...
In recent times, Reactive Programming has gained a lot of popularity. It is not a “silver bullet” nor it is a solution for every problem. Yet, it is a paradigm to build applications which are non-blocking, event-driven and asynchronous and require a small number of threads to scale. Spring Framework 5 embraces Reactive Streams and Reactor for its own reactive use, as well as in many of its core APIs. It also adds an ability to code in a declarative way, as opposed to imperatively, resulting in more responsive and resilient applications. On top of that, you are given an amazing toolbox of functions to combine, create and filter any data stream. It becomes easy to support input and output streaming scenarios for microservices, scatter/gather, data ingestion, and so on. This presentation is about support and building blocks for reactive programming, that come with the latest versions of Spring Framework 5 and Spring Boot 2.
Servlet vs Reactive Stacks in 5 Use CasesVMware Tanzu
ROSSEN STOYANCHEV SPRING FRAMEWORK DEVELOPER
In the past year Netflix shared a story about upgrading their main gateway serving 83 million users from Servlet-stack Zuul 1 to an async and non-blocking Netty-based Zuul 2. The results were interesting and nuanced with some major benefits as well as some trade-offs. Can mere mortal web applications make this journey and moreover should they? The best way to explore the answer is through a specific use case. In this talk we'll take 5 common use cases in web application development and explore the impact of building on Servlet and Reactive web application stacks. For reactive programming we'll use RxJava and Reactor. For the web stack we'll pit Spring MVC vs Spring WebFlux (new in Spring Framework 5.0) allowing us to move easily between the Servlet and Reactive worlds and drawing a meaningful, apples-to-apples comparison. Spring knowledge is not required and not assumed for this session.
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...Flink Forward
Today's end-to-end data pipelines need to combine many diverse workloads such as machine learning, relational operations, stream dataflows, tensor transformations, and graphs. For each of these workload types exist several frontends (e.g., DataFrames/SQL, Beam, Keras) based on different programming languages as well as different runtimes (e.g., Spark, Flink, Tensorflow) that target a particular frontend and possibly a hardware architecture (e.g., GPUs). Putting all the pieces of a data pipeline together simply leads to excessive data materialisation, type conversions and hardware utilisation as well as miss-matches of processing guarantees.
Our research group at RISE and KTH in Sweden has founded Arc, an intermediate language that bridges the gap between any frontend and a dataflow runtime (e.g., Flink) through a set of fundamental building blocks for expressing data pipelines. Arc incorporates Flink and Beam-inspired stream semantics such as windows, state and out of order processing as well as concepts found in batch computation models. With Arc, we can cross- compile and optimise diverse tasks written in any programming language into a unified dataflow program. Arc programs can run on various hardware backends efficiently as well as allowing seamless, distributed execution on dataflow runtimes. To that end, we showcase Arcon a concept runtime built in Rust that can execute Arc programs natively as well as presenting a minimal set of extensions to make Flink an Arc-ready runtime.
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.
While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the ex- isting Qthreads backend of Chapel.
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
Apache Flink is a general-purpose platform for batch and streaming distributed data processing. This talk describes how Flink’s powerful APIs, iterative operators and other unique features make it a competitive alternative for large-scale graph processing as well. We take a close look at how one can elegantly express graph analysis tasks, using common Flink operators and how different graph processing models, like vertex-centric, can be easily mapped to Flink dataflows. Next, we get a sneak preview into Flink's upcoming Graph API, Gelly, which further simplifies graph application development in Flink. Finally, we show how to perform end-to-end data analysis, mixing common Flink operators and Gelly, without having to build complex pipelines and combine different systems. We go through a step-by-step example, demonstrating how to perform loading, transformation, filtering, graph creation and analysis, with a single Flink program.
Building cloud-enabled genomics workflows with Luigi and DockerJacob Feala
Talk given at Bio-IT 2016, Cloud Computing track
Abstract:
As bioinformatics scientists, we tend to write custom tools for managing our workflows, even when viable, open-source alternatives are available from the tech community. Our field has, however, begun to adopt Docker containers to stabilize compute environments. In this talk, I will introduce Luigi, a workflow system built by engineers at Spotify to manage long-running big data processing jobs with complex dependencies. Focusing on a case study of next generation sequencing analysis in cancer genomics research, I will show how Luigi can connect simple, containerized applications into complex bioinformatics pipelines that can be easily integrated with compute, storage, and data warehousing on the cloud.
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
Matei Zaharia is an assistant professor of computer science at Stanford University, Chief Technologist and Co-founder of Databricks. He started the Spark project at UC Berkeley and continues to serve as its vice president at Apache. Matei also co-started the Apache Mesos project and is a committer on Apache Hadoop. Matei’s research work on datacenter systems was recognized through two Best Paper awards and the 2014 ACM Doctoral Dissertation Award.
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...Databricks
Yelp’s ad platform handles millions of ad requests everyday. To generate ad metrics and analytics in real-time, they built they ad event tracking and analyzing pipeline on top of Spark Streaming. It allows Yelp to manage large number of active ad campaigns and greatly reduce over-delivery. It also enables them to share ad metrics with advertisers in a more timely fashion.
This session will start with an overview of the entire pipeline and then focus on two specific challenges in the event consolidation part of the pipeline that Yelp had to solve. The first challenge will be about joining multiple data sources together to generate a single stream of ad events that feeds into various downstream systems. That involves solving several problems that are unique to real-time applications, such as windowed processing and handling of event delays. The second challenge covered is with regards to state management across code deployments and application restarts. Throughout the session, the speakers will share best practices for the design and development of large-scale Spark Streaming pipelines for production environments.
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
Trong quá trình phân tích hiệu năng, hiểu và nắm vững ngôn ngữ lập trình cũng như cách thiết kế của nó là rất hữu ích. Go là một trong những ngôn ngữ được sử dụng phổ biến trong các hệ thống phân tán có hiệu năng cao. Để hiểu rõ hơn cách mà Go compiler phân tích cách cấp phát bộ nhớ khi biên dịch chương trình, hãy nghe những chia sẻ của anh Cường về Escape Analysis trong Go compiler.
Về diễn giả:
Anh Lê Mạnh Cường là một kĩ sư phần mềm có 8 năm kinh nghiệm chuyên sâu trong backend và Quản trị hệ thống Linux. Là một OSS contributor tích cực, anh Cường đã có nhiều cống hiến vào cộng đồng mã nguồn mở, đặc biệt là Go và ecosystem của Go.
Continuous Application with Structured Streaming 2.0Anyscale
Introduction to Continuous Application with Apache Spark 2.0 Structured Streaming. This presentation is a culmination and curation from talks and meetups presented by Databricks engineers.
The notebooks on Structured Streaming demonstrates aspects of the Structured Streaming APIs
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks
Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models AI has always been on of the most exciting applications of big data and Apache Spark. Increasingly Spark users want to integrate Spark with distributed deep learning and machine learning frameworks built for state-of-the-art training. On the other side, increasingly DL/AI users want to handle large and complex data scenarios needed for their production pipelines.
This talk introduces a new project that substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark. We will introduce the major directions and provide progress updates, including 1) barrier execution mode for distributed DL training, 2) fast data exchange between Spark and DL frameworks, and 3) accelerator-awareness scheduling.
My talk from SICS Data Science Day, describing FlinkML, the Machine Learning library for Apache Flink.
I talk about our approach to large-scale machine learning and how we utilize state-of-the-art algorithms to ensure FlinkML is a truly scalable library.
You can watch a video of the talk here: https://youtu.be/k29qoCm4c_k
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
On November 27 Trayan Iliev from IPT presented “Reactive microservices with Spring 5: WebFlux” @Dev.bg in Betahaus Sofia. IPT – Intellectual Products & Technologies has been organizing Java & JavaScript trainings since 2003.
Spring 5 introduces a new model for end-to-end functional and reactive web service programming with Spring 5 WebFlow, Spring Data & Spring Boot. The main topics include:
– Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure)
– REST services with WebFlux – comparison between annotation-based and functional reactive programming approaches for building.
– Router, handler and filter functions
– Using reactive repositories and reactive database access with Spring Data. Building end-to-end non-blocking reactive web services using Netty-based web runtime
– Reactive WebClients and integration testing. Reactive WebSocket support
– Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
Servlet vs Reactive Stacks in 5 Use CasesVMware Tanzu
ROSSEN STOYANCHEV SPRING FRAMEWORK DEVELOPER
In the past year Netflix shared a story about upgrading their main gateway serving 83 million users from Servlet-stack Zuul 1 to an async and non-blocking Netty-based Zuul 2. The results were interesting and nuanced with some major benefits as well as some trade-offs. Can mere mortal web applications make this journey and moreover should they? The best way to explore the answer is through a specific use case. In this talk we'll take 5 common use cases in web application development and explore the impact of building on Servlet and Reactive web application stacks. For reactive programming we'll use RxJava and Reactor. For the web stack we'll pit Spring MVC vs Spring WebFlux (new in Spring Framework 5.0) allowing us to move easily between the Servlet and Reactive worlds and drawing a meaningful, apples-to-apples comparison. Spring knowledge is not required and not assumed for this session.
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...Flink Forward
Today's end-to-end data pipelines need to combine many diverse workloads such as machine learning, relational operations, stream dataflows, tensor transformations, and graphs. For each of these workload types exist several frontends (e.g., DataFrames/SQL, Beam, Keras) based on different programming languages as well as different runtimes (e.g., Spark, Flink, Tensorflow) that target a particular frontend and possibly a hardware architecture (e.g., GPUs). Putting all the pieces of a data pipeline together simply leads to excessive data materialisation, type conversions and hardware utilisation as well as miss-matches of processing guarantees.
Our research group at RISE and KTH in Sweden has founded Arc, an intermediate language that bridges the gap between any frontend and a dataflow runtime (e.g., Flink) through a set of fundamental building blocks for expressing data pipelines. Arc incorporates Flink and Beam-inspired stream semantics such as windows, state and out of order processing as well as concepts found in batch computation models. With Arc, we can cross- compile and optimise diverse tasks written in any programming language into a unified dataflow program. Arc programs can run on various hardware backends efficiently as well as allowing seamless, distributed execution on dataflow runtimes. To that end, we showcase Arcon a concept runtime built in Rust that can execute Arc programs natively as well as presenting a minimal set of extensions to make Flink an Arc-ready runtime.
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.
While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the ex- isting Qthreads backend of Chapel.
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
Apache Flink is a general-purpose platform for batch and streaming distributed data processing. This talk describes how Flink’s powerful APIs, iterative operators and other unique features make it a competitive alternative for large-scale graph processing as well. We take a close look at how one can elegantly express graph analysis tasks, using common Flink operators and how different graph processing models, like vertex-centric, can be easily mapped to Flink dataflows. Next, we get a sneak preview into Flink's upcoming Graph API, Gelly, which further simplifies graph application development in Flink. Finally, we show how to perform end-to-end data analysis, mixing common Flink operators and Gelly, without having to build complex pipelines and combine different systems. We go through a step-by-step example, demonstrating how to perform loading, transformation, filtering, graph creation and analysis, with a single Flink program.
Building cloud-enabled genomics workflows with Luigi and DockerJacob Feala
Talk given at Bio-IT 2016, Cloud Computing track
Abstract:
As bioinformatics scientists, we tend to write custom tools for managing our workflows, even when viable, open-source alternatives are available from the tech community. Our field has, however, begun to adopt Docker containers to stabilize compute environments. In this talk, I will introduce Luigi, a workflow system built by engineers at Spotify to manage long-running big data processing jobs with complex dependencies. Focusing on a case study of next generation sequencing analysis in cancer genomics research, I will show how Luigi can connect simple, containerized applications into complex bioinformatics pipelines that can be easily integrated with compute, storage, and data warehousing on the cloud.
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
Matei Zaharia is an assistant professor of computer science at Stanford University, Chief Technologist and Co-founder of Databricks. He started the Spark project at UC Berkeley and continues to serve as its vice president at Apache. Matei also co-started the Apache Mesos project and is a committer on Apache Hadoop. Matei’s research work on datacenter systems was recognized through two Best Paper awards and the 2014 ACM Doctoral Dissertation Award.
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...Databricks
Yelp’s ad platform handles millions of ad requests everyday. To generate ad metrics and analytics in real-time, they built they ad event tracking and analyzing pipeline on top of Spark Streaming. It allows Yelp to manage large number of active ad campaigns and greatly reduce over-delivery. It also enables them to share ad metrics with advertisers in a more timely fashion.
This session will start with an overview of the entire pipeline and then focus on two specific challenges in the event consolidation part of the pipeline that Yelp had to solve. The first challenge will be about joining multiple data sources together to generate a single stream of ad events that feeds into various downstream systems. That involves solving several problems that are unique to real-time applications, such as windowed processing and handling of event delays. The second challenge covered is with regards to state management across code deployments and application restarts. Throughout the session, the speakers will share best practices for the design and development of large-scale Spark Streaming pipelines for production environments.
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
Trong quá trình phân tích hiệu năng, hiểu và nắm vững ngôn ngữ lập trình cũng như cách thiết kế của nó là rất hữu ích. Go là một trong những ngôn ngữ được sử dụng phổ biến trong các hệ thống phân tán có hiệu năng cao. Để hiểu rõ hơn cách mà Go compiler phân tích cách cấp phát bộ nhớ khi biên dịch chương trình, hãy nghe những chia sẻ của anh Cường về Escape Analysis trong Go compiler.
Về diễn giả:
Anh Lê Mạnh Cường là một kĩ sư phần mềm có 8 năm kinh nghiệm chuyên sâu trong backend và Quản trị hệ thống Linux. Là một OSS contributor tích cực, anh Cường đã có nhiều cống hiến vào cộng đồng mã nguồn mở, đặc biệt là Go và ecosystem của Go.
Continuous Application with Structured Streaming 2.0Anyscale
Introduction to Continuous Application with Apache Spark 2.0 Structured Streaming. This presentation is a culmination and curation from talks and meetups presented by Databricks engineers.
The notebooks on Structured Streaming demonstrates aspects of the Structured Streaming APIs
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks
Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models AI has always been on of the most exciting applications of big data and Apache Spark. Increasingly Spark users want to integrate Spark with distributed deep learning and machine learning frameworks built for state-of-the-art training. On the other side, increasingly DL/AI users want to handle large and complex data scenarios needed for their production pipelines.
This talk introduces a new project that substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark. We will introduce the major directions and provide progress updates, including 1) barrier execution mode for distributed DL training, 2) fast data exchange between Spark and DL frameworks, and 3) accelerator-awareness scheduling.
My talk from SICS Data Science Day, describing FlinkML, the Machine Learning library for Apache Flink.
I talk about our approach to large-scale machine learning and how we utilize state-of-the-art algorithms to ensure FlinkML is a truly scalable library.
You can watch a video of the talk here: https://youtu.be/k29qoCm4c_k
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
On November 27 Trayan Iliev from IPT presented “Reactive microservices with Spring 5: WebFlux” @Dev.bg in Betahaus Sofia. IPT – Intellectual Products & Technologies has been organizing Java & JavaScript trainings since 2003.
Spring 5 introduces a new model for end-to-end functional and reactive web service programming with Spring 5 WebFlow, Spring Data & Spring Boot. The main topics include:
– Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure)
– REST services with WebFlux – comparison between annotation-based and functional reactive programming approaches for building.
– Router, handler and filter functions
– Using reactive repositories and reactive database access with Spring Data. Building end-to-end non-blocking reactive web services using Netty-based web runtime
– Reactive WebClients and integration testing. Reactive WebSocket support
– Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
London Spark Meetup 2014-11-11 @Skimlinks
http://www.meetup.com/Spark-London/events/217362972/
To paraphrase the immortal crooner Don Ho: "Tiny Batches, in the wine, make me happy, make me feel fine." http://youtu.be/mlCiDEXuxxA
Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams). These "micro batch" computations operated on small time intervals, generally from 500 milliseconds up. One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc.
This talk will compare case studies for production deployments of Spark Streaming, emerging design patterns for integration with popular complementary OSS frameworks, plus some of the more advanced features such as approximation algorithms, and take a look at what's ahead — including the new Python support for Spark Streaming that will be in the upcoming 1.2 release.
Also, let's chat a bit about the new Databricks + O'Reilly developer certification for Apache Spark…
My jPrime 2016 presentation shows example of Domain-Driven Design (DDD), Event Sourcing (ES) and Functional Reactive Programming (FRP) using Reactor and Redux in a showcase of Java robotics - two small robots IPTPI (Raspberry Pi 2 + Ardiuno) and LeJaRo (LeJOS).
Building Reactive Real-time Data PipelineTrieu Nguyen
Topic: Building reactive real-time data pipeline at FPT ?
1) What is “Data Pipeline” ?
2) Big Data Problems at FPT
+ VnExpress: pageview and heat-map
+ eClick: real-time reactive advertising
3) Solutions and Patterns
4) Fast Data Architecture at FPT
5) Wrap up
Building and deploying LLM applications with Apache AirflowKaxil Naik
Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions.
This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data.
In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.
https://airflowsummit.org/sessions/2023/keynote-llm/
Reactive Java Robotics & IoT with Spring ReactorTrayan Iliev
On April 4-th, 2017 in cosmos coworking camp – Sofia, Trayan Iliev will talk about “Reactive Java Robotics and IoT with Spring Reactor” (http://robolearn.org/reactive-java-robotics-iot-spring-reactor/).
The event is organized by DEV.BG and it is part from the user group Internet of Things.
Program:
1. Robotics, IoT & Complexity. Domain-Driven Design (DDD). Reactive programming. Reactive Streams (java.util.concurrent.Flow);
2. High performance non-blocking asynchronous programming on JVM using Reactor project (using Disruptor/RingBuffer);
3. Implementig reactive hot event streams processing with Reactor: Flux & Mono, Processors;
4. End-to-end reactive web applications and services: Reactor IO (REST, WebSocket) + RxJS + Angular 2;
5. IPTPI robot demo – reactive hot event streams processing on Raspberry Pi 2 + Arduino with embedded and mobile interfaces: http://robolearn.org/
For the lecturer: Trayan Iliev
– founder and manager of IPT – Intellectual Products & Technologies (http://iproduct.org/) – company for IT trainings and consultancy, specialized in Java, Fullstack JavaScipt, web and mobile technologies
– 15+ years training and consulting experience
– lecturer on the conferences, organized by BGJUG and BGOUG – 9 presentations
– organizer of hackathons on Java robotics & IoT in Sofia and Plovdiv
– presenter on international developer conferences: jPrime, jPofessionals, Voxxed Days
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
Big Data to SMART Data : Process scenario
Scenario of an implementation of a transformation process of the Data towards exploitable data and representative with treatments of the streaming, the distributed systems, the messages, the storage in an NoSQL environment, a management with an ecosystem Big Data graphic visualization of the data with the technologies:
Apache Storm, Apache Zookeeper, Apache Kafka, Apache Cassandra, Apache Spark and Data-Driven Document.
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
Los Angeles Apache Spark Users Group 2014-12-11 http://meetup.com/Los-Angeles-Apache-Spark-Users-Group/events/218748643/
A look ahead at Spark Streaming in Spark 1.2 and beyond, with case studies, demos, plus an overview of approximation algorithms that are useful for real-time analytics.
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
IPT presentation @ jProfessionals 2016 on Java and JavaScipt Reactive Robotics and IoT including: Domain Driven Design (DDD), high-performance reactive micro-services development using Spring Reactor, state-of-the-art component-based client side MVVM implementation with Angular 2, ngrx (Redux pattern), TypeScript and reactive WebSockets.
Similar to Making Machine Learning Easy with H2O and WebFlux (20)
Learning Programming Using Robots - Sofia University Conference 2018 Trayan Iliev
Learn information technologies by creating your own robots and IoT projects. Robotics and IoT offer rich opportunities for practical and active learning of core information technologies, programming languages and software architectures. Presentation includes examples of teaching practices and robotics projects, and offers suggestions why and how to use them to achieve better students' motivation, engagement, creativity, and connection between theory and practice.
Active Learning Using Connected Things - 2018 (in Bulgarian)Trayan Iliev
Learn about active learning methods and practices using Robotics, IoT, and "smart things" projects. Includes examples of teaching practices and robotics projects, and offers suggestions why and how to use them to achieve better students' motivation, engagement, creativity, and connection between theory and practice. Several blended learning models are compared - Flipped Classroom, Stations/Labs Rotation, Flex model. Project support for individual learning styles is discussed.
Fog Computing – between IoT Devices and The Cloud presentation covers following topics:
- Edge, Fog, Mist & Cloud Computing
- Fog domains and fog federation, wireless sensor networks, - multi-layer IoT architecture
- Fog computing standards and specifications
- Practical use-case scenarios & advantages of fog
- Fog analytics and intelligence on the edge
- Technologies for distributed asynchronous event processing - and analytics in real time
- Lambda architecture – Spark, Storm, Kafka, Apex, Beam, Spring - Reactor & WebFlux
- Eclipse IoT platform
Presentation from BGOUG conference Nov 17, 2017.
Since September 2017, Java 9 is generally available. It offers many enhancements:
• Modularity – provides clear separation between public and private APIs, stronger encapsulation & dependency management.
• JShell – using and customizing Java 9 interactive shell by example
• Process API updates – feature-rich, async OS process management and statistics
• Reactive Streams, CompletableFuture and Stream API updates
• Building asynchronous HTTP/2 and WebSocket pipelines using HTTP/2 Client and CompletableFuture composition
• Collection API updates
• Stack walking, and other language enhancements (Project Coin)
Discussed topics are accompanied by live demos available for further review @ github.com/iproduct.
Presentation discusses the best practices when writing higher order components (HOCs), and presents examples how to write them according to React recommendations.
Topics include: maximizing HOC factories composability, using ES7 decorators, wrapping context DI in HOCs, handling async state changes using RxJS Observanels and separation of concerns between presentation and business logic components. Examples (@GitHub projects - referenced in the presentation) are given using react, redux, react-router-redux, redux-observable, recompose, and reselect.
Presentation is highlighting novelties in SPA development with Angular 2 (+Ionic 2 demo) with real code examples.
We created together simple Ng2 application with Angular CLI.
All the code is available on GitHub (link to demos is at the end of presentation).
Prerequisites:
1. Install NodeJS. It is better to install version 6 or 4x. Read about NPM.
2. Install TypeScript + editor (Visual Studio Code or Sublime 3).
3. Install Angular 2 Command Line Interface (Angular CLI):
npm install -g angular-cli
MVC 1.0 is an action-oriented framework building on experience with previous frameworks such as Struts, Spring MVC, VRaptor etc. It is based on JAX-RS, CDI and BeanValidation JavaEE technologies and provides a standard, view specification neutral way to build web applications. Among supported view template frameworks are: JSP, Facelets, Freemarker, Handlebars, Jade, Mustache, Velocity, Thymeleaf, etc.
NOTE: MVC 1.0 JavaEE 8 API Specification is in early draft stage, and is subject to change based on open community process.
IPT High Performance Reactive Programming with JAVA 8 and JavaScriptTrayan Iliev
Presentation @ jProfessionals BGJUG Conference
Sofia, November 22, 2015 by IPT – IT Education Evolved, High Performance Reactive Programming Workshop - Dec 15-17,2015 http://iproduct.org/en/course-reactive-java-js/
You are welcome to join us!
Low-latency, high-throughput reactive and functional programming in Java using Spring Reactor, RxJava, RxJS, Facebook React, Angular 2, Reactive Streams, Disruptor (ring buffer), Reactor & Proactor design patterns, benchmarking & comparison of concurrency implementations. December 15 - 17, 2015 - Workshop: High Performance Reactive Programming with JAVA 8 and JavaScript - http://iproduct.org/en/course-reactive-java-js/
IPT Workshops on Java Robotics and IoTTrayan Iliev
Learn how to build your own robot & how to program it in Java with IPT workshops & hackathons. Two small robots: LeJaRo - "the leJOS java robot" & IPTPI - Raspberry Pi 2 enabled robot with higher processing capabilities + distance US and optical sensors, line follow sensor array, cameras, encoders and more. Get first hand experience and jump start in JAVA robotics. Find more on http://robolearn.org/. Children of ALL AGES are Welcome :)
The presentation shows some code examples for programing two small robots (Lego® & Raspberry Pi®) in Java ™ – http://iproduct.org/en/, https://github.com/iproduct/course-social-robotics/wiki/Lectures
Novelties in Java EE 7: JAX-RS 2.0 + IPT REST HATEOAS Polling Demo @ BGOUG Co...Trayan Iliev
Presentation shows by example (IPT Polling Demo JAXRS20 HATEOAS, https://github.com/iproduct/IPT-Polling-Demo-JAXRS20-HATEOAS/wiki) the novelties in JAX-RS 2.0 and REST HATEOAS:
- Standardized REST Client API;
- Client and server-side asynchronous HTTP request processing;
- Integration of declarative validation using JSR 349: Bean Validation 1.1;
- Improved server-suggested content negotiation;
- Aspect-oriented extensibility of request/response processing using Filters and Interceptors;
- Dynamic extension registration using DynamicFeature interface;
- Hypermedia As The Engine Of Application State (HATEOAS) REST architectural constraint support using state transition links (support for new HTTP Link header as well as JAXB serialization of resource links).
[IPT, http://iproduct.org]
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
1. 1
Making Machine Learning Easy
Deep Learning with H2O and Spring WebFlux
Licensed under CC BY 3.0 license.
2. 2
Disclaimer
All information presented in this document and all supplementary materials
and programming code represent only my personal opinion and current
understanding and has not received any endorsement or approval by IPT -
Intellectual Products and Technologies or any third party. It should not be
taken as any kind of advice, and should not be used for making any kind of
decisions with potential commercial impact.
The information and code presented may be incorrect or incomplete. It is
provided "as is", without warranty of any kind, express or implied, including but
not limited to the warranties of merchantability, fitness for a particular purpose
and non-infringement. In no event shall the author or copyright holders be
liable for any claim, damages or other liability, whether in an action of
contract, tort or otherwise, arising from, out of or in connection with the
information, materials or code presented or the use or other dealings with this
information or programming code.
3. 3
3
Trayan Iliev
– CEO of IPT – Intellectual Products & Technologies
– Oracle® certified programmer 15+ Y
– End-to-end reactive fullstack apps with Go, Python, Java, ES6/7,
TypeScript, Angular, React and Vue.js
– 12+ years IT trainer
– Voxxed Days, jPrime, Java2Days, jProfessionals, BGOUG, BGJUG, DEV.BG
speaker
– Lecturer @ Sofia University – courses: Fullstack React, Multimedia with
Angular & TypeScript, Spring 5 Reactive, Internet of Things (with SAP),
Multiagent Systems and Social Robotics, Practical Robotics & Smart Things
– Robotics / smart-things/ IoT enthusiast, RoboLearn hackathons organizer
About Me
4. 4
Where to Find The Code and Materials?
https://github.com/iproduct/course-ml
5. 5
Machine Learning + Big Data in Real Time +
Cloud Technologies
=> The Future of Intelligent Systems
7. 7
Need for Speed :)
The Go gopher was designed by Renee French. (http://reneefrench.blogspot.com/) The design is licensed under the Creative Commons 3.0 Attributions license.
11. 11
• Message Driven – asynchronous message-passing allows to
establish a boundary between components that ensures loose
coupling, isolation, location transparency, and provides the means
to delegate errors as messages [Reactive Manifesto].
• The main idea is to separate concurrent producer and consumer
workers by using message queues.
• Message queues can be unbounded or bounded (limited max
number of messages)
• Unbounded message queues can present memory allocation
problem in case the producers outrun the consumers for a long
period → OutOfMemoryError
Scalable, Massively Concurrent
12. 12
“Conceptually, a stream is a (potentially never-ending) flow of data
records, and a transformation is an operation that takes one or more
streams as input, and produces one or more output streams as a result.”
Apache Flink: Dataflow Programming Model
Data / Event / Message Streams
13. 13
The idea of abstracting logic from execution is hardly new -- it was the
dream of SOA. And the recent emergence of microservices and
containers shows that the dream still lives on.
For developers, the question is whether they want to learn yet one more
layer of abstraction to their coding. On one hand, there's the elusive
promise of a common API to streaming engines that in theory should let
you mix and match, or swap in and swap out.
Tony Baer (Ovum) @ ZDNet - Apache Beam and Spark:
New coopetition for squashing the Lambda Architecture?
Data Stream Programming
19. 19
Kappa Architecture
Query = K (New Data) = K (Live streaming data)
• Proposed by Jay Kreps in 2014
• Real-time processing of distinct events
• Drawbacks of Lambda architecture:
−It can result in coding overhead due to comprehensive processing
−Re-processes every batch cycle which may not be always beneficial
−Lambda architecture modeled data can be difficult to migrate
• Canonical data store in a Kappa Architecture system is an append-only
immutable log (like Kafka, Pulsar)
20. 20
Kappa Architecture II
Query = K (New Data) = K (Live streaming data)
• Multiple data events or queries are logged in a
queue to be catered against a distributed file
system storage or history.
• The order of the events and queries is not
predetermined. Stream processing platforms
can interact with database at any time.
• It is resilient and highly available as handling
terabytes of storage is required for each node
of the system to support replication.
• Machine learning is done on the real time basis
Data Lake
Input
Data
Streaming
Layer
Serving
Layer
21. 21
Zeta Architecture
• Main characteristics of Zeta architecture:
− file system (HDFS, S3, GoogleFS),
− realtime data storage (HBase, Spanner, BigTable),
− modular processing model and platform (MapReduce, Spark, Drill, BigQuery),
− containerization and deployment (cgroups, Docker, Kubernetes),
− Software solution architecture (serverless computing – e.g. Amazon Lambda)
• Recommender systems and machine learning
• Business applications and dynamic global resource management
(Mesos + Myriad, YARN, Diego, Borg).
22. 22
• Apache Spark is an open-source
cluster-computing framework. Spark
Streaming, Spark Mllib
• Apache Storm is a distributed
stream processing – streams DAG
• Apache Samza is a distributed real-
time stream processing framework.
Distributed Stream Processing – Apache Projects:
24. 24
24
Example: Internet of Things (IoT)
CC BY 2.0, Source: https://www.flickr.com/photos/wilgengebroed/8249565455/ ,
Radar, GPS, lidar for navigation and obstacle avoidance ( 2007 DARPA Urban Challenge )
26. 26
26
• Raspberry Pi 2 (quad-core ARMv7 @
900MHz) + Arduino Leonardo clone A-Star
32U4 Micro
• Optical encoders (custom), IR optical array,
3D accelerometers, gyros, and compass
MinIMU-9 v2
• IPTPI is programmed in Python, Java and
Go using: Wiring Pi, Numpy, Pandas,
Scikit-learn, Pi4J, Reactor, RxJava,
GPIO(Go)
IPTPI: Raspberry Pi + Ardunio Robot
27. 27
IPTPI: Raspberry Pi + Ardunio Robot
27
3D accelerometers, gyros,
and compass MinIMU-9 v2
Pololu DRV8835
Dual Motor Driver
for Raspberry Pi
Arduino Leonardo clone
A-Star 32U4 Micro
USB Stereo
Speakers - 5V
LiPo Powebank
15000 mAh
32. 32
32
We live in a Connected Universe
... there is hypothesis that all the things in
the Universe are intimately connected, and
you can not change a bit without changing
all.
Action – Reaction principle is the
essence of how Universe behaves.
Imperative and Reactive
33. 33
Imperative vs. Reactive
• Reactive Programming: using static or dynamic data flows and
propagation of change
• Example: a := b + c
• Functional Programming: evaluation of mathematical functions, avoids
changing-state and mutable data, declarative programming
• Side effects free => much easier to understand and predict the program
behavior.
Ex. (RxGo): observable := rxgo.Just("Hello", "Reactive", "World", "from", "RxGo")().
Map(ToUpper). // map to upper case
Filter(LengthGreaterThan4) // greaterThan4 func filters values > 4
for item := range observable.Observe() {
fmt.Println(item.V)
}
34. 34
34
• According to Connal Elliot's (ground-breaking paper at
Conference on Functional Programming, 1997), FRP is:
(a) Denotative
(b) Temporally continuous
• FRP is asynchronous data-flow programming using the building
blocks of functional programming (map, reduce, filter, etc.)
and explicitly modeling time
Functional Reactive Programming (FRP)
36. 36
ReactiveX: Observable vs. Iterable
By ReactiveX - Creative Commons Attribution 3.0 License, Source: http://reactivex.io/intro.html
You can think of the Observable class as a “push” equivalent to Iterable, which is a “pull.” With an Iterable, the
consumer pulls values from the producer and the thread blocks until those values arrive. By contrast, with an
Observable the producer pushes values to the consumer whenever values are available. This approach is more
flexible, because values can arrive synchronously or asynchronously.
40. 40
40
• PULL-based (Cold Event Streams) – Cold streams (e.g. RxJava
Observable / Flowable or Reactor Flow / Mono) are streams
that run their sequence when and if they are subscribed to.
They present the sequence from the start to each subscriber.
• PUSH-based (Hot Event Streams) – emit values independent of
individual subscriptions. They have their own timeline and events
occur whether someone is listening or not. When subscription is
made observer receives current events as they happen.
• Example: mouse events
Hot and Cold Event Streams
41. 41
41
Converting Cold to Hot Stream
Source: RxJava 2 API documentation, http://reactivex.io/RxJava/2.x/javadoc/
43. 43
What Is Machine Learning?
• Machine Learning (ML) –the study of computer algorithms that improve automatically
through experience. A subset of Artificial Intelligence (AI).
• ML algorithms build a model based on sample data, known as "training data", in order
to make predictions or decisions without being explicitly programmed to do so.
• ML algorithms are used in a wide variety of applications, where it is difficult or
unfeasible to develop conventional algorithms for the task – e.g. computer vision,
email filtering, digital assistants, natural language processing (NLP), recommendations,
personalized online advertising, chatbots, fraud detection, cybersecurity, medical
image analysis, self-driving cars, self-driving databases, and much, much more ...
• ML is closely connected with computational statistics, mathematical optimization, data
mining, predictive analytics.
44. 44
Machine Learning and Artificial Intelligence
• ML learns and predicts based on passive observations, whereas AI implies an agent
interacting with the environment to learn and take actions that maximize its chance of
successfully achieving its goals.
Judea Pearl, The Book of Why, 2018
Pictures by: Yakoove - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=95589683
By Yakoove - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=95589710
45. 45
Types of Learning Algorithms: Supervised Learning
• Supervised learning – build a mathematical model of a set of data that contains both
the inputs and the desired outputs.
• Training data – a set (matrix) of training examples (feature vectors), having one or
more inputs and the desired output. Through iterative optimization of an objective
function, supervised learning algorithms learn a function that can be used to predict
the output associated with new inputs.
• Examples: active learning, classification and regression.
• Active learning – learning algorithm can interactively query a user (teacher or oracle) to
label new data points with the desired outputs (optimal experimental design).
• Classification algorithms – when the outputs are restricted to a limited set of values
• Regression algorithms – when the outputs may have any numerical value within a range.
• Similarity learning – using a similarity function measuring how similar two objects are –
ranking, recommender systems, visual identity tracking, face and speaker verification.
46. 46
Types of Learning Algorithms: Unsupervised Learning
• Unsupervised learning – takes a data set that contains only inputs, and find structure in
the data, like grouping or clustering of data points by identifying data commonalities
• Algorithms learn from test data that has not been labeled, classified or categorized.
• Applications: density function estimation, summarizing and explaining data features.
• Cluster analysis is the assignment of a set of observations into subsets (called clusters)
so that observations within the same cluster are similar according to one or more
predesignated criteria, while observations drawn from different clusters are dissimilar.
• Different clustering techniques make different assumptions on the structure of the data,
often defined by some similarity metric and evaluated, for example, by internal
compactness, or the similarity between members of the same cluster, and separation,
the difference between clusters. Other methods are based on estimated density and
graph connectivity.
47. 47
Types of Learning Algorithms: Semi-supervised Learning
• Semi-supervised learning – falls between unsupervised learning (without any labeled
training data) and supervised learning (with completely labeled training data). During
training, it uses a smaller labeled data set to guide classification and feature extraction
from a larger, unlabeled data set.
• In weakly supervised learning, the training labels are noisy, limited, or imprecise; however,
these labels are often cheaper to obtain, resulting in larger effective training sets.
48. 48
Types of Learning Algorithms: Reinforcement Learning
• Reinforcement learning – concerned with how software agents ought to take actions in
an environment so as to maximize some notion of cumulative reward
• Due to its generality, the field is studied in many other disciplines, such as game theory,
control theory, operations research, information theory, simulation-based optimization,
multi-agent systems, swarm intelligence, statistics and genetic algorithms.
• In ML, the environment is typically represented as a Markov
decision process (MDP). Many reinforcement learning algorithms
use dynamic programming techniques.
• Reinforcement learning algorithms do not assume knowledge of
an exact mathematical model of the MDP, and are used when
exact models are infeasible.
• Examples: reinforcement learning algorithms are used in
autonomous vehicles or in learning to play a game against human
49. 49
Types of Learning Algorithms: Feature learning – I
• Feature learning (representation learning algorithms) – discover better representations of
the inputs provided during training. Examples: principal components analysis and cluster
analysis.
• Attempt to preserve the information in their input but also transform it in a way that makes
it useful, often as a pre-processing step before performing classification or predictions.
• Allows a machine to both learn the features and use them to perform a specific task.
• Feature learning can be either supervised or unsupervised. In supervised feature learning,
features are learned using labeled input data. Examples: artificial neural networks,
multilayer perceptrons, and supervised dictionary learning. In unsupervised feature
learning, features are learned with unlabeled input data. Examples: dictionary learning,
independent component analysis, autoencoders, matrix factorization, and clustering.
50. 50
Types of Learning Algorithms: Feature learning – II
• Manifold learning algorithms attempt to do so under the constraint that the learned
representation is low-dimensional.
• Sparse coding algorithms attempt to do so under the constraint that the learned
representation is sparse, meaning that the mathematical model has many zeros.
• Multilinear subspace learning algorithms aim to learn low-dimensional representations
directly from tensor representations for multidimensional data, without reshaping them into
higher-dimensional vectors.
• Deep learning algorithms discover multiple levels of representation, or a hierarchy of
features, with higher-level, more abstract features defined in terms of (or generating)
lower-level features. It has been argued that an intelligent machine is one that learns a
representation that disentangles the underlying factors of variation that explain the
observed data.
52. 52
Top Machine Learning Libraries - I
• Apache Spark Mllib – fast, in-memory, big-data processing, which has also made it a
go-to framework for ML, highly scalable, can be coupled with any Hadoop data
source, supports Java, Scala, R and Python, new ML algorithms are constantly being
added, and existing enhanced.
• H20 + AutoML - open source, in-memory, distributed, fast, and scalable machine
learning and predictive analytics platform allowing to learn from big data, written in
Java, uses Distributed Key/Value store for accessing data, models, objects, etc. across
all nodes and machines, uses distributed Map/Reduce, utilizes Java Fork/Join multi-
threading, data read in parallel, distributed across the cluster, and stored in memory in
a columnar format in a compressed way, data parser has built-in intelligence to guess
the schema of the incoming dataset and supports data ingest from multiple sources in
various formats, APIs: REST, Flow UI, H2O-R, H2O-Python, supports Deep Learning, Tree
Ensembles, and Generalized Low Rank Models (GLRM).
53. 53
Top Machine Learning Libraries - II
• Microsoft Azure ML Studio - Microsoft’s ML framework runs in their Azure cloud, offering
high-capacity data processing on a pay-as-you-go model. Its interactive, visual
workspace can be used to create, test and iterate ML “experiments” using the built-in
ML packages; they can then be published and shared as web services. Python or R
custom coding is also supported. A free trial is available for new users.
• Amazon Machine Learning - like Azure, the Amazon Machine Learning framework is
cloud-based, supporting Amazon S3, Redshift and RDS. It combines ML algorithms with
interactive tools and data visualizations which allow users to easily create, evaluate
and deploy ML models. These models can be binary, categoric or numeric – making
them useful in areas such as information filtering and predictive analytics.
54. 54
Top Machine Learning Libraries - III
• Microsoft Distributed Machine Learning Toolkit (DMTK) - uses local data caching to
make it more scalable and efficient on computing clusters that have limited resources
on each node. Its distributed ML algorithms allow for fast training of gradient boosting,
topic and word-embedding learning models:
• LightLDA: Scalable, fast and lightweight system for large-scale topic modeling.
• LightGBM: LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT,
GBM or MART) framework based on decision tree algorithms, used for ranking, classification
and many other machine learning tasks.
• Distributed word embedding: a parallelization of the Word2Vec algorithm
• Distributed Multi-sense Word Embedding (DMWE) - a parallelization of the Skip-Gram
Mixture algorithm used for polysemous words.
• Google TensorFlow - open-source ML toolkit that has been applied in areas ranging
from machine translation to biomedical science. Data flows are processed by a series
of algorithms described by a graph. This allows for a flexible, multi-node architecture
that supports multiple CPUs and GPUs. The recently released Tensorflow Lite adds
support for neural processing on mobile phones.
55. 55
Top Machine Learning Libraries - IV
• Caffe - developed by Berkeley AI Research, Caffe claims to offer one of the fastest
implementations of convolutional neural networks. It was originally developed for
machine vision projects but has since expanded to other applications, with its
extensible C++ code designed to foster active development. Both CPU and GPU
processing are supported.
• Veles - developed and released as open source by Samsung, Veles is a distributed ML
platform designed for rapid deep-learning application development. Users can train
various artificial neural net types – including fully connected, convolutional and
recurrent. It can be integrated with any Java application, and its “snapshot” feature
improves disaster recovery.
56. 56
Top Machine Learning Libraries - V
• Massive Online Analysis (MOA) - developed at the University of Waikato in New
Zealand, this open-source Java framework specializes in real-time mining of streaming
data and large-scale ML. It offers a wide range of ML algorithms, including
classification, regression, clustering, outlier detection and concept drift detection.
Evaluation tools are also provided.
• Torch - Torch ML library makes ML easy and efficient, thanks to its use of the Lua
scripting language and a GPU-first implementation. It comes with a large collection of
community-driven ML packages that cover computer vision, signal processing, audio
processing, networking and more. Its neural network and optimization libraries are
designed to be both accessible and flexible.
58. 58
Deep Learning
• Deep learning (deep structured learning) - artificial neural networks with representation
learning – supervised, semi-supervised or unsupervised.
• Deep-learning architectures such as deep neural networks, deep belief networks,
recurrent neural networks (RNN) and convolutional neural networks (CNN) have been
applied to fields including computer vision, machine vision, speech recognition,
natural language processing, audio recognition, social network filtering, machine
translation, bioinformatics, drug design, medical image analysis, material inspection
and game programing, where they have produced results comparable to and in
some cases surpassing human expert performance.
• Artificial neural networks (ANNs) were inspired by information processing and
distributed communication nodes in biological systems. ANNs have various differences
from biological brains. Specifically, neural networks tend to be static and symbolic,
while the biological brain of most living organisms is dynamic (plastic) and analog.
59. 59
Artificial Neural Networks (ANNs)
• ANN is based on a collection of connected units or
nodes called artificial neurons, which model
the neurons in a biological brain.
• Synapses can transmit a signal to other neurons,
“signal" is a real number, output is computed by some
non-linear function of the sum of its inputs -> threshold.
• The connections are called edges - typically have
weight that adjusts as learning proceeds.
• Typically, neurons are aggregated into layers. Different
layers may perform different transformations on their
inputs. Signals travel from the first layer (the input
layer), to the last layer (the output layer), possibly after
traversing the layers multiple times.
By Egm4313.s12 (Prof. Loc Vu-Quoc) - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=72816083
CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=125222
60. 60
Feed Forward Networks (FFN)
• FFN - artificial neural network wherein
connections between the nodes
do not form a cycle. As such, it is different
from its descendant: recurrent neural
networks.
By Mcstrother - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9516009
By Mcstrother - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9515940
61. 61
ANNs, Multilayer Perceptron (MLP)
• MLP - class of feedforward artificial neural network (ANN),
three layers: input layer, hidden layer, and output layer,
nonlinear activation function, supervised learning
technique called backpropagation for training.
CC BY-SA 3.0, https://en.wikipedia.org/w/index.php?curid=8201514
63. 63
Deep Learning
• The adjective "deep" in deep learning comes from the use of multiple layers in the
network. Early work showed that a linear perceptron cannot be a universal classifier,
and that a network with a nonpolynomial activation function with one hidden layer of
unbounded width can be so.
• Deep learning – unbounded number of layers of bounded size, which permits practical
application and optimized implementation, while retaining theoretical universality
under mild conditions.
• Layers can be heterogeneous and to deviate widely from biologically informed
connectionist models, for the sake of efficiency, trainability and understandability,
whence the "structured" part.
64. 64
Convolutional Neural Networks (CNN)
• Convolutional Neural Network (CNN, or ConvNet) - a class of deep neural networks, most
commonly applied to analyzing visual imagery, also known as shift invariant or space
invariant artificial neural networks (SIANN), based on their shared-weights architecture and
translation invariance characteristics, have applications in image and video recognition,
recommender systems, image classification, medical image analysis, natural language
processing, brain-computer interfaces, and financial time series.
• CNNs are regularized versions of multilayer perceptrons – MPLs usually mean fully
connected networks, that is, each neuron in one layer is connected to all neurons in the
next layer -> overfitting, typical ways of regularization include adding measurement of
weights to the loss function.
• CNNs take a different approach towards regularization – they take advantage of the
hierarchical pattern in data and assemble more complex patterns using smaller and
simpler patterns -> lower connectedness and complexity.
65. 65
Picture by: By Sven Behnke - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=82466022
By Original file: Avimanyu786SVG version: Tukijaaliwa - File:AI-ML-DL.png, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=90131352
66. 66
Convolutional Neural Networks (CNN)
• Convolutional networks were inspired by biological processes in that the connectivity
pattern between neurons resembles the organization of the animal visual cortex.
Individual cortical neurons respond to stimuli only in a restricted region of the visual
field known as the receptive field. The receptive fields of different neurons partially
overlap such that they cover the entire visual field.
• CNNs use relatively little pre-processing compared to other image classification
algorithms. This means that the network learns the filters that in traditional algorithms
were hand-engineered. This independence from prior knowledge and human effort in
feature design is a major advantage.
71. 71
Deep Leaning with H2O - I
• purely supervised training protocol for regression and classification tasks
• fast and memory-efficient Java implementations based on columnar compression and
fine- grain Map/Reduce
• multi-threaded and distributed parallel computation to be run on either a single node
or a multi-node cluster
• fully automatic per-weight adaptive learning rate for fast convergence
• optional specification of learning rate, annealing and momentum options
• regularization options include L1, L2, dropout, Hogwild! and model averaging to
prevent model overfitting
• grid search for hyperparameter optimization and model selection
72. 72
Deep Leaning with H2O - II
• model checkpointing for reduced run times and model tuning
• automatic data pre- and post-processing (one-hot encoding and standardization) for
categorical and numerical data
• automatic imputation of missing values
• automatic tuning of communication vs computation for best performance
• model export in plain java code for deployment in production environments
• export of weights and biases as H2O frames
• additional expert parameters for model tuning
• deep autoencoders for unsupervised feature learning and anomaly detection
capabilities
73. 73
Deep Leaning with H2O - II
• model checkpointing for reduced run times and model tuning
• automatic data pre- and post-processing (one-hot encoding and standardization) for
categorical and numerical data
• automatic imputation of missing values
• automatic tuning of communication vs computation for best performance
• model export in plain java code for deployment in production environments
• export of weights and biases as H2O frames
• additional expert parameters for model tuning
• deep autoencoders for unsupervised feature learning and anomaly detection
capabilities
74. 74
Deep Leaning with H2O - II
• model checkpointing for reduced run times and model tuning
• automatic data pre- and post-processing (one-hot encoding and standardization) for
categorical and numerical data
• automatic imputation of missing values
• automatic tuning of communication vs computation for best performance
• model export in plain java code for deployment in production environments
• export of weights and biases as H2O frames
• additional expert parameters for model tuning
• deep autoencoders for unsupervised feature learning and anomaly detection
capabilities
75. 75
Deep Leaning with H2O - II
• model checkpointing for reduced run times and model tuning
• automatic data pre- and post-processing (one-hot encoding and standardization) for
categorical and numerical data
• automatic imputation of missing values
• automatic tuning of communication vs computation for best performance
• model export in plain java code for deployment in production environments
• export of weights and biases as H2O frames
• additional expert parameters for model tuning
• deep autoencoders for unsupervised feature learning and anomaly detection
capabilities
76. 76
Receiver Operating Characteristic (ROC)
Receiver operating
characteristic (ROC) curve -
graphical plot that illustrates
the diagnostic ability of a
binary classifier system as its
discrimination threshold is
varied.
77. 77
Receiver Operating Characteristic (ROC)
• condition positive (P) / negative (N) - the number of real positive / negative cases in the data
• true positive (TP) - eqv. with hit, true negative (TN) - eqv. with correct rejection
• false positive (FP) - eqv. with false alarm, Type I error, false negative (FN) - eqv. with miss, Type II error
79. 79
Ensemble Learning
• Ensemble methods – using multiple learning algorithms to obtain better predictive
performance than could be obtained from any of the constituent learning algorithms
alone. The term ensemble usually means using the same type of base learners.
• Even if the hypothesis space contains hypotheses that are very well-suited for a
particular problem, it may be very difficult to find a good one. Ensembles combine
multiple hypotheses to form a better hypothesis. Resultant hypothesis, is not necessarily
contained within the hypothesis space of the building models – can be more complex.
• Ensemble over-fits the training data easier than a single model -> Bagging.
• Ensembles yield better results when there is a diversity among the models. Using variety
of strong learning algorithms, has been shown to be more effective than using
techniques that attempt to dumb-down the models in order to promote diversity.
• number of independent classifiers == number of classes => highest accuracy
80. 80
Bagging – e.g. Random Forest
• Bagging (Bootstrap AGGregatING) – involves having each model in the ensemble vote
with equal weight. In order to promote model variance, bagging trains each model in
the ensemble using a randomly drawn subset of the training set.
• Example: Random Forest algorithm combines random decision trees with bagging to
achieve very high classification accuracy. Given a training set X = x1, ..., xn with
responses Y = y1, ..., yn, bagging repeatedly (B times) selects a random sample with
replacement of the training set and fits trees to these samples + feature bagging:
1. For b = 1, ..., B:
2. Sample, with replacement, n training examples from X, Y; call these Xb, Yb.
3. Train a classification or regression tree fb on Xb, Yb, by selecting a limited number of features
(ayttributes) to be used during the training (feature bagging).
4. After training, predictions for unseen samples x' can be made by averaging the predictions /
taking the majority vote from all the individual regression trees on x‘.
82. 82
Boosting
• Boosting - ensemble meta-algorithm for primarily reducing bias, and also variance,
based on the question posed by Kearns and Valiant (1988, 1989"Can a set of weak
learners create a single strong learner?"
• In supervised learning, and a family of machine learning algorithms that convert weak
learners to strong ones
• Weak learner – a classifier that is only slightly correlated with the true classification (it
can label examples better than random guessing). In contrast, a strong learner is a
classifier that is arbitrarily well-correlated with the true classification.
• Informally, [the hypothesis boosting] problem asks whether an efficient learning
algorithm […] that outputs a hypothesis whose performance is only slightly better than
random guessing [i.e. a weak learner] implies the existence of an efficient algorithm
that outputs a hypothesis of arbitrary accuracy [i.e. a strong learner]."
83. 83
Boosting - Basic Algorithm II
By Sirakorn - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=85888769
84. 84
Boosting – Basic Algorithm
1. Equal weight (uniform probability
distribution) is given to the sample
training data (say D1) initially. Let i = 1
2. This data (Di) is then given to a base
learner (say Li).
3. Mis-classified instances by ensemble
L1, L2, …Li are assigned a weight higher
than the correctly classified instances
(keeping the total probability
distribution = 1) => Di+1 – boosted data
4. Let i = i + 1. Go to step 2 until error
becomes acceptable (or other finish
criteria is met).
D1 D2 Di
L1 L2 Li
85. 85
AdaBoost
• AdaBoost - the output of the other learning algorithms ('weak learners') is combined into a
weighted sum that represents the final output of the boosted classifier. AdaBoost is
adaptive in the sense that subsequent weak learners are tweaked in favor of those
instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and
outliers. The individual learners can be weak, but as long as the performance of each one
is slightly better than random guessing, the final model can be proven to be strong learner.
• Every learning algorithm tends to suit some problem types better than others, and typically
has many different parameters and configurations to adjust before it achieves optimal
performance on a dataset. AdaBoost (with decision trees as the weak learners) is often
referred to as the best out-of-the-box classifier. When used with decision tree learning,
information gathered at each stage of the AdaBoost algorithm about the relative
'hardness' of each training sample is fed into the tree growing algorithm such that later
trees tend to focus on harder-to-classify examples.
87. 87
Gradient Boosting
Hastie, T.; Tibshirani, R.; Friedman, J. H. (2009). "10. Boosting and Additive Trees". The Elements of Statistical Learning (2nd ed.). New York: Springer. pp. 337–384. ISBN
978-0-387-84857-0. Archived from the original on 2009-11-10.
88. 88
Thank’s for Your Attention!
Trayan Iliev
IPT – Intellectual Products & Technologies
http://iproduct.org/
https://github.com/iproduct
https://twitter.com/trayaniliev
https://www.facebook.com/IPT.EACAD