From Overnight to Always On @ Jfokus 2019

•

0 likes•59 views

Systems integration is everywhere, not because we want it, but because we need it. It's the download of exchange rates, the list of yesterday's orders and the latest inventory. Not long time ago, we'd pull this kind of information in overnight batches and every system had something to work on. That was the age where we had printed newspapers. Today, data needs to be there. Instantaneously. Or "as fast as possible". We don't want to transfer huge piles of data once every night, but have the updates coming by - just after the change happened. We want streaming data. In this talk, we exemplify the path to move from overnight file exchanges to streaming data by using Alpakka, which is an integration library based on Reactive Streams and Akka.

Technology

From Overnight
to Always On
Enno Runne
Jfokus 2019-02-06

Akka Streams
Akka Streams is library to model and run high-
performance, non-blocking data flows supporting
back-pressure, with concise APIs for Java and
Scala.

Alpakka
Alpakka is a Reactive Enterprise Integration
library for Java and Scala, based on Reactive
Streams and Akka.
The short version: “Endpoints for Akka Streams”

Let’s call it source and sink
Disk
Read
from file
Disk
Write
to file
Source<Byte…> Sink<Byte…>

FileIO.fromPath(sourceFile) FileIO.toPath(targetFile)
File source and sink with Akka Streams
Disk
Read
from file
Disk
Write
to file
Source<ByteString, …> Sink<ByteString, …>

Copying a file with Akka Streams
Read
from file
Write
to file
CompletionStage<IOResult> handle =
FileIO.fromPath(sourceFile)
.runWith(
FileIO.toPath(targetFile),
materializer
);

Data flow
Step 1 Step 4Step 2 Step 3
Buffering data
• May cure the immediate issue
• All buffering is limited

Downstream
Source SinkFlow Flow
Streaming with back-pressure
Dynamic push/pull
• Push when downstream is faster
• Pull when upstream is faster

Make our file-copy more useful
Disk
Read
from file
Disk
Write
to file
Detect
new file

Detect a new file in the directory
DirectoryChangesSource.create(
sourceDir,
pollingInterval,
maxChangesKept
)
Detect
new file
.filter(pathChange ->
DirectoryChange.Creation
== pathChange.second()
)
// Pair<Path, DirectoryChange>
.map(Pair::first);
Source<Path, NotUsed> newFileDetector =

Combine the source with a stream with a sink
newFileDetector
.mapAsync(8, p -> {
Path targetFile = targetDir.resolve(p.getFileName());
return createFileToFile(p, targetFile);
})
.runWith(Sink.ignore(), materializer);
Nesting a stream execution within a stream

Outer and inner flows
Disk
Read
from file
Disk
Write
to file
Detect
new file

Disk
Read
from file
Parse
CSV
bytes messages
Parse
JSON
Parse
XML
Overcoming file-based integration

Parse as CSV with Alpakka
Parse
CSV
lines
byteStringSource
.via(CsvParsing.lineScanner())
Convert
CSV lines
to maps
ByteString Collection<ByteString> Map<String, String>
.via(CsvToMap.toMapAsStrings(StandardCharsets.UTF_8));

Flow<ByteString, Map<String, String>, NotUsed> csvBytesToMap =
Flow.of(ByteString.class)
.via(CsvParsing.lineScanner())
.via(CsvToMap.toMapAsStrings(
StandardCharsets.UTF_8
));
Build your own flow

$JsonNodeFactory jsonNodeFactory = JsonNodeFactory.instance; FileIO.fromPath(p) .via(csvBytesToMap) .map(data -> { // Using raw Jackson to create JSON objects ObjectNode objectNode = jsonNodeFactory.objectNode(); data.forEach(objectNode::put); return objectNode; }) Use your own flow and apply data mapping Flow<ByteString, Map<String, String>, NotUsed> csvBytesToMap = Flow.of(ByteString.class) .via(CsvParsing.lineScanner()) .via(CsvToMap.toMapAsStrings(StandardCharsets.UTF_8));$

Overcoming file-based integration
Disk
Read
from file
Parse
CSV
bytes JSONMap Map to
JSON

Reactive File Integration
Disk
Read
from file
Destination
Detect
new file
Turn
into
messages
Other
technology

Alpakka connectors for cloud services
Amazon DynamoDB
Amazon Kinesis data
streams & firehose
AWS Lambda
Amazon S3
Amazon SNS
Amazon SQS
Google Cloud Pub/
Sub
Google Firebase
Cloud Messaging
Azure Storage Queue

Alpakka connectors for data stores
Elasticsearch

Alpakka connectors for messaging
(Eclipse Paho
and Akka native)
AMQP
(RabbitMQ)
IronMQ
JMS
Java Messaging Service
Apache Kafka
… not as colourful logos, but very well suited for the streaming approach.

Thank you!
@ennru
enno.runne@lightbend.com
https://www.lightbend.com/alpakka
@akkateam
Enno Runne
• Alpakka Tech Lead @ Lightbend
• Stockholm, Sweden

Rich Viet, Principal Engineer at Cloud Elements presents 'Scalable Logging and Analytics with LogStash' at All Things API meetup in Denver, CO. Learn more about scalable logging and analytics using LogStash. This will be an overview of logstash components, including getting started, indexing, storing and getting information from logs. Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching).

Dapper

Suresh Loganatha

Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab

CloudxLab

Big Data with Hadoop & Spark Training: http://bit.ly/2LF3pBA This CloudxLab Introduction to Pig & Pig Latin tutorial helps you to understand Pig and Pig Latin in detail. Below are the topics covered in this tutorial: 1) Introduction to Pig 2) Why Do We Need Pig? 3) Pig - Usecases 4) Pig - Philosophy 5) Pig Latin - Data Flow Language 6) Pig - Local and MapReduce Mode 7) Pig Data Types 8) Load, Store, and Dump in Pig 9) Lazy Evaluation in Pig 10) Pig - Relational Operators - FOREACH, GROUP and FILTER 11) Hands-on on Pig - Calculate Average Dividend of NYSE

SQL on Big Data using Optiq

Julian Hyde

SQL on Big Data is not a "one size fits all". Optiq is a framework that allows you to build a data management system on top of any back-end system, including NoSQL and Hadoop, and rules that optimize query processing for capabilities of the data source. We show how Optiq is used in the Apache Drill and Cascading Lingual projects, and how we plan to combine Optiq materialized views, Mondrian, and a data grid to create next-generation in-memory analytics. This presentation was given at the Real-Time Big Data meetup at RichRelevance in San Francisco, 2013-04-09.

Streaming SQL (at FlinkForward, Berlin, 2016/09/12)

Julian Hyde

A talk given by Julian Hyde at FlinkForward, Berlin, on 2016/09/12. Streaming is necessary to handle data rates and latency, but SQL is unquestionably the lingua franca of data. Is it possible to combine SQL with streaming, and if so, what does the resulting language look like? Apache Calcite is extending SQL to include streaming, and Apache Flink is using Calcite to support both regular and streaming SQL. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency.

Spark Structured Streaming

Revin Chalil

SQL on everything, in memory

Julian Hyde

Enterprise data is moving into Hadoop, but some data has to stay in operational systems. Apache Calcite (the technology behind Hive’s new cost-based optimizer, formerly known as Optiq) is a query-optimization and data federation technology that allows you to combine data in Hadoop with data in NoSQL systems such as MongoDB and Splunk, and access it all via SQL. Hyde shows how to quickly build a SQL interface to a NoSQL system using Calcite. He shows how to add rules and operators to Calcite to push down processing to the source system, and how to automatically build materialized data sets in memory for blazing-fast interactive analysis.

User Defined Aggregation in Apache Spark: A Love Story

Databricks

Using Apache Spark to Solve Sessionization Problem in Batch and Streaming

Databricks

Analyzing sessions can bring a lot of useful feedback about what works and what does not. But implementing them is not easy because of data issues and operational costs that you will meet sooner or later. In this talk I will present 2 approaches to compute sessions with Apache Spark and AWS services. The first one will use batch and therefore, Spark SQL, whereas the second streaming and Structured Streaming module. During the talk I will cover different problems you may encounter when creating sessions, like late data, incomplete dataset, duplicated data, reprocessing or fault-tolerance aspects. I will try to solve them and show how Apache Spark features and AWS services (EMR, S3) can help to do that. After the talk you should be aware of the problems you may encounter with session pipelines and understand how to address them with Apache Spark features like watermarks, state store, checkpoints and how to integrate your code with a cloud provider.

Scaling an ELK stack at bol.com

Renzo Tomà

A presentation about the deployment of an ELK stack at bol.com At bol.com we use Elasticsearch, Logstash and Kibana in a logsearch system that allows our developers and operations people to easilly access and search thru logevents coming from all layers of its infrastructure. The presentations explains the initial design and its failures. It continues with explaining the latest design (mid 2014). Its improvements. And finally a set of tips are giving regarding Logstash and Elasticsearch scaling. These slides were first presented at the Elasticsearch NL meetup on September 22nd 2014 at the Utrecht bol.com HQ.

Xephon K A Time series database with multiple backends

University of California, Santa Cruz

Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...

Chris Fregly

How to use Parquet as a basis for ETL and analytics

Julien Le Dem

Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.

Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana

Md Safiyat Reza

InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx

InfluxData

Always on. 2018-10 Reactive Summit

Enno Runne

https://www.reactivesummit.org/2018/schedule/from-overnight-to-always-on Systems integration is everywhere, not because we want it, but because we need it. It's the download of exchange rates, the list of yesterday's orders and the latest inventory. Not long time ago, we'd pull this kind of information in overnight batches and every system had something to work on. That was the age where we had printed newspapers. Today, data needs to be there. Instantaneously. Or 'as fast as possible'. We don't want to transfer huge piles of data once every night but have the updates coming by - just after the change happened. We want streaming data. In this talk, we exemplify the path to move from overnight file exchanges to streaming data by using Alpakka, which is an integration library based on Reactive Streams and Akka. Always on.

Easy, scalable, fault tolerant stream processing with structured streaming - ...

Databricks

Last year, in Apache Spark 2.0, Databricks introduced Structured Streaming, a new stream processing engine built on Spark SQL, which revolutionized how developers could write stream processing application. Structured Streaming enables users to express their computations the same way they would express a batch query on static data. Developers can express queries using powerful high-level APIs including DataFrames, Dataset and SQL. Then, the Spark SQL engine is capable of converting these batch-like transformations into an incremental execution plan that can process streaming data, while automatically handling late, out-of-order data and ensuring end-to-end exactly-once fault-tolerance guarantees. Since Spark 2.0, Databricks has been hard at work building first-class integration with Kafka. With this new connectivity, performing complex, low-latency analytics is now as easy as writing a standard SQL query. This functionality, in addition to the existing connectivity of Spark SQL, makes it easy to analyze data using one unified framework. Users can now seamlessly extract insights from data, independent of whether it is coming from messy / unstructured files, a structured / columnar historical data warehouse, or arriving in real-time from Kafka/Kinesis. In this session, Das will walk through a concrete example where – in less than 10 lines – you read Kafka, parse JSON payload data into separate columns, transform it, enrich it by joining with static data and write it out as a table ready for batch and ad-hoc queries on up-to-the-last-minute data. He’ll use techniques including event-time based aggregations, arbitrary stateful operations, and automatic state management using event-time watermarks.

What's hot

mongodb-aggregation-may-2012

Chris Westin

Operations on rdd

sparrowAnalytics.com

Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.

Airat Khisamov

Apache avro and overview hadoop tools

alireza alikhani

Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...

Lucidworks

'Scalable Logging and Analytics with LogStash'

Cloud Elements

Dapper

Suresh Loganatha

Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab

CloudxLab

SQL on Big Data using Optiq

Julian Hyde

Streaming SQL (at FlinkForward, Berlin, 2016/09/12)

Julian Hyde

Spark Structured Streaming

Revin Chalil

SQL on everything, in memory

Julian Hyde

User Defined Aggregation in Apache Spark: A Love Story

Databricks

Using Apache Spark to Solve Sessionization Problem in Batch and Streaming

Databricks

Scaling an ELK stack at bol.com

Renzo Tomà

Xephon K A Time series database with multiple backends

University of California, Santa Cruz

Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...

Chris Fregly

How to use Parquet as a basis for ETL and analytics

Julien Le Dem

Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana

Md Safiyat Reza

InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx

InfluxData

What's hot (20)

mongodb-aggregation-may-2012

Operations on rdd

Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.

Apache avro and overview hadoop tools

Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...

'Scalable Logging and Analytics with LogStash'

Dapper

Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab

SQL on Big Data using Optiq

Streaming SQL (at FlinkForward, Berlin, 2016/09/12)

Spark Structured Streaming

SQL on everything, in memory

User Defined Aggregation in Apache Spark: A Love Story

Using Apache Spark to Solve Sessionization Problem in Batch and Streaming

Scaling an ELK stack at bol.com

Xephon K A Time series database with multiple backends

Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...

How to use Parquet as a basis for ETL and analytics

Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana

InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx

Similar to From Overnight to Always On @ Jfokus 2019

Always on. 2018-10 Reactive Summit

Enno Runne

Easy, scalable, fault tolerant stream processing with structured streaming - ...

Databricks

Easy, scalable, fault tolerant stream processing with structured streaming - ...

Databricks

Leveraging Azure Databricks to minimize time to insight by combining Batch an...

Microsoft Tech Community

Making Structured Streaming Ready for Production

Databricks

In mid-2016, we introduced Structured Steaming, a new stream processing engine built on Spark SQL that revolutionized how developers can write stream processing application without having to reason about having to reason about streaming. It allows the user to express their streaming computations the same way you would express a batch computation on static data. The Spark SQL engine takes care of running it incrementally and continuously updating the final result as streaming data continues to arrive. It truly unifies batch, streaming and interactive processing in the same Datasets/DataFrames API and the same optimized Spark SQL processing engine. The initial alpha release of Structured Streaming in Apache Spark 2.0 introduced the basic aggregation APIs and files as streaming source and sink. Since then, we have put in a lot of work to make it ready for production use. In this talk, Tathagata Das will cover in more detail about the major features we have added, the recipes for using them in production, and the exciting new features we have plans for in future releases. Some of these features are as follows: - Design and use of the Kafka Source - Support for watermarks and event-time processing - Support for more operations and output modes Speaker: Tathagata Das This talk was originally presented at Spark Summit East 2017.

Apache: Big Data - Starting with Apache Spark, Best Practices

felixcss

Apache Spark Structured Streaming + Apache Kafka = ♡

Bartosz Konieczny

Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...

Databricks

Stateful processing is one of the most challenging aspects of distributed, fault-tolerant stream processing. The DataFrame APIs in Structured Streaming make it easy for the developer to express their stateful logic, either implicitly (streaming aggregations) or explicitly (mapGroupsWithState). However, there are a number of moving parts under the hood which makes all the magic possible. In this talk, I will dive deep into different stateful operations (streaming aggregations, deduplication and joins) and how they work under the hood in the Structured Streaming engine.

Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka

Lightbend

Since its stable release in 2016, Akka Streams is quickly becoming the de facto standard integration layer between various Streaming systems and products. Enterprises like PayPal, Intel, Samsung and Norwegian Cruise Lines see this is a game changer in terms of designing Reactive streaming applications by connecting pipelines of back-pressured asynchronous processing stages. This comes from the Reactive Streams initiative in part, which has been long led by Lightbend and others, allowing multiple streaming libraries to inter-operate between each other in a performant and resilient fashion, providing back-pressure all the way. But perhaps even more so thanks to the various integration drivers that have sprung up in the community and the Akka team—including drivers for Apache Kafka, Apache Cassandra, Streaming HTTP, Websockets and much more. In this webinar for JVM Architects, Konrad Malawski explores the what and why of Reactive integrations, with examples featuring technologies like Akka Streams, Apache Kafka, and Alpakka, a new community project for building Streaming connectors that seeks to “back-pressurize” traditional Apache Camel endpoints. * An overview of Reactive Streams and what it will look like in JDK 9, and the Akka Streams API implementation for Java and Scala. * Introduction to Alpakka, a modern, Reactive version of Apache Camel, and its growing community of Streams connectors (e.g. Akka Streams Kafka, MQTT, AMQP, Streaming HTTP/TCP/FileIO and more). * How Akka Streams and Akka HTTP work with Websockets, HTTP and TCP, with examples in both in Java and Scala.

Writing Continuous Applications with Structured Streaming Python APIs in Apac...

Databricks

Description: We are amidst the Big Data Zeitgeist era in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that’s continuous, reacts and interacts with data in real-time. We call this continuous application, which we will discuss. Abstract: We are amidst the Big Data Zeitgeist era in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that’s continuous, reacts and interacts with data in real-time. We call this continuous application. In this talk we will explore the concepts and motivations behind the continuous application, how Structured Streaming Python APIs in Apache Spark 2.x enables writing continuous applications, examine the programming model behind Structured Streaming, and look at the APIs that support them. Through a short demo and code examples, I will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames and Datasets APIs. You’ll walk away with an understanding of what’s a continuous application, appreciate the easy-to-use Structured Streaming APIs, and why Structured Streaming in Apache Spark 2.x is a step forward in developing new kinds of streaming applications.

Streaming Microservices With Akka Streams And Kafka Streams

Lightbend

One of the most frequent questions that we get asked at Lightbend is “what’s the difference between Akka Streams and Kafka Streams?” After all, there is only a 1 letter difference between these two technologies, so how different could they be? Well, as we see in this presentation, they are actually quite different. Both tools are part of the streaming Fast Data stack, but were created with entirely different technological approaches in mind. For example, While Akka Streams emerged as a dataflow-centric abstraction for the Akka Actor model, designed for general-purpose microservices, very low-latency event processing, and supports a wider class of application problems and third-party integrations via Alpakka, Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics in a Kafka-centric way. In this webinar by Dr. Dean Wampler, VP of Fast Data Engineering at Lightbend, we will: * Discuss the strengths and weaknesses of Kafka Streams and Akka Streams for particular design needs in data-centric microservices * Contrast them with Spark Streaming and Flink, which provide richer analytics over potentially huge data sets * Help you map these streaming engines to your specific use cases, so you confidently pick the right ones for your jobs

Stream or not to Stream? 

Lukasz Byczynski

Java7 New Features and Code Examples

Naresh Chintalcheru

Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...

Matt Stubbs

Date: 13th November 2018 Location: Fast Data Theatre Time: 15:50 - 16:20 Speaker: Dean Wampler Organisation: Lightbend About: What if you used microservices for streaming data processing, rather than systems like Spark? I'll examine Kafka-based, microservice applications that use Akka Streams and Kafka Streams libraries for stream processing. I'll discuss the strengths and weaknesses of each tool for particular design needs, with lessons that are applicable to other library choices, too. I'll also contrast them with Spark Streaming and Flink; when should you choose them instead?

Writing Continuous Applications with Structured Streaming in PySpark

Databricks

We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application. In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.

From Zero to Stream Processing

Eventador

Data file handling in python binary & csv files

Keerty Smile

Writing Continuous Applications with Structured Streaming PySpark API

Databricks

"We're amidst the Big Data Zeitgeist era in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that’s continuous, reacts and interacts with data in real-time. We call this continuous application. In this tutorial we'll explore the concepts and motivations behind the continuous application, how Structured Streaming Python APIs in Apache Spark™ enable writing continuous applications, examine the programming model behind Structured Streaming, and look at the APIs that support them. Through presentation, code examples, and notebooks, I will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames and Datasets APIs. You’ll walk away with an understanding of what’s a continuous application, appreciate the easy-to-use Structured Streaming APIs, and why Structured Streaming in Apache Spark is a step forward in developing new kinds of streaming applications. This tutorial will be both instructor-led and hands-on interactive session. Instructions in how to get tutorial materials will be covered in class. WHAT YOU’LL LEARN: – Understand the concepts and motivations behind Structured Streaming – How to use DataFrame APIs – How to use Spark SQL and create tables on streaming data – How to write a simple end-to-end continuous application PREREQUISITES – A fully-charged laptop (8-16GB memory) with Chrome or Firefox –Pre-register for Databricks Community Edition" Speaker: Jules Damji

Java 7 Features and Enhancements

Gagan Agrawal

A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...

Databricks

Stateful processing is one of the most challenging aspects of distributed, fault-tolerant stream processing. The DataFrame APIs in Structured Streaming make it very easy for the developer to express their stateful logic, either implicitly (streaming aggregations) or explicitly (mapGroupsWithState). However, there are a number of moving parts under the hood which makes all the magic possible. In this talk, I am going to dive deeper into how stateful processing works in Structured Streaming. In particular, I’m going to discuss the following. • Different stateful operations in Structured Streaming • How state data is stored in a distributed, fault-tolerant manner using State Stores • How you can write custom State Stores for saving state to external storage systems.

Similar to From Overnight to Always On @ Jfokus 2019 (20)

Always on. 2018-10 Reactive Summit

Easy, scalable, fault tolerant stream processing with structured streaming - ...

Leveraging Azure Databricks to minimize time to insight by combining Batch an...

Making Structured Streaming Ready for Production

Apache: Big Data - Starting with Apache Spark, Best Practices

Apache Spark Structured Streaming + Apache Kafka = ♡

Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...

Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka

Writing Continuous Applications with Structured Streaming Python APIs in Apac...

Streaming Microservices With Akka Streams And Kafka Streams

Stream or not to Stream? 

Java7 New Features and Code Examples

Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...

Writing Continuous Applications with Structured Streaming in PySpark

From Zero to Stream Processing

Data file handling in python binary & csv files

Writing Continuous Applications with Structured Streaming PySpark API

Java 7 Features and Enhancements

A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...

Recently uploaded

The Future of Platform Engineering

Jemma Hussein Allen

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

Recently uploaded (20)