1) The document discusses ING's strategy to shift from batch processing to real-time streaming analytics using Apache Flink.
2) It proposes building a streaming analytics platform to enable use cases like fraud detection and actionable customer insights.
3) The platform would process data streams with low latency and scale to handle large volumes, using technologies like Kafka, PMML models, and Flink for fault tolerance and exactly-once processing.
Hplc method development for proteins and polypeptides ijsit 2.4.2IJSIT Editor
In the pharmaceutical field, there is considerable interest in the use of peptides and proteins for therapeutic
purposes. High performance liquid chromatography (HPLC) and its methods of complex peptide or protein
mixtures remains a general method of choice because of the resolution it provides. Unlike small organic
molecules whose chromatographic behavior is described by a finite partitioning equilibrium between the
stationary phase and the mobile phase, proteins and peptides typically do not exhibit such an effect. Instead,
they exhibit an adsorption phenomenon in which the polypeptide is adsorbed onto the stationary phase and
elutes only when the solvent strength of the mobile phase is sufficient to compete with the hydrophobic
forces keeping it there. For this reason, elution of peptides or proteins from reversed-phase supports is by
gradients of increasing solvent strength. There are other differences that one needs to be aware of in order to
develop HPLC methods for separations of proteins and peptides as efficiently as possible.
Cortesía de @OficialINH, presentamos el Programa Oficial correspondiente a la Reunión N° 1 a celebrarse el domingo 14 de enero de 2024 en el Hipódromo La Rinconada, Caracas, Venezuela.
Reunion 18 Hipodromo La Rinconada 120524.pdfWinston1968
Cortesía de @OficialINH, presentamos el programa oficial correspondiente a la reunión N° 18 a celebrarse el domingo 12 de mayo de 2024 en el hipódromo La Rinconada, Caracas, Venezuela.
As many industries, banking is undergoing a fundamental change because of the software revolution. No longer are banks competing only on interest rates and having the best traders, these days customer experience and having the best engineers are the focus. In this changing world, banks compete with new start-ups, the so-called Fintechs, and with large platform organisations such as Google, Facebook and Apple. At ING, we believe that staying ahead of the game means changing how we interact with our customers, no longer a traditional model of waiting for the customers to come to the bank through our website or apps, but to actively reach out to the customer with information that is relevant to him or her in order to make their financial life frictionless. Many of these changes are driven by reacting to all events that are relevant to the customer, and using streaming analytics to be able to reach out to the customer in milliseconds after the event occurs. Apache Flink is key for ING to achieve this. This presentation addresses how ING approaches the challenge, the role that Apache Flink plays, and the consequences regulations have on how we work with Open Source in general, and with Apache Flink (and data Artisans) in particular. This keynote takes place at Kino 3.
Hplc method development for proteins and polypeptides ijsit 2.4.2IJSIT Editor
In the pharmaceutical field, there is considerable interest in the use of peptides and proteins for therapeutic
purposes. High performance liquid chromatography (HPLC) and its methods of complex peptide or protein
mixtures remains a general method of choice because of the resolution it provides. Unlike small organic
molecules whose chromatographic behavior is described by a finite partitioning equilibrium between the
stationary phase and the mobile phase, proteins and peptides typically do not exhibit such an effect. Instead,
they exhibit an adsorption phenomenon in which the polypeptide is adsorbed onto the stationary phase and
elutes only when the solvent strength of the mobile phase is sufficient to compete with the hydrophobic
forces keeping it there. For this reason, elution of peptides or proteins from reversed-phase supports is by
gradients of increasing solvent strength. There are other differences that one needs to be aware of in order to
develop HPLC methods for separations of proteins and peptides as efficiently as possible.
Cortesía de @OficialINH, presentamos el Programa Oficial correspondiente a la Reunión N° 1 a celebrarse el domingo 14 de enero de 2024 en el Hipódromo La Rinconada, Caracas, Venezuela.
Reunion 18 Hipodromo La Rinconada 120524.pdfWinston1968
Cortesía de @OficialINH, presentamos el programa oficial correspondiente a la reunión N° 18 a celebrarse el domingo 12 de mayo de 2024 en el hipódromo La Rinconada, Caracas, Venezuela.
As many industries, banking is undergoing a fundamental change because of the software revolution. No longer are banks competing only on interest rates and having the best traders, these days customer experience and having the best engineers are the focus. In this changing world, banks compete with new start-ups, the so-called Fintechs, and with large platform organisations such as Google, Facebook and Apple. At ING, we believe that staying ahead of the game means changing how we interact with our customers, no longer a traditional model of waiting for the customers to come to the bank through our website or apps, but to actively reach out to the customer with information that is relevant to him or her in order to make their financial life frictionless. Many of these changes are driven by reacting to all events that are relevant to the customer, and using streaming analytics to be able to reach out to the customer in milliseconds after the event occurs. Apache Flink is key for ING to achieve this. This presentation addresses how ING approaches the challenge, the role that Apache Flink plays, and the consequences regulations have on how we work with Open Source in general, and with Apache Flink (and data Artisans) in particular. This keynote takes place at Kino 3.
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
IBM i & digital transformation - Presentation & basic demo
IBM Watson Studio, IBM DSX Local w/ Open Source (Spark) & IBM Technology (OpenPower, CAPI, NVLINK)
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...VoltDB
In this webinar, you’ll hear VoltDB CEO, Bruce Reading, outline 4 steps to expand your window of opportunity while avoiding risky options that can destroy your business. Technology innovator and CEO of Emagine International, David Peters, will also share key lessons learned from establishing his company’s ability to use fast data to crush competitors and execute on Emagine’s ability to open new markets. To view the webinar in its entirety, click here: http://learn.voltdb.com/WRExecSeries1.html
Is Your Data Paying You Dividends? Data innovation is a means to an end where data as an asset can be managed, developed, monetized, and eventually expected to pay dividends to the business.While 70% of CEOs surveyed expect investments in data, analytics, ML and AI initiatives to improve their bottom-line, 56% stated concerns over the integrity of their data1. Data science teams are now tasked to deliver true business value but fundamental issues remain in data preparation, data cleansing which impedes speed to market.Join Karan Sachdeva as he demonstrates capabilities of the all-new IBM Cloud Private for Data –a single containerized platform - that bridges the gap between data consumability, governance, integration, and visualization, accelerating speed to market and dividends to your business.by Karan Sachdeva, Sales Leader Big Data Analytics, IBM Asia Pacific
APIdays Open Banking & Fintech: Workshop - Financial Services Use Cases for APIsJeremy Brown
This was a fairly high level presentation I did at the APIdays Open Banking and Fintech event in London where I started to explore the main drivers for APIs, the mega trends in consumer banking and the various use cases for APIs within Financial Services.
Watch here: https://bit.ly/2D1fqB6
Today’s evolving data landscape has spawned new business challenges that require innovative solutions. These challenges include:
- Strategic decision-making, which relies on multiple perspectives such as social and economic factors that require combining internal and external data.
- Accounting for the increased volume and structural complexity of today’s data, and increased frequency required in delivering data assets.
- Coping with data silos that house data that must be combined and provisioned to support decision-making.
- Exposing purpose-built analytics, such as supply chain, for consumption in order to expedite decision-making.
Attend this session to learn how Data as a Service, fueled by data virtualization, overcomes these common challenges from the three dimensions of:
- Provisioning information-rich external data assets,
- Connecting data silos, and
- Enabling pre-built and packaged analytics.
Value Innovation Labs is a Technology and Consulting company focused on developing products and services for the Fintech space, specifically Digital Banking Transformation and InsurTech. We have a proven track record in developing services based on Block Chain, AI, Big Data and IoT.
For more information you can visit our website : valueinnovationlabs.com
Gartner CIO & IT Executive Summit -- Event Mesh: The Architecture Layer That ...Solace
Crispin Clarke, SVP Europe at Solace and Martin Bachmann, Head of Product Management, Connected Cores SAP SE, presented at Gartner CIO & IT Executive Summit 2019 in Munich.
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
More Related Content
Similar to Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - buidling a streaming data platform with Flink and Kafka
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
IBM i & digital transformation - Presentation & basic demo
IBM Watson Studio, IBM DSX Local w/ Open Source (Spark) & IBM Technology (OpenPower, CAPI, NVLINK)
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...VoltDB
In this webinar, you’ll hear VoltDB CEO, Bruce Reading, outline 4 steps to expand your window of opportunity while avoiding risky options that can destroy your business. Technology innovator and CEO of Emagine International, David Peters, will also share key lessons learned from establishing his company’s ability to use fast data to crush competitors and execute on Emagine’s ability to open new markets. To view the webinar in its entirety, click here: http://learn.voltdb.com/WRExecSeries1.html
Is Your Data Paying You Dividends? Data innovation is a means to an end where data as an asset can be managed, developed, monetized, and eventually expected to pay dividends to the business.While 70% of CEOs surveyed expect investments in data, analytics, ML and AI initiatives to improve their bottom-line, 56% stated concerns over the integrity of their data1. Data science teams are now tasked to deliver true business value but fundamental issues remain in data preparation, data cleansing which impedes speed to market.Join Karan Sachdeva as he demonstrates capabilities of the all-new IBM Cloud Private for Data –a single containerized platform - that bridges the gap between data consumability, governance, integration, and visualization, accelerating speed to market and dividends to your business.by Karan Sachdeva, Sales Leader Big Data Analytics, IBM Asia Pacific
APIdays Open Banking & Fintech: Workshop - Financial Services Use Cases for APIsJeremy Brown
This was a fairly high level presentation I did at the APIdays Open Banking and Fintech event in London where I started to explore the main drivers for APIs, the mega trends in consumer banking and the various use cases for APIs within Financial Services.
Watch here: https://bit.ly/2D1fqB6
Today’s evolving data landscape has spawned new business challenges that require innovative solutions. These challenges include:
- Strategic decision-making, which relies on multiple perspectives such as social and economic factors that require combining internal and external data.
- Accounting for the increased volume and structural complexity of today’s data, and increased frequency required in delivering data assets.
- Coping with data silos that house data that must be combined and provisioned to support decision-making.
- Exposing purpose-built analytics, such as supply chain, for consumption in order to expedite decision-making.
Attend this session to learn how Data as a Service, fueled by data virtualization, overcomes these common challenges from the three dimensions of:
- Provisioning information-rich external data assets,
- Connecting data silos, and
- Enabling pre-built and packaged analytics.
Value Innovation Labs is a Technology and Consulting company focused on developing products and services for the Fintech space, specifically Digital Banking Transformation and InsurTech. We have a proven track record in developing services based on Block Chain, AI, Big Data and IoT.
For more information you can visit our website : valueinnovationlabs.com
Gartner CIO & IT Executive Summit -- Event Mesh: The Architecture Layer That ...Solace
Crispin Clarke, SVP Europe at Solace and Martin Bachmann, Head of Product Management, Connected Cores SAP SE, presented at Gartner CIO & IT Executive Summit 2019 in Munich.
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
Flink Forward San Francisco 2022.
To improve Amazon Alexa experiences and support machine learning inference at scale, we built an automated end-to-end solution for incremental model building or fine-tuning machine learning models through continuous learning, continual learning, and/or semi-supervised active learning. Customer privacy is our top concern at Alexa, and as we build solutions, we face unique challenges when operating at scale such as supporting multiple applications with tens of thousands of transactions per second with several dependencies including near-real time inference endpoints at low latencies. Apache Flink helps us transform and discover metrics in near-real time in our solution. In this talk, we will cover the challenges that we faced, how we scale the infrastructure to meet the needs of ML teams across Alexa, and go into how we enable specific use cases that use Apache Flink on Amazon Kinesis Data Analytics to improve Alexa experiences to delight our customers while preserving their privacy.
by
Aansh Shah
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
Introducing the Apache Flink Kubernetes OperatorFlink Forward
Flink Forward San Francisco 2022.
The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples."
by
Thomas Weise
Flink Forward San Francisco 2022.
Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo.
by
Robert Metzger
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
Flink Forward San Francisco 2022.
Flink consumers read from Kafka as a scalable, high throughput, and low latency data source. However, there are challenges in scaling out data streams where migration and multiple Kafka clusters are required. Thus, we introduced a new Kafka source to read sharded data across multiple Kafka clusters in a way that conforms well with elastic, dynamic, and reliable infrastructure. In this presentation, we will present the source design and how the solution increases application availability while reducing maintenance toil. Furthermore, we will describe how we extended the existing KafkaSource to provide mechanisms to read logical streams located on multiple clusters, to dynamically adapt to infrastructure changes, and to perform transparent cluster migrations and failover.
by
Mason Chen
One sink to rule them all: Introducing the new Async SinkFlink Forward
Flink Forward San Francisco 2022.
Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework.
by
Steffen Hausmann & Danny Cranmer
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
Flink Forward San Francisco 2022.
In normal situations, the default Kafka consumer and producer configuration options work well. But we all know life is not all roses and rainbows and in this session we’ll explore a few knobs that can save the day in atypical scenarios. First, we'll take a detailed look at the parameters available when reading from Kafka. We’ll inspect the params helping us to spot quickly an application lock or crash, the ones that can significantly improve the performance and the ones to touch with gloves since they could cause more harm than benefit. Moreover we’ll explore the partitioning options and discuss when diverging from the default strategy is needed. Next, we’ll discuss the Kafka Sink. After browsing the available options we'll then dive deep into understanding how to approach use cases like sinking enormous records, managing spikes, and handling small but frequent updates.. If you want to understand how to make your application survive when the sky is dark, this session is for you!
by
Olena Babenko
Flink powered stream processing platform at PinterestFlink Forward
Flink Forward San Francisco 2022.
Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework.
by
Rainie Li & Kanchi Masalia
Flink Forward San Francisco 2022.
This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications.
We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs.
After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are.
by
David Moravek
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
Flinkn Forward San Francisco 2022.
In this talk, we will cover various topics around performance issues that can arise when running a Flink job and how to troubleshoot them. We’ll start with the basics, like understanding what the job is doing and what backpressure is. Next, we will see how to identify bottlenecks and which tools or metrics can be helpful in the process. Finally, we will also discuss potential performance issues during the checkpointing or recovery process, as well as and some tips and Flink features that can speed up checkpointing and recovery times.
by
Piotr Nowojski
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
Flink Forward San Francisco 2022.
Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A
by
James Busche & Ted Chang
Flink Forward San Francisco 2022.
The Table API is one of the most actively developed components of Flink in recent time. Inspired by databases and SQL, it encapsulates concepts many developers are familiar with. It can be used with both bounded and unbounded streams in a unified way. But from afar it can be difficult to keep track of what this API is capable of and how it relates to Flink's other APIs. In this talk, we will explore the current state of Table API. We will show how it can be used as a batch processor, a changelog processor, or a streaming ETL tool with many built-in functions and operators for deduplicating, joining, and aggregating data. By comparing it to the DataStream API we will highlight differences and elaborate on when to use which API. We will demonstrate hybrid pipelines in which both APIs interact with one another and contribute their unique strengths. Finally, we will take a look at some of the most recent additions as a first step to stateful upgrades.
by
David Andreson
Flink Forward San Francisco 2022.
Based on the new Flink-Pulsar connector, we implemented Flink's TableAPI and Catalog to help users to interact with the Pulsar cluster via Flink SQL easily. We would like to go through the design and implementation of the SQL connector in the following aspects:
1. Two different modes of use Pulsar as a metadata store
2. Data format transformation and management
3. SQL semantics support within Pulsar context
by
Sijie Guo & Neng Lu
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
Flink Forward San Francisco 2022.
At Bloomberg, we deal with high volumes of real-time market data. Our clients expect to be notified of any anomalies in this market data, which may indicate volatile movements in the markets, notable trades, forthcoming events, or system failures. The parameters for these alerts are always evolving and our clients can update them dynamically. In this talk, we'll cover how we utilized the open source Apache Flink and Siddhi SQL projects to build a distributed, scalable, low-latency and dynamic rule-based, real-time alerting system to solve our clients' needs. We'll also cover the lessons we learned along our journey.
by
Ajay Vyasapeetam & Madhuri Jain
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
Flink Forward San Francisco 2022.
What if my data is already in order? Stream Processing has given us an elegant and powerful solution for running analytic queries and logic over high volumes of continuously arriving data. However, in both Apache Flink and Apache Beam, the notion of time-ordering is baked in at a very low level, making it difficult to express computations that are interested in a semantic-, rather than time-ordering of the data. In financial services, what often matters the most about the data moving between systems is not when the data was created, but in what order, to the extent that many institutions engineer a global sequencing over all data entering and produced by their systems to achieve complete determinism. How, then, can financial institutions and others best employ Stream Processing on streams of data that are already ordered? I will cover various techniques that can make this work, as well as seek input from the community on how Flink might be improved to better support these use-cases.
by
Patrick Lucas
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
Flink Forward San Francisco 2022.
In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling.
by
Gang Ye & Steven Wu
Batch Processing at Scale with Flink & IcebergFlink Forward
Flink Forward San Francisco 2022.
Goldman Sachs's Data Lake platform serves as the firm's centralized data platform, ingesting 140K (and growing!) batches per day of Datasets of varying shape and size. Powered by Flink and using metadata configured by platform users, ingestion applications are generated dynamically at runtime to extract, transform, and load data into centralized storage where it is then exported to warehousing solutions such as Sybase IQ, Snowflake, and Amazon Redshift. Data Latency is one of many key considerations as producers and consumers have their own commitments to satisfy. Consumers range from people/systems issuing queries, to applications using engines like Spark, Hive, and Presto to transform data into refined Datasets. Apache Iceberg allows our applications to not only benefit from consistency guarantees important when running on eventually consistent storage like S3, but also allows us the opportunity to improve our batch processing patterns with its scalability-focused features.
by
Andreas Hailu
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - buidling a streaming data platform with Flink and Kafka
1. Bas Geerdink & Martijn Visser, 13 September 2017
Streaming Analytics at
2. IT Chapter Lead at ING Analytics
Academic background: A.I. & Informatics
Working in IT since 2004
At ING since 2013
@bgeerdink https://nl.linkedin.com/in/geerdink
Bas Geerdink
3. Product Owner / Data Analyst at ING
Academic background: ICT & Law
Working with data since 2008
At ING since 2013
@martijnvisser82 https://nl.linkedin.com/in/martijnvisser1982
Martijn Visser
5. ING is a top financial enterprise, operating since 1881
Customers
35 Million
Private, Corporate and
Institutional Customers
Countries
40+
In Europe, Asia,
Australia, North and
South America
Employees
51,000
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
5
6. ING’s Think Forward strategy requires a data-driven approach
6
1. Earn the primary relationship
2. Develop analytics skills to understand our customers better
3. Increase the pace of innovation to serve changing customer needs
4. Think beyond traditional banking to develop new services and business models
Empowering people to stay a step
ahead in life and in business.
Simplify &
Streamline
Operational
Excellence
Performance
Culture
Lending
Capabilities
Purpose
Customer
Promise
Strategic
Priorities
Enablers
Creating a differentiating customer experience
Clear and Easy Anytime, Anywhere Empower Keep Getting Better
7. We’re making a shift from batch to real-time
7
Secure and reliable
Relevant
Personal
Omnichannel
Predictive Actionable
8. We’re building a Streaming Analytics platform with
high-end capabilities
8
• High performance & low latency to manage and process
users’ events.
• Scalability & availability to build new features, scenarios
and integration with other external systems.
• Fault tolerance mechanism to ensure high quality output
with strong information consistency.
• Online machine learning and business rules management
capabilities.
9. • Fraud detection:
• Detect possible fraudulent transactions
• Identify possible money laundering accounts
• Actionable Insights:
• Small financial insights and robotized advise
• Customer-defined alerts and push notifications
• Offer chat/call when customers don’t finish their customer journey
• Marketing and commercial offerings (‘Next Best Action’)
Main use cases for a Streaming Analytics Platform
9
10. • Wisdom of the crowd:
• making use of similar customers to make better predictions
• Online learning:
• making use of customer feedback in real-time to train models
• Beyond banking:
• enabling IoT
• robo advise
Future scenarios
10
12. Do people still need Banks, or just Banking?
Whoever controls the customer experience controls the customer value.
Alibaba, the most
valuable retailer,
has no inventory
Uber, the world’s
largest taxi
company owns no
fleet
AirBnB, the largest
accomodation
provider owns no
real estate
Facebook, the most
popular media
owner, creates no
content
We want our environment to be the front and center of customer financial
needs.
16. • Actionable Insights rather than data –
but select problems that are easy for
machines but diffiicult for people e.g.
forecasting
• Real time alerts can be the real
differentiators e.g. when a card auth is
received, when a customer makes a
payment that he has made a few times
previously
• Hierarchy of alerts – absolutely tell me
when things are going wrong but
occasionallly celebrate with me too
• UI adapts to confidence in AI. E.g. The
message diiffers when we have high
confidence in forecast
• UI also needs to adapt to customer
behaviours e.g. easy login when a
customer logs in to the same device
• The key role of AI is to help me stay a
step ahead in life – so UI needs to make
it easy to find items to act upon
• Digital experience is personalised to what’s
important to me
• Make the benefit of using your system clear
to me immediately rather than making me
discover the benefit
• I expect consistent experience Everywhere
e.g. once I dismiss an action online, I don’t
want to be prompted for it on mobile again
• I am used to certain way of working e.g. I am
used to settings being in one place, and I am
used to certain interaction patterns. So reduce
my learning curve by reusing those
AI + UI + I
Formula for creating compelling digital experience
20. Decision Engine: Batch Data Processing & APIs
Insert in cache
Send resultsApplication
Executes SQL
selections to
determine Insights
Data Lake
20
MVP: Batch implementation
Data warehouse
Determine
Insights
XFB File Transfer
Cassandra
[ODBC]
Realtime
Validator
InsightsAPI
Inbound response
Inbound request
1 API request response for
applicable Insights and rules
2 API request response
System Of Records
3 API request response
4 API request response for
Insights Content
Actionable Insights
24. The Streaming Data Platform has to be probabilistic in nature
24
Deterministic Probabilistic
• Business rules / decision trees
• Fixed logic (if-then-else)
• Rigid
• Machine learning
• Fuzzy logic
• Flexible
26. Streaming Data Platform Architecture
RawToRelevance Model Execution Engine Post-Processor
“filter and detect
pattern”
“determine
relevancy”
“produce
notification/alert”
Application Flow:
Business Flow:
“Data generator”
Formats outgoing messages
Get
Customer
Profile
Apply
Selection
Criteria
Apply
Model
Detect
Pattern
Get
Intermed
Event
Filter
Send
Output
Create
Intermed
Event
Get
Business
Event
Create
Business
Event
Format
Event
“Data gatherer”
Load stream of raw events and
filter/combine/aggregate them into a
meaningful business event
“Event scorer”
Scoring of PMML models to determine
the personal and relevant follow-up
actions
Get Raw
Events
27. Streaming Data Platform Architecture – Example 1
Application Flow:
Business Flow:
-150 -100 -100
Transactions
9:14
AMS
10:10
AMS
10:12
BER
Prediction:
Possible fraudulaus
transaction
Alert:
Block transaction
Option 1:
Do nothing
Option 2:
Stop money flow
Option 3:
Block card
0.12
0.91
0.73
RawToRelevance Model Execution Engine Post-Processor
“filter and detect
pattern”
“determine
relevancy”
“produce
notification/alert”
Sliding window,
triggered by transaction
10:22
Pattern:
Magic carpet
29. Streaming Data Platform Architecture – Example 2
Application Flow:
Business Flow:
-101 -13 -98
Transactions
13:58 14:24 15:02
Prediction:
Customer balance will
be zero at 16:11
Pattern:
Customer is shopping
Alert:
Ask customer to
transfer money
Option 1:
Do nothing
Option 2:
Send Whatsapp
Option 3:
Transfer money
0.34
0.13
0.57
-71
15:22
RawToRelevance Model Execution Engine Post-Processor
“filter and detect
pattern”
“determine
relevancy”
“produce
notification/alert”
Sliding window,
triggered by balance < 400
15:22
30. Streaming Data Platform Architecture – Example 3
Application Flow:
Business Flow:
P1 P2 P3
Page visits
18:01 18:03 18:06
Prediction:
Customer is unclear
what to do
Pattern:
Customer is
interested in a loan
Helpdesk calls
customer to give
support
Option 1:
Do nothing
Option 2:
Call customer
Option 3:
Send email
0.23
0.81
0.61
P4
RawToRelevance Model Execution Engine Post-Processor
“filter and detect
pattern”
“determine
relevancy”
“produce
notification/alert”
18:21
Sliding window,
triggered first page hit
32. Streaming Data Platform Architecture
32
Application Flow :
Kafka Events:
Business Flow:
Business
Users
System configuration:
Machine
Learning
Environment
Configuration
GUI
(future scope) Data
Scientists Feedback loop
DSL
Offline training
RawToRelevance Model Execution Engine Post-Processor
“filter and detect
pattern”
“determine
relevancy”
“produce
notification/alert”
Get
Customer
Profile
Apply
Selection
Criteria
Apply
Model
Detect
Pattern
Get
Intermed
Event
Filter
Send
Output
Create
Intermed
Event
Get
Business
Event
Create
Business
Event
Format
Event
Get Raw
Events
Raw
Event
Business
Event
Alert / Notification
Event
Intermediate
Event
openscoring.io
Control
Stream
Online scoring
Error
Stream
34. Benefits for ING:
• true streaming: per-event processing, no micro-batching
• open source, active community
• high volumes, low latency
• connectors to Kafka
• enterprise ready
• Scala api
Features that we use:
• state management and fault tolerancy: savepointing, checkpointing exactly-once
semanctics
• time windowing: event time, flexible (e.g. sum all transactions in the past minute)
• Complex Event Processing
ING chose Apache Flink for its Streaming Data Platform
34
36. PMML is the de facto standard for model representation
• XML-based interchange format for predictive models
• Support of hot-replacing an algorithm (configuration) at runtime by just providing an updated PMML
model and interpreting (scoring) it with openscoring.io
• PMML models can be created by a large number of tools, including: Spark, SAS, R, Knime, Python, H20.ai
• Large number of supported machine learning models
• Working together with the open source community on FlinkML
• Next steps: PFA, TensorFlow
PMML (Predictive Model Markup Language) bridges the gap
between data scientists and data engineers
36
Type Model support
Simple business rules • Rulesets, score cards
Complex business rules • Sequences, association rules
Predictive models • Naive Bayes
• k-Nearest Neighbors
• Regression: linear, logistic, regularized
• Trees: decision trees, random forest
• Support Vector Machines
Advanced models • Bayesian Network
• Neural Network (Deep Learning)
• Gaussian Process
Other • Time Series
• Cluster Models
Openscoring.io
38. • Cross functional squads
• Business, dev, ops, data scientist
• Reusing ING standard building blocks and processes
• Infrastructure, monitoring
• Continuous integration & delivery in DTAP environment
• Develop: Gitlab Jenkins Artifactory
• Using Flink’s savepointing mechanism
• No data loss, guaranteed
• Limited downtime
• Scalability, resilience and robustness
• Flink gives us automatic restart on failure, using checkpointing mechnism
• Rolling updates of infrastructure and middleware leads to no downtime
Flink fits the development & operations model with the ING way
of working
38
Design
Develop
TestDeploy
Monitor
39. Squad
In ING’s one way of working, ‘business’ and ‘IT’ go hand-in-hand
SquadSquad
Chapter
Guild
DS
DA
OpsOps
CJE
DS
DA
DS
CJE
DS
Ops
DS
CJE
Dev DevDevDevDev
Tribe
41. ING is an “IT company with a banking licence”, whose competitive advantage is in its use
of data and analytics. To excel, we’ll handle bigger volumes of streaming data with higher
performance demands.
ING build a streaming analytics platform as a generic, multi-purpose solution with fraud
detection and actionable insights as it first use cases.
Enterprise-readyness and security of Flink is continuously improving; although major
improvement points are on the topics of data retention and lineage (GDPR requirements).
With PMML we have a standard format that is very well suited for multidisciplinary
teams, where data scientists work on the same user stories as engineers.
We set up a streaming analytics community within ING, and work together with industry
leaders (Lightbend, DataArtisans).
Why is Streaming Analytics with Flink the right choice for ING,
moving forward?
42
43. We’re hiring!
Follow us to stay a step ahead
ING.com
YouTube.com/ING
SlideShare.net/ING@ING_News LinkedIn.com/company/ING
Flickr.com/INGGroupFacebook.com/ING