Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance"

•

1 like•352 views

Sudden spikes in load can be a source of disaster for stream processors. These spikes can reveal latent bottlenecks in otherwise well-balanced configurations and through them introduce backpressure, increase latency and reduce overall throughput. This problem is far from being solved. While the prevailing solution of dynamic scaling (i.e. the process of re-deploying the analytic job on an increased set of resources) offers relief in some cases, it has the disadvantage of requiring free resources to be available during the spike and additionally risks violating existing SLAs during the re-deployment phase. As an alternative, we have implemented MERA. MERA prioritizes items based on their value for the result of an analytic job. When a job is in risk of creating backpressure, MERA sheds items of lower priority at selected locations in the job graph, ensuring that the job stays within its SLA. MERA relies on Flink's internal metrics to restrict its overhead during normal operations to a minimum. In this talk we present MERA, a framework for trading in precision for performance. We demonstrate MERA via a custom performance monitor for Apache Flink.

Technology

MERA:
Trading Precision for Performance
Niklas Semmler (PhD student)
Anja Feldmann (Supervisor)

The Why: Compute function chains over large datasets.
2
Cost-efficientFast Accurate

Queue
Queue
UDF x N
Task
Jobs are deployed on commodity
hardware
Commodity hardware is failure-
prone
UDF UDF
UDF UDF
Blocked
UDF
Slowed
down
3
Backpressure

Causes for Backpressure
UDF
UDF
UDF
UDF
UDF
UDF
UDF
UDF
UDF
UDF
UDF
Slow host
Slow network link
Data skew
UDF UDF
Difference between UDFs
4
No single cause for backpressure!

Prior Knowledge: Straggler Mitigation
• Learn over iterations to identify the best configuration options?
• Herodotou, Herodotos, et al. "Starfish: a self-tuning system for big data analytics." Cidr. Vol.
11. No. 2011. 2011.
5
• Use speculative execution, i.e. execute the same task multiple times
in parallel, and use the result from the fastest tasks.
• Ananthanarayanan, Ganesh, et al. "Effective Straggler Mitigation: Attack of the Clones."
NSDI. Vol. 13. 2013.
Does not work for streaming!

Data Streams are different
time
UDF
UDF UDF
UDF
time
UDF
UDF UDF
UDF
No perfect configuration!
6

Methods to deal with Backpressure
• Overprovision
 Does not help against unbalanced partition scheme
• Dynamically re-scale/re-partition
 Requires state migration
 Incurs overhead
• Reduce accuracy in response to load peaks
7
• Accept unbounded delays for requests
 consumers might require timely results
✅
❌
❌Approximation
Aggregation Filtering
Sampling LoadsheddingDifficult for complex
schemata

Filter - Sampling
8
StreamApprox presented by Pramod Bhatotia
and Do Le Quoc at FlinkForward’17
ERROR WARN INFO DEBUG
Statistics
Generate a representative subset

Filter - Loadshedding
9
Generate a subset fulfilling a given property
Scoring
Function
Load-
shedder
Priority score
HIGH
LOW
Accept
Shed
Threshold

Introducing…
10
UDF
UDF
UDF
UDF
UDF
UDF
Metric
Server
Inference
(ILP)
Enforcer
Monitor
loadshedding
or sampling

MERA - Monitor
11
taskmanager
host
taskmanger
port
tasks
placed on
this task-
manager
saturation
of output
queue
saturation
of input
queue

Insights
Dynamic approximation is faster than scaling
Backpressure needs to be handled everywhere at the job!
 Filtering at the source is sub optimal!
There is more to approximation than sampling!
12
No provisioning &
state migration!

Hendrik von
Kiedrowski
Frontend
Paula
Breitbach
Optimization
Kajetan
Maliszewski
Backend
Myself
Contact: niklas@inet.tu-berlin.de
Thanks!
13

Results
• No optimization vs optimization
• Bottleneck latency
• Input rate
• Output rate
• Drop rate
• Latency?
• Drop only at source or everywhere
• Bar charts
• Bottleneck Latency
• Latency Mean
• Without Opt/Only at Source/Full
14

Re-Read
• Load Shedding Techniques for Data Stream Systems
• STREAMAPPROX: APPROXIMATE STREAM ANALYTICS IN APACHE FLINK
• Approximation papers
16
UDFUDF
Future
Combine aggregation with task

Open issues
• Problem: The ILP will select only a single or only few places to shed
items, leading to a loss of high-scoring items.
• Fix: Include the distribution factor into the ILP.
• Problem: By shedding at reducers, the distribution of data will change
in one direction.
• Fix: Shed only at mappers
17

Similar to Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance"

Programming The Real World

pauldeng

Boosting spark performance: An Overview of Techniques

Ahsan Javed Awan

Thinking in parallel ab tuladev

Pavel Tsukanov

Beyond data and model parallelism for deep neural networks

JunKudo2

AI on Greenplum Using  Apache MADlib and MADlib Flow - Greenplum Summit 2019

VMware Tanzu

Sector Sphere 2009

lilyco

sector-sphere

xlight

Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research. In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.

GPU and Deep learning best practices

Lior Sidi

Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

Yahoo Developer Network

These days fast code needs to operate in harmony with its environment. At the deepest level this means working well with hardware: RAM, disks and SSDs. A unifying theme is treating memory access patterns in a uniform and predictable that is sympathetic to the underlying hardware. For example writing to and reading from RAM and Hard Disks can be significantly sped up by operating sequentially on the device, rather than randomly accessing the data. In this talk we’ll cover why access patterns are important, what kind of speed gain you can get and how you can write simple high level code which works well with these kind of patterns.

Performance and predictability

RichardWarburton

Intro to Spark development

Spark Summit

Introduction to Spark Training

Spark Summit

In this InfluxDays NYC 2019 presentation, InfluxData VP of Products Tim Hall and Sales Engineer Sam Dillard discuss architecture patterns with InfluxEnterprise time series platform. They cover an overview of InfluxEnterprise, features, ingestion and query rates, deployment examples, replication patterns, and general advice. Presentation highlights include InfluxEnterprise cluster architecture and how to determine if you're ready for adopting InfluxEnterprise.

InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard

InfluxData

Lecture 24

Shani729

Sigcomm16 sdn-nvf-topics-preview

Christian Esteve Rothenberg

Deep Learning Primer: A First-Principles Approach

Maurizio Calo Caligaris

Software Define Network, a new security paradigm ?

Jean-Marc ANDRE

REEF: Towards a Big Data Stdlib

DataWorks Summit

At Netflix, we manage petabytes of data in Apache Cassandra which must be reliably accessible to users in mere milliseconds. To achieve this, we have built sophisticated control planes that turn our persistence layer based on Apache Cassandra into a truly self-driving system. We will start with the user interface that Netflix developers use to interact with their Cassandra databases and dive deep into the automation that powers it all. From cluster creation, through scaling up, to cluster death, complex automation drives large fleets of virtual machines hosted on the AWS cloud. First, we will cover the basics of how Netflix deploys Apache Cassandra. In particular, this begins with how we mold Apache Cassandra to the Netflix philosophy of immutable infrastructure, including managing software and hardware upgrades in the face of ever-failing hardware. Then we will explore the concrete techniques needed for such a massive deployment, specifically pull-based control planes and auto-healing strategies. Next, we will cover how Netflix has automated complex but critical Apache Cassandra maintenance tasks such as continuous snapshot backups and always-on anti-entropy repair for keeping our datasets safe and consistent. Both of these systems have gone through multiple architectural evolutions, and we have learned many lessons along the way. Lastly, we will share some of the ways this has gone wrong, and what you can do to avoid them. We will cover a few case studies of major Cassandra outages at Netflix, their root cause, and what we learned from those incidents. At the end of this talk, we hope that participants leave with concrete understanding of the challenges in running massive scale Apache Cassandra as well as solid advice and techniques for building their own self-driving data persistence layer.

How netflix manages petabyte scale apache cassandra in the cloud

Vinay Kumar Chella

Performance Analysis of Idle Programs

greenwop

Similar to Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance" (20)

Programming The Real World

Boosting spark performance: An Overview of Techniques

Thinking in parallel ab tuladev

Beyond data and model parallelism for deep neural networks

AI on Greenplum Using  Apache MADlib and MADlib Flow - Greenplum Summit 2019

Sector Sphere 2009

sector-sphere

GPU and Deep learning best practices

Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

Performance and predictability

Intro to Spark development

Introduction to Spark Training

InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard

Lecture 24

Sigcomm16 sdn-nvf-topics-preview

Deep Learning Primer: A First-Principles Approach

Software Define Network, a new security paradigm ?

REEF: Towards a Big Data Stdlib

How netflix manages petabyte scale apache cassandra in the cloud

Performance Analysis of Idle Programs

More from Flink Forward

Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.

Building a fully managed stream processing platform on Flink at scale for Lin...

Flink Forward

Flink Forward San Francisco 2022. When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment. by Jun Qin & Karl Friedrich

Evening out the uneven: dealing with skew in Flink

Flink Forward

Flink Forward San Francisco 2022. To improve Amazon Alexa experiences and support machine learning inference at scale, we built an automated end-to-end solution for incremental model building or fine-tuning machine learning models through continuous learning, continual learning, and/or semi-supervised active learning. Customer privacy is our top concern at Alexa, and as we build solutions, we face unique challenges when operating at scale such as supporting multiple applications with tens of thousands of transactions per second with several dependencies including near-real time inference endpoints at low latencies. Apache Flink helps us transform and discover metrics in near-real time in our solution. In this talk, we will cover the challenges that we faced, how we scale the infrastructure to meet the needs of ML teams across Alexa, and go into how we enable specific use cases that use Apache Flink on Amazon Kinesis Data Analytics to improve Alexa experiences to delight our customers while preserving their privacy. by Aansh Shah

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...

Flink Forward

Flink Forward San Francisco 2022. Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink. by Nico Kruber

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...

Flink Forward

Flink Forward San Francisco 2022. The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples." by Thomas Weise

Introducing the Apache Flink Kubernetes Operator

Flink Forward

Flink Forward San Francisco 2022. Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo. by Robert Metzger

Autoscaling Flink with Reactive Mode

Flink Forward

Flink Forward San Francisco 2022. Flink consumers read from Kafka as a scalable, high throughput, and low latency data source. However, there are challenges in scaling out data streams where migration and multiple Kafka clusters are required. Thus, we introduced a new Kafka source to read sharded data across multiple Kafka clusters in a way that conforms well with elastic, dynamic, and reliable infrastructure. In this presentation, we will present the source design and how the solution increases application availability while reducing maintenance toil. Furthermore, we will describe how we extended the existing KafkaSource to provide mechanisms to read logical streams located on multiple clusters, to dynamically adapt to infrastructure changes, and to perform transparent cluster migrations and failover. by Mason Chen

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...

Flink Forward

Flink Forward San Francisco 2022. Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework. by Steffen Hausmann & Danny Cranmer

One sink to rule them all: Introducing the new Async Sink

Flink Forward

Flink Forward San Francisco 2022. In normal situations, the default Kafka consumer and producer configuration options work well. But we all know life is not all roses and rainbows and in this session we’ll explore a few knobs that can save the day in atypical scenarios. First, we'll take a detailed look at the parameters available when reading from Kafka. We’ll inspect the params helping us to spot quickly an application lock or crash, the ones that can significantly improve the performance and the ones to touch with gloves since they could cause more harm than benefit. Moreover we’ll explore the partitioning options and discuss when diverging from the default strategy is needed. Next, we’ll discuss the Kafka Sink. After browsing the available options we'll then dive deep into understanding how to approach use cases like sinking enormous records, managing spikes, and handling small but frequent updates.. If you want to understand how to make your application survive when the sky is dark, this session is for you! by Olena Babenko

Tuning Apache Kafka Connectors for Flink.pptx

Flink Forward

Flink Forward San Francisco 2022. Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework. by Rainie Li & Kanchi Masalia

Flink powered stream processing platform at Pinterest

Flink Forward

Flink Forward San Francisco 2022. This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications. We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs. After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are. by David Moravek

Apache Flink in the Cloud-Native Era

Flink Forward

Flinkn Forward San Francisco 2022. In this talk, we will cover various topics around performance issues that can arise when running a Flink job and how to troubleshoot them. We’ll start with the basics, like understanding what the job is doing and what backpressure is. Next, we will see how to identify bottlenecks and which tools or metrics can be helpful in the process. Finally, we will also discuss potential performance issues during the checkpointing or recovery process, as well as and some tips and Flink features that can speed up checkpointing and recovery times. by Piotr Nowojski

Where is my bottleneck? Performance troubleshooting in Flink

Flink Forward

Flink Forward San Francisco 2022. Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A by James Busche & Ted Chang

Using the New Apache Flink Kubernetes Operator in a Production Deployment

Flink Forward

Flink Forward San Francisco 2022. The Table API is one of the most actively developed components of Flink in recent time. Inspired by databases and SQL, it encapsulates concepts many developers are familiar with. It can be used with both bounded and unbounded streams in a unified way. But from afar it can be difficult to keep track of what this API is capable of and how it relates to Flink's other APIs. In this talk, we will explore the current state of Table API. We will show how it can be used as a batch processor, a changelog processor, or a streaming ETL tool with many built-in functions and operators for deduplicating, joining, and aggregating data. By comparing it to the DataStream API we will highlight differences and elaborate on when to use which API. We will demonstrate hybrid pipelines in which both APIs interact with one another and contribute their unique strengths. Finally, we will take a look at some of the most recent additions as a first step to stateful upgrades. by David Andreson

The Current State of Table API in 2022

Flink Forward

Flink Forward San Francisco 2022. Based on the new Flink-Pulsar connector, we implemented Flink's TableAPI and Catalog to help users to interact with the Pulsar cluster via Flink SQL easily. We would like to go through the design and implementation of the SQL connector in the following aspects: 1. Two different modes of use Pulsar as a metadata store 2. Data format transformation and management 3. SQL semantics support within Pulsar context by Sijie Guo & Neng Lu

Flink SQL on Pulsar made easy

Flink Forward

Flink Forward San Francisco 2022. At Bloomberg, we deal with high volumes of real-time market data. Our clients expect to be notified of any anomalies in this market data, which may indicate volatile movements in the markets, notable trades, forthcoming events, or system failures. The parameters for these alerts are always evolving and our clients can update them dynamically. In this talk, we'll cover how we utilized the open source Apache Flink and Siddhi SQL projects to build a distributed, scalable, low-latency and dynamic rule-based, real-time alerting system to solve our clients' needs. We'll also cover the lessons we learned along our journey. by Ajay Vyasapeetam & Madhuri Jain

Dynamic Rule-based Real-time Market Data Alerts

Flink Forward

Flink Forward San Francisco 2022. At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing. by Xiang Zhang & Pratyush Sharma & Xiaoman Dong

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

Flink Forward

Flink Forward San Francisco 2022. What if my data is already in order? Stream Processing has given us an elegant and powerful solution for running analytic queries and logic over high volumes of continuously arriving data. However, in both Apache Flink and Apache Beam, the notion of time-ordering is baked in at a very low level, making it difficult to express computations that are interested in a semantic-, rather than time-ordering of the data. In financial services, what often matters the most about the data moving between systems is not when the data was created, but in what order, to the extent that many institutions engineer a global sequencing over all data entering and produced by their systems to achieve complete determinism. How, then, can financial institutions and others best employ Stream Processing on streams of data that are already ordered? I will cover various techniques that can make this work, as well as seek input from the community on how Flink might be improved to better support these use-cases. by Patrick Lucas

Processing Semantically-Ordered Streams in Financial Services

Flink Forward

Flink Forward San Francisco 2022. In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling. by Gang Ye & Steven Wu

Tame the small files problem and optimize data layout for streaming ingestion...

Flink Forward

Flink Forward San Francisco 2022. Goldman Sachs's Data Lake platform serves as the firm's centralized data platform, ingesting 140K (and growing!) batches per day of Datasets of varying shape and size. Powered by Flink and using metadata configured by platform users, ingestion applications are generated dynamically at runtime to extract, transform, and load data into centralized storage where it is then exported to warehousing solutions such as Sybase IQ, Snowflake, and Amazon Redshift. Data Latency is one of many key considerations as producers and consumers have their own commitments to satisfy. Consumers range from people/systems issuing queries, to applications using engines like Spark, Hive, and Presto to transform data into refined Datasets. Apache Iceberg allows our applications to not only benefit from consistency guarantees important when running on eventually consistent storage like S3, but also allows us the opportunity to improve our batch processing patterns with its scalability-focused features. by Andreas Hailu

Batch Processing at Scale with Flink & Iceberg

Flink Forward

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...

Evening out the uneven: dealing with skew in Flink

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...

Introducing the Apache Flink Kubernetes Operator

Autoscaling Flink with Reactive Mode

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...

One sink to rule them all: Introducing the new Async Sink

Tuning Apache Kafka Connectors for Flink.pptx

Flink powered stream processing platform at Pinterest

Apache Flink in the Cloud-Native Era

Where is my bottleneck? Performance troubleshooting in Flink

Using the New Apache Flink Kubernetes Operator in a Production Deployment

The Current State of Table API in 2022

Flink SQL on Pulsar made easy

Dynamic Rule-based Real-time Market Data Alerts

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

Processing Semantically-Ordered Streams in Financial Services

Tame the small files problem and optimize data layout for streaming ingestion...

Batch Processing at Scale with Flink & Iceberg

Recently uploaded

AI mind or machine power point presentation

yogeshlabana357357

Generative AI refers to a class of machine learning algorithms that are designed to generate new data samples that are similar to those in the training data. Unlike traditional AI models that are trained to recognize patterns and make predictions, generative AI models have the ability to create entirely new data based on the patterns they have learned. This is achieved through techniques such as generative adversarial networks (GANs), variational autoencoders (VAEs), and transformer architectures, among others.

Generative AI Use Cases and Applications.pdf

alexjohnson7307

Welcome to this session on UiPath Manufacturing and AI. In this session, we will discuss the importance UiPath plays with technology in the manufacturing and we will do an overview of AI and Document Understanding. Please join us to hear from UiPath and Community experts on why you might consider leveraging UiPath in manufacturing. Topics covered Community team overview The importance of UiPath technology for manufacturing UiPath AI and Document Understanding overview What is Document Understanding? DU Process Studio Template Components of DU Framework Q&A Speakers: Sebastian Seutter, Senior Industry Practice Director, UiPath, Inc. Priya Darshini, UiPath MVP RPA Solutions Architect Dzmitry Belanovski, Account Executive, UiPath, Inc.

UiPath manufacturing technology benefits and AI overview

DianaGray10

Introduction To Iamnobody89757 the vast expanse of the online realm, where anonymity and individuality intertwine, a phenomenon has emerged that captivates the collective curiosity – Iamnobody89757. This enigmatic entity, once a mere username, has transcended its humble origins to become a symbol of the intricate dance between privacy and expression in the digital age. Through this exploration, we delve into the origins, evolution, and far-reaching implications of this intriguing moniker, shedding light on the profound questions it raises about our digital selves. The Origins of iamnobody89757 The origins of iamnobody89757 are shrouded in the virtual mists of the wide internet. The term gained popularity on a niche discussion group that examined the most puzzling puzzles on the internet. In this instance, iamnobody89757 surfaced as a mysterious character who captivated other users with a conversation that was both mysterious and perceptive. These initial interactions were nuanced, filled with veiled references and subtle hints that painted a picture of someone—or something—with an intricate understanding of the digital domain’s darker corners. Although initially dismissed by many as just another anonymous user, the accuracy of certain predictions shared by iamnobody89757 soon captured the collective imagination. The iamnobody89757 enigma began with this shift from an unnoticed commenter to a fascinating subject, laying the groundwork for a growing tale that would weave its way through the fabric of online communities. The Birth of a Digital Persona Tracing the Roots The inception of Iamnobody89757 can be traced back to the early days of online forums and chat rooms, where users sought a delicate balance between anonymity and connectivity. In a world where personal information was a coveted commodity, this seemingly random assemblage of words and numbers became a shield, protecting the user’s identity while allowing them to engage in discourse without fear of judgment or repercussions.

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf

Muhammad Subhan

How to Check CNIC Information Online with Pakdata cf

danishmna97

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

Samir Dash

Event-Driven Architecture Masterclass: Challenges in Stream Processing

ScyllaDB

CORS (Kitworks Team Study 양다윗 발표자료 240510)

Wonjun Hwang

The presentation was made in “Web3 Fusion: Embracing AI and Beyond” is more than a conference; it's a journey into the heart of digital transformation. The conference a provided a platform where the future of technology meets practical application. This three-day hybrid event, set in the heart of innovation, served as a gateway to the latest trends and transformative discussions in AI, Blockchain, IoT, AR/VR, and their collective impact on the information space.

AI in Action: Real World Use Cases by Anitaraj

AnitaRaj43

Google I/O Extended 2024 Warsaw

GDSC PJATK

Introduction to FIDO Authentication and Passkeys.pptx

FIDO Alliance

Webinar Recording: https://www.panagenda.com/webinars/easier-faster-and-more-powerful-notes-document-properties-reimagined/ Have you ever felt frustrated by the small properties dialog in Notes? Had to create an agent or button to quickly change a field? Searched endlessly for the field you wanted to compare each time you selected a new document? Wished you could just make the damned thing bigger? Luckily, there is a solution – and you probably already have it installed! With the free panagenda Document Properties (Pro) you get the properties dialog you always needed. Big, resizable, full-text searchable. View multiple documents at once or compare them with a diff viewer. Modify any field, and finally have an easy way to handle profile documents for all users. Join HCL Lifetime Ambassador Julian Robichaux to discover how Document Properties can simplify your work and assist you daily when using Domino applications – in the client or the designer. You will never look back! Key takeaways from this session - What Document Properties is, which editions there are, and how you can find it in Notes and Domino Designer - How you can search for and edit any field, compare documents, or CSV export all data - How to find, edit, and even delete profile documents - Which configuration settings are available to customize feature

Easier, Faster, and More Powerful – Notes Document Properties Reimagined

panagenda

ERP Contender Series: Acumatica vs. Sage Intacct

BrainSell Technologies

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots

Leah Henrickson

WebRTC and SIP not just audio and video @ OpenSIPS 2024

Lorenzo Miniero

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

ScyllaDB

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf

AnubhavMangla3

Because observability is such a broad topic – and often something we learn on the job – it can feel like there’s too much to learn at once. But you don’t have to tackle everything and can start with the basics and build from there! Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. No matter what tooling is in place, there are still observability fundamentals that developers should know. That’s why I’ve put together a primer on the different telemetry types, when to use them, how to understand the data journey, and what to look for in time series graphs.

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)

Paige Cruz

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx

FIDO Alliance

Hyatt driving innovation and exceptional customer experiences with FIDO passw...

FIDO Alliance

Recently uploaded (20)

AI mind or machine power point presentation

Generative AI Use Cases and Applications.pdf

UiPath manufacturing technology benefits and AI overview

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf

How to Check CNIC Information Online with Pakdata cf

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

Event-Driven Architecture Masterclass: Challenges in Stream Processing

CORS (Kitworks Team Study 양다윗 발표자료 240510)

AI in Action: Real World Use Cases by Anitaraj

Google I/O Extended 2024 Warsaw

Introduction to FIDO Authentication and Passkeys.pptx

Easier, Faster, and More Powerful – Notes Document Properties Reimagined

ERP Contender Series: Acumatica vs. Sage Intacct

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots

WebRTC and SIP not just audio and video @ OpenSIPS 2024

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx

Hyatt driving innovation and exceptional customer experiences with FIDO passw...

Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance"

1. MERA: Trading Precision for Performance Niklas Semmler (PhD student) Anja Feldmann (Supervisor)

2. The Why: Compute function chains over large datasets. 2 Cost-efficientFast Accurate

3. Queue Queue UDF x N Task Jobs are deployed on commodity hardware Commodity hardware is failure- prone UDF UDF UDF UDF Blocked UDF Slowed down 3 Backpressure

4. Causes for Backpressure UDF UDF UDF UDF UDF UDF UDF UDF UDF UDF UDF Slow host Slow network link Data skew UDF UDF Difference between UDFs 4 No single cause for backpressure!

5. Prior Knowledge: Straggler Mitigation • Learn over iterations to identify the best configuration options? • Herodotou, Herodotos, et al. "Starfish: a self-tuning system for big data analytics." Cidr. Vol. 11. No. 2011. 2011. 5 • Use speculative execution, i.e. execute the same task multiple times in parallel, and use the result from the fastest tasks. • Ananthanarayanan, Ganesh, et al. "Effective Straggler Mitigation: Attack of the Clones." NSDI. Vol. 13. 2013. Does not work for streaming!

6. Data Streams are different time UDF UDF UDF UDF time UDF UDF UDF UDF No perfect configuration! 6

7. Methods to deal with Backpressure • Overprovision  Does not help against unbalanced partition scheme • Dynamically re-scale/re-partition  Requires state migration  Incurs overhead • Reduce accuracy in response to load peaks 7 • Accept unbounded delays for requests  consumers might require timely results ✅ ❌ ❌Approximation Aggregation Filtering Sampling LoadsheddingDifficult for complex schemata

8. Filter - Sampling 8 StreamApprox presented by Pramod Bhatotia and Do Le Quoc at FlinkForward’17 ERROR WARN INFO DEBUG Statistics Generate a representative subset

9. Filter - Loadshedding 9 Generate a subset fulfilling a given property Scoring Function Load- shedder Priority score HIGH LOW Accept Shed Threshold

10. Introducing… 10 UDF UDF UDF UDF UDF UDF Metric Server Inference (ILP) Enforcer Monitor loadshedding or sampling

11. MERA - Monitor 11 taskmanager host taskmanger port tasks placed on this taskmanager saturation of output queue saturation of input queue

12. Insights Dynamic approximation is faster than scaling Backpressure needs to be handled everywhere at the job!  Filtering at the source is sub optimal! There is more to approximation than sampling! 12 No provisioning & state migration!

13. Hendrik von Kiedrowski Frontend Paula Breitbach Optimization Kajetan Maliszewski Backend Myself Contact: niklas@inet.tu-berlin.de Thanks! 13

14. Results • No optimization vs optimization • Bottleneck latency • Input rate • Output rate • Drop rate • Latency? • Drop only at source or everywhere • Bar charts • Bottleneck Latency • Latency Mean • Without Opt/Only at Source/Full 14

15. Future 15

16. Re-Read • Load Shedding Techniques for Data Stream Systems • STREAMAPPROX: APPROXIMATE STREAM ANALYTICS IN APACHE FLINK • Approximation papers 16 UDFUDF Future Combine aggregation with task

17. Open issues • Problem: The ILP will select only a single or only few places to shed items, leading to a loss of high-scoring items. • Fix: Include the distribution factor into the ILP. • Problem: By shedding at reducers, the distribution of data will change in one direction. • Fix: Shed only at mappers 17

Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance"

Recommended

Recommended

More Related Content

Similar to Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance"

Similar to Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance" (20)

More from Flink Forward

More from Flink Forward (20)

Recently uploaded

Recently uploaded (20)

Flink Forward Berlin 2018: Niklas Semmler - "MERA: Trading precision for performance"