Microsoft uses Apache Flink to power Anubis, their anomaly detection engine for cloud activities. Anubis ingests activity events from cloud applications at large scale, maintains behavioral models for each user, detects anomalies in real-time, and raises alerts. Flink was chosen for its ability to handle massive state sizes, provide fault tolerance, and perform real-time stream processing at scale across multiple data centers. Microsoft has customized Flink for upgrades, monitoring, and deployment across large clusters to power Anubis and other production jobs.
Automatize a detecção de ameaças e evite falsos positivosElasticsearch
Detecte ameaças e evite falsos positivos com o mecanismo de detecção no Elastic Security. Automatize a detecção de ameaças por meio de correlações e machine learning com exemplos do reais.
Security Events Logging at Bell with the Elastic StackElasticsearch
One of Canada’s largest telecommunications company is using Elastic to drive improved security analysis in their SOC. With a need to ingest all security logs, build threat detection models, and normalize many new types of logs, the Bell security team turned to Elastic. Learn how they’ve streamlined alerts, deepened log analysis, and addressed challenges unique to being an ISP.
O monitoramento da infraestrutura facilitado, da ingestão ao insightElasticsearch
Descubra como a integração simplificada de dados com integrações predefinidas, insights automatizados com alertas e aprendizado de máquina e novas ferramentas visuais estão otimizando o caso de uso de monitoramento de infraestrutura.
Log Monitoring and Anomaly Detection at Scale at ORNLElasticsearch
See how Oak Ridge National Laboratory transitioned from using COTS toolset to a more cost-effective and flexible open source model by employing NiFi, Kafka, and the Elastic Stack.
Reinventing enterprise defense with the Elastic StackElasticsearch
Tune in to hear the most impactful lessons learned from Uber's security journey, and how security practitioners everywhere can tackle pervasive enterprise security challenges using the Elastic Stack.
Turning Evidence into Insights: How NCIS Leverages Elastic Elasticsearch
Learn how NCIS data analysis uses Elasticsearch to process evidence in the form of log files, its impact on efficient law enforcement, and some lessons learned along the way.
See the video: https://www.elastic.co/elasticon/tour/2019/washington-dc/turning-evidence-into-insights-how-ncis-leverages-elastic-
Empower your security practitioners with the Elastic StackElasticsearch
How does your organization detect and respond to cyber threats? Learn how the latest security capabilities in the Elastic Stack enable interactive exploration and automated analysis with speed and at scale.
Automatize a detecção de ameaças e evite falsos positivosElasticsearch
Detecte ameaças e evite falsos positivos com o mecanismo de detecção no Elastic Security. Automatize a detecção de ameaças por meio de correlações e machine learning com exemplos do reais.
Security Events Logging at Bell with the Elastic StackElasticsearch
One of Canada’s largest telecommunications company is using Elastic to drive improved security analysis in their SOC. With a need to ingest all security logs, build threat detection models, and normalize many new types of logs, the Bell security team turned to Elastic. Learn how they’ve streamlined alerts, deepened log analysis, and addressed challenges unique to being an ISP.
O monitoramento da infraestrutura facilitado, da ingestão ao insightElasticsearch
Descubra como a integração simplificada de dados com integrações predefinidas, insights automatizados com alertas e aprendizado de máquina e novas ferramentas visuais estão otimizando o caso de uso de monitoramento de infraestrutura.
Log Monitoring and Anomaly Detection at Scale at ORNLElasticsearch
See how Oak Ridge National Laboratory transitioned from using COTS toolset to a more cost-effective and flexible open source model by employing NiFi, Kafka, and the Elastic Stack.
Reinventing enterprise defense with the Elastic StackElasticsearch
Tune in to hear the most impactful lessons learned from Uber's security journey, and how security practitioners everywhere can tackle pervasive enterprise security challenges using the Elastic Stack.
Turning Evidence into Insights: How NCIS Leverages Elastic Elasticsearch
Learn how NCIS data analysis uses Elasticsearch to process evidence in the form of log files, its impact on efficient law enforcement, and some lessons learned along the way.
See the video: https://www.elastic.co/elasticon/tour/2019/washington-dc/turning-evidence-into-insights-how-ncis-leverages-elastic-
Empower your security practitioners with the Elastic StackElasticsearch
How does your organization detect and respond to cyber threats? Learn how the latest security capabilities in the Elastic Stack enable interactive exploration and automated analysis with speed and at scale.
Palestra de abertura: Evolução e visão do Elastic ObservabilityElasticsearch
O caso de uso de Observability ajuda a conduzir o tempo médio de resolução a zero com visibilidade de ponta a ponta em uma única plataforma. Ouça sobre os recursos e capacidades mais recentes e tenha uma visão do futuro.
Search for all with Elastic Enterprise Search Elasticsearch
We reimagined search in the workplace so you can get to the information you need quickly with a unified search experience, out-of-the-box data connectors, and simple search management interfaces. Meet Elastic Enterprise Search.
See the video: https://www.elastic.co/elasticon/tour/2019/washington-dc/search-for-all-with-elastic-enterprise-search
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
KeyBank is using an iterative design approach to scale their end-to-end enterprise monitoring system with Kafka and Elasticsearch at its core. See how they did it and the lessons learned along the way.
Monitoring real-life Azure applications: When to use what and whyKarl Ots
Slides from my presentation at Intelligent Cloud Conf on 29.5.2018 in Copenhagen
Modern applications leverage a variety of services, and often span across on premises, IaaS, PaaS and SaaS. Monitoring these environments is different from traditional systems. We have more and more data available from the platform with the likes of ARM Activity Logs, Azure Monitor, Log Analytics and Application Insights.
With a massive amount of signal and noise being generated in all these systems, how do we get our arms around what is happening? Is my application impacted in an ongoing Azure outage? Are my integrations intact? Which services from Azure should I use to monitor my application end-to-end? Come and hear how to answer these questions. After the session, you’ll have deeper understanding of end-to-end monitoring techniques in Azure solutions and know which services to choose for which scenario.
.
Automate Your Container Deployments SecurelyDevOps.com
Operations seeking to make their apps and APIs both performant and available to their users must bake effective application security tooling into their processes and infrastructure configurations. How can development and operations teams release at increasing velocity with app protection built into their CI/CD pipeline?
A true next-generation, holistic web application and API protection platform does just that: operations teams can integrate security into their workflows and ensure new infrastructure and app code released to production is both effective and secure in any environment from cloud using containers to datacenters to a hybrid of these.
Join application security expert Aneel Dadani from Signal Sciences to learn how your team can automate, deploy at scale safely while gaining layer 7 visibility in production environments.
Attendees will learn:
What constitutes effective application security within the context of cloud adoption and an ever expanding threat landscape
How development teams can gain visibility into how their apps and APIs are being used in production and what vulnerabilities may exist that they overlooked
How DevOps teams can scale their application footprint to meet demand while securing your codebase in production
How to inspect request traffic at the API gateway or the ingress
Infrastructure monitoring made easy, from ingest to insightElasticsearch
Discover how simplified data onboarding with prebuilt integrations, automated insights with alerting and machine learning, and new visual tools are streamlining the infrastructure monitoring use case.
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Tom Kerkhove
It is not a secret that it is hard to manage sensitive information. Azure Key Vault allows you to securely store this kind of information ranging from secrets & certificates to cryptographic keys.
Great! But how do you use it? How do I authenticate with it and how do I build robust applications with it?
Come join me and I'll walk you through the challenges and give you some recommendations.
Join the creators of the Elastic Stack and Microsoft product experts to learn best practices around deployment, scaling, and security on hosted VMs and the fully managed Elasticsearch Service in this demo-heavy session.
The full picture of Openstack in real-timeDynatrace
The full picture of Openstack in real-time
Peter Hack
Get real-time insights into resource utilization, OpenStack services, service availability and log files in one single dashboard.
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elasticsearch
Pour les organisations modernes, les applications sont souvent l'interface client principale, et influencent directement les résultats tels que le chiffre d'affaires et la fidélisation de la clientèle. Quelle que soit votre progression dans votre parcours vers les solutions cloud natives, Elastic APM peut vous aider à améliorer les expériences clients en détectant plus tôt les goulets d'étranglement des performances et en identifiant plus rapidement les régressions à partir des nouveaux déploiements. Découvrez comment obtenir une vue complète des services qui alimentent vos applications, du front-end au back-end, pour garantir un fonctionnement optimal.
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elasticsearch
Não importa onde você esteja em sua jornada rumo à nuvem, o Elastic APM ajuda a oferecer melhores experiências ao cliente, identificando gargalos de desempenho e identificando regressões de novas implantações com mais rapidez.
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)DevOps.com
As containers and Kubernetes are adopted in production, security is a critical concern and DevOps teams need to go beyond image scanning. Use cases such as runtime security, network visibility and segmentation, incident response and compliance become priorities as your Kubernetes security framework matures.
In this talk, we’ll share an overview of runtime security, discuss approaches used by open source and commercial tools, and hear how users are getting started quickly without impacting developer productivity.
Your Agile, Modern Data Delivery Platformsyed_javed
Lyftron eliminates traditional ETL/ELT bottlenecks with automatic data pipeline and make data instantly accessible to BI user with the modern cloud compute of Spark & Snowflake.
Lyftron connectors automatically convert any source into normalized, ready-to-query relational format and provide search capability on your enterprise data catalog.
Topics of this presentation:
- Basics and best practices of developing single-page applications (SPA) and Web API Services on Microsoft .NET -
- Core with Docker and Linux.
- PowerShell Core automated builds.
- Markdown/PDF documentation.
- Documentation of public interfaces with Swagger/OAS/YAML.
- Automated testing of SPA on Protractor and testing the Web API on Postman/Newman.
This presentation by Sergii Fradkov (Consultant, Engineering), Andrii Zarharov (Lead Software Engineer, Consultant), Igor Magdich (Lead Test Engineer, Consultant) was delivered at GlobalLogic Kharkiv .NET TechTalk #1 on May 24, 2019.
Search for All with Elastic Enterprise Search Elasticsearch
Learn how we reimagined search in the workplace so you can get to the information you need quickly with a unified search experience, out-of-the-box data connectors, and simple search management interfaces.
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Tom Kerkhove
Azure Data Factory is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. It is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.
In this talk I'll cover the basics of Azure Data Factory and show you how you can create, manage & operate data pipelines.
Our cloud-native environments are more complex than ever before! So how can we ensure that the applications we’re deploying to them are behaving as we intended them to? This is where effective observability is crucial. It enables us to monitor our applications in real-time and analyse and diagnose their behaviour in the cloud. However, until recently, we were lacking the standardization to ensure our observability solutions were applicable across different platforms and technologies. In this session, we’ll delve into what effective observability really means, exploring open source technologies and specifications, like OpenTelemetry, that can help us to achieve this while ensuring our applications remain flexible and portable.
Palestra de abertura: Evolução e visão do Elastic ObservabilityElasticsearch
O caso de uso de Observability ajuda a conduzir o tempo médio de resolução a zero com visibilidade de ponta a ponta em uma única plataforma. Ouça sobre os recursos e capacidades mais recentes e tenha uma visão do futuro.
Search for all with Elastic Enterprise Search Elasticsearch
We reimagined search in the workplace so you can get to the information you need quickly with a unified search experience, out-of-the-box data connectors, and simple search management interfaces. Meet Elastic Enterprise Search.
See the video: https://www.elastic.co/elasticon/tour/2019/washington-dc/search-for-all-with-elastic-enterprise-search
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
KeyBank is using an iterative design approach to scale their end-to-end enterprise monitoring system with Kafka and Elasticsearch at its core. See how they did it and the lessons learned along the way.
Monitoring real-life Azure applications: When to use what and whyKarl Ots
Slides from my presentation at Intelligent Cloud Conf on 29.5.2018 in Copenhagen
Modern applications leverage a variety of services, and often span across on premises, IaaS, PaaS and SaaS. Monitoring these environments is different from traditional systems. We have more and more data available from the platform with the likes of ARM Activity Logs, Azure Monitor, Log Analytics and Application Insights.
With a massive amount of signal and noise being generated in all these systems, how do we get our arms around what is happening? Is my application impacted in an ongoing Azure outage? Are my integrations intact? Which services from Azure should I use to monitor my application end-to-end? Come and hear how to answer these questions. After the session, you’ll have deeper understanding of end-to-end monitoring techniques in Azure solutions and know which services to choose for which scenario.
.
Automate Your Container Deployments SecurelyDevOps.com
Operations seeking to make their apps and APIs both performant and available to their users must bake effective application security tooling into their processes and infrastructure configurations. How can development and operations teams release at increasing velocity with app protection built into their CI/CD pipeline?
A true next-generation, holistic web application and API protection platform does just that: operations teams can integrate security into their workflows and ensure new infrastructure and app code released to production is both effective and secure in any environment from cloud using containers to datacenters to a hybrid of these.
Join application security expert Aneel Dadani from Signal Sciences to learn how your team can automate, deploy at scale safely while gaining layer 7 visibility in production environments.
Attendees will learn:
What constitutes effective application security within the context of cloud adoption and an ever expanding threat landscape
How development teams can gain visibility into how their apps and APIs are being used in production and what vulnerabilities may exist that they overlooked
How DevOps teams can scale their application footprint to meet demand while securing your codebase in production
How to inspect request traffic at the API gateway or the ingress
Infrastructure monitoring made easy, from ingest to insightElasticsearch
Discover how simplified data onboarding with prebuilt integrations, automated insights with alerting and machine learning, and new visual tools are streamlining the infrastructure monitoring use case.
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Tom Kerkhove
It is not a secret that it is hard to manage sensitive information. Azure Key Vault allows you to securely store this kind of information ranging from secrets & certificates to cryptographic keys.
Great! But how do you use it? How do I authenticate with it and how do I build robust applications with it?
Come join me and I'll walk you through the challenges and give you some recommendations.
Join the creators of the Elastic Stack and Microsoft product experts to learn best practices around deployment, scaling, and security on hosted VMs and the fully managed Elasticsearch Service in this demo-heavy session.
The full picture of Openstack in real-timeDynatrace
The full picture of Openstack in real-time
Peter Hack
Get real-time insights into resource utilization, OpenStack services, service availability and log files in one single dashboard.
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elasticsearch
Pour les organisations modernes, les applications sont souvent l'interface client principale, et influencent directement les résultats tels que le chiffre d'affaires et la fidélisation de la clientèle. Quelle que soit votre progression dans votre parcours vers les solutions cloud natives, Elastic APM peut vous aider à améliorer les expériences clients en détectant plus tôt les goulets d'étranglement des performances et en identifiant plus rapidement les régressions à partir des nouveaux déploiements. Découvrez comment obtenir une vue complète des services qui alimentent vos applications, du front-end au back-end, pour garantir un fonctionnement optimal.
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elasticsearch
Não importa onde você esteja em sua jornada rumo à nuvem, o Elastic APM ajuda a oferecer melhores experiências ao cliente, identificando gargalos de desempenho e identificando regressões de novas implantações com mais rapidez.
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)DevOps.com
As containers and Kubernetes are adopted in production, security is a critical concern and DevOps teams need to go beyond image scanning. Use cases such as runtime security, network visibility and segmentation, incident response and compliance become priorities as your Kubernetes security framework matures.
In this talk, we’ll share an overview of runtime security, discuss approaches used by open source and commercial tools, and hear how users are getting started quickly without impacting developer productivity.
Your Agile, Modern Data Delivery Platformsyed_javed
Lyftron eliminates traditional ETL/ELT bottlenecks with automatic data pipeline and make data instantly accessible to BI user with the modern cloud compute of Spark & Snowflake.
Lyftron connectors automatically convert any source into normalized, ready-to-query relational format and provide search capability on your enterprise data catalog.
Topics of this presentation:
- Basics and best practices of developing single-page applications (SPA) and Web API Services on Microsoft .NET -
- Core with Docker and Linux.
- PowerShell Core automated builds.
- Markdown/PDF documentation.
- Documentation of public interfaces with Swagger/OAS/YAML.
- Automated testing of SPA on Protractor and testing the Web API on Postman/Newman.
This presentation by Sergii Fradkov (Consultant, Engineering), Andrii Zarharov (Lead Software Engineer, Consultant), Igor Magdich (Lead Test Engineer, Consultant) was delivered at GlobalLogic Kharkiv .NET TechTalk #1 on May 24, 2019.
Search for All with Elastic Enterprise Search Elasticsearch
Learn how we reimagined search in the workplace so you can get to the information you need quickly with a unified search experience, out-of-the-box data connectors, and simple search management interfaces.
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Tom Kerkhove
Azure Data Factory is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. It is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.
In this talk I'll cover the basics of Azure Data Factory and show you how you can create, manage & operate data pipelines.
Our cloud-native environments are more complex than ever before! So how can we ensure that the applications we’re deploying to them are behaving as we intended them to? This is where effective observability is crucial. It enables us to monitor our applications in real-time and analyse and diagnose their behaviour in the cloud. However, until recently, we were lacking the standardization to ensure our observability solutions were applicable across different platforms and technologies. In this session, we’ll delve into what effective observability really means, exploring open source technologies and specifications, like OpenTelemetry, that can help us to achieve this while ensuring our applications remain flexible and portable.
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
This presentation outlines key business drivers for real-time analytics applications in retail and describes the emerging architectures based on In-Stream Processing (ISP) technologies. The slides present a complete open blueprint for an ISP platform - including a demo application for real-time Twitter Sentiment Analytics - designed with 100% open source components and deployable to any cloud.
To learn more, read an adjoining blog series on this topic here : https://blog.griddynamics.com/in-stream-processing-service-blueprint
According to service scale, there are hundreds or thousands of running containers in your service. Should we monitor each container by microscope or monitor each microservice by magnifier? This depends which granularity can help us find and solve the problems. In this sharing, I will introduce how to use cAdvisor, Icinga2, InfluxDB and Grafana to build a self-hosted monitoring system. In addition, I also discuss with how to embrace open source and share some practical experiences.
DCSF19 Container Security: Theory & Practice at NetflixDocker, Inc.
Michael Wardrop, Netflix
Usage of containers has undergone rapid growth at Netflix and it is still accelerating. Our container story started organically with developers downloading Docker and using it to improve their developer experience. The first production workloads were simple batch jobs, pioneering micro-services followed, then status as a first class platform running critical workloads.
As the types of workloads changed and their importance increased, the security of our container ecosystem needed to evolve and adapt. This session will cover some security theory, architecture, along with practical considerations, and lessons we learnt along the way.
From the demise of conventional signature-based endpoint technologies have risen next generation solutions. These technologies have cluttered the marketplace introducing a conundrum for endpoint selection. This session will focus on the key requirements for effective security prevention, detection, and remediation. It will introduce a real-world framework for categorizing endpoint capabilities, and enable selection of solutions matching the unmet needs of security programs. The following topics will be covered:
• What do i actually need?
• Real-world framework to categorize endpoint capabilities
• Map vendors into buckets within the framework
• Housekeeping, what's needed before you even start?
• Cheat sheet of probing questions to ask vendors
• Best practices of deploying best of breed solutions
BSidesLondon 20th April 2011 - Xavier Mertens (@xme)
========================
Your IT infrastructure generates thousands(millions?) of events a day. They are stored in several places under multiple forms and contain a lot of very interesting information. Using free tools, This presentation will give you some ideas how to properly manage this continuous flow of information and how to make them more valuable.
for more about Xavier
http://blog.rootshell.be
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes per day of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality. This session is especially recommended for data infrastructure engineers and architects planning, building, or maintaining similar systems.
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Codemotion
At the Cloud, Big Data and Internet division of the Dutch National Police, 4 DevOps teams use the latest open source technology to build high tech, cloud native web applications using Spring Boot, Angular 5, Spark, Kafka and Jenkins 2. I'll share our experiences and real-world use cases for microservices. I’ll show how 4 teams work together on one product and I’ll talk about how we apply the principles of DevOps and Continuous Delivery. I’ll show how we handle security, build pipelines, test automation, performance tests, service discovery, automated deployments, monitoring and more!
Azure 101: Shared responsibility in the Azure CloudPaulo Renato
Whether you’re working exclusively on Azure or with multiple cloud environments, there are certain things you should consider when moving assets to the public cloud. As with any cloud deployment, security is a top priority, and moving your workloads to the Azure cloud doesn’t mean you’re not responsible for the security of your operating system, applications, and data.
Building on the security of the Azure infrastructure, this shared security responsibility starts with making sure your environment is secure. In this session, we will discuss step-by-step what you need to do to secure access at the administrative, application and network layers.
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
Flink Forward San Francisco 2022.
To improve Amazon Alexa experiences and support machine learning inference at scale, we built an automated end-to-end solution for incremental model building or fine-tuning machine learning models through continuous learning, continual learning, and/or semi-supervised active learning. Customer privacy is our top concern at Alexa, and as we build solutions, we face unique challenges when operating at scale such as supporting multiple applications with tens of thousands of transactions per second with several dependencies including near-real time inference endpoints at low latencies. Apache Flink helps us transform and discover metrics in near-real time in our solution. In this talk, we will cover the challenges that we faced, how we scale the infrastructure to meet the needs of ML teams across Alexa, and go into how we enable specific use cases that use Apache Flink on Amazon Kinesis Data Analytics to improve Alexa experiences to delight our customers while preserving their privacy.
by
Aansh Shah
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
Introducing the Apache Flink Kubernetes OperatorFlink Forward
Flink Forward San Francisco 2022.
The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples."
by
Thomas Weise
Flink Forward San Francisco 2022.
Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo.
by
Robert Metzger
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
Flink Forward San Francisco 2022.
Flink consumers read from Kafka as a scalable, high throughput, and low latency data source. However, there are challenges in scaling out data streams where migration and multiple Kafka clusters are required. Thus, we introduced a new Kafka source to read sharded data across multiple Kafka clusters in a way that conforms well with elastic, dynamic, and reliable infrastructure. In this presentation, we will present the source design and how the solution increases application availability while reducing maintenance toil. Furthermore, we will describe how we extended the existing KafkaSource to provide mechanisms to read logical streams located on multiple clusters, to dynamically adapt to infrastructure changes, and to perform transparent cluster migrations and failover.
by
Mason Chen
One sink to rule them all: Introducing the new Async SinkFlink Forward
Flink Forward San Francisco 2022.
Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework.
by
Steffen Hausmann & Danny Cranmer
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
Flink Forward San Francisco 2022.
In normal situations, the default Kafka consumer and producer configuration options work well. But we all know life is not all roses and rainbows and in this session we’ll explore a few knobs that can save the day in atypical scenarios. First, we'll take a detailed look at the parameters available when reading from Kafka. We’ll inspect the params helping us to spot quickly an application lock or crash, the ones that can significantly improve the performance and the ones to touch with gloves since they could cause more harm than benefit. Moreover we’ll explore the partitioning options and discuss when diverging from the default strategy is needed. Next, we’ll discuss the Kafka Sink. After browsing the available options we'll then dive deep into understanding how to approach use cases like sinking enormous records, managing spikes, and handling small but frequent updates.. If you want to understand how to make your application survive when the sky is dark, this session is for you!
by
Olena Babenko
Flink powered stream processing platform at PinterestFlink Forward
Flink Forward San Francisco 2022.
Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework.
by
Rainie Li & Kanchi Masalia
Flink Forward San Francisco 2022.
This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications.
We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs.
After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are.
by
David Moravek
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
Flinkn Forward San Francisco 2022.
In this talk, we will cover various topics around performance issues that can arise when running a Flink job and how to troubleshoot them. We’ll start with the basics, like understanding what the job is doing and what backpressure is. Next, we will see how to identify bottlenecks and which tools or metrics can be helpful in the process. Finally, we will also discuss potential performance issues during the checkpointing or recovery process, as well as and some tips and Flink features that can speed up checkpointing and recovery times.
by
Piotr Nowojski
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
Flink Forward San Francisco 2022.
Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A
by
James Busche & Ted Chang
Flink Forward San Francisco 2022.
The Table API is one of the most actively developed components of Flink in recent time. Inspired by databases and SQL, it encapsulates concepts many developers are familiar with. It can be used with both bounded and unbounded streams in a unified way. But from afar it can be difficult to keep track of what this API is capable of and how it relates to Flink's other APIs. In this talk, we will explore the current state of Table API. We will show how it can be used as a batch processor, a changelog processor, or a streaming ETL tool with many built-in functions and operators for deduplicating, joining, and aggregating data. By comparing it to the DataStream API we will highlight differences and elaborate on when to use which API. We will demonstrate hybrid pipelines in which both APIs interact with one another and contribute their unique strengths. Finally, we will take a look at some of the most recent additions as a first step to stateful upgrades.
by
David Andreson
Flink Forward San Francisco 2022.
Based on the new Flink-Pulsar connector, we implemented Flink's TableAPI and Catalog to help users to interact with the Pulsar cluster via Flink SQL easily. We would like to go through the design and implementation of the SQL connector in the following aspects:
1. Two different modes of use Pulsar as a metadata store
2. Data format transformation and management
3. SQL semantics support within Pulsar context
by
Sijie Guo & Neng Lu
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
Flink Forward San Francisco 2022.
At Bloomberg, we deal with high volumes of real-time market data. Our clients expect to be notified of any anomalies in this market data, which may indicate volatile movements in the markets, notable trades, forthcoming events, or system failures. The parameters for these alerts are always evolving and our clients can update them dynamically. In this talk, we'll cover how we utilized the open source Apache Flink and Siddhi SQL projects to build a distributed, scalable, low-latency and dynamic rule-based, real-time alerting system to solve our clients' needs. We'll also cover the lessons we learned along our journey.
by
Ajay Vyasapeetam & Madhuri Jain
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
Flink Forward San Francisco 2022.
What if my data is already in order? Stream Processing has given us an elegant and powerful solution for running analytic queries and logic over high volumes of continuously arriving data. However, in both Apache Flink and Apache Beam, the notion of time-ordering is baked in at a very low level, making it difficult to express computations that are interested in a semantic-, rather than time-ordering of the data. In financial services, what often matters the most about the data moving between systems is not when the data was created, but in what order, to the extent that many institutions engineer a global sequencing over all data entering and produced by their systems to achieve complete determinism. How, then, can financial institutions and others best employ Stream Processing on streams of data that are already ordered? I will cover various techniques that can make this work, as well as seek input from the community on how Flink might be improved to better support these use-cases.
by
Patrick Lucas
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
Flink Forward San Francisco 2022.
In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling.
by
Gang Ye & Steven Wu
Batch Processing at Scale with Flink & IcebergFlink Forward
Flink Forward San Francisco 2022.
Goldman Sachs's Data Lake platform serves as the firm's centralized data platform, ingesting 140K (and growing!) batches per day of Datasets of varying shape and size. Powered by Flink and using metadata configured by platform users, ingestion applications are generated dynamically at runtime to extract, transform, and load data into centralized storage where it is then exported to warehousing solutions such as Sybase IQ, Snowflake, and Amazon Redshift. Data Latency is one of many key considerations as producers and consumers have their own commitments to satisfy. Consumers range from people/systems issuing queries, to applications using engines like Spark, Hive, and Presto to transform data into refined Datasets. Apache Iceberg allows our applications to not only benefit from consistency guarantees important when running on eventually consistent storage like S3, but also allows us the opportunity to improve our batch processing patterns with its scalability-focused features.
by
Andreas Hailu
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Knowledge engineering: from people to machines and back
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detection Engine for Cloud Activities using Flink"
1. Anomaly Detection Engine for
Cloud Activities using Flink
Flink Forward Berlin 2018
Yonatan Most & Avihai Berkovitz
Microsoft Cloud App Security
2. Microsoft Cloud App Security
Discover and
assess risks
Control access
in real time
Detect
threats
Protect your
information
Identify cloud apps on your
network, gain visibility into shadow
IT, and get risk assessments and
ongoing analytics.
Manage and limit cloud app
access based on conditions and
session context, including user
identity, device, and location.
Identify high-risk usage and
detect unusual behavior using
Microsoft threat intelligence
and research.
Get granular control over data
and use built-in or custom
policies for data sharing and
data loss prevention.
Threat detection: Microsoft Intelligent Security Graph, Office ATP
Information Protection: Office 365 & Azure Information Protection
Identity: Azure AD and Conditional Access
To your cloud appsExtend Microsoft security
+ more
4. • The primary data type in the product
• 16,000+ supported SaaS applications
• Dozens of activity types (logon, upload, export, create user…)
• Locations, devices, target objects, impersonated users…
• Out of order by up to 24 hours!
Cloud activities
6. • Goal: detect anomalous user behavior and alert the admin
• The core flow is:
• Analyze all the activities and extract security-oriented insights
• Maintain a behavioral model for every user, and update it inline
• Detect outliers, suspicious behavior or potentially malicious activity
• Cross-reference with previous alerts and other users, to prevent false positives and “alert fatigue”
• Reach a decision and raise an alert within seconds
Activity-based threat protection
7. • Scalability
• Fault tolerance & recovery
• Real time processing
• Short time from ingestion to
detection
• Extendibility
Detection engine requirements
• Support for massive state size
• State persistency
• State consistency
• Fast, parallel access and update of
state
9. • We needed a stateful stream processing framework
• We really didn’t want to build one in-house
• We tested 10 frameworks
• Azure ML, Azure Stream Analytics, Microsoft Orleans, Apache Storm, Apache Samza, Apache Spark streaming, Apache Ignite, Apache Beam…
• We chose Flink!
We embarked on a search for a framework
10. Anubis
• Our anomaly detection engine
• A single Flink job running the entire flow
• Ingests activities
• Outputs alerts
11. • 8 clusters across multiple datacenters
• Our largest cluster:
• 40 machines
• 25,000 events per second
• 1.3 TB of state
Anubis
13. Anubis scalability
Scalability in… Requires…
Event rate per cluster Increase cluster as needed (stability cost)
Event rate per user group
Key by user where possible
Special treatment of group-keyed operators
Event rate per user Capping the user event rate
Features models and detections
Key by feature / model / detection +
increase cluster as needed
Number of clusters Easy cluster management
14. • Multiple clusters in multiple datacenters
• Custom standalone cluster setup
• Automation scripts
• Machine setup
• Flink version upgrade
• Job deployment and upgrade
• Kubernetes-based deployment solution is in progress
Flink @ Microsoft - cluster
15. • We use Azure EventHubs as sources and sinks
• We use Azure Blob Storage for state backend
Flink @ Microsoft – connectors
16. • The job and its state should last forever
• We also deploy new versions frequently and update the state objects
• We created a custom versioned framework based on Kryo
Flink @ Microsoft – upgrades
17. • Custom wrappers for operators, state access, serializers, collectors
• Adding log metadata about current operator, context, timing
• Adding metrics using the built-in framework
• Around 15,000 metrics per process!
• Everything flows to Splunk
• Investigation
• Dashboards
• Alerts
Flink @ Microsoft – monitoring
24. • Two other Flink jobs already in production
• Another two in development
• Kubernetes-based deployment solution is in progress
• Helping other groups within Microsoft build Flink jobs
What’s next?