Gian will offer his reflections on the Druid journey to date, plus describe his vision for what Druid will become. He will lay out the near-term Druid roadmap and take your questions.
Watch video: https://imply.io/virtual-druid-summit/apache-druid-vision-and-roadmap-gian-merlino
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
Apache Druid®: A Dance of Distributed ProcessesImply
Apache Druid® is an open source analytics database powering fresh, fast analytics in companies from AirBnB to Zeotap on clickstream, telemetry, financial transactions, applications and more. In this talk, we open the box on the three distributed processes in Druid led by the coordinator, overlord, and broker, and the ways that these come together to deliver reliable, performant query, ingestion, and management services.
Building a Real-Time Gaming Analytics Service with Apache DruidImply
At GameAnalytics we receive and process real time behavioural data from more than 100 million daily active users, helping thousands of game studios and developers understand user behaviour and improve their games. In this talk, you will learn how we managed to migrate our legacy backend system from using an in-house built streaming analytics service to Apache Druid, and the lessons learned along the way. By adopting Druid, we have been able to reduce development costs, increase reliability of our systems and implement new features that would have not been possible with our old stack. We will provide an overview of our approach to schema design, segments optimization, creation of our query layer, caching and datasources optimisation, which can help you better understand how you can successfully use Druid as a key component on your data processing and reporting infrastructure.
MoPub, a Twitter company, provides monetization solutions for mobile app publishers and developers around the globe. MoPub receives over 33 Billion ad requests per day generating over 200TB of raw logs every day. We built MoPub Analytics as the analytics platform, using Druid + Imply for our end users who are Publishers, Demand side partners and Internal users.
We will talk about the architecture of the analytics platform, our Druid cluster setup, hardware choices, monitoring, use cases, limiting factors, challenges with lookups and solutions we used.
Watch video:https://imply.io/virtual-druid-summit/analytics-over-terabytes-of-data-at-twitter-apache-druid
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
In this talk, we will talk about:
1) the motivation of switching from Hbase backed analytics system to Druid
2) the architecture design of Druid as a platform in Pinterest (Archmage, Hadoop, Kafka) including a query interface, Archmage, a thrift service in front of Druid which exposes a thrift api to company-wise clients, handles Druid broker hosts discovery, serves as a relay to broker hosts to abstract the async HTTP connection and provides query optimizations transparent to clients including directly translating fixed pattern SQL to Druid native JSON queries to save planning time. In addition, we’ll cover the production Hadoop batch and Kafka real time ingestion pipeline setup and the reason we picked a pull-based solution instead of a push-based solution for real time ingestion.
3) We will also talk about the use cases currently running in production on this platform including their data volume, QPS, Druid cluster setup, the unique challenges we met while onboarding and how we addressed them with extensive tunings to meet SLA and lessons learned for use cases including: partner insights, which provides partners with stats on organic pins; realtime spam detection, which detects user login related anomaly events and pin related spamming events like pin creation and repin; and migrating the backend from Presto to Druid for Ads related experiments data analysis.
Why data warehouses cannot support hot analyticsImply
Check out the full webinar: https://imply.io/videos/why-data-warehouses-cannot-support-hot-analytics
Today’s data warehouses - whether traditional, specialized or cloud-based - are good at supporting cold analytics, such as reporting, where query times can take minutes. But they cannot cost-effectively support hot analytics—interactive ad hoc analytics usually performed by larger groups of users against batch or streaming data. Examples of hot analytics include clickstream analytics; service, network and application performance monitoring; and risk analytics.
Data warehouses struggle with hot analytics use cases because they are too slow, unable to scale, or too expensive. Learn how a new class of real-time data platforms overcome these limitations, and how companies implement a “temperature-based” approach to analytics.
One of the most popular use cases for Apache Druid is building data applications. Data applications exist to deliver data into the hands of everyone on a team in a business, and are used by these teams to make faster, better decisions. To fulfill this role, they need to support granular drill down, because the devil is in the details, but also be extremely fast, because otherwise people won't use them!
In this talk, Gian Merlino will cover:
*The unique technical challenges of powering data-driven applications
*What attributes of Druid make it a good platform for data applications
*Some real-world data applications powered by Druid
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
Apache Druid®: A Dance of Distributed ProcessesImply
Apache Druid® is an open source analytics database powering fresh, fast analytics in companies from AirBnB to Zeotap on clickstream, telemetry, financial transactions, applications and more. In this talk, we open the box on the three distributed processes in Druid led by the coordinator, overlord, and broker, and the ways that these come together to deliver reliable, performant query, ingestion, and management services.
Building a Real-Time Gaming Analytics Service with Apache DruidImply
At GameAnalytics we receive and process real time behavioural data from more than 100 million daily active users, helping thousands of game studios and developers understand user behaviour and improve their games. In this talk, you will learn how we managed to migrate our legacy backend system from using an in-house built streaming analytics service to Apache Druid, and the lessons learned along the way. By adopting Druid, we have been able to reduce development costs, increase reliability of our systems and implement new features that would have not been possible with our old stack. We will provide an overview of our approach to schema design, segments optimization, creation of our query layer, caching and datasources optimisation, which can help you better understand how you can successfully use Druid as a key component on your data processing and reporting infrastructure.
MoPub, a Twitter company, provides monetization solutions for mobile app publishers and developers around the globe. MoPub receives over 33 Billion ad requests per day generating over 200TB of raw logs every day. We built MoPub Analytics as the analytics platform, using Druid + Imply for our end users who are Publishers, Demand side partners and Internal users.
We will talk about the architecture of the analytics platform, our Druid cluster setup, hardware choices, monitoring, use cases, limiting factors, challenges with lookups and solutions we used.
Watch video:https://imply.io/virtual-druid-summit/analytics-over-terabytes-of-data-at-twitter-apache-druid
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
In this talk, we will talk about:
1) the motivation of switching from Hbase backed analytics system to Druid
2) the architecture design of Druid as a platform in Pinterest (Archmage, Hadoop, Kafka) including a query interface, Archmage, a thrift service in front of Druid which exposes a thrift api to company-wise clients, handles Druid broker hosts discovery, serves as a relay to broker hosts to abstract the async HTTP connection and provides query optimizations transparent to clients including directly translating fixed pattern SQL to Druid native JSON queries to save planning time. In addition, we’ll cover the production Hadoop batch and Kafka real time ingestion pipeline setup and the reason we picked a pull-based solution instead of a push-based solution for real time ingestion.
3) We will also talk about the use cases currently running in production on this platform including their data volume, QPS, Druid cluster setup, the unique challenges we met while onboarding and how we addressed them with extensive tunings to meet SLA and lessons learned for use cases including: partner insights, which provides partners with stats on organic pins; realtime spam detection, which detects user login related anomaly events and pin related spamming events like pin creation and repin; and migrating the backend from Presto to Druid for Ads related experiments data analysis.
Why data warehouses cannot support hot analyticsImply
Check out the full webinar: https://imply.io/videos/why-data-warehouses-cannot-support-hot-analytics
Today’s data warehouses - whether traditional, specialized or cloud-based - are good at supporting cold analytics, such as reporting, where query times can take minutes. But they cannot cost-effectively support hot analytics—interactive ad hoc analytics usually performed by larger groups of users against batch or streaming data. Examples of hot analytics include clickstream analytics; service, network and application performance monitoring; and risk analytics.
Data warehouses struggle with hot analytics use cases because they are too slow, unable to scale, or too expensive. Learn how a new class of real-time data platforms overcome these limitations, and how companies implement a “temperature-based” approach to analytics.
One of the most popular use cases for Apache Druid is building data applications. Data applications exist to deliver data into the hands of everyone on a team in a business, and are used by these teams to make faster, better decisions. To fulfill this role, they need to support granular drill down, because the devil is in the details, but also be extremely fast, because otherwise people won't use them!
In this talk, Gian Merlino will cover:
*The unique technical challenges of powering data-driven applications
*What attributes of Druid make it a good platform for data applications
*Some real-world data applications powered by Druid
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
Bio: Peter Marshall (https://linkedin.com/in/amillionbytes/) leads outreach and engineering across Europe for Imply (http://imply.io/), a company founded by the original developers of Apache Druid. He has 20 years architecture experience in CRM, EDRM, ERP, EIP, Digital Services, Security, BI, Analytics, and MDM. He is TOGAF certified and has a BA (hons) degree in Theology and Computer Studies from the University of Birmingham in the United Kingdom.
Peter Marshall, Technology Evangelist at Imply
Abstract: Apache Druid® can revolutionise business decision-making with a view of the freshest of fresh data in web, mobile, desktop, and data science notebooks. In this talk, we look at key activities to integrate into Apache Druid POCs, discussing common hurdles and signposting to important information.
Bio: Peter Marshall (https://petermarshall.io) is an Apache Druid Technology Evangelist at Imply (http://imply.io/), a company founded by original developers of Apache Druid. He has 20 years architecture experience in CRM, EDRM, ERP, EIP, Digital Services, Security, BI, Analytics, and MDM. He is TOGAF certified and has a BA degree in Theology and Computer Studies from the University of Birmingham in the United Kingdom.
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
Ensuring a consistently great Netflix experience while continuously pushing innovative technology updates is no easy feat.
We'll look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field. Including sharing some of the lessons learned around optimizing Druid to handle our load.
Check out the webinar: https://imply.io/videos/whats-new-imply-3-3-apache-druid-0-18
The most recent Imply 3.3 release, based on Apache 0.18 brings several major new features, including joins, query laning and Clarity Alerts. These new features deliver increased design flexibility during design, and provide improved ingestion performance, and sub-second response times to help accelerate data warehouse and data lake deployments, and add real-time analytics in general.
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
In this session, TrafficGuard’s Head of Data Science, Raigon Jolly, will discuss how TrafficGuard uses Druid and its partnership with Imply to:
- Provide granular reporting to clients in near-real time
- Monitor rules and concept drift
- Staying ahead of the moving target that is ad fraud
- Facilitate performance tuning and right-sizing infrastructure so our team can focus on innovation of our core product
Nicolas Trésegnie, Chief Architect at SuperAwesome
Abstract: SuperAwesome's mission is to make the internet safer for kids. At the core of SuperAwesome's analytics is Druid. In this talk, we walk through how we run Druid on spot instances. We explain the consequences in terms of cost and reliability, how we managed to build a reliable system despite the risks, and how you could do the same.
Nicolas works as Chief Architect at SuperAwesome, where is is looking after the overall architecture of the systems and the infrastructure. He is all about automation and how technology can be used to achieve business goals. Nicolas studied Computer Science and Bioinformatics, and he is now pursuing an MBA at Imperial.
Matt Sarrel of Imply draws on his work benchmarking Apache Druid with the Star Schema Benchmark (SSB) and shows how you can performance test Druid with your workload. Virtual meetup of July 16, 2020.
Splunk: Druid on Kubernetes with Druid-operatorImply
We went through the journey of deploying Apache Druid clusters on Kubernetes(K8s) and created a druid-operator (https://github.com/druid-io/druid-operator). This talk introduces the druid kubernetes operator, how to use it to deploy druid clusters and how it works under the hood. We will share how we use this operator to deploy Druid clusters at Splunk.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Druid is a complex stateful distributed system and a Druid cluster consists of multiple web services such as Broker, Historical, Coordinator, Overlord, MiddleManager etc each deployed with multiple replicas. Deploying a single web service on K8s requires creating few K8s resources via YAML files and it multiplies due to multiple services inside of a Druid cluster. Now doing it for multiple Druid clusters (dev, staging, production environments) makes it even more tedious and error prone.
K8s enables creation of application (such as Druid) specific extension, called “Operator”, that combines kubernetes and application specific knowledge into a reusable K8s extension that makes deploying complex applications simple.
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
Benjamin Hopp (Solutions Architect) @ Imply:
Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”-style) on large data sets.
This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit.
Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics.
Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack.
The most important contributor to a fast analytical setup is getting the data model right.
The talk will center around various choices you can make to prepare your data to get best possible query performance.
We’ll look at some general best practices to model your data before ingestion such as OLAP dimensional modeling (called “roll-up” in Druid), data partitioning, and tips for choosing column types and indexes.
We’ll also look at how more can be less: often, storing copies of your data partitioned, sorted, or aggregated in different ways can speed up queries by reducing the amount of computation needed.
We’ll also look at Druid-specific optimizations that take advantage of approximations; where you can trade accuracy for performance and reduced storage.
You’ll get introduced to Druid’s features for approximate counting, set operations, ranking, quantiles, and more.
And we will finish with the latest and greatest Druid news, including details about the latest roadmap and releases.
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.
In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics:
-Network flow use cases and why this data is important.
-Reference architectures from production systems at a major international Bank.
-Why Kafka and Druid and other OSS tools for Network Flows.
-A demo of one such system.
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
How do you monitor performance for one of your clients on a specific user segmentation when dealing with billions of events a day ? With over 2 billion ads served and 230Tb of data processed a day, we at Criteo have a comprehensive need for an interactive analytics stack. And by interactive, we mean a querying system with dynamic filtering to drill down over multiple dimensions, answering within sub-second latency. This session will take you on our journey with Druid, ""an open-source data store designed for real-time exploratory analytics on large data sets"". We will explore Druid's architecture and noticeable concepts, how relevant they are for some use cases and how it really performs.
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
Talking about the ease of use and handling Big Data technologies in the Cloud. Using Google Cloud Platform and Amazon Web Services and all of the tools around it.
Showing the problems and how we can solve them with simple tools.
Apache Druid ingests and enables instant query on many billions of events in real-time. But how? In this talk, each of the components of an Apache Druid cluster is described – along with the data and query optimisations at its core – that unlock fresh, fast data for all.
Bio: Peter Marshall (https://linkedin.com/in/amillionbytes/) leads outreach and engineering across Europe for Imply (http://imply.io/), a company founded by the original developers of Apache Druid. He has 20 years architecture experience in CRM, EDRM, ERP, EIP, Digital Services, Security, BI, Analytics, and MDM. He is TOGAF certified and has a BA (hons) degree in Theology and Computer Studies from the University of Birmingham in the United Kingdom.
Peter Marshall, Technology Evangelist at Imply
Abstract: Apache Druid® can revolutionise business decision-making with a view of the freshest of fresh data in web, mobile, desktop, and data science notebooks. In this talk, we look at key activities to integrate into Apache Druid POCs, discussing common hurdles and signposting to important information.
Bio: Peter Marshall (https://petermarshall.io) is an Apache Druid Technology Evangelist at Imply (http://imply.io/), a company founded by original developers of Apache Druid. He has 20 years architecture experience in CRM, EDRM, ERP, EIP, Digital Services, Security, BI, Analytics, and MDM. He is TOGAF certified and has a BA degree in Theology and Computer Studies from the University of Birmingham in the United Kingdom.
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
Ensuring a consistently great Netflix experience while continuously pushing innovative technology updates is no easy feat.
We'll look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field. Including sharing some of the lessons learned around optimizing Druid to handle our load.
Check out the webinar: https://imply.io/videos/whats-new-imply-3-3-apache-druid-0-18
The most recent Imply 3.3 release, based on Apache 0.18 brings several major new features, including joins, query laning and Clarity Alerts. These new features deliver increased design flexibility during design, and provide improved ingestion performance, and sub-second response times to help accelerate data warehouse and data lake deployments, and add real-time analytics in general.
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
In this session, TrafficGuard’s Head of Data Science, Raigon Jolly, will discuss how TrafficGuard uses Druid and its partnership with Imply to:
- Provide granular reporting to clients in near-real time
- Monitor rules and concept drift
- Staying ahead of the moving target that is ad fraud
- Facilitate performance tuning and right-sizing infrastructure so our team can focus on innovation of our core product
Nicolas Trésegnie, Chief Architect at SuperAwesome
Abstract: SuperAwesome's mission is to make the internet safer for kids. At the core of SuperAwesome's analytics is Druid. In this talk, we walk through how we run Druid on spot instances. We explain the consequences in terms of cost and reliability, how we managed to build a reliable system despite the risks, and how you could do the same.
Nicolas works as Chief Architect at SuperAwesome, where is is looking after the overall architecture of the systems and the infrastructure. He is all about automation and how technology can be used to achieve business goals. Nicolas studied Computer Science and Bioinformatics, and he is now pursuing an MBA at Imperial.
Matt Sarrel of Imply draws on his work benchmarking Apache Druid with the Star Schema Benchmark (SSB) and shows how you can performance test Druid with your workload. Virtual meetup of July 16, 2020.
Splunk: Druid on Kubernetes with Druid-operatorImply
We went through the journey of deploying Apache Druid clusters on Kubernetes(K8s) and created a druid-operator (https://github.com/druid-io/druid-operator). This talk introduces the druid kubernetes operator, how to use it to deploy druid clusters and how it works under the hood. We will share how we use this operator to deploy Druid clusters at Splunk.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Druid is a complex stateful distributed system and a Druid cluster consists of multiple web services such as Broker, Historical, Coordinator, Overlord, MiddleManager etc each deployed with multiple replicas. Deploying a single web service on K8s requires creating few K8s resources via YAML files and it multiplies due to multiple services inside of a Druid cluster. Now doing it for multiple Druid clusters (dev, staging, production environments) makes it even more tedious and error prone.
K8s enables creation of application (such as Druid) specific extension, called “Operator”, that combines kubernetes and application specific knowledge into a reusable K8s extension that makes deploying complex applications simple.
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
Benjamin Hopp (Solutions Architect) @ Imply:
Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”-style) on large data sets.
This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit.
Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics.
Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack.
The most important contributor to a fast analytical setup is getting the data model right.
The talk will center around various choices you can make to prepare your data to get best possible query performance.
We’ll look at some general best practices to model your data before ingestion such as OLAP dimensional modeling (called “roll-up” in Druid), data partitioning, and tips for choosing column types and indexes.
We’ll also look at how more can be less: often, storing copies of your data partitioned, sorted, or aggregated in different ways can speed up queries by reducing the amount of computation needed.
We’ll also look at Druid-specific optimizations that take advantage of approximations; where you can trade accuracy for performance and reduced storage.
You’ll get introduced to Druid’s features for approximate counting, set operations, ranking, quantiles, and more.
And we will finish with the latest and greatest Druid news, including details about the latest roadmap and releases.
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.
In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics:
-Network flow use cases and why this data is important.
-Reference architectures from production systems at a major international Bank.
-Why Kafka and Druid and other OSS tools for Network Flows.
-A demo of one such system.
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
How do you monitor performance for one of your clients on a specific user segmentation when dealing with billions of events a day ? With over 2 billion ads served and 230Tb of data processed a day, we at Criteo have a comprehensive need for an interactive analytics stack. And by interactive, we mean a querying system with dynamic filtering to drill down over multiple dimensions, answering within sub-second latency. This session will take you on our journey with Druid, ""an open-source data store designed for real-time exploratory analytics on large data sets"". We will explore Druid's architecture and noticeable concepts, how relevant they are for some use cases and how it really performs.
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
Talking about the ease of use and handling Big Data technologies in the Cloud. Using Google Cloud Platform and Amazon Web Services and all of the tools around it.
Showing the problems and how we can solve them with simple tools.
Infrastructure - a journey from datacentres to cloudEqual Experts
What is infrastructure, and how do I avoid it forever? Where does the software that runs so much of the world, actually run? In this talk, we look at the terms "infrastructure" and "platform", what they currently mean and how they are built and managed; we rant about how bad a metaphor "The Cloud" is; and we speculate wildly about the future for our servers, our planet and ourselves
Interconnection Automation For All - Extended - MPS 2023Chris Grundemann
Matt "Grizz" Griswold and Chris Grundemann are both IX founders, internetworking experts, and automation proponents. With over 4 decades of combined experience they are now turning to sharing what they've learned about automating BGP and interconnection through a set of open source tools, along with support and services for those that need it.
This talk will share what they have learned both from personal experience as well as through dozens of recent interviews with IX operators and interconnection engineers over the past several months. Including common challenges, productive methodologies, and best practices.
The highlight of the talk will be announcing and describing two open source automation tools built to make interconnection and BGP easier for everyone. One is ixCtl, which is built to automate the most common and problematic tasks involved in running an internet exchange point, particularly configuring and managing secure route servers. The other is PeerCtl, which is built to automate the most common and problematic tasks involved in interconnecting an AS; from bilateral and multilateral peering to PNI and also transit connections.
Code for both (along with several other tools) is available on GitHub: https://github.com/fullctl.
Speaker: Chris Grundemann
Speaker: Matt Griswold
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
The GPU Open Analytics Initiative, GOAI, is accelerating data science like never before. CPUs are not improving at the same rate as networking and storage, and leveraging GPUs data scientist can analyze more data than ever with less hardware. Learn more about how GPU are accelerating data science (not just Deep Learning), and how to get started.
Designing a Distributed Cloud Database for DummiesDataStax
Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more.
View the recording: https://youtu.be/azC7lB0QU7E
To explore all DataStax webinars: https://www.datastax.com/resources/webinars
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
Freed from the constraints of storage, network and memory, many big data analytics systems now are routinely revealing themselves to be compute bound. To compensate, big data analytic systems often result in wide horizontal sprawl (300-node Spark or NoSQL clusters are not unusual!)— to bring in enough compute for the task at hand. High system complexity and crushing operational costs often result. As the world shifts from physical to virtual assets and methods of engagement, there is an increasing need for systems of intelligence to live alongside the more traditional systems of record and systems of analysis. New approaches to data processing are required to support the real-time processing of data required to drive these systems of intelligence.
Join 451 Research and Kinetica to learn:
•An overview of the business and technical trends driving widespread interest in real-time analytics
•Why systems of analysis need to be transformed and augmented with systems of intelligence bringing new approaches to data processing
•How a new class of solution—a GPU-accelerated, scale out, in-memory database–can bring you orders of magnitude more compute power, significantly smaller hardware footprint, and unrivaled analytic capabilities.
•Hear how other companies in a variety of industries, such as financial services, entertainment, pharmaceutical, and oil and gas, benefit from augmenting their legacy systems with a modern analytics database.
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineScyllaDB
Numberly operates business-critical data pipelines and applications where failure and latency means "lost money" in the best-case scenario. Most of their data pipelines and applications are deployed on Kubernetes and rely on Kafka and ScyllaDB, with Kafka acting as the message bus and ScyllaDB as the source of data for enrichment. The availability and latency of both systems are thus very important for data pipelines. While most of Numberly’s applications are developed using Python, they found a need to move high-performance applications to Rust in order to benefit from a lower-level programming language.
Learn the lessons from Numberly’s experience, including:
- Rationale in selecting a lower-level language
- Developing using a lower-level Rust code base
- Observability and analyzing latency impacts with Rust
- Tuning everything from Apache Avro to driver client settings
- How to build a mission-critical system combining Apache Kafka and ScyllaDB
- Half a year Rust in production feedback
Big Data in Action – Real-World Solution ShowcaseInside Analysis
The Briefing Room with Radiant Advisors and IBM
Live Webcast on February 25, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=53c9b7fa2000f98f5b236747e3602511
The power of Big Data depends heavily upon the context in which it's used, and most organizations are just beginning to figure out where, how and when to leverage it. One key to success is integration with existing information systems, many of which still rely on relational database technologies. Finding ways to blend these two worlds can help companies generate measurable business value in fairly short order.
Register for this episode of The Briefing Room to hear Analysts Lindy Ryan and John O'Brien as they explain how the combination of traditional Business Intelligence with Big Data Analytics can provide game-changing results in today's information economy. They'll be briefed by Eric Poulin and Paul Flach of Stream Integration who will share best practices for designing and implementing Big Data solutions. They'll discuss the components of IBM BigInsights, and explain how BigSheets can empower non-technical users who need to explore self-structured data.
Visit InsideAnlaysis.com for more information.
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?Chris Swan
As a hacker and engineer I've been interested in identity and privacy since the dawn of the Internet and the online services it's enabled. For the past year I've been helping to build and open source The @ Platform, which inverts the usual model by giving everybody (and every thing) their own place to store data and control who (and what) has access to it. This talk will give an overview of the platform and its underlying protocol, and illustrate how it can be used to build privacy preserving apps and Internet connected things. It will also cover how the platform can be self hosted on devices like the Raspberry Pi, and how people can get involved in the open source community growing around it.
"Industrial Internet IoT bootcamp" meetup, 11-5-2015 hosted by GE Digital at HackerDojo. Discussing topics ranging from IoT architecture to connectivity and protocols, cyber security, data science and industrial UX design.
DevOps and data privacy do not need to oppose each other. Rather, they can complement one another.
The automation and audit trails that DevOps processes introduce to database development can ease compliance with data protection regulations and enable organizations to balance the need to deliver software faster with the requirement to protect and preserve personal data.
So how can the promise of releasing changes to the database faster and easier be balanced with the need to keep data safe and remain compliant with legislation?
Redgate’s Data Privacy and Protection Specialist Chris Unwin, shows how the answer lies in in going one step further than database DevOps and thinking about Compliant Database DevOps:
• Introduce standardized team-based development
• Automate deployments
• Monitor performance and availability
• Protect and preserve data
More than year of extremely intensive Big Data technologies development with Hadoop, HBase, MapReduce and ZooKeeper as key technologies. New company that has established infrastructure which grows pretty fast. Lot of experience in networking and distributed systems but completely new enterprise solutions world. What tasks does this bring? What issue and traps? What lessons were learned and what is considered as near future tasks? How embedded developer can enter this new world and what advantages he or she has? What challenges should you be ready to face?
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2l2Rr6L.
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack. He walks through the decisions they have made at each layer, covers the pros and cons of these decisions and discusses the tooling they have built. Filmed at qconsf.com.
Doug Daniels is a Director of Engineering at Datadog, where he works on high-scale data systems for monitoring, data science, and analytics. Prior to joining Datadog, he was CTO at Mortar Data and an architect and developer at Wireless Generation, where he designed data systems to serve more than 4 million students in 49 states.
Presented on April 14, 2018 at CarolinaCon (https://www.carolinacon.org). This talk will provide a quick overview honeypots, an explanation of the cyber deception space, and the benefits of implementing deception as part of your cyber defense program. In addition, this talk will highlight the HoneyDB project, which enables anyone to get started with operating deception sensors and start collecting threat information. Finally, this presentation will describe how I built scalable honeypot sensor collection, employing a "Frankenstein Cloud Architecture", for minimal cost.
As Uber continues to grow, our big data systems need to grow in scalability, reliability, and performance, to help Uber make business decisions, give user recommendations, and analyze experiments across all data sources. Since 2016, we put Presto in production. Now Presto is serving ~100K queries per day @ Uber, and it becomes a key component for interactive SQL queries on big data. In this presentation, we would like to talk about our experiences and engineering efforts, we start with general introduction about Hadoop Infrastructure & Analytics @ Uber, then comes a brief introduction to Presto, the Interactive SQL engine for big data. We will focus on how we build the New Parquet Reader for Presto, and the detail techniques, Columnar Reads, Lazy Reads, Nested Column Pruning. We will show performance improvements and Uber's Use Cases. Finally, we would like to share our ongoing plan and future work for Big Data Analytics @ Uber.
Pivot 2.0 - The next generation visualization tool for your streaming dataImply
We have rearchitected Pivot from the ground up for enhanced dimensional analysis while ensuring that it is even faster, if that was even possible.
Pivot 2.0 has plenty of new ways for you to visualize your data so that you can figure out the complex relationships between your data and enhanced the comparative analysis to quickly gain insight.
In this webinar, will walk you through the exciting new features that are coming soon to Pivot.
Zeotap: Data Modeling in Druid for Non temporal and Nested DataImply
Druid has been the production workhorse for the past 2+ years at Zeotap powering the core Audience planning across our Connect and Targeting products. Though Druid is best suited for data having time as a dimension as it partitions data based on time first, we have used Druid to serve ML powered enhanced insights and Estimation of potential dataset sizes, to assist us with our core business case of Audience planning. These are datasets without timestamp a.k.a non-temporal with high scale and having nested dimensions. These have been achieved using nuanced data modelling to store the data sets and achieve millisecond latency retrieval on top of the same. The core of the presentation would be on the data modelling journey to achieve these use cases detailing the query access patterns. We also delve upon the architecture - ingestion into druid sink and processing including ML. In the end we go over the production setup and configurations and provide the performance tunings applied. The presentation would have the following heads:
The presentation would have the following heads
* Business case in Ad-Tech and Mar-Tech vertical
* Audience Planner Usecase 1 - Insights
-Lambda Architecture and data flow
-Deep dive on data model
-Takeaways
*Audience Planner Usecase 2 - Estimator
-Architecture and data flow
-Stratified sampling explained
-Data model to solve nested data - deep dive
-Takeaways
*Audience Planner Usecase 3 - Skew correction
-Skew correction model
-Query Access
-Data model in Druid to accommodate output from ML models
-Takeaways
*Production setup, config and Tunings
*Production Operation experience takeaways
Nielsen: Casting the Spell - Druid in PracticeImply
At Nielsen Identity, we leverage Druid to provide our customers with real-time analytics tools for various use-cases, including in-flight analytics, reporting and building target audiences. The common challenge of these use-cases is counting distinct elements in real-time at scale. We’ve been using Druid to solve these problems for the past 4 years, and gained a lot of experience with it.
In this talk, we will share some of the best practices and tips we’ve gathered over the years, including:
*Data modeling
*Ingestion
*Retention and deletion
*Query optimization
Maximizing Apache Druid performance: Beyond the basicsImply
Druid is a powerful real-time database, and part of that power is the level of control you get over cluster configuration, allowing you to get maximum performance for your specific data and query types.
In this talk, Gian Merlino, one of the original authors of Druid and CTO and co-founder of Imply, will walk you through some advanced techniques that can provide a multiplier to your Druid performance. Afterwards, he’ll take your questions about performance, or anything else Druid-related.
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations. As a data-driven organization, we need a data analytics platform that can address the unique needs of each of these various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.
In this talk we’ll cover why Target chose to create our own analytics platform and specifically how Druid makes this platform successful. We’ll cover how we utilize key features in Druid, such as union datasources, arbitrary granularities, real-time ingestion, complex aggregation expressions and lightning-fast query response to provide analytics to users at all levels of the organization. We’ll also cover how Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
As Twitch grew, both the amount of data we received and the number of employees interested in the data grew rapidly. In order to continue empowering decision making as we scaled, we turned to using Druid and Imply to provide self service analytics to both our technical and non technical staff allowing them to drill into high level metrics in lieu of reading generated reports.
In this talk, learn how Twitch implemented a common analytics platform for the needs of many different teams supporting hundreds of users, thousands of queries, and ~5 billion events each day. This session will explain our Druid architecture in detail, including:
-The end-to-end architecture deployed on Amazon that includes Kinesis, RDS, S3, Druid, Pivot and Tableau
-How the data is brought together to deliver a unified view of live customer engagement and historical trends
-Operational best practices we learnt scaling Druid
-An example walk through using the platform
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Imply
Talk abstract:
Users are demanding access to large, multi-petabyte, multi-dimension, real-time datasets to answer business critical questions. Providing a self-service interface that meets the performance expectations of these users can be challenging.
Enter Apache Druid: an open source analytics database powering real-time, ad hoc, lightning fast analytics. It is used for clickstream analytics, network telemetry, fraud detection, application monitoring and so much more by companies like Apple, Netflix, Twitter, and AirBnb. Druid can ingest millions of records per second and deliver sub-second response times on OLAP-style slice and dice queries.
In this talk, we will start with an overview of Apache Druid followed by a look at several examples of how Druid is being used in the real-world. We'll finish up with Q&A and some virtual networking.
Speaker Bio:
Mike McLaughlin is a senior field engineer at Imply. He helps customers run and optimize Apache Druid in production. He has 20 years experience developing, architecting, and deploying software.
Matt Sarrel of Imply draws on his work benchmarking Apache Druid with the Star Schema Benchmark (SSB) and shows how you can performance test Druid with your workload. Virtual meetup of July 16, 2020.
Watch the video: https://www.youtube.com/watch?v=RbwMCy4GsIE
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. Who am I?
2
Gian Merlino
Committer & PMC chair at
Cofounder at (we’re hiring!)
3. 3
Druid Summit 2020
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
talks from...
netflix ✯ twitter ✯ ntt ✯ paypal ✯ cisco ✯ splunk ✯ central bank of
turkey ✯ swisscom ✯ dbs ✯ nielsen ✯ lyft ✯ pinterest ✯ unity ✯ target
✯ expedia ✯ outbrain ✯ verizon ✯ confluent ✯ sk telecom ✯ game
analytics
plus fundamental or advanced training.
November 2-4, 2020
San Francisco Waterfront Marriott
Early Bird Rates Available
druidsummit.org
6. Druid in the wild
6
100+ billion rows/day
1+ trillion rows, 1+ year retained
100s of servers
sub-second to few seconds query latency
mix of streaming and batch ingest
10. Thinking of real-time as “hot”
10
🔥
⏱ 0.1–3s query
🚰 fresh data
🏋♀ high concurrency
🚴 interactive workloads
11. Hot vs. cold
11
🔥
⏱ 0.1–3s query
🚰 fresh data
🏋♀ high concurrency
🚴 highly interactive
⚙
⏱ slow queries are ok
🚰 less fresh data is ok
🏋♀ low concurrency
🚴 reporting / planning
12. How about “warm”?
12
🍞
⏱ 5–30s query
🚰 less fresh data is ok
🏋♀ high concurrency
🚴 somewhat interactive
16. Towards Druid 1.0
◆ Coming together of many efforts
◆ Native batch ingestion
◆ New and improved query engines
◆ SQL support
◆ Stay tuned!
16
17. Stay in touch
17
@druidio
Join the community
(Mailing lists, Slack, meetups)
https://druid.apache.org/community/
Follow the Druid project on Twitter!
18. Time for questions
@gianmerlino
18
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
19. 19
Register Now for
Druid Summit
November 2-4, 2020
San Francisco, CA
druidsummit.org
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.