Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"

•

1 like•166 views

This document discusses Apache Druid, an open-source distributed real-time analytics database. It summarizes Druid's evolution, architecture, use cases, and how companies use it. The document outlines Druid's ability to handle large, high-dimensional datasets with sub-second queries and discusses its core components like segments for efficient storage and parallelism. It concludes by inviting the reader to join the Druid community.

Software

Who Am I
!2
Rommel Garcia
Director, Field Engineering @Imply
Author: Virtualizing Hadoop
10+ years: distributed systems, big data, security, cloud, gpu

Agenda
• Evolution of analytic platforms
• Yet, decision makers wants more
• The technical challenges
• Apache Druid: The Genesis
• Architecture
• Real-time Use Cases
• Powered by Druid
• Join the community!

Yet, decision makers wants more
!5
Still has problems to solve:
• can’t get data fast enough
• interacting with data instantly is tough
• large amount of data to slice and dice, drill down
• need to make decisions now

The technical challenges
!6
• Scale: when data is large, we need a lot of servers
• Speed: aiming for sub-second response time
• Complexity: too much ﬁne grain to precompute
• High dimensionality: 10s or 100s of dimensions
• Concurrency: many users and tenants
• Freshness: load from streams

Apache Druid: The Genesis
!7
Vadim Ogievetsky Gian Merlino Fangjin Yang

Segment
!9
▸ Highly optimized storage unit
▸ Highly compressed bitmap indexes
▸ 150MB - 700MB size
▸ Determines parallelism
▸ Read in memory
▸ No contentions between read and writes
▸ 10x - 75x storage space savings

Real-time Use Cases
!10
• Quality of experience
• Increasing production yield
• Cost to serve
• Pricing optimization
• Ad campaign performance
• Customer behavior analysis
• Netflow performance
• APM
• Security

Powered by Druid
!11
Source: http://druid.io/druid-powered.html

Join the community
!12
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started

What's hot

Real-Time Analytics in Transactional Applications by Brian BulkowskiData Con LA

Apache Druid Vision and RoadmapImply

Building Pinterest Real-Time Ads Platform Using Kafka Streams confluent

Benchmarking Apache Druid Matt Sarrel

British Gas Connected Homes: Data EngineeringDataStax Academy

DataStax Enterprise in Practice (Field Notes)DataStax

Netflix Big Data Paris 2017Jason Flittner

Data Modeling Basics for the Cloud with DataStaxDataStax

Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterSpark Summit

Discover some "Big Data" architectural concepts with Redis Maturin BADO

Managing Cassandra Databases with OpenStack TroveTesora

Lambda architectureMario Alexandro Santini

Building the Foundation for a Latency-Free LifeSingleStore

Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore

Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...VMware Tanzu

Alluxio Data Orchestration Platform for the CloudShubham Tagra

Big Data at Tube: Events to Insights to ActionMurtaza Doctor

Architecting Data in the AWS EcosystemSingleStore

RubiXShubham Tagra

An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...Databricks

What's hot (20)

Real-Time Analytics in Transactional Applications by Brian Bulkowski

Apache Druid Vision and Roadmap

Building Pinterest Real-Time Ads Platform Using Kafka Streams

Benchmarking Apache Druid

British Gas Connected Homes: Data Engineering

DataStax Enterprise in Practice (Field Notes)

Netflix Big Data Paris 2017

Data Modeling Basics for the Cloud with DataStax

Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master

Discover some "Big Data" architectural concepts with Redis

Managing Cassandra Databases with OpenStack Trove

Lambda architecture

Building the Foundation for a Latency-Free Life

Getting It Right Exactly Once: Principles for Streaming Architectures

Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...

Alluxio Data Orchestration Platform for the Cloud

Big Data at Tube: Events to Insights to Action

Architecting Data in the AWS Ecosystem

RubiX

An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...

Similar to Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"

What ya gonna do?CQD

Tech lab 2016-ep01-pepper-data-dez-slides-20160303-finalDez Blanchfield

PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.

Hadoop As The Platform For The Smartgrid At TVACloudera, Inc.

Intro to Big DataZohar Elkayam

Chirp 2010: Scaling TwitterJohn Adams

The Hadoop Ecosystem for DevelopersZohar Elkayam

Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam

Soft-Shake 2013 : Enabling Realtime Queries to End UsersBenoit Perroud

Mapping Life Science Informatics to the CloudChris Dagdigian

Tackling complexity in giant systems: approaches from several cloud providersPatrick Chanezon

Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere

Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge

50 Shades of SQLDataWorks Summit

Getting Started with Big Data in the CloudRightScale

How Open Source is Transforming the Internet. Again.Steve Hoffman

Dibi Conference 2012Scott Rutherford

Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr

Similar to Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making" (20)

What ya gonna do?

Tech lab 2016-ep01-pepper-data-dez-slides-20160303-final

PyData: The Next Generation | Data Day Texas 2015

Hadoop As The Platform For The Smartgrid At TVA

Intro to Big Data

Chirp 2010: Scaling Twitter

The Hadoop Ecosystem for Developers

Rapid Cluster Computing with Apache Spark 2016

Soft-Shake 2013 : Enabling Realtime Queries to End Users

Mapping Life Science Informatics to the Cloud

Tackling complexity in giant systems: approaches from several cloud providers

Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...

Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa

50 Shades of SQL

Getting Started with Big Data in the Cloud

How Open Source is Transforming the Internet. Again.

Dibi Conference 2012

Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...

Recently uploaded

SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed

Advanced Flow Concepts Every Developer Should KnowPeter Caitens

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171

Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl

top nidhi software solution freedownloadvrstrong314

Vitthal Shirke Microservices Resume MontevideoVitthal Shirke

De mooiste recreatieve routes ontdekken met RouteYou en FMEJelle | Nordend

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei

How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800

Accelerate Enterprise Software Engineering with PlatformlessWSO2

AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.

Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology

GraphAware - Transforming policing with graph-based intelligence analysisNeo4j

Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILNatan Silnitsky

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.

Recently uploaded (20)

SOCRadar Research Team: Latest Activities of IntelBroker

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1

Advanced Flow Concepts Every Developer Should Know

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf

Agnieszka Andrzejewska - BIM School Course in Kraków

top nidhi software solution freedownload

Vitthal Shirke Microservices Resume Montevideo

De mooiste recreatieve routes ontdekken met RouteYou en FME

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

Accelerate Enterprise Software Engineering with Platformless

AI/ML Infra Meetup | Perspective on Deep Learning Framework

Breaking the Code : A Guide to WhatsApp Business API.pdf

GraphAware - Transforming policing with graph-based intelligence analysis

Cyaniclab : Software Development Agency Portfolio.pdf

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"

1. Rommel Garcia rommel.garcia@imply.io

2. Who Am I !2 Rommel Garcia Director, Field Engineering @Imply Author: Virtualizing Hadoop 10+ years: distributed systems, big data, security, cloud, gpu

3. Agenda • Evolution of analytic platforms • Yet, decision makers wants more • The technical challenges • Apache Druid: The Genesis • Architecture • Real-time Use Cases • Powered by Druid • Join the community!

4. Evolution of analytic platforms !4

5. Yet, decision makers wants more !5 Still has problems to solve: • can’t get data fast enough • interacting with data instantly is tough • large amount of data to slice and dice, drill down • need to make decisions now

6. The technical challenges !6 • Scale: when data is large, we need a lot of servers • Speed: aiming for sub-second response time • Complexity: too much ﬁne grain to precompute • High dimensionality: 10s or 100s of dimensions • Concurrency: many users and tenants • Freshness: load from streams

7. Apache Druid: The Genesis !7 Vadim Ogievetsky Gian Merlino Fangjin Yang

8. Architecture !8

9. Segment !9 ▸ Highly optimized storage unit ▸ Highly compressed bitmap indexes ▸ 150MB - 700MB size ▸ Determines parallelism ▸ Read in memory ▸ No contentions between read and writes ▸ 10x - 75x storage space savings

10. Real-time Use Cases !10 • Quality of experience • Increasing production yield • Cost to serve • Pricing optimization • Ad campaign performance • Customer behavior analysis • Netflow performance • APM • Security

11. Powered by Druid !11 Source: http://druid.io/druid-powered.html

12. Join the community !12 Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started

13. Try this at home 13

Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"

Similar to Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making" (20)

More from Rommel Garcia

More from Rommel Garcia (10)

Recently uploaded

Recently uploaded (20)

Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"