Title: Druid: under the covers
Speaker: Peter Marshall (https://linkedin.com/in/amillionbytes/)
Date: Tuesday, January 28, 2020
Event: https://meetup.com/Athens-Big-Data/events/266900242/
2. Your photo here
https://www.linkedin.com/in/amillionbytes
peter.marshall@imply.io
20 years of enterprise architecture experience
CRM, EDRM, ERP, EIP, Digital Services, Security, BI,
Analytics, and MDM. TOGAF certified and a BA (hons)
degree in Theology (!) and Computer Studies from the
University of Birmingham in the United Kingdom.
Peter Marshall
Field Engineering
4. Mindful
I want to use my instinct and
skills to make decisions while
being grounded in the
knowledge of my troop’s
experiences in the past and
their critique of my conclusions.
Digital
My world creates,
and is powered by,
data at a massive scale.
I have a hundreds of apps on
tens of devices, each one for
different uses.
Aware
Data is powerful. Limitations,
whether in breadth, depth, or
timeliness, for me or for my troop,
will very quickly turn into a
personal concern about liberty.
Agile
If I don’t like what you’ve
got, I will go to someone
else from somewhere
else.
6. conversational
experience
complete picture
fresh, frequent,
and fast
ad-hoc, arbitrary
slice-and-dice
responsive
user interface
real-time,
event-based
clean, enriched, joined,
and normalised
many dimensions with
many values likely
multiple and varied
statistical metrics
voluminous
& valuable
11. Largest wireless carrier in South
Korea, with 50m+ connections
2 billion events per week just from
mobile apps, 60TB per data from
server infrastructure, 720M records
per day, 24 hours a day
Dashboarding and visualizations,
including geoSpatial via Lucene and
GeoTools, built on top
Queries by Data Scientists through
Jupyter using PyDruid
https://www.slideshare.net/kyungtaak/2019-strata-self-sevice-bi-meets-geospatial-analysis
12. 23 billions records per day with
more sources and greater volumes
coming (50+ billion…)
Enrichment and session windowing
through Apache Flink etc, feeding
into Kafka and flowing into Druid for
use by BI and user apps
https://www.youtube.com/watch?v=zO1Lw7QVwFw
13. Growth from 34TB to 60PB, with
700 billion (1 trillion peak) events
every day, into their S3 data
warehouse - covering all
subscriber activities - logging in,
content watched, play / stop...
Druid provides more dimensions
and longer data retention, plus
instant statistical slice and dice.
AWS Capacity planning, A / B
testing, payments analysis,
algorithm comparison, security,
and quality of experience
https://www.youtube.com/watch?v=Qvhqe4yUKpw
14. Sensors generate billions of
streaming event data
Druid, Spark, and Kafka embedded
Tetration eases configuration,
security hardening, data centre
migrations, and DevOps with
machine and human analytics
https://www.networkworld.com/article/3086250/under-the-hood-of-cisco-s-tetration-analytics-platform.html
https://www.youtube.com/watch?v=UL3oywg0o58
16. Manages data for over 75,000
game developers
Lets studios see metrics in real-time
Allows for multi-dimensional filtering
Ingests 10 billion events per day
Opened up more opportunities to
build advanced products
https://www.slideshare.net/secret/fqnmIUXdDkrB7d
20. Central bank responsible for
stability, exchange rates, and
payment systems
Need to monitor market liquidity in
real-time and spot risks to inform
policy decisions
Used Druid to combine historical
and real-time data and provide
up-to-the-minute dashboards and a
queryable data set
https://www.youtube.com/watch?v=o0JsVwFEdtA
“Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid”