THE POWER OF ELASTIC
AGGREGATIONS
DAN FEY
ABOUT DAN FEY
▸ Grew up in northern NJ
▸ Back-end engineer at Crowdskout in DC
▸ Work on API and data layers in PHP using Laravel
▸ Work with MySQL, Mongo, and Elasticsearch
CROWDSKOUT
▸ Data collection, analytics, and outreach platform
▸ Collects, normalizes, and matches data for customers
▸ Tools for segmenting audiences, building dynamic
charting, and acting on segments
IN THIS PRESENTATION
▸ Brief introduction to Elasticsearch
▸ Practical examples of aggregations
▸ Crowdskout’s journey to provide dynamic real-time
charting among millions of records within a web request
CHARTS SCREENSHOT
ELASTICSEARCH
Elasticsearch is a distributed, RESTful search and
analytics engine capable of solving a growing number
of use cases.
ELASTICSEARCH CONCEPTS
• Cluster - Collection of one or more nodes
• Node - One server - stores/indexes data, serves queries
• Index - Collection of documents with mappings
• Type - Category of documents within an index
• Document - JSON unit of information
• Shards & Replicas - Subdivision of an index and copies
ELASTIC AGGREGATIONS
▸ Somewhat similar to SQL Group By
▸ Can perform operations on large datasets:
▸ Terms - unique terms with counts
▸ Range - counts within a number range
▸ Date Histogram - counts by a given time interval
▸ Sum/Average/Statistics - performed on the given
dataset
ELASTIC AGGREGATIONS
▸ Enabled by stored columnar document values
▸ Can be nested within each other*
INDEX MAPPING DOCUMENT
MY INTRODUCTION TO ELASTICSEARCH
▸ Started at Crowdskout over a year and a half ago
▸ Used search queries to create segments of profiles
▸ Used terms aggregations to get unique string options for
profile fields
▸ Wanted to provide charting capabilities for our data points
SEARCH QUERY EXAMPLE
SELECT COUNT(*) FROM genders WHERE value = 'female'
TERMS AGGREGATION EXAMPLE
SELECT value, COUNT(*) FROM genders GROUP BY value
FIRST - SIMPLE CHARTING
▸ Provide simple charting data while supporting segment
querying
▸ This meant combining our segment search queries with
our options terms aggregations
COMBINING SEARCH AND SEGMENT QUERIES
Count of profiles by gender for people with an
undergraduate education
SELECT genders.value, COUNT(*)
FROM genders
JOIN educations USING (profile_id)
WHERE educations.level = "undergraduate"
GROUP BY genders.value
NEXT - DATE HISTOGRAM SUPPORT
▸ We needed to replace the terms aggregation with a date
histogram aggregation
▸ We also needed to add a filter aggregation to the date to
limit the time period
DATE HISTOGRAM QUERY
▸ Count page views by month
SELECT when, MONTH(when) AS month, COUNT(*)
FROM pageviews
WHERE when >= "2017-06-01"
GROUP BY month
MANAGING COMPLEX REQUESTS AND RESPONSES
▸ Built a library: https://github.com/crowdskout/es-search-
builder
▸ Simpler, less verbose queries
▸ Assists in building aggregations and parsing results
ES-SEARCH-BUILDER QUERY
ES-SEARCH-BUILDER AGG
USE ES-SEARCH-BUILDER WITH ELASTICSEARCH-PHP
NEXT - COMPARING TWO COLLECTIONS
▸ What if we want to view Education within Gender
▸ This required nesting two terms aggregations
EXAMPLE REQUEST
EXAMPLE RESPONSE
THERE’S NO NESTING LIMIT IN AGGREGATIONS
OTHER AGGREGATIONS
▸ Geo Distance, Geo Bounds
▸ Histogram, Range, Date Range
▸ Min, Max, Percentile
▸ Scripted
THE VALUE OF ELASTIC AGGREGATIONS
▸ Performant on large datasets for a wide variety of dynamic
charting
▸ Charts can be requested frequently, making them real-
time and always up to date
▸ Customers can build their own charts through a simple UI
and get immediate results
▸ After charts are built, customers can apply filters using
dates or segment queries
SOME DIFFICULTIES
▸ Keeping Elasticsearch up to date with the databases of record
▸ Complex nested aggregations tricky to get right
▸ Some aggregations are less performant
▸ High entropy fields (lots of unique values)
▸ Very large amount of documents (i.e. billions)
▸ Not sure the query limits, i.e. requests per hour, number of
concurrent requests
QUESTIONS
CROWDSKOUT CHARTS DEMO
CROWDSKOUT QUERY LANGUAGE
▸ Simple SQL-like language for Elasticsearch queries
▸ (Gender = "Male") AND (EducationLevel = "Graduate")
CROWDSKOUT TRAIT QUERIES FOR CHARTING
▸ Trait queries use Crowdskout Query Language criteria
TRAITS DEMO

The Power of Elastic Aggregations

  • 1.
    THE POWER OFELASTIC AGGREGATIONS DAN FEY
  • 2.
    ABOUT DAN FEY ▸Grew up in northern NJ ▸ Back-end engineer at Crowdskout in DC ▸ Work on API and data layers in PHP using Laravel ▸ Work with MySQL, Mongo, and Elasticsearch
  • 3.
    CROWDSKOUT ▸ Data collection,analytics, and outreach platform ▸ Collects, normalizes, and matches data for customers ▸ Tools for segmenting audiences, building dynamic charting, and acting on segments
  • 4.
    IN THIS PRESENTATION ▸Brief introduction to Elasticsearch ▸ Practical examples of aggregations ▸ Crowdskout’s journey to provide dynamic real-time charting among millions of records within a web request
  • 5.
  • 6.
    ELASTICSEARCH Elasticsearch is adistributed, RESTful search and analytics engine capable of solving a growing number of use cases.
  • 7.
    ELASTICSEARCH CONCEPTS • Cluster- Collection of one or more nodes • Node - One server - stores/indexes data, serves queries • Index - Collection of documents with mappings • Type - Category of documents within an index • Document - JSON unit of information • Shards & Replicas - Subdivision of an index and copies
  • 8.
    ELASTIC AGGREGATIONS ▸ Somewhatsimilar to SQL Group By ▸ Can perform operations on large datasets: ▸ Terms - unique terms with counts ▸ Range - counts within a number range ▸ Date Histogram - counts by a given time interval ▸ Sum/Average/Statistics - performed on the given dataset
  • 9.
    ELASTIC AGGREGATIONS ▸ Enabledby stored columnar document values ▸ Can be nested within each other*
  • 10.
  • 11.
    MY INTRODUCTION TOELASTICSEARCH ▸ Started at Crowdskout over a year and a half ago ▸ Used search queries to create segments of profiles ▸ Used terms aggregations to get unique string options for profile fields ▸ Wanted to provide charting capabilities for our data points
  • 12.
    SEARCH QUERY EXAMPLE SELECTCOUNT(*) FROM genders WHERE value = 'female'
  • 13.
    TERMS AGGREGATION EXAMPLE SELECTvalue, COUNT(*) FROM genders GROUP BY value
  • 14.
    FIRST - SIMPLECHARTING ▸ Provide simple charting data while supporting segment querying ▸ This meant combining our segment search queries with our options terms aggregations
  • 16.
    COMBINING SEARCH ANDSEGMENT QUERIES Count of profiles by gender for people with an undergraduate education SELECT genders.value, COUNT(*) FROM genders JOIN educations USING (profile_id) WHERE educations.level = "undergraduate" GROUP BY genders.value
  • 18.
    NEXT - DATEHISTOGRAM SUPPORT ▸ We needed to replace the terms aggregation with a date histogram aggregation ▸ We also needed to add a filter aggregation to the date to limit the time period
  • 20.
    DATE HISTOGRAM QUERY ▸Count page views by month SELECT when, MONTH(when) AS month, COUNT(*) FROM pageviews WHERE when >= "2017-06-01" GROUP BY month
  • 22.
    MANAGING COMPLEX REQUESTSAND RESPONSES ▸ Built a library: https://github.com/crowdskout/es-search- builder ▸ Simpler, less verbose queries ▸ Assists in building aggregations and parsing results
  • 23.
  • 24.
  • 25.
    USE ES-SEARCH-BUILDER WITHELASTICSEARCH-PHP
  • 26.
    NEXT - COMPARINGTWO COLLECTIONS ▸ What if we want to view Education within Gender ▸ This required nesting two terms aggregations
  • 28.
  • 29.
  • 31.
    THERE’S NO NESTINGLIMIT IN AGGREGATIONS
  • 32.
    OTHER AGGREGATIONS ▸ GeoDistance, Geo Bounds ▸ Histogram, Range, Date Range ▸ Min, Max, Percentile ▸ Scripted
  • 33.
    THE VALUE OFELASTIC AGGREGATIONS ▸ Performant on large datasets for a wide variety of dynamic charting ▸ Charts can be requested frequently, making them real- time and always up to date ▸ Customers can build their own charts through a simple UI and get immediate results ▸ After charts are built, customers can apply filters using dates or segment queries
  • 34.
    SOME DIFFICULTIES ▸ KeepingElasticsearch up to date with the databases of record ▸ Complex nested aggregations tricky to get right ▸ Some aggregations are less performant ▸ High entropy fields (lots of unique values) ▸ Very large amount of documents (i.e. billions) ▸ Not sure the query limits, i.e. requests per hour, number of concurrent requests
  • 35.
  • 36.
  • 37.
    CROWDSKOUT QUERY LANGUAGE ▸Simple SQL-like language for Elasticsearch queries ▸ (Gender = "Male") AND (EducationLevel = "Graduate")
  • 38.
    CROWDSKOUT TRAIT QUERIESFOR CHARTING ▸ Trait queries use Crowdskout Query Language criteria
  • 39.