Practical examples written in PHP using nested aggregations to quickly segment and provide data for charting among millions of records within a web request.
ABOUT DAN FEY
▸Grew up in northern NJ
▸ Back-end engineer at Crowdskout in DC
▸ Work on API and data layers in PHP using Laravel
▸ Work with MySQL, Mongo, and Elasticsearch
3.
CROWDSKOUT
▸ Data collection,analytics, and outreach platform
▸ Collects, normalizes, and matches data for customers
▸ Tools for segmenting audiences, building dynamic
charting, and acting on segments
4.
IN THIS PRESENTATION
▸Brief introduction to Elasticsearch
▸ Practical examples of aggregations
▸ Crowdskout’s journey to provide dynamic real-time
charting among millions of records within a web request
ELASTICSEARCH
Elasticsearch is adistributed, RESTful search and
analytics engine capable of solving a growing number
of use cases.
7.
ELASTICSEARCH CONCEPTS
• Cluster- Collection of one or more nodes
• Node - One server - stores/indexes data, serves queries
• Index - Collection of documents with mappings
• Type - Category of documents within an index
• Document - JSON unit of information
• Shards & Replicas - Subdivision of an index and copies
8.
ELASTIC AGGREGATIONS
▸ Somewhatsimilar to SQL Group By
▸ Can perform operations on large datasets:
▸ Terms - unique terms with counts
▸ Range - counts within a number range
▸ Date Histogram - counts by a given time interval
▸ Sum/Average/Statistics - performed on the given
dataset
MY INTRODUCTION TOELASTICSEARCH
▸ Started at Crowdskout over a year and a half ago
▸ Used search queries to create segments of profiles
▸ Used terms aggregations to get unique string options for
profile fields
▸ Wanted to provide charting capabilities for our data points
FIRST - SIMPLECHARTING
▸ Provide simple charting data while supporting segment
querying
▸ This meant combining our segment search queries with
our options terms aggregations
16.
COMBINING SEARCH ANDSEGMENT QUERIES
Count of profiles by gender for people with an
undergraduate education
SELECT genders.value, COUNT(*)
FROM genders
JOIN educations USING (profile_id)
WHERE educations.level = "undergraduate"
GROUP BY genders.value
18.
NEXT - DATEHISTOGRAM SUPPORT
▸ We needed to replace the terms aggregation with a date
histogram aggregation
▸ We also needed to add a filter aggregation to the date to
limit the time period
20.
DATE HISTOGRAM QUERY
▸Count page views by month
SELECT when, MONTH(when) AS month, COUNT(*)
FROM pageviews
WHERE when >= "2017-06-01"
GROUP BY month
22.
MANAGING COMPLEX REQUESTSAND RESPONSES
▸ Built a library: https://github.com/crowdskout/es-search-
builder
▸ Simpler, less verbose queries
▸ Assists in building aggregations and parsing results
OTHER AGGREGATIONS
▸ GeoDistance, Geo Bounds
▸ Histogram, Range, Date Range
▸ Min, Max, Percentile
▸ Scripted
33.
THE VALUE OFELASTIC AGGREGATIONS
▸ Performant on large datasets for a wide variety of dynamic
charting
▸ Charts can be requested frequently, making them real-
time and always up to date
▸ Customers can build their own charts through a simple UI
and get immediate results
▸ After charts are built, customers can apply filters using
dates or segment queries
34.
SOME DIFFICULTIES
▸ KeepingElasticsearch up to date with the databases of record
▸ Complex nested aggregations tricky to get right
▸ Some aggregations are less performant
▸ High entropy fields (lots of unique values)
▸ Very large amount of documents (i.e. billions)
▸ Not sure the query limits, i.e. requests per hour, number of
concurrent requests