Elasticsearch (not just for text search)

Aleck Landgraf
Aleck LandgrafVP Software Engineering and Co-founder at Building Energy Inc.
Elasticsearch
(not just for text search)
Aleck Landgraf
@aleck_landgraf
buildingenergy.com
Buildings use a LOT of energy
• Buildings use more energy than any other sector in the US!
• 23% wasted energy*
• $1.2 Trillion wasted
• 40% of GHG wasted(1.1 gigatons annually)**
• What’s the miles per gallon of your office building?
• So how are buildings like mine performing?
• How are my peers’ buildings performing?
*McKinsey & Co: “Unlocking energy efficiency in the US economy”
**equivalent to the entire US fleet of passenger vehicles and lights trucks
The Buildings Performance
Database
• With the US DOE, LBNL, we make one of the largest
datasets of building data available (by statistical methods)
• Developer API which enables people to build their own
visualizations and develop fully customized applications
• Expose the DOE Building Energy Performance Taxonomy
through “filters”, the standard for describing buildings
• Provide a decision support tool
• 755k buildings +
A Histogram Illustration
/analyze/peers/
Why Elasticseach?
• We were choking on data with our previous solution
• It’s not just for text search
• Fast access to a denormalized set of data
• django-haystack integration into our Django stack
• It’s built to scale!
• Aggs!
Elasticsearch Aggregations
• stats aggregation
• percentile aggregation
• histogram aggregation
• facet counts
stats aggregation
• min, max, std dev, determines bin width
{
"aggs" : {
"eui_stats" : { "stats" : { "field" : "eui" } }
}
}
{
...
!
"aggregations": {
"eui_stats": {
"count": 2194,
"min": 0,
"max": 120,
"avg": 55.8,
"sum": 122425.2
}
}
}
percentile aggregation
• quartiles, median (the 0th and 100th quartiles from stats)
{
"aggs" : {
"eui_quartiles" : {
"percentiles" : {
"field" : "eui",
"percents" : [25, 50, 75]
}
}
}
}
{
...
!
"aggregations": {
"eui_quartiles": {
"values" : {
"25.0": 40,
"50.0": 60,
"75.0": 85
}
}
}
}
histogram aggregation
• EUI histogram
{
"aggs" : {
“eui_histogram" : {
"histogram" : {
"field" : "eui",
"interval" : 10
}
}
}
}
{
"aggregations": {
“eui_histogram" : {
"buckets": [
{
"key": 0,
"doc_count": 57
},
{
"key": 10,
"doc_count": 93
},
...
Elasticsearch Aggregations
• stats aggregation (min, max, std dev, determines bin width)
• percentile aggregation (quartiles, median)
• histogram aggregation (counts per EUI range)
Learning curve
• Custom ES backend for django-haystack to add the new ES
features, hope these make it to haystack someday
• Three queries per search to get stats, percentiles, and
histogram. Room for improvement/ES scripts
• Easy to set up in dev and prod, django-haystack keeps ES
and postgres in sync.
• An order of magnitude speed improvement :-)
Thanks!
buildingenergy.com
Questions/Comments?
@aleck_landgraf
1 of 12

Recommended

Cahier des charges PINS version 1 (CCTP) by
Cahier des charges PINS version 1 (CCTP)Cahier des charges PINS version 1 (CCTP)
Cahier des charges PINS version 1 (CCTP)Jaime Larivedroite
1K views46 slides
Facet and Search API by
Facet and Search APIFacet and Search API
Facet and Search APICaroline Achee
4K views29 slides
Introduction to Elasticsearch by
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
7.6K views75 slides
Kts c5-pld by
Kts c5-pldKts c5-pld
Kts c5-pldWang Ruan
293 views16 slides
project by
projectproject
projectSASTRA UNIVERSITY
274 views19 slides
The curious case of missing corporate profitability by
The curious case of missing corporate profitabilityThe curious case of missing corporate profitability
The curious case of missing corporate profitabilityAshutosh Bhargava
308 views4 slides

More Related Content

Viewers also liked

Kts c3-he to hop by
Kts c3-he to hopKts c3-he to hop
Kts c3-he to hopWang Ruan
667 views34 slides
46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng by
46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng
46/2015/NĐ-CP về quản lý chất lượng công trình xây dựngMèo Hoang
545 views57 slides
trabajo de ingenieria by
trabajo de ingenieriatrabajo de ingenieria
trabajo de ingenieriaTrevor_Jordikson
334 views5 slides
Kts c1-he thong so by
Kts c1-he thong soKts c1-he thong so
Kts c1-he thong soWang Ruan
257 views34 slides
Intertextuality by
IntertextualityIntertextuality
Intertextualitytyoxall23
494 views6 slides
Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe... by
Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe...Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe...
Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe...Accelify
1.2K views46 slides

Viewers also liked(13)

Kts c3-he to hop by Wang Ruan
Kts c3-he to hopKts c3-he to hop
Kts c3-he to hop
Wang Ruan667 views
46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng by Mèo Hoang
46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng
46/2015/NĐ-CP về quản lý chất lượng công trình xây dựng
Mèo Hoang545 views
Kts c1-he thong so by Wang Ruan
Kts c1-he thong soKts c1-he thong so
Kts c1-he thong so
Wang Ruan257 views
Intertextuality by tyoxall23
IntertextualityIntertextuality
Intertextuality
tyoxall23494 views
Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe... by Accelify
Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe...Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe...
Better Goals for Better Outcomes: Best Practices for Writing Measurable, Effe...
Accelify1.2K views
Het leven samenvouwen (Vastenwerking 2015) by Biesbrouck Peter
Het leven samenvouwen (Vastenwerking 2015)Het leven samenvouwen (Vastenwerking 2015)
Het leven samenvouwen (Vastenwerking 2015)
Biesbrouck Peter394 views
抄,是最好的獲利模式(全) by Chris Chang
抄,是最好的獲利模式(全)抄,是最好的獲利模式(全)
抄,是最好的獲利模式(全)
Chris Chang739 views
Indian Economy: The Curious Case of Household Savings-Investment Gap by Ashutosh Bhargava
Indian Economy: The Curious Case of Household Savings-Investment GapIndian Economy: The Curious Case of Household Savings-Investment Gap
Indian Economy: The Curious Case of Household Savings-Investment Gap
Ashutosh Bhargava352 views
ADME And Toxicity Optimization Services by thomas shaw
ADME And Toxicity Optimization ServicesADME And Toxicity Optimization Services
ADME And Toxicity Optimization Services
thomas shaw647 views
ESSA Overview by Accelify
ESSA OverviewESSA Overview
ESSA Overview
Accelify511 views

Recently uploaded

CRIJ4385_Death Penalty_F23.pptx by
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
6 views24 slides
MOSORE_BRESCIA by
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 slides
ColonyOS by
ColonyOSColonyOS
ColonyOSJohanKristiansson6
9 views17 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
UNEP FI CRS Climate Risk Results.pptx by
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 slides
PROGRAMME.pdf by
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdfHiNedHaJar
18 views13 slides

Recently uploaded(20)

CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Understanding Hallucinations in LLMs - 2023 09 29.pptx by Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski17 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
Supercharging your Data with Azure AI Search and Azure OpenAI by Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials11 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann111 views
Data structure and algorithm. by Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 19 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
RuleBookForTheFairDataEconomy.pptx by noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views

Elasticsearch (not just for text search)

  • 1. Elasticsearch (not just for text search) Aleck Landgraf @aleck_landgraf buildingenergy.com
  • 2. Buildings use a LOT of energy • Buildings use more energy than any other sector in the US! • 23% wasted energy* • $1.2 Trillion wasted • 40% of GHG wasted(1.1 gigatons annually)** • What’s the miles per gallon of your office building? • So how are buildings like mine performing? • How are my peers’ buildings performing? *McKinsey & Co: “Unlocking energy efficiency in the US economy” **equivalent to the entire US fleet of passenger vehicles and lights trucks
  • 3. The Buildings Performance Database • With the US DOE, LBNL, we make one of the largest datasets of building data available (by statistical methods) • Developer API which enables people to build their own visualizations and develop fully customized applications • Expose the DOE Building Energy Performance Taxonomy through “filters”, the standard for describing buildings • Provide a decision support tool • 755k buildings +
  • 5. Why Elasticseach? • We were choking on data with our previous solution • It’s not just for text search • Fast access to a denormalized set of data • django-haystack integration into our Django stack • It’s built to scale! • Aggs!
  • 6. Elasticsearch Aggregations • stats aggregation • percentile aggregation • histogram aggregation • facet counts
  • 7. stats aggregation • min, max, std dev, determines bin width { "aggs" : { "eui_stats" : { "stats" : { "field" : "eui" } } } } { ... ! "aggregations": { "eui_stats": { "count": 2194, "min": 0, "max": 120, "avg": 55.8, "sum": 122425.2 } } }
  • 8. percentile aggregation • quartiles, median (the 0th and 100th quartiles from stats) { "aggs" : { "eui_quartiles" : { "percentiles" : { "field" : "eui", "percents" : [25, 50, 75] } } } } { ... ! "aggregations": { "eui_quartiles": { "values" : { "25.0": 40, "50.0": 60, "75.0": 85 } } } }
  • 9. histogram aggregation • EUI histogram { "aggs" : { “eui_histogram" : { "histogram" : { "field" : "eui", "interval" : 10 } } } } { "aggregations": { “eui_histogram" : { "buckets": [ { "key": 0, "doc_count": 57 }, { "key": 10, "doc_count": 93 }, ...
  • 10. Elasticsearch Aggregations • stats aggregation (min, max, std dev, determines bin width) • percentile aggregation (quartiles, median) • histogram aggregation (counts per EUI range)
  • 11. Learning curve • Custom ES backend for django-haystack to add the new ES features, hope these make it to haystack someday • Three queries per search to get stats, percentiles, and histogram. Room for improvement/ES scripts • Easy to set up in dev and prod, django-haystack keeps ES and postgres in sync. • An order of magnitude speed improvement :-)