Elasticsearch (not just for text search)

3,701 views

Published on

Using Elasticsearch to serve the largest data base of building performance and attribute data in the world. Working with the U.S. Department of Energy and Lawrence Berkeley National Laboratory, Building Energy(buildingenergy.com) developed and manages the largest database in the world of building performance and attribute data and provide statistical and analytical methods via a web app and API.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Elasticsearch (not just for text search)

  1. 1. Elasticsearch (not just for text search) Aleck Landgraf @aleck_landgraf buildingenergy.com
  2. 2. Buildings use a LOT of energy • Buildings use more energy than any other sector in the US! • 23% wasted energy* • $1.2 Trillion wasted • 40% of GHG wasted(1.1 gigatons annually)** • What’s the miles per gallon of your office building? • So how are buildings like mine performing? • How are my peers’ buildings performing? *McKinsey & Co: “Unlocking energy efficiency in the US economy” **equivalent to the entire US fleet of passenger vehicles and lights trucks
  3. 3. The Buildings Performance Database • With the US DOE, LBNL, we make one of the largest datasets of building data available (by statistical methods) • Developer API which enables people to build their own visualizations and develop fully customized applications • Expose the DOE Building Energy Performance Taxonomy through “filters”, the standard for describing buildings • Provide a decision support tool • 755k buildings +
  4. 4. A Histogram Illustration /analyze/peers/
  5. 5. Why Elasticseach? • We were choking on data with our previous solution • It’s not just for text search • Fast access to a denormalized set of data • django-haystack integration into our Django stack • It’s built to scale! • Aggs!
  6. 6. Elasticsearch Aggregations • stats aggregation • percentile aggregation • histogram aggregation • facet counts
  7. 7. stats aggregation • min, max, std dev, determines bin width { "aggs" : { "eui_stats" : { "stats" : { "field" : "eui" } } } } { ... ! "aggregations": { "eui_stats": { "count": 2194, "min": 0, "max": 120, "avg": 55.8, "sum": 122425.2 } } }
  8. 8. percentile aggregation • quartiles, median (the 0th and 100th quartiles from stats) { "aggs" : { "eui_quartiles" : { "percentiles" : { "field" : "eui", "percents" : [25, 50, 75] } } } } { ... ! "aggregations": { "eui_quartiles": { "values" : { "25.0": 40, "50.0": 60, "75.0": 85 } } } }
  9. 9. histogram aggregation • EUI histogram { "aggs" : { “eui_histogram" : { "histogram" : { "field" : "eui", "interval" : 10 } } } } { "aggregations": { “eui_histogram" : { "buckets": [ { "key": 0, "doc_count": 57 }, { "key": 10, "doc_count": 93 }, ...
  10. 10. Elasticsearch Aggregations • stats aggregation (min, max, std dev, determines bin width) • percentile aggregation (quartiles, median) • histogram aggregation (counts per EUI range)
  11. 11. Learning curve • Custom ES backend for django-haystack to add the new ES features, hope these make it to haystack someday • Three queries per search to get stats, percentiles, and histogram. Room for improvement/ES scripts • Easy to set up in dev and prod, django-haystack keeps ES and postgres in sync. • An order of magnitude speed improvement :-)
  12. 12. Thanks! buildingenergy.com Questions/Comments? @aleck_landgraf

×