Energy Usage Insights
with Hadoop & HBase
July 25, 2013
Scott Kuehn Data Architect
Oren Benjamin Senior Software Engineer
Our Utility Partners
2
Australia New Zealand France Nova ScotiaUK
Energy Usage Insights
326 July 2013
Home Energy Report
426 July 2013
Energy Savings
526 July 2013
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1...
Impact
626 July 2013
$300,000,000
2,500,000,000 kWh
4,000,000,000 lbs CO2
Web Portal
726 July 2013
826 July 2013
Data Overview: Energy Usage Streams
926 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02...
Data Overview: Smart Meter
1026 July 2013
Data Overview: Entities
1126 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Data Overview: Size
1226 July 2013
» Billing data: 60M households
» Smart meter data: 15M households
» On disk: 5TB (raw)
...
Architecture: Usage Data Store
1326 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Architecture: Usage Data Store
1426 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
HBase + Hadoop Architecture v1.0
1526 July 2013
Meter
metadata
Usage data
Mysql report/
AMI DB's
Batch
Workers
Web
servers...
HBase + Hadoop Architecture v2.0
1626 July 2013
Meter
metadata
Batch
Workers
Web
servers
HDFS file upload
Mysql report/
AMI...
Data Schema: Kiji
1726 July 2013
Kiji Schema
»  Table layout definition
»  Schema management
»  Object serialization
»  En...
Entity-centric Table: Row Key
1826 July 2013
Hash prefix Utility company Site ID
1 byte 4 bytes 8 bytes
"keys_format":{
"en...
Entity-centric Table: Site
1926 July 2013
A single row
0.12 kWh
1.3 Therm
24 Therm
356 kWh
Usage Data Column Family
UUA
Ju...
Insight Example: Rate Calculation
2026 July 2013
Insights: Jobs & Services
2126 July 2013
»  M/R jobs to compute insights in batch
»  Services to access pre-computed insig...
Insight Example: Rate Calculation
2226 July 2013
Usage data column family
site
… … …rate
calculation
bill
forecast
Insight...
Rate Calculation: Producer
2326 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
p...
Rate Calculation: Producer
2426 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
p...
2526 July 2013
public class RateCalculationProducer extends KijiProducer {	
  
	
  
	
  @Override	
  
	
  public	
  KijiDa...
In-practice
2626 July 2013
»  ETL to an entity-centric schema
»  Bulk loading
»  Mixed workloads
Design decisions and chal...
In-practice: ETL to entity-centric schema
2726 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 ...
In practice: ETL to entity-centric schema
2826 July 2013
»  Use bulkloading for performance
»  Make ingest process idempot...
In practice: bulk loading
2926 July 2013
»  Bulk loaded files are not assigned sequence numbers
»  All compactions become ...
In practice: Mixed workloads
3026 July 2013
Site table
Reporting
apps
Web
servers
M/R
Ad-hoc reads
and forecasts
Batch ins...
In practice: Mixed workloads
3126 July 2013
»  Supporting mixed workloads requires adapting jobs and configurations
»  IO:...
In practice: Mixed workloads
3226 July 2013
Results Visualized
3326 July 2013
Animation of jobs in progress
Mixed Workload Success
3426 July 2013
9ms
2ms
»  Mean read time is ~2ms
»  Nearly 200 forecasts/sec on performance testing...
3526 July 2013
Recap
3626 July 2013
Opower
»  Save energy
»  Make money
»  Big (enough) data
Oren Benjamin
oren.benjamin@opower.com
We’re...
Rate Calculation: Rate Engine
3726 July 2013
public interface RateEngine {
/**	
  	
  
	
  *	
  Compute	
  the	
  cost	
  ...
Rate Calculation: Application Context
3826 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  priva...
Upcoming SlideShare
Loading in …5
×

Energy usage insights_with_hadoop_and_h_base

653 views

Published on

Oren Benjamin and Scott Kuehn presentation at DC Hadoop User Group on 7/23/2013.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
653
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Energy usage insights_with_hadoop_and_h_base

  1. 1. Energy Usage Insights with Hadoop & HBase July 25, 2013 Scott Kuehn Data Architect Oren Benjamin Senior Software Engineer
  2. 2. Our Utility Partners 2 Australia New Zealand France Nova ScotiaUK
  3. 3. Energy Usage Insights 326 July 2013
  4. 4. Home Energy Report 426 July 2013
  5. 5. Energy Savings 526 July 2013 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Average Steady State Savings = ~1.5 – 3.5% Months since program start Energy saved
  6. 6. Impact 626 July 2013 $300,000,000 2,500,000,000 kWh 4,000,000,000 lbs CO2
  7. 7. Web Portal 726 July 2013
  8. 8. 826 July 2013
  9. 9. Data Overview: Energy Usage Streams 926 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  10. 10. Data Overview: Smart Meter 1026 July 2013
  11. 11. Data Overview: Entities 1126 July 2013 Customer Account Site Meter Account Customer Account Meter
  12. 12. Data Overview: Size 1226 July 2013 » Billing data: 60M households » Smart meter data: 15M households » On disk: 5TB (raw) » More smart meter data than all other data combined
  13. 13. Architecture: Usage Data Store 1326 July 2013 Customer Account Site Meter Account Customer Account Meter
  14. 14. Architecture: Usage Data Store 1426 July 2013 Customer Account Site Meter Account Customer Account Meter
  15. 15. HBase + Hadoop Architecture v1.0 1526 July 2013 Meter metadata Usage data Mysql report/ AMI DB's Batch Workers Web servers Sqoop MySQL report/AMI DB's HDFS M/RHBase
  16. 16. HBase + Hadoop Architecture v2.0 1626 July 2013 Meter metadata Batch Workers Web servers HDFS file upload Mysql report/ AMI DB's MySQL report/AMI DB's metadata requests HDFS M/RHBase Usage data
  17. 17. Data Schema: Kiji 1726 July 2013 Kiji Schema »  Table layout definition »  Schema management »  Object serialization »  Entity-centric data model Supporting Projects »  Kiji MR »  Kiji Hive Adapter »  Kiji REST »  ...
  18. 18. Entity-centric Table: Row Key 1826 July 2013 Hash prefix Utility company Site ID 1 byte 4 bytes 8 bytes "keys_format":{ "encoding":"FORMATTED", "salt": { "hash_type": "MD5”, "hash_size": 1 }, "components":[ { "name":"utility_company”, "type":"INTEGER” }, { "name":"site_id”, "type":"LONG” } ] }
  19. 19. Entity-centric Table: Site 1926 July 2013 A single row 0.12 kWh 1.3 Therm 24 Therm 356 kWh Usage Data Column Family UUA June 18 - July 17; $25 Insights Column Family stream:0 stream:1 stream:2 stream:3 uua:0 bill_forecast:0
  20. 20. Insight Example: Rate Calculation 2026 July 2013
  21. 21. Insights: Jobs & Services 2126 July 2013 »  M/R jobs to compute insights in batch »  Services to access pre-computed insights / compute insights on demand »  Insight for a Site is calculated based on the data in the Site’s row »  The calculated insight is saved back to the Site row
  22. 22. Insight Example: Rate Calculation 2226 July 2013 Usage data column family site … … …rate calculation bill forecast Insights column family Rate Calculation MapReduce stream:0 stream:n
  23. 23. Rate Calculation: Producer 2326 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }   }  
  24. 24. Rate Calculation: Producer 2426 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }     @Override   public  String  getOutputColumn()  {      return  "rate_calculation”;   }     }  
  25. 25. 2526 July 2013 public class RateCalculationProducer extends KijiProducer {      @Override    public  KijiDataRequest  getDataRequest()  {      Configuration  conf  =  getConf();            long  startTime  =  parseLong(conf.get(START_PARAM));              return  KijiDataRequest.builder()                                    .withTimeRange(startTime,  END_OF_TIME)                                    .addColumns(ColumnsDef.create()                                            .withMaxVersions(ALL_VERSIONS)                                            .addFamily("usage_data"))                                    .build();        }     @Override   public  void  produce(KijiRowData  siteRowData,  ...    
  26. 26. In-practice 2626 July 2013 »  ETL to an entity-centric schema »  Bulk loading »  Mixed workloads Design decisions and challenges
  27. 27. In-practice: ETL to entity-centric schema 2726 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  28. 28. In practice: ETL to entity-centric schema 2826 July 2013 »  Use bulkloading for performance »  Make ingest process idempotent »  Introduce a read-log for utility company billing corrections »  ETL Steps: 1. Ingest all reads into a read-log table2 2. Load reads into the corresponding Site row Read-log table M/R Bulkload Pivot Site table21 M/R Bulkload Billing files
  29. 29. In practice: bulk loading 2926 July 2013 »  Bulk loaded files are not assigned sequence numbers »  All compactions become major compactions »  Solution: Find a temporary fix, monitor the HBase JIRA
  30. 30. In practice: Mixed workloads 3026 July 2013 Site table Reporting apps Web servers M/R Ad-hoc reads and forecasts Batch insight calculations Bulk scans
  31. 31. In practice: Mixed workloads 3126 July 2013 »  Supporting mixed workloads requires adapting jobs and configurations »  IO: Switch to bulkloading, enable direct HDFS reads »  Major compactions: Disabled »  Memory: increase heap and region sizes, use MSLAB »  Verify performance by simulating nominal and high load scenarios
  32. 32. In practice: Mixed workloads 3226 July 2013
  33. 33. Results Visualized 3326 July 2013 Animation of jobs in progress
  34. 34. Mixed Workload Success 3426 July 2013 9ms 2ms »  Mean read time is ~2ms »  Nearly 200 forecasts/sec on performance testing cluster
  35. 35. 3526 July 2013
  36. 36. Recap 3626 July 2013 Opower »  Save energy »  Make money »  Big (enough) data Oren Benjamin oren.benjamin@opower.com We’re hiring. http://opower.com/careers Scott Kuehn scott.kuehn@opower.com
  37. 37. Rate Calculation: Rate Engine 3726 July 2013 public interface RateEngine { /**      *  Compute  the  cost  per  usage  read  for  the  given  Site      *  over  the  requested  time  interval.      *  @return  a  RateCalculation  containing  the  result    */   RateCalculation calculate(Site site, List<UsageRead> usageReads); }
  38. 38. Rate Calculation: Application Context 3826 July 2013 public class RateCalculationProducer extends KijiProducer {  private  ConfigurableApplicationContext  appContext;    private  RateEngine  rateEngine;    @Override    public  void  setup(KijiContext  context)  {            String  contextPath  =  getConf().get(CONTEXT_PATH_KEY);            appContext  =  new  XmlAppContext(contextPath);            rateEngine  =  appContext.getBean(RateEngine.class);     @Override   public  void  produce(KijiRowData  siteRowData,  …

×