HBase to Save the Planet            Alex Newman         posix4e@apache.org      Architect, Drawn to Scale      Strategic A...
My life with HBase                      Drawn toFactset    Cloudera              Opower                        Scale
About OpowerOpower is a customer engagement platform for the utility industry
About Opower        Home energy reports       Customized utility billsEnergy efficiency programs for utilities
About Opower   Opower runs on analyticsAnalytics run on Hadoop + HBase
Opower analysis relies on datafrom a variety of sources   »   Electric Utility Usage         »   Thermostat       »   Weat...
Opower’s first architecture couldnot support their analytic vision                MySQL             Scalability?          ...
Opower’s first architecture couldnot support their analytic vision       Analytic workflow instead of               analyt...
Problem #1                 Data Lake CostUsage   AMI Regional AMI   Sensor Data   Data Lake
Problem #2     Slower and slower queries                Smart-grid-scale dataLots of supporting data: weather, demographic...
Problem #3It was taking lots of “magic”        Intense analytics        Strange schemas       Segmented queries
Hadoop + HBase at OpowerOpower determined that they needed  an entirely new data architecture
NexGen Architecture @ Opower
Hadoop + HBase at Opower      Early success:       HBase AMI
What rockedEndless, cheap scalability
What rockedThe analytics team loved it!
What suckedHard on the ops team – still trying to              grok it
What sucked  NoSchema p1.    Creating Schema  Managing MetaDataSchema <=> Performance
What sucked     HA   Failover  Snapshots
What sucked      No secondary indexAggregation is slow (Rollup/OLAP)    Poor Client Performance
It would be better if only …Developers were not forced to knowhow the data is stored, indexed, etc.
It would be better if only … There were nicer APIs and better     query languages (SQL?)
It would be better if only …  Version migrations were easy       Hierarchical Tables
It would be better if only …       Real-time tuning
It would be better if only …       Did I mention HA?
In summaryHBase has helped Opower achieve their analytic                      vision     But they’ve still got a long way ...
Questions?      Alex Newman   posix4e@apache.orgArchitect, Drawn to ScaleStrategic Advisor, Opower
Upcoming SlideShare
Loading in …5
×

HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

2,767 views

Published on

Opower is a fast moving energy management SaaS company that collects sensor data from nearly all of the major utilities in the United States–meaning from more than 45 million American households–along with major utilities in 5 countries throughout Europe and AsiaPac. Opower manages more than 100 billion meter reads, ranging from high frequency power data (AMI), smart thermostats data, and weather data. Currently all data at Opower is stored in HBase or Hadoop (and is notably not security sensitive). This discussion will discuss Opower’s HBase architecture, highlight potential and current uses of data in HBase, share the vision of Opower’s future projects and directions, and reveal how Opower’s big data management has allowed the company to help its utility clients save enough energy to power a city of nearly 200,000 people and save utility customers more than $70 million since only 2008!

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,767
On SlideShare
0
From Embeds
0
Number of Embeds
101
Actions
Shares
0
Downloads
76
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Name Email Address Title
  • WARNING THESE ARE MY WORDS, not FDS, Cloudera or OPower Factest 2005: - Maybe I was crazy to use it - Tens of databases 10 of query langagues, VMS moving towards commdity servers. Running into issues with scaling on environments like MySQL - They were used to code that crashed. In fact, I would say while I was on call, a service from one of the sites was down, at least once a week. Luckily they had redundancy in multiple sites, and multiple servers within those sites. The redundancy was added at a higher level, so generally, at least all of the times I remember, it was able to increase the availability and downtime wasn&apos;t actually an issue. - What was an issue was scale. - INteresting enough Hbase, even at that time, was a pretty highly available database. So what did they use it for - Time and Sales. This is the collection of all of the Quotes and Trades, for different securities. So to translate you put out quotes to buy or sell stocks at a certain price. If they overlapp, the echange registers a trade, and you just bought or sold a security. Not just stocks, but options and extremely high frequency data. - There was some value add on top of that, for calculating more complicated statistics on the fly through a home grown Web SASS thing - Cloudera: - Started off in kitchen focusing on building the packages that y’all know and love. When I entered it was all manual, when I left it was all automated. One could think of this as sortof like dev-opsie, meets, qa, meets release engineering, meets generic development - Moved into our first management tools team as a developer. Where we developed the cloudera manager. It was originally part of HUE and it became more springy. - Then I left Cloudera to be a founder in Drawn to Scale. We built a prototype and started pitching it for about 6 to 7 months. - While that was going on, I because the Lead Data Architect at OPower. And then more recently, after funding, I have returned to drawntoscale as a coder in the trenches, and have changed myself to a advisor to opower. The reason why I bring this up, is I have been working with HBase in production for about 5 years.
  • Opower helps people use energy more efficiently and ultimately save money on their energy bills.it vastly improves the overall customer experience by making energy use personally relevant. - Behavioral Science (Great marketing, understanding people, great hci) - Data Science (Analytics, Data Infastructure Teams) - Lobbying (Yep we do lobbying)
  • - OHow many of you get a bill - OPower White labeled websites. So this is the interface you probably use through your energy website to view how much power you use. Bill forecasting, etc. - Smart Thermostats - Gas and Electric - Social
  • - Analytics is used to understand who we should be targeting - Answering questions that our customers what answered. We can help them improve customer service, improve there marketing, etc. - Justifying our own existence. (Compliance)
  • - This is an old slide which doesn’t really include all the places we get data - Story about detecting broken thermostats
  • - But it had it’s up - Spring and MVC provided a very clear and systematic way for developers developer systems. - It was very easy to manage from an operations perspective.
  • - WE did this at FDS as well. Of course not with R, but specialized langauages. - IN fact our customers did as well, and they had a whole team of people to help customers do it.
  • So here is the data sizes we have, along with the costs with traditional hadoop systems. - We were a cisco shop but we ended up going with dell, mostly because of the 3.5 inch disks. It looks like cisco is wising up to this whole hadoop thing. - These numbers are for dell. So I think this is priced out assuming a 710, then a 810 and then a 910 for the RDBMS, and 510&apos;s for hadoop.
  • - A lot of this data just doesn’t work well with traditional databases. - An unnamed utility takes 3 days to mysqldump the ami data out. subsampling interpolation
  • - I should warn you, i drawn almost all of my drawings in xfig so if this isn’t clear I’m sorry. - Basically the utility data has to come in from a variety of different protocols, as we integrate into the utility pipeline. It then flows into hbase, it’s validated from hbase, and then imported into our existent workflow. - Some of that data, you could imagine for instance information about user is still stored in MySQL. - All of the data is in a HIVE data lake
  • All of our timeseries data in regards to high frequency data is being ported to being stored in HBase. Also soon things like bill forecasting, and a bunch of cool other stuff I probably should mention is being moved here. This includes data from the utilities, and data that users are enterring themselves. In additition thermostat data is moving here.
  • - We still need to improve effeciency - We are doubling the size of the cluster this year - We have a ton of room to grow.
  • - Having all of your data is a huge thing - Having a place to do m/r based R is great - No more running out of memory or being bounded to a single machine - Having a cheap scratch space
  • At cloudera i thought all we needed was cfengine, snmp and syslog. Frankly that would have made ops happy. But more and more I think we made the right decision and that these tools really aren’t the right answer. JuJu looks interesting. - cloudera of course built there own tool. - access and auth
  • HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

    1. 1. HBase to Save the Planet Alex Newman posix4e@apache.org Architect, Drawn to Scale Strategic Advisor, Opower
    2. 2. My life with HBase Drawn toFactset Cloudera Opower Scale
    3. 3. About OpowerOpower is a customer engagement platform for the utility industry
    4. 4. About Opower Home energy reports Customized utility billsEnergy efficiency programs for utilities
    5. 5. About Opower Opower runs on analyticsAnalytics run on Hadoop + HBase
    6. 6. Opower analysis relies on datafrom a variety of sources » Electric Utility Usage » Thermostat » Weather » Gas Utility Usage Data data data Data Data Storage & 4 Shared Energy Processing Signature Repository 3 1 2 Disaggregation OPOWER Algorithms Platform
    7. 7. Opower’s first architecture couldnot support their analytic vision MySQL Scalability? Performance? Data integration?
    8. 8. Opower’s first architecture couldnot support their analytic vision Analytic workflow instead of analytic apps: SQL -> CSV -> R -> too little, too slow
    9. 9. Problem #1 Data Lake CostUsage AMI Regional AMI Sensor Data Data Lake
    10. 10. Problem #2 Slower and slower queries Smart-grid-scale dataLots of supporting data: weather, demographics, etc.
    11. 11. Problem #3It was taking lots of “magic” Intense analytics Strange schemas Segmented queries
    12. 12. Hadoop + HBase at OpowerOpower determined that they needed an entirely new data architecture
    13. 13. NexGen Architecture @ Opower
    14. 14. Hadoop + HBase at Opower Early success: HBase AMI
    15. 15. What rockedEndless, cheap scalability
    16. 16. What rockedThe analytics team loved it!
    17. 17. What suckedHard on the ops team – still trying to grok it
    18. 18. What sucked NoSchema p1. Creating Schema Managing MetaDataSchema <=> Performance
    19. 19. What sucked HA Failover Snapshots
    20. 20. What sucked No secondary indexAggregation is slow (Rollup/OLAP) Poor Client Performance
    21. 21. It would be better if only …Developers were not forced to knowhow the data is stored, indexed, etc.
    22. 22. It would be better if only … There were nicer APIs and better query languages (SQL?)
    23. 23. It would be better if only … Version migrations were easy Hierarchical Tables
    24. 24. It would be better if only … Real-time tuning
    25. 25. It would be better if only … Did I mention HA?
    26. 26. In summaryHBase has helped Opower achieve their analytic vision But they’ve still got a long way to go HBase still has a long way to go
    27. 27. Questions? Alex Newman posix4e@apache.orgArchitect, Drawn to ScaleStrategic Advisor, Opower

    ×