HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

Cloudera, Inc.
May. 30, 2012
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower
1 of 27

More Related Content

What's hot

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies

What's hot(20)

Viewers also liked

A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...
A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...National Information Standards Organization (NISO)
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...National Information Standards Organization (NISO)
IT서비스사업의 이해: SW CEO 아카데미 9차 강의IT서비스사업의 이해: SW CEO 아카데미 9차 강의
IT서비스사업의 이해: SW CEO 아카데미 9차 강의Korea Advanced Institute of Science and Technology
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.

Viewers also liked(20)

Similar to HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria

Similar to HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower(20)

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

More from Cloudera, Inc.(20)

Recently uploaded

How to use the Cataloguing Code Ethics at your day job : a hands-on workshop ...How to use the Cataloguing Code Ethics at your day job : a hands-on workshop ...
How to use the Cataloguing Code Ethics at your day job : a hands-on workshop ...CILIP MDG
ISO Survey 2022: ISO 27001 certificates (ISMS)ISO Survey 2022: ISO 27001 certificates (ISMS)
ISO Survey 2022: ISO 27001 certificates (ISMS)Andrey Prozorov, CISM, CIPP/E, CDPSE. LA 27001
Die ultimative Anleitung für HCL Nomad Web AdministratorenDie ultimative Anleitung für HCL Nomad Web Administratoren
Die ultimative Anleitung für HCL Nomad Web Administratorenpanagenda
Product Listing Presentation_Cathy.pptxProduct Listing Presentation_Cathy.pptx
Product Listing Presentation_Cathy.pptxCatarinaTorrenuevaMa
Meetup_adessoCamunda_2023-09-13_Part1&2_en.pdfMeetup_adessoCamunda_2023-09-13_Part1&2_en.pdf
Meetup_adessoCamunda_2023-09-13_Part1&2_en.pdfMariaAlcantara50
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AIMatthew Reynolds

HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

Editor's Notes

  1. Name Email Address Title
  2. WARNING THESE ARE MY WORDS, not FDS, Cloudera or OPower Factest 2005: - Maybe I was crazy to use it - Tens of databases 10 of query langagues, VMS moving towards commdity servers. Running into issues with scaling on environments like MySQL - They were used to code that crashed. In fact, I would say while I was on call, a service from one of the sites was down, at least once a week. Luckily they had redundancy in multiple sites, and multiple servers within those sites. The redundancy was added at a higher level, so generally, at least all of the times I remember, it was able to increase the availability and downtime wasn't actually an issue. - What was an issue was scale. - INteresting enough Hbase, even at that time, was a pretty highly available database. So what did they use it for - Time and Sales. This is the collection of all of the Quotes and Trades, for different securities. So to translate you put out quotes to buy or sell stocks at a certain price. If they overlapp, the echange registers a trade, and you just bought or sold a security. Not just stocks, but options and extremely high frequency data. - There was some value add on top of that, for calculating more complicated statistics on the fly through a home grown Web SASS thing - Cloudera: - Started off in kitchen focusing on building the packages that y’all know and love. When I entered it was all manual, when I left it was all automated. One could think of this as sortof like dev-opsie, meets, qa, meets release engineering, meets generic development - Moved into our first management tools team as a developer. Where we developed the cloudera manager. It was originally part of HUE and it became more springy. - Then I left Cloudera to be a founder in Drawn to Scale. We built a prototype and started pitching it for about 6 to 7 months. - While that was going on, I because the Lead Data Architect at OPower. And then more recently, after funding, I have returned to drawntoscale as a coder in the trenches, and have changed myself to a advisor to opower. The reason why I bring this up, is I have been working with HBase in production for about 5 years.
  3. Opower helps people use energy more efficiently and ultimately save money on their energy bills.it vastly improves the overall customer experience by making energy use personally relevant. - Behavioral Science (Great marketing, understanding people, great hci) - Data Science (Analytics, Data Infastructure Teams) - Lobbying (Yep we do lobbying)
  4. - OHow many of you get a bill - OPower White labeled websites. So this is the interface you probably use through your energy website to view how much power you use. Bill forecasting, etc. - Smart Thermostats - Gas and Electric - Social
  5. - Analytics is used to understand who we should be targeting - Answering questions that our customers what answered. We can help them improve customer service, improve there marketing, etc. - Justifying our own existence. (Compliance)
  6. - This is an old slide which doesn’t really include all the places we get data - Story about detecting broken thermostats
  7. - But it had it’s up - Spring and MVC provided a very clear and systematic way for developers developer systems. - It was very easy to manage from an operations perspective.
  8. - WE did this at FDS as well. Of course not with R, but specialized langauages. - IN fact our customers did as well, and they had a whole team of people to help customers do it.
  9. So here is the data sizes we have, along with the costs with traditional hadoop systems. - We were a cisco shop but we ended up going with dell, mostly because of the 3.5 inch disks. It looks like cisco is wising up to this whole hadoop thing. - These numbers are for dell. So I think this is priced out assuming a 710, then a 810 and then a 910 for the RDBMS, and 510's for hadoop.
  10. - A lot of this data just doesn’t work well with traditional databases. - An unnamed utility takes 3 days to mysqldump the ami data out. subsampling interpolation
  11. - I should warn you, i drawn almost all of my drawings in xfig so if this isn’t clear I’m sorry. - Basically the utility data has to come in from a variety of different protocols, as we integrate into the utility pipeline. It then flows into hbase, it’s validated from hbase, and then imported into our existent workflow. - Some of that data, you could imagine for instance information about user is still stored in MySQL. - All of the data is in a HIVE data lake
  12. All of our timeseries data in regards to high frequency data is being ported to being stored in HBase. Also soon things like bill forecasting, and a bunch of cool other stuff I probably should mention is being moved here. This includes data from the utilities, and data that users are enterring themselves. In additition thermostat data is moving here.
  13. - We still need to improve effeciency - We are doubling the size of the cluster this year - We have a ton of room to grow.
  14. - Having all of your data is a huge thing - Having a place to do m/r based R is great - No more running out of memory or being bounded to a single machine - Having a cheap scratch space
  15. At cloudera i thought all we needed was cfengine, snmp and syslog. Frankly that would have made ops happy. But more and more I think we made the right decision and that these tools really aren’t the right answer. JuJu looks interesting. - cloudera of course built there own tool. - access and auth