Making Every Drop Count: How i2O Addresses the Water
Crisis with the IOT and Apache Cassandra
Mike Williams, Software and IT Director, i2O Water
Thank you for joining. We will begin shortly.
All attendees
placed on mute
Input questions at any time
using the online interface
Webinar Housekeeping
1 About i2O Water & The State of Water Consumption Today
2 Database Technology for Time Series and IOT
3 Relational vs NoSQL
4 Why NoSQL and Cassandra?
5 Impact of Cassandra
6 Q&A
Agenda
About i2O Water
• Smart pressure management systems for water
distribution networks
• Hardware/Firmware/Software
• IoT
• Reduces leakages, burst frequency and energy
waste
• Saves over 235 million litres of water per day
around the world
Water Consumption v.s. Supply
• A staggering 46 billion litres (~12.15 billion gallons ) of drinking water are lost globally
every day
• It’s not a problem restricted only to the developing world either – Montreal, for example,
loses 40% of the water it produces
• 40% global shortfall between water demand and supply by 2030
• The World Economic Forum ranked water crises as the top global risk in its 2015 Global
Risks Report
How i2O Works
How do we do it?
Safety margin
Head loss
The Challenge with Relational Databases
• Massive volumes of time series data (1.5TB and growing) that needs
to be stored and analyzed in close to real time
• Low energy, battery powered devices
• Must be efficient in protocol design
• Must be efficient in message sizes
• Relational database (SQL Server) couldn't adequately handle time-
series data and IOT needs at scale
• Re-indexing tables causes loss of performance
• The need to scale without impacting performance of availability
• Migration from existing data management platform
Key Database Requirements
Strong real-time search and
query capability
Support high reliability and security
Structured & unstructured
real-time data
Flexible data model with
affordable scalability
1 2
3 4
How Cassandra Compared to RDBMS
• Wide rows allowed better modeling of time series
– time sharding and rollup aggregations
• Smaller data footprint on “disk”
• Much faster write performance
• Faster read performance
• Scalable by design / architecture
• Encourages us to duplicate data
• Model what you need to query efficiently
How Cassandra Compared to other NoSQL
• Compared it to HBASE
– Similar architectures
– C* was a better supported product
• Compared it to MongoDB
– C* was much more scalable by design
– Concerned over sharding and lossy writes
• Compared it to RavenDB
– C* was far more robust
– C* was far more performant
– C* is better supported
Why NoSQL and Apache Cassandra?
• Database platform built for IOT applications
• Optimized for storage and retrieval of time-series data
• Streaming and real time analytics (Spark integration)
• Best performance based on internal benchmark testing
• Strong search & real-time query capability on unstructured data (Solr
integration)
• Supports 100% uptime through masterless architecture and multi-
datacenter replication
• Linear scalability across commodity hardware makes supporting high
velocity data a reality (and affordable)
i2O Technical Environment
SSL Offloading
Load Balancing
In memory
Cache
Development Stack
Message Broker
Queuing
Protocols
Message Encoding
Data Stores
AMQP
Cassandra
Postgresql
PostGIS
The Results
>15,000 devices
>70
Water Utilities globally
+235M
litres of water saved per day
20+ countries
>99.9%
Uptime since launch
even during upgrades & node failures
Security audited / tested
Recommendations
• Think Security at all times - everywhere
• Think about Scalability early
• Think about API’s and Protocols early
• Use test infrastructures to practice changes to tech
– Cheap to do, consider Container technologies like Docker
Input questions at any time
using the online interface
Q&A
Thank you!

Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and Apache Cassandra

  • 1.
    Making Every DropCount: How i2O Addresses the Water Crisis with the IOT and Apache Cassandra Mike Williams, Software and IT Director, i2O Water Thank you for joining. We will begin shortly.
  • 3.
    All attendees placed onmute Input questions at any time using the online interface Webinar Housekeeping
  • 4.
    1 About i2OWater & The State of Water Consumption Today 2 Database Technology for Time Series and IOT 3 Relational vs NoSQL 4 Why NoSQL and Cassandra? 5 Impact of Cassandra 6 Q&A Agenda
  • 5.
    About i2O Water •Smart pressure management systems for water distribution networks • Hardware/Firmware/Software • IoT • Reduces leakages, burst frequency and energy waste • Saves over 235 million litres of water per day around the world
  • 6.
    Water Consumption v.s.Supply • A staggering 46 billion litres (~12.15 billion gallons ) of drinking water are lost globally every day • It’s not a problem restricted only to the developing world either – Montreal, for example, loses 40% of the water it produces • 40% global shortfall between water demand and supply by 2030 • The World Economic Forum ranked water crises as the top global risk in its 2015 Global Risks Report
  • 7.
  • 8.
    How do wedo it? Safety margin Head loss
  • 9.
    The Challenge withRelational Databases • Massive volumes of time series data (1.5TB and growing) that needs to be stored and analyzed in close to real time • Low energy, battery powered devices • Must be efficient in protocol design • Must be efficient in message sizes • Relational database (SQL Server) couldn't adequately handle time- series data and IOT needs at scale • Re-indexing tables causes loss of performance • The need to scale without impacting performance of availability • Migration from existing data management platform
  • 10.
    Key Database Requirements Strongreal-time search and query capability Support high reliability and security Structured & unstructured real-time data Flexible data model with affordable scalability 1 2 3 4
  • 11.
    How Cassandra Comparedto RDBMS • Wide rows allowed better modeling of time series – time sharding and rollup aggregations • Smaller data footprint on “disk” • Much faster write performance • Faster read performance • Scalable by design / architecture • Encourages us to duplicate data • Model what you need to query efficiently
  • 12.
    How Cassandra Comparedto other NoSQL • Compared it to HBASE – Similar architectures – C* was a better supported product • Compared it to MongoDB – C* was much more scalable by design – Concerned over sharding and lossy writes • Compared it to RavenDB – C* was far more robust – C* was far more performant – C* is better supported
  • 13.
    Why NoSQL andApache Cassandra? • Database platform built for IOT applications • Optimized for storage and retrieval of time-series data • Streaming and real time analytics (Spark integration) • Best performance based on internal benchmark testing • Strong search & real-time query capability on unstructured data (Solr integration) • Supports 100% uptime through masterless architecture and multi- datacenter replication • Linear scalability across commodity hardware makes supporting high velocity data a reality (and affordable)
  • 14.
    i2O Technical Environment SSLOffloading Load Balancing In memory Cache Development Stack Message Broker Queuing Protocols Message Encoding Data Stores AMQP Cassandra Postgresql PostGIS
  • 15.
    The Results >15,000 devices >70 WaterUtilities globally +235M litres of water saved per day 20+ countries >99.9% Uptime since launch even during upgrades & node failures Security audited / tested
  • 16.
    Recommendations • Think Securityat all times - everywhere • Think about Scalability early • Think about API’s and Protocols early • Use test infrastructures to practice changes to tech – Cheap to do, consider Container technologies like Docker
  • 17.
    Input questions atany time using the online interface Q&A
  • 18.

Editor's Notes

  • #6 So who are we? We have been in business since 2005 and we currently operate our solutions in 25 countries around the globe, helping over 70 water utilities to save water. We have about 2,000 systems installed and this currently accounts for around 2.5Tb of data. Hajj festival in Mecca (6M people, drinking water and a/c), 750 systems. Our total daily savings of water for our customers is in excess of 235 Ml - how much water is that? [Reveal]
  • #7 The Global Risks Landscape 2015 Source: http://widgets.weforum.org/global-risks-2015-interactive/risk-explorer.html#landscape/// Water can be as cheap as a few cents here in the UK, to over $2 in the middle east due to de-salination and pumping costs and much of it is lost (up to 40%)
  • #8 This is a simplified water distribution network. The boxes represent what i2O does for a water utility. We aim to help water utilities optimise this network through a combination of intelligent hardware devices which we design and a SaaS platform with intelligent learning algorithms and optimisers. As we add more of our solutions into this network for our customers, we can move them up a value curve towards a point where they can be confident with our technologies, they are delivering great value and service to their customers.
  • #9 In a simplified form, the graph shows how the pressure in a water network varies over time. Without any active control the pressure into the zone would be constant and the pressure at the CP would vary with a diurnal pattern. The gap between them is due to frictional losses. The minimum customer pressure is what they deem acceptable to deliver their service to their customers. This excess pressure leads to high leakage and bursts. When an i2O system is deployed (2 devices, 1 controller and 1 sensor) after a short (2 weeks) learning period the graph looks like this. [Show Example]
  • #10 We started with a traditional N-tier architecture platform around a RDBMS which served as available POC and faced many challenges and woes. Bulk of the data comes from these devices and therefore […] Not multi tenanted Not easily sharding of data Not easy to “cluster”
  • #11 These were (and still are) our requirements with their relative importance marked.
  • #12 Now to Cassandra in particular. We’ve been using it in production since 2011 V1.0.0 and now on 2.0.6. It gives us great write performance even on spinning disks and good read performance. We cache data elsewhere to assist with read latency where required. We have excellent data compression too for less frequently used data cf MS SQL. And of course its distributed scaling model is well known to us so as we migrate more of our customers or acquire more new ones it can grow with us. Use cases - Time varying data; series, spot events and tree evolution. Algorithm development via streaming for m/c learning and optimisation. Also Ecosystem evolution via Auditor replay.
  • #13 Support and community are vital in any emerging tech product space Mongo architecture and scale model was not a good fit for our architecture RavenDB - K-V / Document store in .NET space
  • #14 C* Write perf : [ 100 - 60K ] / s Read perf : [ 100 - 100K ] / s RDBMS Write perf : [0.5] / s Read perf : [5] / s
  • #15 So what technologies are involved? At the front-end web facing we have these for TLS, HTTP and Caching. At the data layer we use these data store technologies. To make it all hang together we use these technologies for our middleware. These allow us to be highly language agnostic, but we predominately use these.