How SkyElectric Uses Scylla
to Power Its Smart Energy
Platform
Jehannaz Khan, Director Engineering and Meraj Rasool, DevOps Lead
Presenters
Jehannaz Khan, Director of Engineering
Jehannaz is an electrical engineer with a Masters in Sustainable Energy from Imperial
College London.
Meraj Rasool, DevOps Lead
Meraj is a Software/DevOps Engineer. He started his career as a software engineer and
then 5 years ago switched primary role to DevOps engineer. Big fan of open source
software.
SkyElectric-Who We are
and What We Do
Great Energy Challenge-Powering the World
in the 21st Century
❏ Gap in energy usage between developed and developing nations
❏ Climate Change
❏ Poor Quality Grid in Developing Nations
To make clean energy
universally available by
building a distributed and
intelligent solar and energy
storage grid, managed via
the Internet, across the
world.
Our Vision & Mission
Our Product
❏ SkyElectric Smart Energy System
❏ Cloud connected, remotely monitored solar energy system with A.I.
algorithms running and constantly learning
SkyElectric - Legacy V1
To support all our desired functionality, we had our V1 SkyElectric Cloud:
■ SkyElectric’s V1 was developed in Java and MySQL as backend.
SkyElectric - Legacy V1 Problems
■ Scalability
■ Performance
■ Database write latency (~2 seconds)
■ Database read latency (<1 second or a few minutes in some cases)
■ Had to regularly do maintenance and cleanup of data to keep up
with growing data.
SkyElectric - v2
■ Team attempted a v2 with MongoDB + Node
■ Both MongoDB + Node didn’t prove to be right candidates for our
solution
■ Stopped working on it at very initial stages.
■ Moved to Elixir + Scylla + PostgreSQL. This is SkyElectric v3.
SkyElectric Upgrade
SkyElectric - V3
■ Infrastructure - AWS
■ Backend - Elixir
■ RDBMS - PostgreSQL
■ Scylla
■ ElasticSearch
Why Scylla?
■ Time Series Data
■ Comparison with Apache Cassandra and Riak TS
■ Ease of Operation
■ No worries about performance or scalability
Scylla Numbers
■ One 5 node cluster
■ Running on AWS EC2
■ Single DC
■ Overall 1+ TB data across all nodes
■ Average Write Latency 1.4 ms
■ Average Read Latency <1 ms
■ Request Volume 10x MysQL throughput
Scylla Maintenance
■ Adding / replacing nodes
■ Upgrading software
■ Repairs
■ Backups and restoration
Scylla at SkyElectric so far
■ Operational Ease
■ Hassle free upgrades to software
■ Better performance
■ Responsive Scylla support team
Wishlist for Scylla
■ Data changelog
■ Wish nodes can join the cluster faster
■ Support forum along with Slack channel
■ Backup / restore process improvements.
Thank you Stay in touch
Any questions?
Jehannaz Khan
jehannazkhan@skyelectric.com
Meraj Rasool
meraj.rasool@skyelectric.com
naqoosh

How SkyElectric Uses Scylla to Power Its Smart Energy Platform

  • 1.
    How SkyElectric UsesScylla to Power Its Smart Energy Platform Jehannaz Khan, Director Engineering and Meraj Rasool, DevOps Lead
  • 2.
    Presenters Jehannaz Khan, Directorof Engineering Jehannaz is an electrical engineer with a Masters in Sustainable Energy from Imperial College London. Meraj Rasool, DevOps Lead Meraj is a Software/DevOps Engineer. He started his career as a software engineer and then 5 years ago switched primary role to DevOps engineer. Big fan of open source software.
  • 3.
  • 4.
    Great Energy Challenge-Poweringthe World in the 21st Century ❏ Gap in energy usage between developed and developing nations ❏ Climate Change ❏ Poor Quality Grid in Developing Nations
  • 5.
    To make cleanenergy universally available by building a distributed and intelligent solar and energy storage grid, managed via the Internet, across the world. Our Vision & Mission
  • 6.
    Our Product ❏ SkyElectricSmart Energy System ❏ Cloud connected, remotely monitored solar energy system with A.I. algorithms running and constantly learning
  • 8.
    SkyElectric - LegacyV1 To support all our desired functionality, we had our V1 SkyElectric Cloud: ■ SkyElectric’s V1 was developed in Java and MySQL as backend.
  • 9.
    SkyElectric - LegacyV1 Problems ■ Scalability ■ Performance ■ Database write latency (~2 seconds) ■ Database read latency (<1 second or a few minutes in some cases) ■ Had to regularly do maintenance and cleanup of data to keep up with growing data.
  • 10.
    SkyElectric - v2 ■Team attempted a v2 with MongoDB + Node ■ Both MongoDB + Node didn’t prove to be right candidates for our solution ■ Stopped working on it at very initial stages. ■ Moved to Elixir + Scylla + PostgreSQL. This is SkyElectric v3.
  • 11.
  • 12.
    SkyElectric - V3 ■Infrastructure - AWS ■ Backend - Elixir ■ RDBMS - PostgreSQL ■ Scylla ■ ElasticSearch
  • 13.
    Why Scylla? ■ TimeSeries Data ■ Comparison with Apache Cassandra and Riak TS ■ Ease of Operation ■ No worries about performance or scalability
  • 14.
    Scylla Numbers ■ One5 node cluster ■ Running on AWS EC2 ■ Single DC ■ Overall 1+ TB data across all nodes ■ Average Write Latency 1.4 ms ■ Average Read Latency <1 ms ■ Request Volume 10x MysQL throughput
  • 15.
    Scylla Maintenance ■ Adding/ replacing nodes ■ Upgrading software ■ Repairs ■ Backups and restoration
  • 16.
    Scylla at SkyElectricso far ■ Operational Ease ■ Hassle free upgrades to software ■ Better performance ■ Responsive Scylla support team
  • 17.
    Wishlist for Scylla ■Data changelog ■ Wish nodes can join the cluster faster ■ Support forum along with Slack channel ■ Backup / restore process improvements.
  • 18.
    Thank you Stayin touch Any questions? Jehannaz Khan jehannazkhan@skyelectric.com Meraj Rasool meraj.rasool@skyelectric.com naqoosh

Editor's Notes

  • #5 Expensive electricity Dams and Hyrdo reducing compared to demand - cost and time of adding it is very high.
  • #7 Custom designed electronics and software. Built in chips communicate with our cloud service. Cloud processes data sent by systems for fault detection and rectification. Customers are able to review their system data and performance.
  • #9 It was more like a proof of concept and prototype and not designed with scaling in mind.
  • #10 The timeouts in last days (before our latest solution went live were very frequent). When our number of systems crossed 50 we started getting big problems: Data analysis was problematic because when pulling data for a customer loading time was in minutes. Customer service team would frequently call devops to say cloud service was down. Also at that time, we didn't have a proper SLA for customers as most systems were for our testing and development. Still we had 50 or so systems running with main app server having more than 34 GB RAM (now we have more than 750+ systems with 16 GB of RAM on main system) with no timeouts.
  • #11 Upon evaluation, MongoDB didn’t suite for our future plan to be able to scale for tens of thousands of systems.
  • #13 Infrastructure has been moved to AWS to avail almost infinite scale and be able to use new hardware at earliest. Why Elixir? We found elixir to be an ideal choice for the problem we are solving. It is built for distributed computing and we are solving a problem as such. PostgreSQL - The most mature and stable relational open source database. Scylla - Will cover in detail later. ElasticSearch - It is reading data from Scylla, indexes it and statistics page is populated from it. It helps in analysing, grouping data for us.
  • #14 Apache Casssandra had serious performance and resource usage issues. We wanted to avoid (due to my previous experience with previous Java based solutions). RiakTS was somehow perfect but lacked a very basic issue of no change in schema once we are live. Also its active development is now stopped. Easy to setup, monitor and scale. It is the most optimal use of hardware I have ever seen Optimal usage of hardware enables us in saving cost for our infrastructure.
  • #15 Started with 3 node cluster. Added two later on. We are running i3.large (2 vCPU / 15.25 RAM / 475 GB NVMe SSD / up to 10 Gigabit network). 50 Systems with v1 and now more than 850+
  • #16 Adding / replacing nodes is very simple and quick. Upgrades are overall very smooth. Experienced an issue once (while on staging) and rolled back. Issue has been reported to Scylla. Also we started with 2.1.3 and now on 3.0.6 (will soon move to 3.1) We have weekly automated repairs session enabled via Scylla Manager and they are working very fine. We backup data from every node to S3. I believe both backup and restoration should be further simplified.
  • #17 Related to previous screen (ease of operation, upgrades etc). Much better performance and confidence to be able to scale further easily. Have been using Scylla’s Slack support and it has been working great for us.
  • #18 Data changelog can be removed from here as Change Data Capture is being announced in Scylla Summit. Node joins the cluster quickly but takes time in syncing and processing data. It would be good if I could see the progress and make it quick. While Slack support channel is good. I believe a forum like discourse would have been more helpful as this way we are able to search for issues / read on it. Search and finding previously discussed issues on Slack is not easy. Backup and restoration should be more straightforward.