SlideShare a Scribd company logo
Akka Cluster and
Auto-scaling
Ikuo Matsumura
CyberAgent, Inc.
2017/02/26
Akka Cluster
⾮中央集権的なノード群構築を⾏うAkka拡張
10ヶ⽉程運⽤してきた中からつまづいた点・学んだ点を紹介
• Decentralized cluster membership service
• no single point of failure, bottleneck
• distribute actors over multiple JVMs
• Applied to build a sub-system on AD serving
• Tens of servers
• Operations about 10 months
Requirements in our case
• Host a lot of Entity with low cost
• Fit existing Akka application
• Down-time is acceptable to some extent*
Akkaベースで多数のEntityを低コストで配備したい
多少のダウンタイムは許容できる
*online machine learning
Our application of Akka Cluster
永続ActorをCluster Shardingで配備
データ保管にコモディティサービスを使⽤
frontend frontend frontend
entities
entities
…
…
entities
frontend
• Existing app
• Tens of nodes
• Auto-scaling
• New sub-system
• Several nodes
ElastiCache
(Journal)
S3
(Snapshot)
data stores
…
Challenges
• Strategy on unreachables removal
• Lifecycle of journals
運⽤する中でつまづいた2つの課題についてお話します
Membership Lifecycle in Cluster Specification
クラスタメンバーのライフサイクルの概観
http://doc.akka.io/docs/akka/2.4/common/cluster.html#Membership_Lifecycle
joining
up
down
removed
join
leaving
exiting
unreachable
leave
Joinning and Leader Action
“leader action”を経て、他メンバと通信できるようになる
joining
up
down
join
unreachable
Joinning and Leader Action
“leader action”を経て、他メンバと通信できるようになる
joining
up
down
unreachable
leader action
Joinning and Leader Action
Scale-in発⽣時、unreachable のままになる
joining
up
down
unreachable
failure detector
leader action
Joinning and Leader Action
leader actionが⾏えなくなる。
結果、新規メンバが他のメンバと通信できないままに。
joining
up
down
unreachable
leader action
Joinning and Leader Action
Scale-inをトリガにしたdown指定が必要
joining
up
down
unreachable
leader action
scale-in
trigger
mark as down
Joinning and Leader Action
unreachableを除くことでleader actionが再開可能に
joining
up
down
unreachable
leader action
Leader actions blocked by unreachables
leader actionが⾏えない状態のログの例
Members that are “up” but have not seen the current state
“Leader can currently not perform its duties”
Causes and actions on unreachables
Type of failures Example
Possible
external action
network partitions -
wait for recovery or
abandon a part
machine crashes
scale-in mark as down
quarantined in
akka remote layer
restart
an actor system
unresponseive
process
long GC restart a JVM
CPU starvation by
credit shortage in EC2
re-create an instance
failure detector はエラーの原因までは区別できない
原因に応じてクラスタ外部からの回復措置が要る
Split Brain Resolver* (commercial add-on)
• Mark members as “down” when a part of the
cluster become unreachable for some time
• Strategies
• Static Quorum
• Keep Majority - default in Lagom
• Keep Oldest
• Keep Referee
* http://doc.akka.io/docs/akka/rp-current/scala/split-brain-resolver.html
商⽤add-onである程度包括的に⾃動のdown指定が可能
⼀定時間メンバの状態・到達可能性に変化がない時に発動
Reset cluster membership (poor man’s)
存命ノードを新しいクラスタに参加させ直す
seed(s)
old cluster (ddata) new cluster (ddata)
Reset cluster membership (poor man’s)
存命ノードを新しいクラスタに参加させ直す
seed(s)
old cluster (ddata) new cluster (ddata)
Reset cluster membership (poor man’s)
存命ノードを新しいクラスタに参加させ直す
seed(s)
old cluster (ddata) new cluster (ddata)
Caution
• Side-effect caused by app restart
• ddata is experimental (at Akka 2.4)
• Use Akka 2.4.8 or higher*
再起動による副作⽤やAkkaのバージョンに注意が必要
*has a fix on distributed pub-sub akka#20847
To keep cluster membership healthy
1. Trigger mark-as-down (or leave) on scale-in
2. Automate restart/recreation of

AcotrSystem, JVM, server instance
3. Setup a fallback mechanism such as

split brain resolver, or

rejoining into a new cluster
unreachable対策のまとめ
Challenges
• Strategy of unreachables removal
• Lifecycle of journals
次に、2つ⽬の課題についてお話しします。
Journal
entities
entities
…
entities
ElastiCache
(Journal)
S3
(Snapshot)
Event Sourcingにおけるイベントストアに対応するAPI
Journalをキャッシュのように運⽤する想定をした
Cleanup old journals in Redis plug-in*
JournalのDeleteMessageでは⼀部データが残るケースがある
snapshotとのsequenceNrの⼀貫性に注意
key in Redis removed on deleteMessages
journal:$persistenceId Yes
journal:$persistenceId.highestSequenceNr No
* https://github.com/hootsuite/akka-persistence-redis/blob/master/src/main/scala/com/hootsuite/akka/persistence/redis/journal/RedisJournal.scala
Deleting highstSequenceNr could cause loading old version of snapshot.
→ Keep only the latest snapshot.
Event Sourcing and Ecosystem
“it stores a complete history of the events
associated with the aggregates in your domain”
Reference 3: Introducing Event Sourcing, CQRS Journey[CQJ]
本来のイベントストアはイベントの完全な履歴を持つ想定
そこから逸れるとエコシステム(plug-in)のサポートも弱くなる
Summary
• Lessons learned from devops of an Akka Cluster app
• Strategy on unreachables removal
• scale-in trigger
• automatic restart/recreation
• fallback mechanism;

split-brain resolver / rejoining
• Lifecycle of journals
• cost of deviation from Event Sourcing
unreachableメンバを取り除く仕組みを各種⼊れておく
Journalのキャッシュ的運⽤は意外と⼤変(なことがある)
Reference
[CQJ] Exploring CQRS and Event Sourcing, Dominic Betts, Julian
Dominguez, Grigori Melnik, Fernando Simonazzi, Mani Subramanian,
2012, https://msdn.microsoft.com/en-us/library/jj554200.aspx
[PSE] Persistence - Schema Evolution, Akka Documentation, http://
doc.akka.io/docs/akka/2.4/scala/persistence-schema-evolution.html

More Related Content

What's hot

Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Lightbend
 
獨體模式
獨體模式 獨體模式
獨體模式
Jen-Hsuan Hsieh
 
Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)
Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)
Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)
petabridge
 
Developing distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka ClusterDeveloping distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka Cluster
Konstantin Tsykulenko
 
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka MicroservicesPutting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
Lightbend
 
The do's and don'ts with java 9 (Devoxx 2017)
The do's and don'ts with java 9 (Devoxx 2017)The do's and don'ts with java 9 (Devoxx 2017)
The do's and don'ts with java 9 (Devoxx 2017)
Robert Scholte
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
Amazon Web Services
 
SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
Anshum Gupta
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
All Things Open
 
Akka Cluster in Production
Akka Cluster in ProductionAkka Cluster in Production
Akka Cluster in Production
bilyushonak
 
CloudStack, jclouds, Jenkins and CloudCat
CloudStack, jclouds, Jenkins and CloudCatCloudStack, jclouds, Jenkins and CloudCat
CloudStack, jclouds, Jenkins and CloudCatAndrew Bayer
 
CloudStack, jclouds and Whirr!
CloudStack, jclouds and Whirr!CloudStack, jclouds and Whirr!
CloudStack, jclouds and Whirr!
Andrew Bayer
 
Real World Java 9
Real World Java 9Real World Java 9
Real World Java 9
J On The Beach
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
Anshum Gupta
 
Scala play-framework
Scala play-frameworkScala play-framework
Scala play-framework
Abdhesh Kumar
 
MySQL High Availability -- InnoDB Clusters
MySQL High Availability -- InnoDB ClustersMySQL High Availability -- InnoDB Clusters
MySQL High Availability -- InnoDB Clusters
Matt Lord
 
Apache jclouds SF Meetup, July 8, 2013
Apache jclouds SF Meetup, July 8, 2013Apache jclouds SF Meetup, July 8, 2013
Apache jclouds SF Meetup, July 8, 2013
Andrew Bayer
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformAkka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
MySQL Group Replication - an Overview
MySQL Group Replication - an OverviewMySQL Group Replication - an Overview
MySQL Group Replication - an Overview
Matt Lord
 
MySQL Operator for Kubernetes
MySQL Operator for KubernetesMySQL Operator for Kubernetes
MySQL Operator for Kubernetes
Kenny Gryp
 

What's hot (20)

Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
獨體模式
獨體模式 獨體模式
獨體模式
 
Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)
Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)
Continuous Deployment with Akka.Cluster and Kubernetes (Akka.NET)
 
Developing distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka ClusterDeveloping distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka Cluster
 
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka MicroservicesPutting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
 
The do's and don'ts with java 9 (Devoxx 2017)
The do's and don'ts with java 9 (Devoxx 2017)The do's and don'ts with java 9 (Devoxx 2017)
The do's and don'ts with java 9 (Devoxx 2017)
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
 
Akka Cluster in Production
Akka Cluster in ProductionAkka Cluster in Production
Akka Cluster in Production
 
CloudStack, jclouds, Jenkins and CloudCat
CloudStack, jclouds, Jenkins and CloudCatCloudStack, jclouds, Jenkins and CloudCat
CloudStack, jclouds, Jenkins and CloudCat
 
CloudStack, jclouds and Whirr!
CloudStack, jclouds and Whirr!CloudStack, jclouds and Whirr!
CloudStack, jclouds and Whirr!
 
Real World Java 9
Real World Java 9Real World Java 9
Real World Java 9
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
Scala play-framework
Scala play-frameworkScala play-framework
Scala play-framework
 
MySQL High Availability -- InnoDB Clusters
MySQL High Availability -- InnoDB ClustersMySQL High Availability -- InnoDB Clusters
MySQL High Availability -- InnoDB Clusters
 
Apache jclouds SF Meetup, July 8, 2013
Apache jclouds SF Meetup, July 8, 2013Apache jclouds SF Meetup, July 8, 2013
Apache jclouds SF Meetup, July 8, 2013
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformAkka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive Platform
 
MySQL Group Replication - an Overview
MySQL Group Replication - an OverviewMySQL Group Replication - an Overview
MySQL Group Replication - an Overview
 
MySQL Operator for Kubernetes
MySQL Operator for KubernetesMySQL Operator for Kubernetes
MySQL Operator for Kubernetes
 

Viewers also liked

Preparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuriPreparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuri
TIS Inc.
 
Scala Matsuri 2017
Scala Matsuri 2017Scala Matsuri 2017
Scala Matsuri 2017
Yoshitaka Fujii
 
Developing an Akka Edge1-3
Developing an Akka Edge1-3Developing an Akka Edge1-3
Developing an Akka Edge1-3
saaaaaaki
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching PatternDeadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
chibochibo
 
Make your programs Free
Make your programs FreeMake your programs Free
Make your programs Free
Pawel Szulc
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
takezoe
 
Van laarhoven lens
Van laarhoven lensVan laarhoven lens
Van laarhoven lens
Naoki Aoyama
 
Pratical eff
Pratical effPratical eff
Pratical eff
Eric Torreborre
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
Pawel Szulc
 
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer ExampleReducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
Connie Chen
 
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
Eugene Yokota
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
univalence
 
Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~
Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~
Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~
Masahito Zembutsu
 
ScalaにまつわるNewsな話
ScalaにまつわるNewsな話ScalaにまつわるNewsな話
ScalaにまつわるNewsな話
Yosuke Mizutani
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
Adam Gibson
 
Scala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscalaScala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscala
Kazuhiro Sera
 
Arquitectura barroca
Arquitectura barrocaArquitectura barroca
Arquitectura barroca
Maria Carmona
 
究極のPHP本完成
究極のPHP本完成究極のPHP本完成
究極のPHP本完成
Katsuhiro Ogawa
 
Sbtのマルチプロジェクトはいいぞ
SbtのマルチプロジェクトはいいぞSbtのマルチプロジェクトはいいぞ
Sbtのマルチプロジェクトはいいぞ
Yoshitaka Fujii
 

Viewers also liked (20)

Preparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuriPreparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuri
 
Scala Matsuri 2017
Scala Matsuri 2017Scala Matsuri 2017
Scala Matsuri 2017
 
Developing an Akka Edge1-3
Developing an Akka Edge1-3Developing an Akka Edge1-3
Developing an Akka Edge1-3
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
 
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching PatternDeadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
 
Make your programs Free
Make your programs FreeMake your programs Free
Make your programs Free
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
 
Van laarhoven lens
Van laarhoven lensVan laarhoven lens
Van laarhoven lens
 
Pratical eff
Pratical effPratical eff
Pratical eff
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
 
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer ExampleReducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
 
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
 
Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~
Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~
Dockerの期待と現実~Docker都市伝説はなぜ生まれるのか~
 
ScalaにまつわるNewsな話
ScalaにまつわるNewsな話ScalaにまつわるNewsな話
ScalaにまつわるNewsな話
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Scala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscalaScala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscala
 
Arquitectura barroca
Arquitectura barrocaArquitectura barroca
Arquitectura barroca
 
究極のPHP本完成
究極のPHP本完成究極のPHP本完成
究極のPHP本完成
 
Sbtのマルチプロジェクトはいいぞ
SbtのマルチプロジェクトはいいぞSbtのマルチプロジェクトはいいぞ
Sbtのマルチプロジェクトはいいぞ
 

Similar to Akka Cluster and Auto-scaling

Cassandra Bootstap from Backups
Cassandra Bootstap from BackupsCassandra Bootstap from Backups
Cassandra Bootstap from Backups
Instaclustr
 
Cassandra Bootstrap from Backups
Cassandra Bootstrap from BackupsCassandra Bootstrap from Backups
Cassandra Bootstrap from Backups
Instaclustr
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatAccelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
NetApp
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Lucidworks (Archived)
 
Coherence RoadMap 2018
Coherence RoadMap 2018Coherence RoadMap 2018
Coherence RoadMap 2018
harvraja
 
Hazelcast Essentials
Hazelcast EssentialsHazelcast Essentials
Hazelcast Essentials
Rahul Gupta
 
AppFabric Velocity
AppFabric VelocityAppFabric Velocity
AppFabric Velocity
Dennis van der Stelt
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02
Jinho Shin
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
DoKC
 
AtlasCamp 2012 - Testing JIRA plugins smarter with TestKit
AtlasCamp 2012 - Testing JIRA plugins smarter with TestKitAtlasCamp 2012 - Testing JIRA plugins smarter with TestKit
AtlasCamp 2012 - Testing JIRA plugins smarter with TestKit
Wojciech Seliga
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10
Kenny Gryp
 
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesMySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
Kenny Gryp
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
Lucidworks
 
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for KubernetesScylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
ScyllaDB
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds
Arun Gupta
 
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
DataStax
 
Effective Testing in DSE
Effective Testing in DSEEffective Testing in DSE
Effective Testing in DSE
pedjak
 

Similar to Akka Cluster and Auto-scaling (20)

Cassandra Bootstap from Backups
Cassandra Bootstap from BackupsCassandra Bootstap from Backups
Cassandra Bootstap from Backups
 
Cassandra Bootstrap from Backups
Cassandra Bootstrap from BackupsCassandra Bootstrap from Backups
Cassandra Bootstrap from Backups
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatAccelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Coherence RoadMap 2018
Coherence RoadMap 2018Coherence RoadMap 2018
Coherence RoadMap 2018
 
Hazelcast Essentials
Hazelcast EssentialsHazelcast Essentials
Hazelcast Essentials
 
AppFabric Velocity
AppFabric VelocityAppFabric Velocity
AppFabric Velocity
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
AtlasCamp 2012 - Testing JIRA plugins smarter with TestKit
AtlasCamp 2012 - Testing JIRA plugins smarter with TestKitAtlasCamp 2012 - Testing JIRA plugins smarter with TestKit
AtlasCamp 2012 - Testing JIRA plugins smarter with TestKit
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10
 
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesMySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for KubernetesScylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds
 
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
 
Effective Testing in DSE
Effective Testing in DSEEffective Testing in DSE
Effective Testing in DSE
 

Recently uploaded

Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Akka Cluster and Auto-scaling

  • 1. Akka Cluster and Auto-scaling Ikuo Matsumura CyberAgent, Inc. 2017/02/26
  • 2. Akka Cluster ⾮中央集権的なノード群構築を⾏うAkka拡張 10ヶ⽉程運⽤してきた中からつまづいた点・学んだ点を紹介 • Decentralized cluster membership service • no single point of failure, bottleneck • distribute actors over multiple JVMs • Applied to build a sub-system on AD serving • Tens of servers • Operations about 10 months
  • 3. Requirements in our case • Host a lot of Entity with low cost • Fit existing Akka application • Down-time is acceptable to some extent* Akkaベースで多数のEntityを低コストで配備したい 多少のダウンタイムは許容できる *online machine learning
  • 4. Our application of Akka Cluster 永続ActorをCluster Shardingで配備 データ保管にコモディティサービスを使⽤ frontend frontend frontend entities entities … … entities frontend • Existing app • Tens of nodes • Auto-scaling • New sub-system • Several nodes ElastiCache (Journal) S3 (Snapshot) data stores …
  • 5. Challenges • Strategy on unreachables removal • Lifecycle of journals 運⽤する中でつまづいた2つの課題についてお話します
  • 6. Membership Lifecycle in Cluster Specification クラスタメンバーのライフサイクルの概観 http://doc.akka.io/docs/akka/2.4/common/cluster.html#Membership_Lifecycle joining up down removed join leaving exiting unreachable leave
  • 7. Joinning and Leader Action “leader action”を経て、他メンバと通信できるようになる joining up down join unreachable
  • 8. Joinning and Leader Action “leader action”を経て、他メンバと通信できるようになる joining up down unreachable leader action
  • 9. Joinning and Leader Action Scale-in発⽣時、unreachable のままになる joining up down unreachable failure detector leader action
  • 10. Joinning and Leader Action leader actionが⾏えなくなる。 結果、新規メンバが他のメンバと通信できないままに。 joining up down unreachable leader action
  • 11. Joinning and Leader Action Scale-inをトリガにしたdown指定が必要 joining up down unreachable leader action scale-in trigger mark as down
  • 12. Joinning and Leader Action unreachableを除くことでleader actionが再開可能に joining up down unreachable leader action
  • 13. Leader actions blocked by unreachables leader actionが⾏えない状態のログの例 Members that are “up” but have not seen the current state “Leader can currently not perform its duties”
  • 14. Causes and actions on unreachables Type of failures Example Possible external action network partitions - wait for recovery or abandon a part machine crashes scale-in mark as down quarantined in akka remote layer restart an actor system unresponseive process long GC restart a JVM CPU starvation by credit shortage in EC2 re-create an instance failure detector はエラーの原因までは区別できない 原因に応じてクラスタ外部からの回復措置が要る
  • 15. Split Brain Resolver* (commercial add-on) • Mark members as “down” when a part of the cluster become unreachable for some time • Strategies • Static Quorum • Keep Majority - default in Lagom • Keep Oldest • Keep Referee * http://doc.akka.io/docs/akka/rp-current/scala/split-brain-resolver.html 商⽤add-onである程度包括的に⾃動のdown指定が可能 ⼀定時間メンバの状態・到達可能性に変化がない時に発動
  • 16. Reset cluster membership (poor man’s) 存命ノードを新しいクラスタに参加させ直す seed(s) old cluster (ddata) new cluster (ddata)
  • 17. Reset cluster membership (poor man’s) 存命ノードを新しいクラスタに参加させ直す seed(s) old cluster (ddata) new cluster (ddata)
  • 18. Reset cluster membership (poor man’s) 存命ノードを新しいクラスタに参加させ直す seed(s) old cluster (ddata) new cluster (ddata)
  • 19. Caution • Side-effect caused by app restart • ddata is experimental (at Akka 2.4) • Use Akka 2.4.8 or higher* 再起動による副作⽤やAkkaのバージョンに注意が必要 *has a fix on distributed pub-sub akka#20847
  • 20. To keep cluster membership healthy 1. Trigger mark-as-down (or leave) on scale-in 2. Automate restart/recreation of
 AcotrSystem, JVM, server instance 3. Setup a fallback mechanism such as
 split brain resolver, or
 rejoining into a new cluster unreachable対策のまとめ
  • 21. Challenges • Strategy of unreachables removal • Lifecycle of journals 次に、2つ⽬の課題についてお話しします。
  • 23. Cleanup old journals in Redis plug-in* JournalのDeleteMessageでは⼀部データが残るケースがある snapshotとのsequenceNrの⼀貫性に注意 key in Redis removed on deleteMessages journal:$persistenceId Yes journal:$persistenceId.highestSequenceNr No * https://github.com/hootsuite/akka-persistence-redis/blob/master/src/main/scala/com/hootsuite/akka/persistence/redis/journal/RedisJournal.scala Deleting highstSequenceNr could cause loading old version of snapshot. → Keep only the latest snapshot.
  • 24. Event Sourcing and Ecosystem “it stores a complete history of the events associated with the aggregates in your domain” Reference 3: Introducing Event Sourcing, CQRS Journey[CQJ] 本来のイベントストアはイベントの完全な履歴を持つ想定 そこから逸れるとエコシステム(plug-in)のサポートも弱くなる
  • 25. Summary • Lessons learned from devops of an Akka Cluster app • Strategy on unreachables removal • scale-in trigger • automatic restart/recreation • fallback mechanism;
 split-brain resolver / rejoining • Lifecycle of journals • cost of deviation from Event Sourcing unreachableメンバを取り除く仕組みを各種⼊れておく Journalのキャッシュ的運⽤は意外と⼤変(なことがある)
  • 26. Reference [CQJ] Exploring CQRS and Event Sourcing, Dominic Betts, Julian Dominguez, Grigori Melnik, Fernando Simonazzi, Mani Subramanian, 2012, https://msdn.microsoft.com/en-us/library/jj554200.aspx [PSE] Persistence - Schema Evolution, Akka Documentation, http:// doc.akka.io/docs/akka/2.4/scala/persistence-schema-evolution.html