It's a fine line to walk for incorporating new technologies in an organization with 15+ years of legacy software. In this presentation, we'll look at the lifecycle and adoption of Cassandra from a skunkworks project to a full fledged service in a legacy organization.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
C* Summit 2013: Stepping Through the Lifecycle of a Service Offering with Cassandra by Chris McEniry and Igor von Nyssen
1. Stepping Through the Lifecycle
of a Service Offering with Cassandra
Chris McEniry, Igor von Nyssen | Sr. Systems Architect, Sr. Software Architect | Sony Network Entertainment
2. * aka Ops guy
* Bs EECS MIT
* 15 years ops experience
* Python, Perl, C, LISP,
Ruby, JS, some English
* C* 0.6
* cmceniry@mit.edu
Chris “Mac” McEniry Igor von Nyssen
* aka Dev guy
* MS CS TU Chemnitz
* MBA USC
* 15 years dev experience
* Java, Ruby, English,
German
* C* 1.0
* igor@vonnyssen.com
3. * Preconception – Solution looking for a problem
* In utero – Connect and talk about it
* Infant – Initial Development
* Kid – Production rollout for an insignificant item
* Teenager – Start real maturing
* Adult – Take over the world
What we’re talking about: Lifecycle
4. * Large entertainment ecommerce application
* 100s of Millions of customers
* Billions of transactions per year
* Worldwide operations: 59 countries, 21 languages and dialects
* Peaks above 10,000 tps
What does SNEI look like?
5. * Global start-up that has to
comply with any three letter
requirement know to man.
* Pressure to innovate
* Pressure to make money
What does SNEI feel like?
13. Need to provide a service that
− Provides
user
authen0ca0on
− Is
Always-‐On
− Has
low
latency
for
every
customer
Our Problem: Latency and Availability
30. Stepping Through the Lifecycle
of a Service Offering with Cassandra
Chris McEniry, Igor von Nyssen | Sr. Systems Architect, Sr. Software Architect | Sony Network Entertainment
31. * aka Ops guy
* Bs EECS MIT
* 15 years ops experience
* Python, Perl, C, LISP,
Ruby, JS, some English
* C* 0.6
* cmceniry@mit.edu
Chris “Mac” McEniry Igor von Nyssen
* aka Dev guy
* MS CS TU Chemnitz
* MBA USC
* 15 years dev experience
* Java, Ruby, English,
German
* C* 1.0
* igor@vonnyssen.com
32. * Preconception – Solution looking for a problem
* In utero – Connect and talk about it
* Infant – Initial Development
* Kid – Production rollout for an insignificant item
* Teenager – Start real maturing
* Adult – Take over the world
What we’re talking about now: Lifecycle
33. * Large entertainment ecommerce application
* 100s of Millions of customers
* 9 figure revenue per year
* Billions of transactions per year
* Worldwide operations: 59 countries, 21 languages and dialects
* Peaks above 10,000 tps
What does SNEI look like?
34. * Global start-up that has to
comply with any three letter
requirement know to man.
* Pressure to innovate
* Pressure to make money
What does SNEI feel like?
39. * How do we keep it from blowing up?
* How do we keep it cared for for its lifetime?
* How do we keep from being a bottleneck?
* How do we keep from being woken up at 4AM?
* How do we keep up with “gimme!gimme!gimme?"
Ops Concerns
40. * Marketing features vs. real features
− What is on the outside of the box?
− What does actually work?
* C* is young
− our understanding of it is young
* Does it solve our problem adequately?
− POCPOCPOC
− We analyzed lots of in-memory data grids
-‐ Some had marketing feature problems
-‐ Some were too young
-‐ Some were too expensive
-‐ None of them addressed partitioning very well
-‐ If for nothing else, the cold startup time is a killer
* Adoption
− Can we implement C* in our organization?
Dev Concerns
41. * Dev – you're not running production.
* Ops – you’re not writing the code.
* Work with someone
− Who
sees
the
same
problem
− Is
excited
about
the
technology
− Agrees
on
the
direc0on
− Does
not
have
an
inherent
conflict
of
interest
Can’t do it by yourself
44. * Be ready to move once you find a suitable problem
− anytime you have a window, you have to move
* Create the window
* But, be sure that you want it
* Be sure it’s the right problem
You Have a Solution Looking for a Problem
47. Need to provide a service that
− Provides
user
authen0ca0on
− Is
Always-‐On
− Has
low
latency
for
every
customer
Our Problem: Latency and Availability
49. * Not a problem that can be solved by slapping a cache in front of
existing technology
* Need distributed storage
* Global distribution helps with both latency and disaster recovery
− Data
replica0on
and
synchroniza0on
are
built-‐in
− Par00on
events
are
handled
out
of
the
box
* Continuous delivery is part of the C* philosophy
Why Does C* Fit Here?
52. * Good Ally – Someone who
− Sees the same problem
− Is excited about the same technology
− Agrees on the direction
* Bad Ally – Someone who
− Will have an inherent conflict of interest
Good Allies, Bad Allies
53. * Develop your elevator pitch for the problem
* Proof it with your allies
* Keep it simple
Latency, Latency, Latency
Coin the Mantra
55. Strategically decide who you’re not going to involve
The New RACI Chart
Role
Responsible
Avoided
Circumvented
Ignored
You
X
Friend
X
Frenemy
X
X
X
56. * It's new technology, so you're outside your comfort zone
* You're evangelizing, so the proof is on you
* Account for different adoption speeds by different people
* Stay positive
Remember: Shiny, But Unknown and Unproven
57. * Ops
− Service
-‐ Central
-‐ Embedded
− Support
-‐ Formal
-‐ Community
-‐ On-your-own
− Schema definition ownership
Figure Out Your Models
* Dev
− How many nodes?
− How many copies?
− How many data centers?
58. * No budget approvals
* Easy to get started
Once you have an idea of what you're talking about, then start to ask
the better questions.
Cassandra Is Open Source – Use That Fact
62. * operational tasks
− Setup
-‐ roll everything in rpms
-‐ there's a bit of a manual setup - cluster join/remove
− Backup
− infrastructure changes (node additions, etc.)
-‐ there's a bit of a manual setup - cluster join/remove
− Monitoring
-‐ jolokia to tie into existing systems, but have to re-invent all of the metric metadata
-‐ Availability
− incident response
-‐ training and see monitoring/visibility
− handling regular tasks
-‐ nodetool repair
-‐ add keyspaces
-‐ updating keyspaces for infrastructure changes
* have a clear service description
* end user support
− self-help community
− email list (subscription), chat room
Minimum Viable (Go Live) Product
63. * Store some data
− start with transient data
− TTLs help here
− If it’s lost, it’s an acceptable loss
* Select which client you will use
− Hector, Astyanax, or Datastax
* Figure out how clients connect to the server
* Understand how C* stores data
− Data model
− Serializers
− Comparators
* Free your mind
− Learn to love primary-key-only indexing
− Start de-normalizing
− Give up on transactions – they are evil to begin with
Minimum Viable (Go Live) Product
65. * Take daddy’s credit card
* Get serious about other data classes
* Start putting something that you need to preserve in there
* Data life cycle management
* Security concerns
* Acknowledge gaps – where either the technology or you or your
implementation of the technology are just not ready
* Revisit your assumptions
Teenager
67. * Allies
− Security
− RDMS zeal^Wusers
* Problems
− Adjacent problems – things that are close to what you’ve already
solved, but bigger and more complicated
* Key to success
− It is still early enough to be their idea
− But it is established enough that people take it seriously
Gather More
68. * More visibility à More battles
* Continue to choose wisely
− Decide where not to focus?
− What not to try?
− What to be compliant with?
* Remember: Proof is on you
− Other groups will closely watch your success or failure
− Auditors will ask you the questions. There is no vendor certification (yet).
Teenage Angst
69. * If you really must do large multipart transactions, choose a different
product.
* Or just don’t. Change your mental model
− From: transactions and conflict prevention
− To: conflict detection and remediation
* Read this: http://www.eaipatterns.com/ramblings/18_starbucks.html
* Avoid sequences that attempt to centralize counting
Teenage Angst
70. * Search (DSE)
* Solidify your backup
* Solidify your deployment model
* Solidify your support
* Solidify your security
− Audit
− Authentication and Authorization
− Encryption
Growth Spurts
76. * In terms of
− Data
− Applicable
context
− Ability
to
adopt
and
adapt
* Less inertia
* Moving faster
* Less energy for the same distance : “More efficient”
Underlying Problem: Critical Mass
77. Summarize
* Preconception – Explore it
* In utero – Start small
* Infant – Be comfortably uncomfortable
* Kid – Grow
* Teenager – Get help
* Adult – Profit
We has a lot of momentum/inertia Means it takes a while to get something changed Have to work within some limitations and around some limitations Separation of Duties Be ready to move - in a slow chnaging org, anytime you have a window, you have to move it push for the window