When small problems become big problems

When small problems
become big problems

@adrianfcole

Agenda

• Introduction to CloudHub
• Challenges we faced building multi-tenant
architecture
• Q/A

Ego slide

Adrian Cole (@jclouds)
founded jclouds march 2009
cloudhub.io architect at

Platform as a Service

Automated Provisioning
Event Tracking
Centralized Logging
Secure Data Gateway

56

When you’ve priced
yourself out of
business

Cloud is utility, but
your service may be
more
• Measurement based pricing exists in
infrastructure tier
• Know your customer, who are they and
where in the value chain you act
• Don’t get into race to the bottom

When 200 users
becomes 2000
accounts

Choosing a BASIC
starting point

• Already had a LDAP infrastructure
• Straightforward integration with console
and other access tools
• Easy to do do BASIC authentication

Remember users
(and api users)
(and api users)
• Basic Auth is not a good choice for an API
over time
• System integrators need delegated access
• Hard to cleanup accounts when there are
multiple owners

When
myapp.cloudhub.io
becomes
myapp001.cloudhub.io
myapp001.cloudhub.io

How to present the
iApps

• X.cloudhub.io
• DNS is flexible to deal with
• clear branding

X.cloudhub.io woes

• Namespace contention
• qa.cloudhub.io isn’t really an iApp
• need to maintain blacklist

PaaS is more than java
-jar mule.jar

• CloudHub adds services integration to
Mule
• Logging, Event Tracking, Replay, etc.

appstack -> platform is
tricky
• transparent features and also compatible?
• dealing with network streams that could be
more brittle
• matching serialization/marshalling w/ cloud
features like streaming

When SLA turns into
refund

Desire to rely on more
services

• Cloud Infrastructure
• Cloud Search
• Cloud Scaling

Reality of relying on
more services
• uptime is less the more service
dependencies you add
• services may underperform their SLAs with
little financial impact
• you may need to manually deal with service
outages

When logging turns
into a big data
problem

Customers desire real
time search
• need to centralize and index logs
• using ElasticSearch can avoid service fees or
license fees
• with a custom logging plugin, we can
redirect output to the cluster

Logging is always a big
problem
• Clusters can fail for reasons beyond
servers deployed
• API design for logging is different
• What happens if your disk fails or your
cluster fails?
• What happens when you replace a worker?

Testability is crucial

• each dependency needs to be testable and
mockable
• devs need a local environment that
matches, or your test cases will suffer
• creation of new tenants means more
money.. test it!

Platform testing is really
hard

• Some external deps don’t have sandboxes
• Can you try 500 applications?
• Can you maintain a quiet production
“neighborhood" while testing QA

When security updates
= vi ipsec.conf in for
loop

Security in a public
service is hard
• assume user is infinitely clever and
malicious
• deny by default vs service simplicity
• maintain segregation and availability of
tenants
• Asset value can vary widely across tenants

Security design touches
everything
• ipsec is hard to maintain without proper
CM, and wasn’t built for noisy network
• deny by default means higher maintenance,
and not all products support it
• it is easy to violate tenancy segregation in a
platform
• you may have to hire consultants

When your
management service
goes haywire

automation automation
automation

• myriad of technology to automate scaling
and availability
• policies can be fine tuned to relaunch or
scale out based on system feedback or api

What about network
splits
• Will your management server “heal”
something that is already around?
• Is your management server on the same
failure plane as your managed servers
• Will you end up with manual intervention
controls (aka red button)

When your api design
haunts you

Put an API on
everything

• Allows automation and guis besides what
you’ve invented
• simplifies testing
• eat your own dogfood

Design redo is a big
problem
• GUIs can change easier as humans drive
them
• Maintaining old apis may not be worth it
• People may depend on bugs or semantic
gaps
• Version practices in ReST are not uniform
• remember understanding state machine is a
prerequisite for HATEOAS

When 5 retries
becomes a DDoS
attack

We want to build
resilient apps
• recovery is a part of the service you
provide, more important as you go up in
value chain
• connections should assume failure and be
able to reconnect to dependencies
• recovery is non-trivial

5 retries is code smell
• things that backup or fail can get worse
with naive error retry loops
• APIs often can be made to include data
about when to retry or that you need to
slow down
• Treat resilience as a requirement, not a
feature

When your users ask
the same questions

Wrong words suck

• Some terms seem sensible in design
discussions, but public use something else
• Changing requires retraining, and thorough
doc review
• What goes online lingers

When a feature
request implies new
architecture

Platform changes
• Customers are looking for service, not
explanations of why it is hard
• Adding value implies touch decisions on
new features
• As the world turns, expectations rise
• Know your customer

Real-time, full-text
search, streaming.. oh
my! full-text search,
•Not all databases support
esp with partitioning
• Some data is better stored in S3, how does
that affect indexing strategy?
• Real-time tools are emerging but immature

When you end up with
a “lock” table in
mongo

Datastore diversity!

• NoSQL datastores like Mongo are
attractive and energize developers
• Cloud provisioners like RDS-driven MySQL
are also attractive
• Specialized stores like CloudWatch for
statistics

Don’t expect mongo to
do magic
• Database Engines Mature
• Consistent backups are tricky and only
recently supported
• Data Ops and visualization tools are
emerging
• There are type safe bridges like Morphia

Hammers and
screwdrivers
• In a pinch, you can knock in a screw with a
hammer, but you can’t screw in a nail with a
screwdriver
• Don’t throw data into whatever store
happens to be easy to grab, even if you can.
• Rechecking data assumptions at T 1 is better
than T3. At T6, you may a disaster

multi-tenant platform

• Own your dependencies or they will own
you
• Add time for entropy
• Repeatedly remind yourself you are a
landlord

Architecture as
iterative development

• Forethought
• Critical debate
• Decision review

‣ @adrianfcole
‣ adrian.cole@mulesoft.com
‣ www.cloudhub.io

When small problems become big problems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to When small problems become big problems

Similar to When small problems become big problems (20)

Recently uploaded

Recently uploaded (20)

When small problems become big problems

Editor's Notes