8. Cloud is utility, but
your service may be
more
• Measurement based pricing exists in
infrastructure tier
• Know your customer, who are they and
where in the value chain you act
• Don’t get into race to the bottom
10. Choosing a BASIC
starting point
• Already had a LDAP infrastructure
• Straightforward integration with console
and other access tools
• Easy to do do BASIC authentication
11. Remember users
(and api users)
(and api users)
• Basic Auth is not a good choice for an API
over time
• System integrators need delegated access
• Hard to cleanup accounts when there are
multiple owners
16. PaaS is more than java
-jar mule.jar
• CloudHub adds services integration to
Mule
• Logging, Event Tracking, Replay, etc.
17. appstack -> platform is
tricky
• transparent features and also compatible?
• dealing with network streams that could be
more brittle
• matching serialization/marshalling w/ cloud
features like streaming
19. Desire to rely on more
services
• Cloud Infrastructure
• Cloud Search
• Cloud Scaling
20. Reality of relying on
more services
• uptime is less the more service
dependencies you add
• services may underperform their SLAs with
little financial impact
• you may need to manually deal with service
outages
22. Customers desire real
time search
• need to centralize and index logs
• using ElasticSearch can avoid service fees or
license fees
• with a custom logging plugin, we can
redirect output to the cluster
23. Logging is always a big
problem
• Clusters can fail for reasons beyond
servers deployed
• API design for logging is different
• What happens if your disk fails or your
cluster fails?
• What happens when you replace a worker?
25. Testability is crucial
• each dependency needs to be testable and
mockable
• devs need a local environment that
matches, or your test cases will suffer
• creation of new tenants means more
money.. test it!
26. Platform testing is really
hard
• Some external deps don’t have sandboxes
• Can you try 500 applications?
• Can you maintain a quiet production
“neighborhood" while testing QA
28. Security in a public
service is hard
• assume user is infinitely clever and
malicious
• deny by default vs service simplicity
• maintain segregation and availability of
tenants
• Asset value can vary widely across tenants
29. Security design touches
everything
• ipsec is hard to maintain without proper
CM, and wasn’t built for noisy network
• deny by default means higher maintenance,
and not all products support it
• it is easy to violate tenancy segregation in a
platform
• you may have to hire consultants
31. automation automation
automation
• myriad of technology to automate scaling
and availability
• policies can be fine tuned to relaunch or
scale out based on system feedback or api
32. What about network
splits
• Will your management server “heal”
something that is already around?
• Is your management server on the same
failure plane as your managed servers
• Will you end up with manual intervention
controls (aka red button)
34. Put an API on
everything
• Allows automation and guis besides what
you’ve invented
• simplifies testing
• eat your own dogfood
35. Design redo is a big
problem
• GUIs can change easier as humans drive
them
• Maintaining old apis may not be worth it
• People may depend on bugs or semantic
gaps
• Version practices in ReST are not uniform
• remember understanding state machine is a
prerequisite for HATEOAS
37. We want to build
resilient apps
• recovery is a part of the service you
provide, more important as you go up in
value chain
• connections should assume failure and be
able to reconnect to dependencies
• recovery is non-trivial
38. 5 retries is code smell
• things that backup or fail can get worse
with naive error retry loops
• APIs often can be made to include data
about when to retry or that you need to
slow down
• Treat resilience as a requirement, not a
feature
40. Wrong words suck
• Some terms seem sensible in design
discussions, but public use something else
• Changing requires retraining, and thorough
doc review
• What goes online lingers
42. Platform changes
• Customers are looking for service, not
explanations of why it is hard
• Adding value implies touch decisions on
new features
• As the world turns, expectations rise
• Know your customer
43. Real-time, full-text
search, streaming.. oh
my! full-text search,
•Not all databases support
esp with partitioning
• Some data is better stored in S3, how does
that affect indexing strategy?
• Real-time tools are emerging but immature
45. Datastore diversity!
• NoSQL datastores like Mongo are
attractive and energize developers
• Cloud provisioners like RDS-driven MySQL
are also attractive
• Specialized stores like CloudWatch for
statistics
46. Don’t expect mongo to
do magic
• Database Engines Mature
• Consistent backups are tricky and only
recently supported
• Data Ops and visualization tools are
emerging
• There are type safe bridges like Morphia
47. Hammers and
screwdrivers
• In a pinch, you can knock in a screw with a
hammer, but you can’t screw in a nail with a
screwdriver
• Don’t throw data into whatever store
happens to be easy to grab, even if you can.
• Rechecking data assumptions at T 1 is better
than T3. At T6, you may a disaster
Not going to focus on normal dev/ops except in context to multi-tenancy
Keep tenant safe, protect all tenants, stay in business http://whosjack.wpengine.netdna-cdn.com/wp-content/uploads/2012/05/terraced_houses_manchester_298792.jpg
Integration customers range from potato to dev/architect; high value features are not easy to pay on a per-message basis, esp when some services run only 150 messages/month; find the right pricing model for users who want to just use mule by itself, they can use ec2 or heroku
LDAP infrastructure existed for Mulesoft community
how do you handle lockout of users? system keys, etc. Who’s building them?! SIs need access to create apps for other users, and account conflation leads to N accounts some users just want access to download mule and docs and will never become a cloudhub user
also follows patterns like s3 which is largest cloud service
x.tenant.cloudhub.io can do more w/ a tenant
also introduces another complexity in release process and opportunity for laptop != prod
ex. dynamo says fully reliable, but if dynamo is out you can only “wait”
some problems are much harder than they seem. Searching, indexing, chronology are difficult and emerging products can suffer from reliability. Logging is also a core means to troubleshoot problems, so if logging is a problem, it is a big problem… significant effort and expertise to nail. cluster *will* eventually become split-brain; how long to restore service from rebooting?
ex. marketo has no sandbox, neither does billing system even if you reproduce prod apps, how do you reproduce behavior of them? *corner cases are the ones likely to stress your platform out*
desire for ssh access can thwart your firewall rules Each ipsec.conf has tunnels configured for each of its peers, and needs to recognize one side of the tunnel as itself. This results in each host's ipsec.conf being unique to that instance, so you can't collapse the hosts into a class, but have to manage each one separately. you need to use config management to role-assign this. we move off VPC due to many problems with using ipsec, yet still have inter-region problem. do you know solution?
ex. failures in management can lead to conservative healing and scaling policies or mandatory user intervention
sometimes people code that true = false as opposed to report a bug api design can go be loveable than hateable or hateable first MVP approach may backfire when you are dealing with a *public* service simple as possible, and expose conservatively
know your customer and if they are likely to be savvy enough to recover from system failures
ex. notification vs activity feed, streaming? once information escapes your network, it can haunt you with clashing instructions, stackoverflow, etc.
ex. what streaming is?)
mysql doesn’t support full text search on partitioned tables ex. druid just released yesterday, twitter storm only a year move the problem from the customer to us, which includes the technical profile and migration. small problem become big problem is when a customer desires capability unsupported or difficult to support with the existing (datastore|infrastructure) or indexing strategy
consistent backup isn't possible until very recently ext4. immature by DB standards, though older than its years. devs love it, and lack of tools are problem; basically have to use navicat, etc. for DataOps stuck with command line, visualization problems. no way to do analytics onto of mongo, without telling them to write some javascript. and answer might be transform to postgresql
we should use mongo or RDS, mongo isn't being used correctly, as a relational database, so it has transactional data; we now have event tracking, but we don't have a document for the event configuration. have to store the whole thing or you will have tx data problems; main problem is that it is not being used correctly. Eventhough it seems you can store relational data in a nosql store like mongo, doesn’t mean you should. 2 (or more) types of datastores may be the most supportable answer to your data problem.A. we use it as Tx (so clashing or overlapping writes)B. it doesn't give you mature features (like consistent backup)
pretty much everyone has to be DevOps design can be refactored, but are tough to change at scaleyour job changes often, so right tool for the job also changes realize choices made now can be difficult to change a year on