Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Antoine Coetsier - billing the cloud

707 views

Published on

Antoine Coetsier - billing the cloud. Presentation from CloudStack European User Group, Thursday, April 19, London.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Antoine Coetsier - billing the cloud

  1. 1. #CloudstackCephDay Billing the cloud A tale of devops
  2. 2. #CloudstackCephDay Antoine Coetsier ● CEO & co-founder at Exoscale ● Engineer ● Since 2011
  3. 3. #CloudstackCephDay Credits: Pierre-Yves Ritchards ● CTO & co-founder at Exoscale ● Open Source Developer ● Monitoring & Distributed Systems Enthusiast ● @pyr
  4. 4. #CloudstackCephDay Billing the cloud A tale of devops
  5. 5. #CloudstackCephDay ● Our Story ● Billing resources ● Scaling methodologies ● Our approach
  6. 6. #CloudstackCephDay
  7. 7. #CloudstackCephDay
  8. 8. #CloudstackCephDay provider "exoscale" { api_key = "${var.exoscale_api_key}" secret_key = "${var.exoscale_secret_key}" } resource "exoscale_instance" "web" { template = "ubuntu 17.04" disk_size = "50g" profile = "medium" ssh_key = "production" }
  9. 9. #CloudstackCephDay Infrastructure isn’t free! (sorry)
  10. 10. #CloudstackCephDay Business Model ● Provide cloud infrastructure ● (???) ● Profit!
  11. 11. #CloudstackCephDay 10000 mile high view
  12. 12. Quantities
  13. 13. Quantities ● 10 megabytes have been sent from 159.100.251.251 over the last minute
  14. 14. Resources
  15. 15. Resources ● Account bar started instance foo with profile large today at 12:00 ● Account bar stopped instance foo today at 12:15
  16. 16. A bit closer to reality {:type :usage :entity :vm :action :create :time #inst "2017-11-12T15:48:32.000-00:00" :template "ubuntu-17.04" :source :compute :account "modern-devops" :uuid "7a070a3d-66ff-4658-ab08-fe3cecd7c70f" :version 1 :offering "medium"}
  17. 17. #CloudstackCephDay Theory
  18. 18. #CloudstackCephDay Quantities are simple
  19. 19. #CloudstackCephDay Resources are harder
  20. 20. #CloudstackCephDay This is per account
  21. 21. #CloudstackCephDay Solving for all events
  22. 22. resources = {} metering = [] def usage_metering(): for event in fetch_all_events(): uuid = event.uuid() time = event.time() if event.action() == 'start': resources[uuid] = time else: timespan = duration(resources[uuid], time) usage = Usage(uuid, timespan) metering.append(usage) return metering
  23. 23. #CloudstackCephDay In Practice
  24. 24. #CloudstackCephDay ● This is a never-ending process ● Minute-precision billing ● Applied every hour
  25. 25. #CloudstackCephDay ● Avoid overbilling at all cost ● Avoid underbilling (we need to eat!)
  26. 26. #CloudstackCephDay ● Keep a small operational footprint
  27. 27. #CloudstackCephDay A naive approach
  28. 28. 30 * * * * usage-metering >/dev/null 2>&1
  29. 29. #CloudstackCephDay Advantages
  30. 30. #CloudstackCephDay ● Low operational overhead ● Simple functional boundaries ● Easy to test
  31. 31. #CloudstackCephDay Drawbacks
  32. 32. #CloudstackCephDay ● High pressure on SQL server ● Hard to avoid overlapping jobs ● Overlaps result in longer metering intervals
  33. 33. You are in a room full of overlapping cron jobs. You can hear the screams of a dying MySQL server. An Oracle vendor is here. To the West, a door is marked “Map/Reduce” To the East, a door is marked “Stream Processing”
  34. 34. > Talk to Oracle
  35. 35. You’ve been eaten by a grue.
  36. 36. > Go West
  37. 37. #CloudstackCephDay
  38. 38. #CloudstackCephDay ● Conceptually simple ● Spreads easily ● Data locality aware processing
  39. 39. #CloudstackCephDay ● ETL ● High latency ● High operational overhead
  40. 40. > Go East
  41. 41. #CloudstackCephDay
  42. 42. #CloudstackCephDay ● Continuous computation on an unbounded stream ● Each record processed as it arrives ● Very low latency
  43. 43. #CloudstackCephDay ● Conceptually harder ● Where do we store intermediate results? ● How does data flow between computation steps?
  44. 44. #CloudstackCephDay Deciding factors
  45. 45. #CloudstackCephDay Our shopping list ● Operational simplicity ● Integration through our whole stack ● Room to grow
  46. 46. #CloudstackCephDay Operational simplicity ● Experience matters ● Spark and Storm are intimidating ● Hbase & Hive discarded
  47. 47. #CloudstackCephDay Integration ● HDFS & Kafka require simple integration ● Spark goes hand in hand with Cassandra
  48. 48. #CloudstackCephDay Room to grow ● A ton of logs ● A ton of metrics
  49. 49. #CloudstackCephDay Small confession ● Previously knew Kafka
  50. 50. #CloudstackCephDay
  51. 51. #CloudstackCephDay ● Publish & Subscribe ● Processing ● Store
  52. 52. #CloudstackCephDay Publish & Subscribe ● Records are produced on topics ● Topics have a predefined number of partitions ● Records have a key which determines their partition
  53. 53. #CloudstackCephDay ● Consumers get assigned a set of partitions ● Consumers store their last consumed offset ● Brokers own partitions, handle replication
  54. 54. #CloudstackCephDay ● Stable consumer topology ● Memory disaggregation ● Can rely on in-memory storage ● Age expiry and log compaction
  55. 55. #CloudstackCephDay
  56. 56. #CloudstackCephDay Billing at Exoscale
  57. 57. #CloudstackCephDay Problem solved?
  58. 58. #CloudstackCephDay ● Process crashes ● Undelivered message? ● Avoiding overbilling
  59. 59. #CloudstackCephDay Reconciliation ● Snapshot of full inventory ● Converges stored resource state if necessary ● Handles failed deliveries as well
  60. 60. #CloudstackCephDay Avoiding overbilling ● Reconciler acts as logical clock ● When supplying usage, attach a unique transaction ID ● Reject multiple transaction attempts on a single ID
  61. 61. #CloudstackCephDay Avoiding overbilling ● Reconciler acts as logical clock ● When supplying usage, attach a unique transaction ID ● Reject multiple transaction attempts on a single ID
  62. 62. #CloudstackCephDay Parting words
  63. 63. #CloudstackCephDay Looking back ● Things stay simple (roughly 600 LoC) ● Room to grow ● Stable and resilient ● DNS, Logs, Metrics, Event Sourcing
  64. 64. #CloudstackCephDay What about batch? ● Streaming doesn’t work for everything ● Sometimes throughput matters more than latency ● Building models in batch, applying with stream processing
  65. 65. #CloudstackCephDay Thanks! Questions? (we are hiring!)

×