StackWatch	
  
Monitoring-­‐as-­‐a-­‐service	
  for	
  Apache	
  
CloudStack	
  
(actually	
  an	
  explora=on	
  around	
...
Disclaimer	
  
•  Developer	
  talk	
  
•  No	
  demo	
  
•  Designed	
  to	
  make	
  you	
  think	
  
	
  
Agenda	
  
•  Introduc=on	
  to	
  StackWatch	
  
•  The	
  design	
  of	
  StackWatch	
  
•  Lessons	
  learned	
  
•  Ti...
What	
  is	
  StackWatch?	
  
Monitoring-­‐as-­‐a-­‐service	
  for	
  the	
  users	
  of	
  a	
  
CloudStack	
  Cloud	
  
...
StackWatch	
  Mo=va=on	
  
•  AutoScale	
  implementa=on	
  
in	
  Apache	
  CloudStack	
  is	
  
adequate	
  but	
  limi=...
AutoScale	
  poten=al	
  improvements	
  
•  Use	
  applica=on	
  metrics	
  
–  BeWer	
  indica=on	
  of	
  applica=on	
 ...
Non-­‐func=onal	
  requirements	
  
•  Develop	
  in	
  a	
  different	
  (i.e.,	
  not	
  Java)	
  language	
  
– More	
  ...
Digression	
  
•  Apache	
  CloudStack	
  can	
  be	
  in=mida=ng	
  
– Lots	
  of	
  features	
  baked	
  in	
  
– Limite...
The	
  Narrow	
  Waist	
  Model	
  
of	
  the	
  Internet	
  	
  
Innova=on	
  
Innova=on	
  
Hard	
  to	
  change	
  
Apache	
  CloudStack	
  Narrow	
  Waist	
  
ACS	
  Core	
  
XenServer	
   KVM	
   Hyper-­‐V	
   vSphere	
  
NFS	
  	
   IS...
Example:	
  The	
  VR	
  model	
  inside-­‐out	
  
ACS	
  
1.	
  create	
  network	
  
2.	
  create	
  VR	
  
Hyperv
isor	...
Micro	
  Services?	
  
a	
  par=cular	
  way	
  of	
  designing	
  sogware	
  
applica=ons	
  as	
  suites	
  of	
  indepe...
Monolith	
  vs.	
  Microservice	
  
•  Monolith:	
  
–  Change	
  is	
  hard	
  (-­‐)	
  
–  Service	
  automa=cally	
  ge...
AWS	
  Example	
  
•  Service	
  boundaries	
  are	
  defined	
  by	
  API	
  
endpoints.	
  
•  Separate	
  API	
  endpoin...
StackWatch	
  Architecture	
  
CloudStack	
  
StackWatch	
  
Riemann	
  
OpenTSDB	
  
PutMetrics/CreateAlarm/	
  
GetStats...
Components	
  -­‐	
  OpenTSDB	
  
•  Open	
  Time	
  Series	
  Database	
  
– Front-­‐end	
  to	
  Apache	
  HBase	
  
– O...
Component:	
  Riemann	
  
•  High	
  performance	
  stream	
  processor	
  designed	
  for	
  monitoring	
  
infrastructur...
Riemann	
  DSL	
  Example	
  
Send	
  an	
  email	
  whenever	
  the	
  average	
  web	
  applica=on	
  latency	
  exceeds...
Component:	
  StackWatch	
  
•  API	
  frontend	
  to	
  CloudWatch-­‐like	
  API	
  
•  Stores	
  metric	
  metadata,	
  ...
Event-­‐based	
  	
  integra=on	
  
•  CloudWatch	
  API	
  (HTTP	
  Query)	
  
mon-put-data --metric-name RequestLatency ...
CloudStack	
  Integra=on	
  
•  Need	
  secret	
  key	
  from	
  CloudStack	
  DB	
  
–  GetUser	
  Admin	
  API	
  return...
StackWatch	
  Current	
  Status	
  
•  Clojure	
  Web	
  App	
  
– Uses	
  Ring	
  web	
  framework	
  
– Easy	
  to	
  sc...
What	
  about	
  AutoScale?	
  
•  CloudStack	
  AutoScale	
  API	
  not	
  fully	
  compa=ble	
  
with	
  AWS	
  
•  Auto...
StackScaler	
  Architecture	
  
CloudStack	
  
StackScaler	
  
(RoR	
  app)	
  
AutoScaling	
  API	
  
Create	
  autoscale...
Lessons	
  learnt	
  
•  Service	
  oriented	
  architecture	
  is	
  useful	
  for	
  
– Rapid	
  prototyping	
  /	
  evo...
Lessons	
  learnt	
  
•  But	
  
– Reinvent	
  API	
  parsing,	
  valida=on	
  and	
  
authen=ca=on	
  
– Reinvent	
  clus...
Future	
  
•  Test	
  metric	
  inser=on	
  at	
  scale	
  	
  
– Validate	
  architecture	
  
•  Support	
  complete	
  C...
The	
  case	
  for	
  a	
  separate	
  service	
  
•  You	
  don’t	
  want	
  to	
  code	
  in	
  Java	
  
•  Your	
  requ...
The	
  case	
  for	
  an	
  in-­‐process	
  service	
  
•  Community	
  advantages	
  
– Many	
  eyes,	
  many	
  users	
 ...
Weaker	
  case	
  for	
  in-­‐process	
  service	
  
•  To	
  use	
  CloudStack	
  clustering	
  logic	
  
•  To	
  use	
 ...
Niche	
  service	
  examples	
  
•  Hypervisor	
  patching	
  service	
  
– Use	
  admin	
  API	
  to	
  list	
  hyperviso...
References	
  
•  OpenTSDB	
  hWp://opentsdb.net/	
  
•  Riemann	
  hWp://riemann.io/index.html	
  
•  Micro	
  Services	
...
Upcoming SlideShare
Loading in …5
×

StackWatch: A prototype CloudWatch service for CloudStack

1,563 views
1,347 views

Published on

Presented at CloudStack Collab 2014 in Denver. The presentation explores adding a Cloudwatch service to Apache CloudStack and some of the interesting design decisions and consequences.

Published in: Software, Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,563
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
2
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

StackWatch: A prototype CloudWatch service for CloudStack

  1. 1. StackWatch   Monitoring-­‐as-­‐a-­‐service  for  Apache   CloudStack   (actually  an  explora=on  around  the   edges  of  Apache  CloudStack)   @chiradeep  
  2. 2. Disclaimer   •  Developer  talk   •  No  demo   •  Designed  to  make  you  think    
  3. 3. Agenda   •  Introduc=on  to  StackWatch   •  The  design  of  StackWatch   •  Lessons  learned   •  Tips  for  building  your  own  service  
  4. 4. What  is  StackWatch?   Monitoring-­‐as-­‐a-­‐service  for  the  users  of  a   CloudStack  Cloud   (like  AWS  CloudWatch)   ✔ Store  metrics  at  high  fidelity   ✔ Retrieve  metric  sta=s=cs   ✔ Graph  metrics   ✔ Alarms  on  threshold  crossings   ✔ Alarm  and  metric  manipula=on   ✔ Large  Scale  (>100k  metrics  /  min)   ✔ Mul=-­‐tenant  
  5. 5. StackWatch  Mo=va=on   •  AutoScale  implementa=on   in  Apache  CloudStack  is   adequate  but  limi=ng   •  Either     – requires  Netscaler  as  a  Load   balancer   OR   – Uses  hypervisor  metrics  (and   s=ll  requires  HAProxy)  
  6. 6. AutoScale  poten=al  improvements   •  Use  applica=on  metrics   –  BeWer  indica=on  of  applica=on  load   •  Scalable  implementa=on     –  No  polling   –  Alarm  driven   •  Fidelity  to  AWS  Autoscale  API   •  Independent  of  LB   •  Flexible  scaling  ac=on  (not  just  add  /  remove  VM)   Works  be)er  with  Monitoring-­‐as-­‐a-­‐Service    
  7. 7. Non-­‐func=onal  requirements   •  Develop  in  a  different  (i.e.,  not  Java)  language   – More  on  this  later   •  Testable  independent  of  ACS   – Faster  development  =me   •  Limited  changes  to  ACS   – Master  branch  is  hard  to  keep  up  with,  especially   if  you  code  just  a  few  hours  every  week!  
  8. 8. Digression   •  Apache  CloudStack  can  be  in=mida=ng   – Lots  of  features  baked  in   – Limited  test  cases   – Requirements  behind  every  logic  point   – Well  defined  extensibility  but  hard  to  go  beyond   the  plugin  API   •  Java  is  the  lingua  franca   – What  if  I  want  to  use  something  else?  
  9. 9. The  Narrow  Waist  Model   of  the  Internet     Innova=on   Innova=on   Hard  to  change  
  10. 10. Apache  CloudStack  Narrow  Waist   ACS  Core   XenServer   KVM   Hyper-­‐V   vSphere   NFS     ISCSI   FC   VLAN   Overlay   CPU   vCenter   libVirt   WMI   SDN   StackMate   DbaaS   LBaaS   MRaaS   PaaS   FWaaS   Technology   Applica=ons   Innova=on   Innova=on   Harder  to  change   Where  do  StackWatch  and  AutoScale  belong?   Should  network  services  be  applica=ons?   Analy=cs*aaS   MLaaS  
  11. 11. Example:  The  VR  model  inside-­‐out   ACS   1.  create  network   2.  create  VR   Hyperv isor  3.  create  VR  VM   VR   4.  Program  rules   ACS   1.  create  network   Hyperv isor  3.  create  VR  VM   VR   4.  Program  rules   2.  create  VR  VM   VR   Service   •  Easier  to  consume   •  Just  works   •  Harder  to  change   •  Harder  to  test  VR  opera=ons   in  isola=on   •  Requires  developer  discipline   to  not  leak  concerns  between   internal  layers   •  Easier  to  change   •  Requires  more  work  from   consumer  (addi=onal  orchestra=on)   •  Opera=onal  challenges  (HA,  state   storage,  failure  model)   Current  model   vs.   Inside  out  
  12. 12. Micro  Services?   a  par=cular  way  of  designing  sogware   applica=ons  as  suites  of  independently   deployable  services.     common  characteris=cs  around  organiza=on   around  business  capability,  automated   deployment,  intelligence  in  the  endpoints,  and   decentralized  control  of  languages  and  data.     -­‐  Mar$n  Fowler  hWp://mar=nfowler.com/ar=cles/microservices.html  
  13. 13. Monolith  vs.  Microservice   •  Monolith:   –  Change  is  hard  (-­‐)   –  Service  automa=cally  gets  horizontal  scale,  HA,  throWling,   monitoring  (+)   –  Easy  refactoring  (+)   •  Microservice:   –  Easier  to  change/rewrite  and  test  and  deploy  (+)   –  Developer  falls  to  Distributed  Compu=ng  fallacies  (-­‐)   •  hWp://en.wikipedia.org/wiki/Fallacies_of_Distributed_Compu=ng   –  Fuzzy  service  boundaries  (-­‐)   –  Service  boundaries  are  harder  to  change  /  refactor  (-­‐)  
  14. 14. AWS  Example   •  Service  boundaries  are  defined  by  API   endpoints.   •  Separate  API  endpoints  for   – EC2   – AutoScale   – CloudWatch   – ELB   – But  not  VPC,  Elas=c  IP,  etc.  
  15. 15. StackWatch  Architecture   CloudStack   StackWatch   Riemann   OpenTSDB   PutMetrics/CreateAlarm/   GetStats   Cache   DB   Alarms   Metric  Info   AlarmHistory   MetricData  +  Alarm  Cfgè  çThreshold  Alarm   Creden=al  Cache   GetUser   ✔ Insert  metrics   ✔ Retrieve  metric  sta=s=cs ✔ Graph  metrics   ✔ Real-­‐=me  alarms  
  16. 16. Components  -­‐  OpenTSDB   •  Open  Time  Series  Database   – Front-­‐end  to  Apache  HBase   – OSS  project  (LGPL  license)   •  Store  billions  of  data  points   – Indefinitely  without  losing  resolu=on   – Reliable  (HDFS  replica=on)   – Scalable  (HBase)   •  Simple  API  to  store  /  query  data  
  17. 17. Component:  Riemann   •  High  performance  stream  processor  designed  for  monitoring   infrastructure   –  Flexible,  powerful  DSL   –  Open  Source  (Eclipse  License)   –  WriWen  in  Clojure   •  Used  to  generate  Alarms  for  StackWatch    
  18. 18. Riemann  DSL  Example   Send  an  email  whenever  the  average  web  applica=on  latency  exceeds  6  ms  over  3   periods  of  3  seconds.       (streams          (where  (not  (expired?  event))                ;;  over  =me  windows  of  3  seconds...                (fixed-­‐$me-­‐window  3                      ;;  calculate  the  average  value  of  the  metric  and  emit  an  average  (summary)  event                      (combine  folds/mean                          ;;  if  there  are  no  events  in  the  window,  we  can  get  nil  events                          (where  (not  (nil?  event))              ;;  collect  the  summary  event  over  the  last  3  fixed-­‐=me-­‐windows                                  (moving-­‐event-­‐window  3                                        ;;find  the  summary  event  with  the  minimum  average  metric                                        (combine  folds/minimum                                              ;;  see  if  it  breached  the  threshold                                              (where  (>  metric  6.0)      ;;  send  the  event  in  an  email                                                (email  ”me@myself.com"))))))                          ))   )  
  19. 19. Component:  StackWatch   •  API  frontend  to  CloudWatch-­‐like  API   •  Stores  metric  metadata,  alarm  history  in   MySQL   •  API  authen=ca=on  using  signatures     – Authen=cated  using  secret  key  from  CloudStack   •  WriWen  in  Clojure  
  20. 20. Event-­‐based    integra=on   •  CloudWatch  API  (HTTP  Query)   mon-put-data --metric-name RequestLatency --namespace ”WebFrontEnd" -- dimensions ”host=i-2c9e85,Stack=Test" --timestamp 2014-03-25T00:00:00.000Z --value 4 •  OpenTSDB  API  (telnet  /  REST)   put RequestLatency 1395705600 6 host=i-2c9e85 Stack=Test namespace=WebFrontEnd acct_uuid=56A17202-36C2-46E8-8905-90423040AAA •  Riemann  event  (ProtoBuf)   {service: “RequestLatency”, metric: 4, time: 1395705600, host: i-2c9e85, stack: Test, namespace: “WebFrontEnd”, acct_uuid:’56A17202-36C2-46E8-8905-90423040AAA’ }   StackWatch   Riemann   OpenTSDB   PutMetrics   MetricDataè   MetricData  è  
  21. 21. CloudStack  Integra=on   •  Need  secret  key  from  CloudStack  DB   –  GetUser  Admin  API  returns   •  Secret  key   •  UUID  of  Account   •  Secret  Key  used  to  authen=cate  query  API   •  Account  UUID  usage:   –  Tag  metric  events  sent  to  OpenTSDB  and  Riemann   –  Part  of  primary  key  in  DB   –  E.g.,  metric  table  has  columns  account_uuid,  namespace  and   metric_name. Primary  key  is  composite  of  these  columns.   •  User  informa=on  cached  inside  app  for  speed   –  Call  GetUser  API  on  cache  miss  
  22. 22. StackWatch  Current  Status   •  Clojure  Web  App   – Uses  Ring  web  framework   – Easy  to  scale  up.  E.g.,  1000  tenants  send  1000   events  per  minute  =  1  million  events  per  minute   •  API  elements  that  work   – PutMetricData   – ListMetrics   – GetStats   •  No  Web  UI  
  23. 23. What  about  AutoScale?   •  CloudStack  AutoScale  API  not  fully  compa=ble   with  AWS   •  AutoScaling  service  concept   – StackScaler  Service  (Ruby-­‐on-­‐Rails  app)   – Concept  only,  not  implemented  
  24. 24. StackScaler  Architecture   CloudStack   StackScaler   (RoR  app)   AutoScaling  API   Create  autoscale   group/create-­‐ launch-­‐config  /  etc.   Cache   DB   AutoScale  Groups   Instance  Info   Launch  Config   History   Alarm  Configè   Creden=al  Cache   GetUser   StackWatch   (Clojure)   deployVM  /   listVM   çThreshold  Alarm   Service  interac=ons   always  use  the  Public   API  
  25. 25. Lessons  learnt   •  Service  oriented  architecture  is  useful  for   – Rapid  prototyping  /  evolu=on   – Using  your  favorite  language   – Using  the  appropriate  frameworks   •  E.g.,  undesirable  to  throw  a  million  PutMetricData  API   requests/minute  at  CloudStack   •  Riemann  and  OpenTSDB  both  have  incompa=ble   licenses  
  26. 26. Lessons  learnt   •  But   – Reinvent  API  parsing,  valida=on  and   authen=ca=on   – Reinvent  clustering,  DB  abstrac=ons,  etc.   – Key  management  problem  (admin  keys   distributed  to  each  service)   – Mul=tude  of  moving  parts  requires  automated   deployment  and  opera=on   – Unified  UI  ques=on      
  27. 27. Future   •  Test  metric  inser=on  at  scale     – Validate  architecture   •  Support  complete  CloudWatch  API   •  Start  working  on  AutoScale  service  triggered   by  StackWatch  
  28. 28. The  case  for  a  separate  service   •  You  don’t  want  to  code  in  Java   •  Your  requirements  aren’t  clear  and  you  want  to  iterate   quickly   •  Your  audience  is  different  (e.g.,  DBaaS  vs  IAAS)   •  CloudStack  Public  API  is  perfectly  adequate  for  your   service   •  Your  service  serves  a  niche  need   –  E.g.,  you  want  to  evaluate  hypervisors  for  patching   •  The  opera=onal  envelope  is  sufficiently  different   –  E.g.,  performance,  API  rate,  DB  needs,  HA,     •  License  issues  
  29. 29. The  case  for  an  in-­‐process  service   •  Community  advantages   – Many  eyes,  many  users   •  General  purpose  service  with  clear-­‐cut   requirements   •  Similar  opera=onal  envelope  to  CloudStack   •  Public  API  is  insufficient,  need  access  to   internal  APIs   – Consider  enhancing  the  public  API  first  
  30. 30. Weaker  case  for  in-­‐process  service   •  To  use  CloudStack  clustering  logic   •  To  use  UI  plugin  infrastructure     •  To  use  database  layer  but  only  for  new  tables   •  Joins  with  exis=ng  tables  (account  /  host  /  etc.)   –  uuid  column  is  your  friend.   •  To  use  API  framework   – Perhaps  this  needs  to  be  an  independent  component  
  31. 31. Niche  service  examples   •  Hypervisor  patching  service   – Use  admin  API  to  list  hypervisors  and  work  off   that  list   •  Integrate  with  your  datacenter  monitoring  /   alarm  /  CMDB   •  Real-­‐=me  repor=ng  and  correla=on   •  Spot  pricing  
  32. 32. References   •  OpenTSDB  hWp://opentsdb.net/   •  Riemann  hWp://riemann.io/index.html   •  Micro  Services   hWp://mar=nfowler.com/ar=cles/microservices.html   •  hWp://en.wikipedia.org/wiki/Fallacies_of_Distributed_Compu=ng      

×