SERENE 2014 School on Engineering Resilient Cyber Physical Systems
Talk: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems, by Imre Kocsis
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems
1. Departmentof Measurementand InformationSystems
Budapest University of Technologyand Economics, Hungary
Measurement-Driven Resilience Design of Cloud-Based Cyber-Physical Systems
Imre Kocsis
ikocsis@mit.bme.hu
SERENE’14 AutumnSchool
2014.10.14.
3. Cyber-PhysicalSystems (CPSs)
3
Ubiquitousembeddedand networkedsystems thatcan monitor and control the physical world witha high level of intelligenceand dependabilityNetworkedembeddedsystemseverywhereClouds, „infusable” analytics, Big Data
7. Cyber-PhysicalSystems
Differentflavors
oNSF, EU, academia, industry…
Still: itis here
oFromsmartcities& IoTtoself- drivingcars
oScalable, reconfigurablebackendis a must
7
Health CareTransportationEnergy
8. „Classical” caseforcloudcomputing: a brainforaCPS
Video
surveillance
Citizen
devices
Env. sensors
…
Trafficcontrol
Situationalawareness
Deep analyticsNormaldayDisaster
See: Naphadeet. al(IBM), „SmarterCitiesand TheirInnovationChallanges”, Computer, 2011Elastic, reconfigurablecomputing Reconfiguration
17. Gartner, 2013
„For larger businesseswith existing internal data centers, well-managed virtualized infrastructure and efficient IT operations teams, IaaSfor steady-state workloads is often no less expensive, and may be more expensive, than an internal private cloud.”
„I needitnow, and needitfast…”?
18. Parallellizableloads
More and more embarrassinglyparallel, „scale- out” applicationcategoriesexist
NYT TimesMachine: publicdomainarchive
oConversiontoweb-friendlyformat: ApacheHadoop, a fewhundredVMs, 36 hours
Inthecloud: coststhesameaswithoneVM
Practically: „speedupforfree”
21. 1.) Big Data atRest
Distributedstorage
„Computationtodata”
„Atrest Big Data”
oNo update
oNo sampling
„Not true, but a very, very good lie!”
(T.Pratchett, Nightwatch)
49. Short transient faults –long recovery
8 sec platform overload
30 sec service outage
120 sec SLA violation
As if you unplug your desktop for a second...
50. Deterministic (?!) run-time in the public cloud...
Variance tolerable by overcapacity
Performance outage intolerable by overcapacity
56. Benchmarking (a pragmatictakeon)
(De-facto) standard applications
withwelldefinedexecutionmetrics
thatmayexercisespecificsubsystems
tocompareIT systemsviasaidmetrics.
Popularbenchmarks: e.g. PhoronixTest Suite
Benchmarking asa Service: cloudharmony.com
59. A performance featuremodel
+ exp. behavior, homogeneity, stability
Li, Z., OBrien, L., Cai, R., & Zhang, H. (2012). Towards a Taxonomy of Performance Evaluation of Commercial Cloud Services.
In 2012 IEEE Fifth International Conference on Cloud Computing (pp. 344–351). IEEE. doi:10.1109/CLOUD.2012.74
60. ModelingIaaSperformanceexperiments
Li, Z., OBrien, L., Cai, R., & Zhang, H. (2012). Towards a Taxonomy of Performance Evaluation of Commercial Cloud Services.
In 2012 IEEE Fifth International Conference on Cloud Computing (pp. 344–351). IEEE. doi:10.1109/CLOUD.2012.74
61. „Cloudmetrology” and itsapplication
Full stack instrumentationFull adaptive data acquisitionFine-grained storageExploratory Data AnalysisConfirmatory Data Analysis Mystery shoppers and routine excercisesApplication sensitivity model(Platform) fault modelPerformance/capacity modelStructural defensesDynamic defenses MONITORING BENCHMARKING
62. Example:characterizingVDI „CPU ReadyTime”
„Ready”: VM readytorun, butnotscheduled
oVDI: „stutter”
Rareevents
oSampling
Needsfinegranularity!
+ atleasta fewmonths
Very„wide” data
Result: ~QoEcapacity+ load
Big Data tooling
64. Workflow? (As of now)
Classical
tools
Slow EDA
On Big Data
Interactive EDA
On samples
statistics
on samples
Big Data
statistics
Hadoop,
Storm,
Cassandra,
…