We’ve examined how we can rebuild inrastucture from scratch, but now let’s think outside the box, and inside the clouds. Before the zombie apocalypse began, many organizations were beginning to leverage public cloud infrastructures for a number of reasons.
1. Season III, Challenge II
Project: Whatevercleverguysare doingonEarth withclouddatacenters,doiton Mars and earn dough
FocusArea : The CloudyThing
Createdby: Lubomir Zvolensky (lubomir@zvolensky.sk)
2. Contents
1. Executive summary........................................................................................................................3
2. Cloud vendors.................................................................................................................................3
2.1 Brief details of existing major cloud infrastructures .......................................................................4
2.2 Choosing the best cloud solution...................................................................................................4
2.3 There’s the problem with Microsoft Azure....................................................................................6
2.4 VMware vCloud Air......................................................................................................................7
3. Applications.................................................................................................................................10
3.1 Application requirements........................................................................................................... 10
3.2 Application infrastructure .......................................................................................................... 11
3.2.A Application A : Life Supporting Critical Systems....................................................................11
3.2.B Application B : time tracking web plus reservation system for resources............................... 13
3.2.C Application C : email communication server......................................................................... 15
3.2.D Application D : internet cache (proxy).................................................................................. 15
3.2.E Application E : social networking stuff, collaboration platform............................................. 15
References.......................................................................................................................................18
3. 1. Executive summary
Humanmankind,whateverisleftof it,foundaninvestorwithdeeppocketsfull of goldstripes.
Hooray.As he doesn’tunderstandtechnologyatall,we needtoprepare aprojectthat will convince
himto spenda buck or twoto potentiallyreplicate there whatestablishedguysare doingdownhere
withcloudsanddatacenters.The ideaisto have hybridor publiccloudinfrastructure available on
Mars hostingapplications,providingbenefitsexpectedfromsuchsolution.
Whencheckingoutonline calculatorsforseveral cloudofferings,one mighteasilygetimpression
theyare sof...reaking expen$$$$ive tobe hard to believe they actually have customers.Got
numerouscallsfrom Columbiaganjaguys,theynoticedthere issomethingmore milkythanwhat
theydo now,additionallyrequiringnoprivate army“to protect the goods" [we’ve seencustomers
charged£4,015 per MONTH for 120GB RAMand 30GHz CPU, one can buy ANDOWN server for
that; not talkingabout£795 for50Mbps connectivitypermonth,youare kidding,right?andwhat
about£665 productionsupportPER MONTH, whenyoucan have the same for much lessfora year
whenyoubuy a box license ?ref.6 and theysaysome offersare 83% cheaperthancompetition,
ref.7]
2. Cloud vendors
It isnot possible toturnthis pamphletinto“ExtremelydetailedcloudcomparisonPhD.thesis”.
Let’stry to keepitshortand concise because inrecentGartner’sstudy,205 criteriahave been
evaluatedacrosscompute,storage,networking, Security/access,service offerings,supportlevels,
managementandprice/billingcategories.
Numerouscloudprovidersandvendorscompete onmarketwiththeirofferings,forexample –in
alphabetical order–Akamai, Amazon,AT&T,CA,Citrix,Cloudera,DataProcessing,Dell,Fujitsu,
Google,Hewlett-Packard,IBM,Microsoft,Oracle,Rackspace,RedHat,SoftLayer,T-Systems,
VMware,and thatis justshort listnothavingan overview aboutprovidersinAsiaasthatmarketis
totallyunknowntome.
4. 2.1 Brief details of existing major cloud infrastructures
It isnext-to-impossible tocreate 100% detailed andthoroughoverview of functionality,features,
advantages, shortcomingsandpricingmodelsof cloudsolutionsmentionedabove.Outof the list,
the biggestpublic/hybridofferingsare AmazonElasticCompute Cloud(EC2) whichiscentral partof
AmazonWebServicesplatform,Google CloudPlatform, MicrosoftAzure andVMware’svCloudAir.
These are the mostcommonchoicesforevaluation today.
All these vendorshave some shortcomings. MicrosoftAzure doesn’tsupport FreeBSDand RedHat
operatingsystemsatall (ref.1+ ref.2),some others don’tsupportMicrosoftFailoverclusteringor
generallyMicrosoftClustersatall due to deficienciesinstorage infrastructure andvirtualization
platform(referencesnotavailable,quite oftenyourunintoproblemswhentalkingtosalesreponly).
Some infrastructures are notcertifiedtorunwide-spreadcriticalbusinessapplicationsforexample
SAPHANA or Oracle stuff.Several donotintegrate well oreasilywithcustomers’existingprivate
datacenterswhatinstantlydisqualifiesthemfrombusinessperspective. Level of automationisalso
extremelydifferentbetweenvendors.
2.2 Choosing the best cloud solution
It isverydauntingtaskto choose “the right” or “the best” hybridcloudproviderasthese terms
meanverydifferentthingsfordifferentcustomers. Some of reasonsare :
- sheeramountof optionsandvariants available witheachvendor (ref.3)
- extremelyproblematictoDIRECTLY compare offersfrom variousvendors
- bulksof CPU, RAM, storage,networkingresourceshastobe boughtand theyare notalignedto
competitors’offers,once againmaking“direct”comparisoneitherutterlycomplicated,or
impossible atall dependingonscale of environment
- technical differencesbetweencompetitors.Canyoutell the EXACTdifferencebetween
VMware’s“SSDAcceleratedStorage”andAmazonEC2’s “ProvisionedIOPS”or“General IOPS”
5. (noreference atall,Ihaven’tfounditon publicmaterials) ?How much IOPSdoesVMware SSD
AcceleratedStorage provideforreadoperations,Ref.4?How muchit doesforwrite operations,
Ref.4? With vCloudAir,are writesacceleratedwithSSDAcceleratedstorage orare theynot?
From the lookof it, thisseemstobe FlashReadCache ONLY,so how much performance willmy
databaseshave ? Didthischange withintroductionof ESX6.0 ? What is burstable IOPSand
bandwidthintermsof MB/s and how longsuch burstis allowedtotake place ? Will performance
of writesdropaftersome time due toconfigurationlimits,butalsodue totechnology
background? We all knowseveral SSDmodels cansustaintheirperformance forlonger(and
better) thansome others, alsothere ishuge difference inlatencyevenness andyouwill never
knowwhatis exactlythe storage usedforyourprojectincloud.
- What isthe cost of your downtime andhow muchof themwill youhave ? Nobodycanexactly
tell youthese factors,butref.5 http://blog.awe.sm/2012/12/18/aws-the-good-the-bad-and-the-
ugly/#~pijAMzVRGudYi8
- Limitedamountof resources.Some cloudinfrastructuresallow only120GB of memoryfor VM,
some limitprojectsto6TB “SSD Acceleratedstorage” –damn,what if I need50TB of flash
storage ? Eh, can I reallyhave PURE SSD storage at all or is italwayssome formof “ponytrick”
withthe superexpensive SSDs? Evenif I could,whatthe heckwouldbe price of it ?
- Performance problems,alotof existingBIGNAMEinfrastructureshave verysurprisingand
unexpectedperformance problems,mainlyintermsof storage performance. Storage seemsto
be the most limiting,the most troublesome technologytoday, ESPECIALLY IN BIG SCALE,no
matterhow strange thatseems (ref.5). Quote “I/O rateson virtualizedhardware will
necessarilysuckrelative tobare metal,butinour experienceEBShasbeensignificantlyworse
than local drivesonthe virtual host”.
- Redundancyandbusiness continuity:numberof datacentersof particularprovider,their
geographical localitymightbe known,but“weakpoints”,let’scall them real failure zones,are
invisible, unknownandneverexplainedtocustomerasthisisthe top secretinformation. In
orderto protectworkloadseffectively,customersmustacceptadditional unnecessary costs
whichvary betweenvendorsandtheirBC/DRpossibilities.It’sextremelycomplicatedto
thoroughlyunderstandcloudoptionsandtheirshortcomings.
- Internal infrastructure “bindings” –for example several AmazonAWSservicesrelyingonEBS
storage, forexample ElasticLoadBalancer(ELB),Relational Database Service (RDS),Elastic
Beanstalkandsome otherswere tiedtoEBS,so whenEBS crashed,these servicescrashedtoo.
Evenwhencustomerswere notusingEBSand payinghuge extrasforotherstorage options,they
were affected.There isabsolutelynovisibilityandnoinformation available abouttheserelations
and infrastructure shortcomings.
- Several fuzzyspecificationsare provided,forexample Amazon’sEC2used“ECU”, evasive
compute units,tospecifyCPUspeed.Accordingtosome reference,ie.
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-
6. workloads.html,IntelE5-2680 v2 CPU has ECU performance equivalentof 68. I can’t imagine
betterreasontocreate suchterrible “performance measurementunits”thanmakingcustomers’
decisionsascomplicatedasitpossiblygets.There are nootherwordsaboutit.How muchCPU
powerneedsmydatabase/OLAP/OLTP? 1000 ECUs or 10000 ECUs ? Will itbe fasterthan quad-
socketE5-2680 system? What isit equivalentof ?Single socket100core CPU ? Or 50sockets
2core ?
- several hiddencosts,quiteoftennotvisibleinofficial calculators,suchasmanagementcosts.
There are huge numberof othervariationsanddetails,efficientlyspoilingcomparisons. Rough
estimates/calculationscanandshouldbe provideddependingonexactsizingof particularproject
and expectedburstsorgrowth.TCOis all whatmatters.
2.3 There’s the problem with Microsoft Azure
Because we will be runningmixture of operatingsystemsonvirtualizedplatformonMars, RedHat
and FreeBSDinclusive,Microsoft’sAzure cloudis disqualifiedfromcompetitionatthe beginning.
Similarly,VMware platforms are inveryearlyadoptionstage inAzure premiseswhichdoesn’tsound
like the rightchoice forlife-critical systemswe needtorunonMars.
For these reasons,Microsoft Azure platformcan’tbe consideredforourpurposes fromtechnical
perspective,withoutconsideringanyothercriteria.
7. 2.4 VMware vCloud Air
One of the most investigatedandnative optionsforcustomersalreadyrunningVMware platform is, for
sure,VMware’svCloudAir. Physical datacentersare USA,UnitedKingdom, Germany,Japan,Australia.
More than 5000 applicationsand90 operatingsystemsare certifiedtorunon vCloudAir.
vCloudAirletsbusinessesmove workloadsbetweenon-premise serversandthe cloud productsand
servicesusingthe same VMware tools thatare usedin-house :eliminatingadditional costs,
reconfigurationornewknowledgeskillsandlearningcurve.The service iscomparabletoAmazonWeb
Services,butalsointegrateswith existingvirtualizedenvironments,theirmanagementandautomation
toolswhichprotectsinvestmentcustomersalreadyperformed. The majoradvantage isnorewritesor
recodingwhenworkloadsare movedfromcustomersinternal premisestovCloudAir,ESXv6.0 is
targetedtoutilize thisfunctionalityasmuchas possible withlong-distance vMotion(upto100ms
latency) andacross-ESX-logical-datacentersmigrations.
Networkvirtualizationinthe formof VMware NSXproductallows customers toconfigure firewallsand
networktomirror on-site networks,includingNATrulesandfirewall rules,networks andpublicIPsto
extendexistingLayer2 or Layer 3 networksfromtheirdatacenterstothe vCloudHybridService :
8. Asa part of service, DataProtectionis available.While notexactlycheap,itisself-service backup
offeringthatgivesgranularcontrol tothe consumer. Noworkaroundsneedtobe takeninorder to
PROPERLY protectworkloadsof customers –thisis integratedpart. Listof featuresfollows:
9. For detailedoptions, consult http://vcloud.vmware.com/
WithESX v6.0, there isno bettereco-systemfromanyothervendor,providingtighterintegration
betweenprivateandpublicvirtualized/cloudinfrastructures.MicrosoftwithWindows2012 R2 and
Hyper-V offeringsdoesn’tachievewhatVMware easilyprovides intermsof functionality,
manageabilityandintegration.
Technical details,suchasnetworkingspeed,numberof virtual machinespervApp,maximum
configurationsof VMs,maximumnumberof virtual NICsperVMand maximumdisksize are factors
for choosingvCloudAiroverAWS :
parameter Google AWS VMware vCloud Air
Networkingspeed(max) 1Gbps 10Gbps
MaximumRAMper VM 244GB 1024GB
MaximumCPU perVM 32 64
Numberof vNics(max) 8 10
Maximumdisksize Approx. 44TB 62TB
We fullyagree thatextremelytightintegrationof mostrecentESXplatform, v6.0, and single original
vendorsupport isby far the majordecisionpointforgoingwithvCloudAir forcustomerswith
existinglocal ESXinfrastructures. Identical importance hasbeenshownbyGartnerstudy: despite
severe shortcomingsof MicrosoftAzure platform, Gartnerpointsout64% of users’biggestreason
for choosingAzure wastheirexistingrelationshipwithMicrosoft.
For reasons discussedinparagraph2.3 and 2.4, VMware vCloudAirhasbeenchosenasthe platform
to buildon.
10. 3. Applications
Followingapplicationswillbe runninginhybridorpublic premises:
a) life supportingcritical systems :commandandcontrol centre foroxygen/watersupply
b) time trackingwebapplication forbotanistsingreenhouses plusreservationsystemforresources
c) email communicationserver
d) internetcache - listof favorite webpagesmirrorednightlysoMarsonautscanread their favorite
content.
e) social networkingstuff,collaborationplatform
3.1 Applicationrequirements
Applicationrequirementsare :
R01 : performance intermsof userexperience andlow hardware demands(HWpower/cooling,space)
R02 : expandable capacity(onlinewithoutdowntime)
R03 : highavailability,cruciallyimportantforlife-supportingsystems,RTO20 minutesmax.
R04 : compatibility:web-basedappsmustruninany browser,suchas InternetExplorer,Chrome,
Firefox,Operaandanyplatform(WindowsOS,Apple,Android)
R06 : scalability(numberof Mars citizensexpectedtogrow exponentially)
R07 : lowbandwidth usage
R08 : resistance tonetworkdisruptionsandcommunicationoutages
R09 : effective storage usage.
R10 : if possible,use dockerable applicationstosave resources(RAM,storage)
11. 3.2 Application infrastructure
3.2.A Application A : Life Supporting Critical Systems
Thisisby far the most critical technologywe will everyrunonMars as itis commandand control centre
for oxygen/watersupply.Thisapplicationonlycantolerate 20minutesrecoverytime objective,then
buildingobjectsonMars will runout of oxygenandpeople will die.
High-level applicationfunctionality:
- collectdatafrom sensors
- transferdata from each object,eachsensortoserversforprocessing
- store data on serversinprotecteddatabase
- react on events(lessoxygen)
- application-basedredundancy,multiple technologycontrolscanbe boundto single instance
- master/slavesarchitecture (thinkactive/multi-passiveconfig)
All three datacentersavailableonMars run thisapplicationsimultaneouslyinreplicatedmode,
providingactive-active-activeredundancy. Applicationrunsconcurrentlyinall three datacenters,
behavingasactive forclosestobjectandpassive (stand-by) forall distantobjects. Technologyavailable
at each site can be controlledbyANYapplication,local orremote,thisisusedforredundancypurposes.
This applicationisNOTlatencysensitive,asitisnotthat much importantif data aboutoxygen
compositionineachobjectisdelivered in2msor 89ms. Atthe same time,bandwidthrequiredbythis
applicationisextremelylow,too,becausecontrol andstatusmessagesonlyare 240 byteslarge andthey
are sentevery30 secondsonly.
Local sensorsare builtwithcache memory – if transmissionfails,resultsare cachedupto 34 because
local cache memoryis8192 bytes.Whenconnectionisestablishedagain,all cache contentistransferred.
Thisgivesopportunityfor1020 secondsoutage = 17 minutes,because messagesare sentin30 second
intervalsand34 of themcan be cached(34x 30 = 1020). Each message containsincrementalidentifier,
whichservesas“orderarbiter”in case of communicationoutage,thinksomethinglikeTCP/IPpacket
orderingmechanism.
Confirmationof message receptionissentbyserversbacktotechnologyineachobject;in case when
data are notreceivedwithinfive secondsof expectedarrival (30secondstiming),cachingof dataand
retransmissionoccurs.
Operatingsystemchosenforapplication:RedHat version6.6,recommendedbyHoneeyvellvendor
12. Clustereddatabase :MySQLClusteredition,requiresnooperatingsystemclustering, NOSHARED
STORAGE,can be geographicallydistributed,canbe backedup ONLINE
Clusteredwebfront-end:NGINXserverisusedtoprovide datato operatorsandadministrators.If it
failsinone datacenter,all remainingtwocanboth control technologyineachsite andprovide status
messages.
Clustereddatacollection anddistribution:aftermessage isreceivedbyanysingle applicationinstance
(rememberthere are THREErunning,eachin separateddatacenter,eachwithseparatednon-shared
database),applicationscommunicate togethertoverifyif all remainingpartnersreceivedthe same
message.
Configurationof 1vCPU, 8GB RAMand 50GB storage is requestedfor eachOxygen VM:
CPU 1
RAM 8
HDD / tier/ IOPS 50GB / fastest/ 1000 IOPSguaranteed
Networkingbandwidth/priority 1Mbit guaranteed,nomax,highestpriority
Total SIX OxygenVMswill be runninginDatacenters:
Datacenter1 OxygenVM1
Datacenter1 OxygenVM2
Datacenter2 OxygenVM3
Datacenter2 OxygenVM4
Datacenter3 OxygenVM5
Datacenter3 OxygenVM6
Easy backupsandrestoresof configurationis mandatory. Applicationscanbe restartedindependently
ineach datacenter; eachdatacenteralsoruns TWO separate copies(synchronizedonapplicationand
database levels).
Rememberwe have perfectinfrastructurerunningonMars,basedon my ~12% complete designfrom
Challenge1(withmissingvCenter,networksetup,redundancy,businesscontinuity/disasterrecoveryand
some other“unnecessaryminorthings”), sothere isnoproblemwithstorage performance due toall-
flashVSAN configuration,networkthroughput(multiple 10GbitEthernetinterfaces,some 40GbE) or
resources(huge 3TB RAMphysical servers).
Intermsof storage IO,maximum50 IOPSisrequestedeach30 secondsforapplicationaseach
transactionmessage canfitintosingle IO.Thismeansvery low demandsdespite extreme criticality.
13. 3.2.B Application B : time tracking web plusreservation system for resources
For time trackingandresource reservationsystem, aninternal applicationhasbeencreatedbyone of
Marsounauts. It providesphenomenaltime trackingpossibilities,easytouse…alsoreservationsystemis
createdon the same principleswithdefinable resources:
14. Each categoryhas several sub-categoriesand/orprojectsasdefinedbyapprovedusers.
Lean,small, blatantlyfast,newitemscanbe definedbyusersbasedontheirprivileges(forexample
whennewdockisbuiltforrockets,approveduserscan enteritto thisapplicationsoeverybodycan
reserve itfortheirvehicles). Anotherexamplesare gasstation- ittakeslongto fill the tanksof rockets
so properplanningandqueueingisnecessary !!,andyeahwe gotnew cinemaEEMax style (because we
onlyhave 12 seats,bringyourlaptop,RedCamera,androidtabletoranythingwith vga, dvi,hdmi,
displayportconnectionandplayitBIG BIG BIG!).
One day,we will be bribed errrr convincedwe need50 mil $ S.A.P. to track attendance forcitizens and
planresources onMars, butnot now.
Configurationof 1vCPU, 4GB RAMand 10GB storage is requestedforCitadelVM:
CPU 1
RAM 4
HDD / tier/ IOPS 10GB / slowest/no guarantee necessary
Networkingbandwidth/priority 128kBit guaranteed, 1Mbitmax,lowestpriority
This thingisany linux distributionplusNGINXontop.A small directorywithscriptedsoftware ontop.As
easyas that. Nospecial requirementsare necessaryintermsof resourcesorperformance,thisis
extremely,extremelyleansoftware. We willneverbe able tooverloadthe webserver,evenif there is
millionpeopleonMars : let’sbe realistic… http://g-wan.ch/benchmark/babel.html ->
http://www.statisticbrain.com/google-searches/
15. Nojava,no flash,runsinany browserwe can thinkof,evenonApple iEverything. Yikes.
3.2.C Application C : email communication server
3.2.D Application D : internet cache (proxy)
3.2.E Application E : social networking stuff, collaboration platform
Marsonauts like toreadcontentavailable downonEarthon internet,theyhave theirpreferredweb
pagestheywouldlike tofollow evenbeingsoterriblyfaraway.
Moreover,challenge1specifiesthatsome formof social collaborationispreferredforMars citizens.Of
course,everybodyneedsemail todayandthat’snotgoingto disappearanytime soon.
In order to save resources,which are very scarce on Mars, we decidedto integrate three
“applications” intoone virtual machine running Citadel software.
Followingisfeature listof Citadel platform:
Email,calendaring,addressbooks,bulletinboards,instantmessaging
Wiki and blogenginesbuiltin.Citadel isacollaborationserver anda contentmanagementsystem
Web browser,telnet/SSH,local clientsoftware accessible
Standards-compliante-mailbuiltin:IMAP,POP3,ESMTP
Group calendaringandscheduling(WebDAV,GroupDAV,andKolab-1compatible)
Built-inlistserv(mailinglistserver)
Built-inRSSFeedAggregation
Supportfor pushe-mail andmobile devices
Database-driven, single-instance message store
Authenticated SMTP forremote email submission
Multiple domainsupport
16. Built-inintegrationwithperimiteremail filteringtechnologiessuchasRealtime Blackhole Lists(RBL's),
SpamAssassin,andClamAV antivirus
Server-to-serverreplication.Usersinanynumberof domainscanbe spreadout across any numberof
Citadel servers,allowingyoutoputdata where youneedit,andenablinginfinite horizontal scalability.
Web-basedaccesstoemail,calendars,andeverythingelse throughapowerful AJAX-style frontend
Verystrong supportfor“publicfolders”andmessage forums.
Built-ininstantmessengerservice
SSL/TLS encryptionforall protocols
Configurationof 1vCPU, 8GB RAMand 50GB storage is requestedforCitadelVM:
CPU 1
RAM 8
HDD / tier/ IOPS 50GB / slowest/no guarantee necessary
Networkingbandwidth/priority 128kBit guaranteed, 16Mbitmax,lowestpriority
Due to lownumberof citizensonMars and theirlow usage of “Internet”,one virtual CPUand 8GB RAM
are more than enoughforCitadel software.Capacityof storage depends onlyonamountof data to
store,we chose 14GB for proxycache inbeginnings asnotmore than 0.5GB can be transferredduring
night[!!] and nomore than 28 days worthof “caching”is considerednecessarywhich equalsto14GB
consumption. Othercapacity will be consumedbylocal operatingsystem(RedHat) anddatabase.There
isno needtoseparate database fromfrond-endwebservices,because one can’texistwithoutthe other
and there isno significantadditional riskputtingall eggsintoone basket. Thisisn’tacritical application
at all.
Please note webcontent andsocial mediaare TYPICALLYcompressible extremelywell,socompressed
filesystemwill be usedonRedHatwhichwill allow tohold muchmore thanprojected28 days of data
cachedfrom internet.If diskspace getstocriticallylow levels,oldestdata(three days) willbe simply
thrownaway automatically.
All resourcescanbe expandedonline if necessary. Intermsof storage IOPS,500 isexpectednormal
usage maximumbutbecause we have phenomenal all-flashVSAN inplace,we don’tneedtolimitthison
VMlevel.
Networktrafficdemandsare prettylow,withnohuge burstsexpectedatall. 16Mbit maximumshould
be more thanadequate for“intra-LAN”Mars network – thisisabsolutely enoughfordisplayingweb
pages,social activities,forums,wiki,blogs andsimilaractivities.Lowestprioritywaschosendue tonon-
critical nature of these applications.
17. Because we initiallyhave enoughcompute resourcesandthisis pretty leanVM,disasterrecovery
countswith simply restartingthisVMinsurvivingdatacentersincase of catastrophy. Nospecial
protectionisnecessary.
Duringtime,whennumberof citizensonMars expandsandconfigurationof thisVMwill notbe
sufficientanymore,itcanbe easilyexpandedintermsof addingCPU,RAMor storage space.