Wicsa2011 cloud tutorial

3,042 views

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,042
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
3
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Reduce cost, reduce complexity
  • Need to cut out more words on this slide – just tell the story!!Still need to do good EA, planning, monitoring, governance and managementRisk management approach to security, privacyPlan for Integration with existing assetsCome pick out brains at UNSW/NICTA
  • NICTA will focus on six research groups of significant scale and focus in which we have genuine opportunity to be ranked in the top five in an area in the world. Research groups have been selected on the basis of current NICTA strengths in research and research leadership. Software Systems. - Software Systems aims to develop game-changing techniques, frameworks and methodologies for the design of integrated, secure, reliable, performant and adaptive software architectures. Software systems has pervasive application in real-world applications ranging from enterprise ecosystems to embedded systems.Networks. - The networks research group will develop new theories, models and methods to support future networked applications andservices. Networked systems will address issues such as radio spectrum scarcity, wired bandwidth abundance, context and content, improvements to computing, energy constraints, and data privacy.Machine Learning. - is the science of interpreting and understanding data. The core problems are jointly statistical and computational. NICTA research will aim to develop machine learning as an engineering discipline, drawing on a spectrum of work from conceptual theory through algorithmics. Machine learning applications will aim to commonalities between problems, developing implementation frameworks that genuinely encourage reuse across different domains.Computer Vision - aims to understand the world through images and video. NICTA will focus on areas including geometry, detection and recognition, optimisation, segmentation, scene understanding, shape/illumination and reflectance, biological inspired approaches and the interfaces between them, drawing from approaches including statistical methods and learning and optimisation. Computer vision is a key enabling research discipline for many applications, including visual surveillance, bionic eye, mapping of the environment and visual surveillance.Control and Signal Processing. - comprises a substantial group of sub-disciplines dealing with optimisation, estimation, detection, identification, behaviour modification, feedback control and stability of a very large class of dynamical systems. It is likely that NICTA will focus on problems of control and signal processing in large-scale decentralised systems which are core to many new ICT systems. Techniques from information theory, Bayesian networks, large scale optimization etc are employed to address this important class of problem.Optimisation - the "science of better". Research will focus on the interface between constraint programming, operations research, satisfiability, search, automated reasoning, machine learning, simulation and game theory, exploring methods that combine algorithms fromthese different areas. Optimisation applications will address multi-faceted questions such as how best to schedule in a network, whether there is a better folding for a protein, or how best to operate a supply chain.
  • Still need to do good EA, planning, monitoring, governance and managementRisk management approach to security, privacyPlan for Integration with existing assets
  • Also comment on Public vs Private, and need to prepare for HybridRapid Elasticity: Elasticity is defined as the ability to scale resources both up and down as needed. To the consumer, the cloud appears to be infinite, and the consumer can purchase as much or as little computing power as they need. This is one of the essential characteristics of cloud computing in the NIST definition. • Measured Service: In a measured service, aspects of the cloud service are controlled and monitored by the cloud provider. This is crucial for billing, access control, resource optimization, capacity planning and other tasks. • On-Demand Self-Service: The on-demand and self-service aspects of cloud computing mean that a consumer can use cloud services as needed without any human interaction with the cloud provider. • Ubiquitous Network Access: Ubiquitous network access means that the cloud provider’s capabilities are available over the network and can be accessed through standard mechanisms by both thick and thin clients.4 • Resource Pooling: Resource pooling allows a cloud provider to serve its consumers via a multi-tenant model. Physical and virtual resources are assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).5
  • Reduce cost, reduce complexity
  • Still need to do good EA, planning, monitoring, governance and managementRisk management approach to security, privacyPlan for Integration with existing assets
  • Cloud Computing Interoperability Forum (CCIF)Amazon/Google/MS missing from the CCIF sponsor listTHE CLOUD COMPUTING INTEROPERABILITY FORUMThe Cloud Computing Interoperability Forum (CCIF) was formed in order to enable a global cloud computing ecosystem whereby organizations are able to seamlessly work together for the purposes for wider industry adoption of cloud computing technology and related services. A key focus will be placed on the creation of a common agreed upon framework / ontology that enables the ability of two or more cloud platforms to exchange information in an unified manor.MissionCCIF is an open, vendor neutral, not for profit community of technology advocates, and consumers dedicated to driving the rapid adoption of global cloud computing services. CCIF shall accomplish this by working through the use open forums (physical and virtual) focused on building community consensus, exploring emerging trends, and advocating best practices / reference architectures for the purposes of standardized cloud computing.
  • Service Bus The Microsoft .NET Service Bus makes it easy to connect applications together over the Internet. Services that register on the Bus can easily be discovered and accessed, across any network topology. The Service Bus provides the familiar Enterprise Service Bus application pattern, while helping to solve some of the hard issues that arise when implementing this pattern across network, security, and organizational boundaries, at Internet-scale.
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Quotas are resource constrains configured by the vendors. You probably can contact the vendors for more resources beyond the quotas, but communication takes time, and it will bring about opportunity cost. Limitations mostly are functions restrictions, you probably can’t go beyond it by making a phone call.Amazon Web ServicesManually setup all applications – large maintenance cost and operation cost, including upgrading systems, installing applications and configuration.Maximum 5 GB per file in S3 – e.g. TB magnitude files can not be put into S3 directly. Extra efforts are needed, i.e. It has to be divided into small trunks (5GB each) before storing. Same efforts are also required during retrieval, all retrieved trunks have to be merged manually.Maximum 5 seconds query execution time in SimpleDB – no long time query in SimpleDB. If thousands items are query in SimpleDB, it could be failed due to timeout. Developers need to estimate the query time before hand, and separate a large query into small queries. And combine/merge the query results on client sides.20 On-Demand or Reserved Instances and 100 Spot Instances by default – You can have more instances by contacting Amazon, but that definitely will increase your opportunity cost, if you need a scale out immediately.1GB free outgoing bandwidth per month in SimpleDB, S3 and EC2 – Yep, you need to pay for extra usages.Microsoft Windows Azure2 deployments per service (production and staging) – The two deployments are used for deploying production version and staging version separately, targeting the end-users and test users correspondingly. But it is not efficient enough to run multiple test versions at the same time..NET, PHP or Java programming language – limited languages for .NET, PHP and Java developersUp to 50 GB for SQL Azure – The maximum size of a single SQL Azure database is 50 GB. If your data is more than 50 GB, then you probably have to consider data partitioning to scale out your database to multiple databases.20 concurrent small compute instances or equivalent per month – 1 clock hour to an extra large instance equates to 8 small instance hours. Therefore, you can only have 10 TB of total data transfers per month – Probably you can get more if you send a request to MicrosoftUp to 750 GB SQL Azure databases per month – For SQL Azure, it originally states 150 Web Edition databases (not sure it is or/and, see http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-us&offer=MS-AZR-0013P) 15 Business Edition databases, since the maximum size for each Web Edition is 5GB and maximum size for each Business Edition is 50GB. I do the simple math, 150*5 or 15*50, calculating the result as 750 GB.Google App EngineJava or Python programming language – PHP developer can do nothing on Google App Engine.Maximum 30 seconds for each request – Each request has to be responded within 30 seconds, otherwise, exceptions will be returned instead of results. In this case, high computational tasks is not applicable in GAE. The alternative is still splitting the task. GAE has made an early experimental release of MapReduce to fulfill the alternative. But only Mapper is implemented at this stage.1 MB for each Datastore entity – Only 1MB for each data item. You probably will find it hard to store a photo in GAE. And also due to the 30 seconds limitation, your query should also be processed within 30 seconds.Maximum 2 GB per file in Blobstore – The same reason as AWS. Plus: maximum size of Blobstore data that can be read by the app with one API call is only 1 MB. So even you stored 2GB in Blobstore, it is still difficult to manipulate these data in GAE.10 web applications per user – since the case of bush fire in 2009. I think all the following parameters can be adjusted by Google.43, 200, 000 requests per day 1 GB (1, 046 GB maximum if billing enabled) incoming/outgoing bandwidth per day6.5 CPU-hours (1, 729 CPU-hours maximum if billing enabled) per day
  • Reduce cost, reduce complexity
  • Reference – Saaland paper at VLDB
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • Figure 1 shows a typical set up of the Amazon VPC. This VPC setup allows a company’s infrastructure to be connected with the Amazon EC2 infrastructure via a VPN connection. It requires setting up two VPN gateways (one on each of the local and remote sides). A secure VPN connection is established between the two gateways via the IPsec protocol. EC2 instances on the remote side (Amazon side) are operated within subnets behind the remote VPN gateway. That is, these EC2 instances are isolated from the rest of the EC2 network and only these instances can access the hosts on the local side. Similarly, hosts can be added on the local side behind the customer gateway (local VPN gateway) and only these hosts have access to the remote EC2 instances. A typical VPC connection meets the following security requirements:Utilise the AES 128-bit encryption functionUtilise the SHA-1 hashing function
  • An example business report query took 16min 30sectakes less than 1min in the existing on-premise dev environmentData transfer over SSIS takes 14min (only 42KB/sec of throughput)No bottleneck observed on CPU (3-10%), memory (6G free), disk (low activity) or network (0.03% usage of 1Gbps) SSIS protocol? -----------------Done.  It works!  I did the following:1.  Start an EC2 micro instance outside the VPC and attach an EBS volume to it2. Copy file from S3 to the EBS volume attached to the micro instance3. Detach the EBS volume from the micro instance4. Attach EBS volume to an instance inside the VPCNote that, we did NOT route through NICTA here at all.The file I used for this experiment is ~700MB in size.  Step 2 took 130s (i.e. 5.39MB/s).
  • Reduce cost, reduce complexity
  • Reduce cost, reduce complexity
  • References:http://aws.amazon.com/ec2/http://code.google.com/appengine/whyappengine.html#scalehttp://www.microsoft.com/windowsazure/appliance/
  • An article (with link to his paper) by Huan Liu discussing limitations of load balancers and autoscaling:http://huanliu.wordpress.com/tag/auto-scaling/http://codecrafter.wordpress.com/2008/10/03/google-app-engine-scalability-that-doesnt-just-work/An example on scaling in Azure:http://code.msdn.microsoft.com/azurescale/Release/ProjectReleases.aspx?ReleaseId=4167
  • Reduce cost, reduce complexity
  • The Australian Prudential Regulation Authority (APRA) is the prudential regulator of the Australian financial services industry. It oversees banks, credit unions, building societies, general insurance and reinsurance companies, life insurance, friendly societies, and most members of the superannuation industry. APRA is funded largely by the industries that it supervises. It was established on 1 July 1998. APRA currently supervises institutions holding approximately $3.6 trillion in assets for 22 million Australian depositors, policyholders and superannuation fund members.AustraliaIn Australia, the federal Privacy Act 1988 sets out principles in relation to the collection, use, disclosure, security and access to personal information. The Act applies to the Australian Government and Australian Capital Territory agencies and private sector organisations (except some small businesses). The Office of the Privacy Commissioner is the complaints handler for alleged breaches of the Act. Some Australian States have enacted privacy laws.The Australian Law Reform Commission [1] completed an inquiry into the state of Australia's privacy laws in 2008. The Report entitled For Your Information: Australian Privacy Law and Practice [2] recommended significant changes be made to the Privacy Act, as well as the introduction of a statutory cause of action for breach of privacy [3]. The Australian Government committed in October 2009 to implementing a large number of the recommendations that the Australian Law Reform Commission had made in its report [4].
  • Still need to do good EA, planning, monitoring, governance and managementRisk management approach to security, privacyPlan for Integration with existing assets
  • - P.36 of DIAC mentioned that they have large amount of data (e.g. 100TB of documentation) and also structured data. Large documents can be stored in Azure Blob and structured data (depending on what type) can be stored in Azure Table.- Parallelised frameworks (such as MapReduce) can be used to perform usage analytics (P.27 of DIAC slides) such as abandonment rate. Azure Table is indexed by time (also partition and row) keys which makes it suitable for time-based queries.
  • Adaptation engine patent pendingSeeking collaboration with industry to source ‘use inspiration’ and trial partnership
  • Need to cut out more words on this slide – just tell the story!!Still need to do good EA, planning, monitoring, governance and managementRisk management approach to security, privacyPlan for Integration with existing assetsCome pick out brains at UNSW/NICTA
  • Wicsa2011 cloud tutorial

    1. 1. From imagination to impact<br />
    2. 2. Architecting Cloud Applications<br />Dr. Anna Liu<br />Research Group Leader<br />Software Systems<br />National ICT Australia<br />
    3. 3. The Land Down Under<br />
    4. 4. Sydney<br />
    5. 5. 5<br />About NICTA<br />National ICT Australia<br /><ul><li>Federal and state funded research company established in 2002
    6. 6. Largest ICT research resource in Australia
    7. 7. National impact is an important success metric
    8. 8. ~700 staff/students working in 5 labs across major capital cities
    9. 9. 7 university partners
    10. 10. Providing R&D services, knowledge transfer to Australian (and global) ICT industry</li></ul>NICTA technology is in over 1 billion mobile phones<br />
    11. 11. Research Areas at NICTA<br />Networks<br />Machine Learning<br />Software Systems<br />Aruna Seneviratne <br />Bob Williamson<br />Anna Liu<br />Gernot Heiser<br />Computer Vision<br />Optimisation<br />Control & Signal Processing<br />Nick Barnes,<br /> Richard Hartley <br />Peter Corke<br />Mark Wallace, <br />Sylvie Thiebaux, <br />Toby Walsh<br />Rob Evans<br />6<br />
    12. 12. NICTA’s mission: to be an enduring world-class ICT research institute that generates national benefit.<br />Australia’s National Centre of Excellence in ICT Research<br />Research focused on areas of importance to Australia<br />Publicly funded, not for profit<br />Best of breed research teams (400 staff + 300 students)<br />Industry engagement<br />Industry outcomes<br />Enduring solutions<br />‘Spinout’ companies<br />Engagement models include… <br /><ul><li>Contract R&D
    13. 13. Consulting services
    14. 14. Strategic Partnerships
    15. 15. Licensing </li></ul>7<br />
    16. 16. Our team’s mission: help enterprises take full advantage as software extends into cloud!<br />Cost optimised<br />High availability<br />Hybrid cloud<br />Onsite/offsite<br />Real-time monitoring<br />Disaster recovery <br />Actionable analytics <br />Business continuity <br />Intelligent management<br />Systems resilience<br />Elastic<br />Dynamic<br />Real time <br />Our applied R&D capability<br />spans cloud computing, web, SOA, distributed systems, data management, analytics, performance monitoring, DR, automated reasoning, ontologies, AI… <br />High performance<br />8<br />
    17. 17. Agenda<br />Introduction to Cloud Computing<br />Characteristics, Deployment and Delivery Models<br />Enterprise Architecture and Migration Framework<br />Usage Scenarios<br />Evaluating Cloud Computing<br />Enterprise context, Business opportunities, risks<br />Technical qualities of platforms<br />Platform Architectural Insights<br />Proof of Concept Experiences<br />Advanced Architecture Issues<br />Future Directions<br />Industry happenings<br />Research Agenda<br />
    18. 18. What is Cloud Computing?<br />Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.<br />This cloud model is composed of five essential characteristics, three service models, and four deployment models.<br />- US National Institute of Standards and Technology<br />
    19. 19. Characterising Cloud Computing<br />
    20. 20. Five Characteristics – NIST Definition<br />On-demand Self-Service<br />A consumer can provision computing capabilities without human interaction<br />Broad network access<br />Computing capabilities are available over the network and accessed through standard mechanisms<br />Resource pooling<br />Provider’s computing resources are pooled to serve multiple consumers with different resources dynamically assigned according to consumers’ demands<br />Rapid elasticity<br />Computing capabilities can be rapidly and elastically provisioned to quickly scale out and rapidly released to scale in<br />Measured service<br />Resource usage can be monitored, controlled, and reported. Providing transparency for both the provider and consumer<br />
    21. 21. Leading Provider: Amazon EC2<br />Let’s see how Amazon EC2, a leading commercial cloud, looks<br />I want my cloud!<br />
    22. 22. 1. Grab your credit card and create an account. (10 min) Then, access to a console<br />3. Hit this button<br />2. Select where you want to create your virtual machines<br />(US East, US West, Ireland or Singapore)<br />
    23. 23. 4. Select a machine image<br /><ul><li> Many pre-configured images are available
    24. 24. You can register your machine images as well</li></li></ul><li>5. Determine the amount of resources to allocate<br /><ul><li> <1.0Ghz CPU + 600MB RAM  0.01 USD/hour
    25. 25. 1.0Ghz CPU + 1.7GB RAM  0.04 USD/hour
    26. 26. 3.0Ghz x 8 CPUs + 68GB RAM  1.1 USD/hour
    27. 27. You can pay Win/SQL Server license fees in pay-per-hour</li></li></ul><li>6. Define a set of access control rules<br />
    28. 28. 7. Done! (< 5 minutes in total)<br /><ul><li> You have your virtual machine atec2-184-74-14-28.us-west-1.compute.amazonaws.com</li></ul>I got my virtual machine!<br />
    29. 29. 8. Connect to my virtual machine<br /><ul><li> Just SSH to the address
    30. 30. You have a root access!!</li></ul>You’re in an Amazon Datacenter in CA<br />This is my desktop in Sydney<br />
    31. 31. If you like Windows, just launch a Windows virtual machine and remote-desktop to it<br />Connected through<br />a VPN connection<br />You’re in an Amazon Datacenter in NV<br />This is my desktop in Sydney<br />
    32. 32. 9. Terminate or hibernate virtual machines when they are not in use<br /><ul><li> In some systems, we use a script to hibernate virtual machines at 8:00PM
    33. 33. Restart instances in the morning if necessary. It takes just a couple of minutes</li></li></ul><li>10. Check a bill in real-time<br /><ul><li> Hours to run virtual machines
    34. 34. Network in/out
    35. 35. VPN
    36. 36. Disk access
    37. 37. # of requests made</li></ul>…<br />
    38. 38. Three Service Models – NIST definition<br />Technology exposed to customers<br />Providers<br />Software<br />as a Service<br />Platform<br />as a Service<br />Infrastructure<br />as a Service<br />Datacenter<br />Infrastructure<br />
    39. 39. Three Delivery Models<br />Infrastructure as a Service (IaaS)<br />The consumer has control over operating systems, storage and deployed applications<br />Platform as a Service (PaaS)<br />Consumers can deploy applications created using programming languages and tools supported by the provider (e.g., Java Servlet)<br />The provider shields the complexity of its infrastructure <br />Scale up/down, load balancing, replication, disaster recovery, database management, …<br />Software as a Service (SaaS)<br />Consumers use the provider’s applications<br />The consumer does not manage the underlying cloud infrastructure <br />
    40. 40. Leading Provider: Google App Engine<br />PaaS is the hottest area because many players (e.g., VMware) are currently moving into PaaS.<br />Let’s see how Google App Engine, a leading commercial PaaS, looks<br />I want my PaaS!<br />
    41. 41. 1. Create an account. (5 min) GAE offers a large amount of quota for free<br />2. Write an application using GAE’s framework<br />
    42. 42. 3. Deploy your application on GAE!<br />Scale up/down, load balancing, replication, disaster recovery, database management, … many functions are implemented by GAE’s framework<br />
    43. 43. 4. Check your resource usage (CPU, storage, # of API calls, …)<br />Pay only when usage exceeds the free quota<br />
    44. 44. Four Deployment Models – NIST Definition<br />Hybrid Cloud<br />Private Cloud<br />Public Cloud<br />Community Cloud<br />Exclusively own,<br />maintain and use<br />Share and justuse when need<br />Share and maintain<br />by a group<br />Organization<br />General public<br />A group of organizations<br />
    45. 45. Four Deployment Models<br />Public cloud<br />Generally very secure but risk of discontinuity and less control<br />Community cloud<br />Enjoy cost savings and relatively easy to retain governance<br />Private cloud<br />More control but less cost savings compared to public cloud<br />Hybrid cloud<br />Hot area in the next couple years: Enjoy both high security and cost savings from private and public cloud, respectively<br />
    46. 46. Why Cloud Computing?<br />High Elasticity/Scalability<br />Virtually infinite amount of resources is available on demand<br />Reduce cost and complexity<br />Pay per usage, economies of scale<br />Generally speaking, non-7x24x365 systems with higher resource usage bring large cost savings<br />No in-house IT maintenance<br />No up-front cost for geographically distributed disaster recovery<br />Innovation Possibilities<br />Ease of Use<br />You can implement your idea with minimum overhead and cost<br />Processing Big Data<br />Cost of 1 machine for 100 hours = Cost of 100 machines for 1 hour<br />
    47. 47. Issues and What NICTA is doing? #1<br />Benefits and risks trade-off analysis<br />Helps with decision making: use cloud or not? Suitable architecture for hybrid cloud?<br />A model to show benefits (e.g., operational cost savings and elasticity) and risks (e.g., security, performance degradation, migration cost)<br />Cost estimation<br />Everybody’s question: Ok, then what is the actual cost for me?<br />A model to estimate the actual initial and operational cost from application’s profile<br />We’re collaborating with various Australian organizations to answer these questions through building and migrating systems in and to cloud<br />
    48. 48. Issues and What NICTA is doing? #2<br />Automatic reconfiguration in hybrid cloud<br />“outsource” your workload to public clouds only when needed<br />Move some components/VMs/data to or from a public cloud to achieve certain performance<br />Monitoring and management<br />Monitor whether Service Level Agreements are guaranteed<br />Secure the transparency of SLA monitoring<br />Developing the new yardstick for cloud platforms, measuring elasticity for SPEC RG<br />Exploring the possibility of new applications<br />What we can do using huge computing resources?<br />Collaborating with Microsoft Research on Azure Cloud Platform<br />
    49. 49. Agenda<br />Introduction to Cloud Computing<br />Characteristics, Deployment and Delivery Models<br />Enterprise Architecture and Migration Framework<br />Usage Scenarios<br />Evaluating Cloud Computing<br />Enterprise context, Business opportunities, risks<br />Technical qualities of platforms<br />Platform Architectural Insights<br />Proof of Concept Experiences<br />Advanced Architecture Issues<br />Future Directions<br />Industry happenings<br />Research Agenda<br />
    50. 50. Cloud, Cloud, Cloud,...<br />Cloud Computing is the No. 1 in the top 10 strategic technologies for 2011<br />Cloud is everywhere? No<br />Middle to large enterprises see huge opportunity in public cloud but also anxiety/pain due to…<br />The lack of governance, i.e., visibility and control<br />The lack of “architectures for (hybrid) cloud”<br />The lack of migration methodology<br />The lack of common cost structure<br />The lack of automation across cloud and in-house<br />…<br />35<br />
    51. 51. Cloud Computing - The Enterprise Context<br />36<br />STATE OF PLAY<br />Clear benefits in cloud adoption<br />Reduced IT cost, agility, efficiency, innovation opportunities<br />Top risks/adoption issues:<br />Security & privacy - Migration challenges<br />Ownership of data – Service levels<br />Lock-in / interoperability – Performance<br />Availability / reliability – Cost and ROI<br />Monitoring & control – Governance<br />Compliance and regulation – Competencies<br />Software licensing in the cloud - Operational challenges<br />Contracts and commercials - new roles and responsibilities<br />Payment model, metering/charge backs<br />Risks vary with service model and provider<br />Many progressive organisations evaluating cloud<br />Proof of concepts, pilots, cloud computing strategy papers<br />Some good adoptions in certain verticals, SME, Software as a service…<br />CIOs need greater visibility and control over their assets running in local servers and in the cloud before reaping the benefits of cloud computing.<br />
    52. 52. Integration Challenges<br />Integration Challenges<br /><ul><li>UI Integration
    53. 53. Data Integration
    54. 54. Process Integration</li></ul>Identity Challenges: <br />Access Control<br />AuthN, SSO, AuthZ<br />Identity Lifecycle<br />Identity Portability<br />Interoperability<br />Management Challenges<br /><ul><li>SLA Monitoring
    55. 55. Halting, Pausing, Throttling…
    56. 56. Programmatic access to health model</li></li></ul><li>Standards and Interoperability<br />Cloud Computing Interoperability Forum (CCIF), OMG effort, The Open Group, Open Cloud Manifesto...<br />Is Standards THE solution?<br />Competing standards? Timing? Design by committee?<br />In fact, does it make sense when cloud platform architecture varies significantly?<br />Individual services already surfaced on the internet<br />Still want to orchestrate services within a long running workflow, across/from different clouds<br />
    57. 57. Internet Service Bus<br />REST on .NET Service Bus<br />Simple to implement for interop across different languages<br />Less overhead packages<br />SOAP on .NET Service Bus<br />Only available for .NET Frameworks communications atm<br />Other languages are not fully supported (Java can only pass Access Control on .NET Service)<br />More overhead packages when communicate between C# and Java, than C# to C#<br />
    58. 58. 40 / 25<br />Overview of Cloud Computing Offerings<br />
    59. 59. Overview of Three Leading Cloud Computing Platforms<br />
    60. 60. Cloud Computing Environment from AWS<br />On-demand instances operate on a virtual environment<br />EC2 is a IaaS offering<br />Scaling computing environment<br />Datacenter located in different regions, including US (North and East), EU and APAC.<br />Types of instances:<br />Standard<br />Small (1 ECU, 1 Core, 1.7GB memory)<br />Large (4 ECUs, 2 Cores, 7.5 GB memory)<br />Extra Large (8 ECUs, 4 Cores, 15GB memory)<br />High-Memory<br />Extra Large (6.5 ECUs, 2 Cores, 17.1GB memory)<br />Double Extra Large (13 ECUs, 4 Cores, 34.2GB memory)<br />Quadruple Extra Large (26 ECUs, 8 Cores, 68.4GB memory)<br />High-CPU<br />Medium (5 ECUs, 2 Cores, 1.7GB memory)<br />Extra Large (20 ECUs, 8 Cores, 7GB memory)<br />
    61. 61. Cloud Computing Environment from AWS (contd)<br />Database Support<br />S3<br />Bucket storage<br />Relational Database Service (RDS)<br />Scalable SQL database<br />Elastic Block Store (EBS)<br />Disk partition (< 1TB)<br />Supported environment<br />Operating System<br />Linux (e.g. Fedora, Ubuntu & Debian)<br />Windows (e.g. Windows 7 & Windows Server 2008)<br />Other licensed environment<br />IBM WebSphere<br />Application Server<br />sMash<br />Portal Server<br />Oracle Database<br />Oracle Enterprise Linux<br />
    62. 62. Cloud Computing Environment from GAE<br />Cloud hosting environment for web applications<br />GAE is PaaS offering<br />Automatic Scaling and load balancing<br />Hardware specification is unknown<br />No notion of geographical regions<br />Database Support<br />BigTable<br />Other Support<br />Google Documents<br />Google Calendar<br />Upcoming Products<br />AppEngine for Business<br />SLA and SQL support<br />Data store (bucket storage)<br />Used in conjunction with Prediction and BigQuery API<br />Prediction and BigQuery API<br />Analytics support<br />
    63. 63. Cloud Computing Environment from Azure<br />Windows Azure has 3 main components: Compute, Storage and Fabric<br />Compute is based on Web Role and Worker Role<br />Storage are scalable storage (see below)<br />Azure is PaaS offering<br />Database Support<br />Small (1 CPU, 1.75GB memory)<br />Medium (2 CPUs, 3.5GB memory)<br />Large (4CPUs, 7 GB memory)<br />Extra Large (8CPUs, 14GB memory)<br />Storage support<br />Types of storage<br />Blob<br />Queue<br />Table<br />Drive<br />
    64. 64. Details of Storage Offerings<br />
    65. 65. Storage Offerings from AWS<br />AWS S3<br />Stores blobs (up to 5GB per blob)<br />Access via REST/SOAP. Sneakernet option (i.e., fedexing) is offered<br />AWS EBS<br />Network attached disk storage<br />Used as an external HDDs of EC2 instances (Up to 1TB per volume)<br />No direct access from the outside<br />Allow for creating point-in-time snapshots of volumes in S3<br />High performance <br />It’s reported that sequential access is faster than 70MB/sec (0.54Gbps)<br />Allow disk striping by attaching multiple volumes to an EC2 instance<br />
    66. 66. RDB Offerings from AWS<br />Amazon RDS<br />An EC2 instance with pre-installed MySQL 5.1 (Up to 1TB storage)<br />Automatically patches<br />Automated transaction logs backup up to last eight days and user-initiated DB snapshot<br />Replication between multiple Availability Zones<br />Amazon Relational Database Offers<br />IBM DB2 9.5, Informix Dynamic Server<br />Oracle 11g, 10g<br />SQL Server Express, 2005<br />Sybase SQL Anyware 11<br />Postgres Plus<br />Vertica Analytic Database<br />
    67. 67. Storage Offerings from Azure<br />Windows Azure Blob<br />Stores blobs (up to 1TB per blob)<br />Read/write a blob 4MB piece by piece<br />Access via REST/SOAP/ADO.NET<br />Windows Azure Drive<br />NTFS volume on Azure Blob accessed from Azure instances<br />Azure SQL<br />Support a subset of Transact-SQL, which SQL Server fully supports (up to 50GB per database)<br />Automatically patches<br />Automatic high availability (no details are available)<br />SQL Azure Data Sync is offered to sync on-premise DB and Azure SQL<br />
    68. 68. Storage Offerings from GAE<br />GAE Blobstore<br />Stores blobs (up to 2GB per blob)<br />Read/write a blob 1MB piece by piece<br />Access via HTTP and no access control<br />GAE Datastore<br />Support SQL-like language and JDO (no storage size limit?)<br />Services on the way<br />Google Storage for Developers<br />Extended version of GAE Blobstore<br />Store blobs (100GB per blob), REST interface, fine access control<br />BigQuery<br />Analyze massive data in Google Storage using SQL-like language<br />A query against 60TB of data takes less than 1 min<br />
    69. 69. Comparison of Storages<br />
    70. 70. Comparison of Storages (con’t)<br />
    71. 71. Cost Example<br />Have 400GB data. Transfer 7GB/day log data into cloud and add 0.5 GB/day to a storage. Read 0.1GB/day from a storage. 1M requests/day on a storage. Cost for one year?<br />AWS S3<br />(400*12 + Σ13650.5*i/30)*0.15 + (0.1*7 + 0.15*0.1)*365 + 0.1 * 365 $1,185<br />Amazon RDS<br />(400*12 + Σ13650.5*i/30)*0.1 + (0.1*7 + 0.15*0.1)* 365 + 0.1 * 365 $889 + CPU fees (min $1,000/year)<br />Azure Blob<br />(400*12 + Σ13650.5*i/30)*0.15 + (0.1*7 + 0.15*0.1)*365 + 1 * 365 $1,513<br />Azure SQL<br />(400*12 + Σ13650.5*i/30)*10+ (0.1*7 + 0.15*0.1)*365  $59,427<br />Azure SQL is quite expensive as a data storage but cheap as a small-mid scale high-performance and reliable SQL server<br />
    72. 72. Cost Example 2<br />Have 5GB data. Transfer 0.01GB/day log data into cloud and add 0.01 GB/day to a storage. Read 0.1GB/day from a storage. 0.1M requests/day on a storage. Cost for one year?<br />AWS S3<br />(5*12 + Σ13650.01*i/30)*0.15 + (0.01*7 + 0.15*0.1)*365 + 0.1*0.1*365 $235<br />Amazon RDS<br />(5*12 + Σ13650.01*i/30)*0.1 + (0.01*7 + 0.15*0.1)* 365 + 0.1*0.1 * 365 $167 + CPU fees (min $1,000/year)<br />Azure Blob<br />(5*12 + Σ13650.01*i/30)*0.15 + (0.01*7 + 0.15*0.1)*365 + 0.1*1 * 365 $268<br />Azure SQL<br />(5*12 + Σ13650.01*i/30)*10+ (0.01*7 + 0.15*0.1)*365  $853<br />
    73. 73. Security Support<br />AWS<br />Firewall support to control network access to and from instances<br />Amazon Virtual Private Cloud<br />Isolates instances by IP range<br />Connect to existing private infrastructure via encrypted Ipsec VPN<br />Charged based on number of VPN connections and duration, as well as data transfer through VPN connection<br />S3<br />Bucket policies<br />Access Control List (ACL)<br />Query string authentication<br />GAE<br />Google Secure Data Connector<br />Encrypted Connection from Google Apps to internal applications behind firewall<br />Filters traffic by users and applications<br />OAuth<br />Denial of Service (DoS) protection<br />Blacklist IP addresses or subnets<br />Impose limits<br />
    74. 74. Security Support (contd)<br />Azure<br />AppFabric Service Bus<br />Connects Azure applications and databases to internal infrastructure<br />AppFabric Access Control<br />Provide federated authorisation to applications and servers<br />
    75. 75. Elastic Compute Capability <br />Elasticity is the defining characteristic of cloud computing<br />The aim is to allocate sufficient resource to do the job, but not too much such that it wastes resources<br />There are broadly 2 architectures that achieves elastic compute capability<br />Push architecture<br />Pull architecture<br />57<br />
    76. 76. Elastic Compute Capability Reference Architecture –Push Architecture<br />The Push architecture is typically used for web applications<br />Web browser (client) send a request to the web application side<br />Load balancer receives the request and “push” to one of the web servers running on a compute node<br />Requests are forwarded immediately (or at a certain rate)<br />Load balancer is aware of the intensity of the workload<br />58<br />
    77. 77. Elastic Compute Capability Reference Architecture –Push Architecture<br />59<br />Fig 1. Push Architecture Pattern<br />
    78. 78. Elastic Compute Capability Reference Architecture –Pull Architecture<br />The Pull architecture is often seen as an application-level architecture<br />Also known as the Producer-Consumer design pattern<br />Requests are sent to a queue<br />In contrast to the Push architecture, it does not forward the request (hence less suitable for web applications)<br />Compute nodes polls the queue periodically for jobs<br />Requests are processed one at a time<br />Polling frequently can induce overhead<br />Easier to implement fail-safe mechanism<br />Compute nodes need NOT inform the queue in case of failure<br />Typical fail-safe mechanism involves a queue (e.g., AWS SQS or Azure Queue) that employs a lock attached with a timer. A message is locked when polled by a node. In case of a node failure, the message lock expires and return the message back to the queue.<br />60<br />
    79. 79. Elastic Compute Capability Reference Architecture<br />61<br />Fig 2. Pull Architecture Pattern<br />
    80. 80. Using Cloud for Business Continuity<br />Two main usages of cloud for Business Continuity:<br />Provides highly available systems for day-to-day business<br />Serves as a technology platform to implement disaster recovery<br />Some definitions:<br />Business Continuity: “Activity performed by an organisation to ensure that critical business functions will be available to customers, suppliers, regulators and other entities…”<br />Disaster Recovery: “A small subset of business continuity. The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organisation after a natural or human-induced disaster”<br />Fault Tolerance: “The property that enables a system to continue operating properly, possibly at a reduced quality level…” <br />62<br />
    81. 81. Building Highly Reliable Systems with Cloud<br />Must address potential failures at two levels:<br />Hardware/Infrastructure<br />To prevent Single-Point-of-Failure (SPOF) by adding redundancy in all hardware components (i.e., redundant disks, redundant network devices, redundant power supply, etc.)<br />NOT all cloud providers provide enterprise grade availability. Check your SLA!!<br />Application<br />Prepare fail-over system to take over in case of a failure<br />Database replicates to minimise downtime and loss of data<br />Replicate to geographically different location (e.g., to avoid natural disasters such as floods)<br />63<br />
    82. 82. Case Study: Building Reliable System using EC2<br />Highly replicated architecture of cloud makes them great as foundations for business continuity solutions<br />Globally distributed nature further enhances the disaster recovery capability of cloud<br />Availability limitations means need to be realistic about Hot vs Warm vs Cold standby options<br />64<br />
    83. 83. Case Study: Building Reliable System using EC2 (Contd) <br />Data backup in AWS<br />Amazon S3 is best for off-site data backup<br />Stores large binary files<br />Designed to provide 99.999999999% durability<br />Objects are redundantly stored in multiple facilities in a Region<br />Back up using EBS<br />Uses a regular file system<br />Takes image (or snapshot) of the partition<br />VM Import<br />Allows for easy replication from on-premise to cloud<br />Not trivial to replicate various configuration such as network configuration and disk drives<br />65<br />
    84. 84. 10 Things You Didn’t Know About Cloud Platforms: Azure, GAE and AWS<br />Dr. Anna Liu, Dr. Hiroshi Wada, Kevin Lee<br />National ICT Australia<br />
    85. 85. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    86. 86. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    87. 87. 69<br />The Reality of Eventual Consistency in Amazon SimpleDB<br />The probability to read updated data in SimpleDB in US West<br />An application reads data X (ms) after it has written data<br />Eventual Consistent<br />Consistent Read<br /><ul><li>SimpleDB has two read operations
    88. 88. Eventual Consistent Read
    89. 89. Consistent Read
    90. 90. This pattern is consistent regardless of the time of day</li></li></ul><li>70<br />Consistent vs. Eventual Consistent Read<br />SimpleDB’s consistent read guarantees to read updated data<br />What is the cost you need to pay for consistency?<br />RTT is same as that of eventual consistent read<br />Monetary cost (usage fee) is exactly same as eventual consistent read<br /> Trade-off is not clear! We suspect consistent read is less scalable and slower under datacenter failures. However, we’ve not observed any differences<br />
    91. 91. 71<br />Other Commercial NoSQL Databases<br />Google App Engine<br />Offers eventual consistent read and consistent read<br />Behavior of eventual consistent read is completely different from Amazon’s<br />In GAE, both types of reads behave exactly same unless data centers have a failure(s)<br />Windows Azure<br />Offers no options for read<br />Always consistent<br />Reference: H Wada, A Fekete, L Zhao, K Lee, A Liu, “Data Consistency Properties<br />And the Trade-offs in Commercial Cloud Storage: The Consumers’ Perspective”,<br />CiDR 2011. http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper15.pdf<br />
    92. 92. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas<br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    93. 93. Limitations and Quotas<br />
    94. 94. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    95. 95. Performance Unpredictability in Cloud<br />Performance unpredictability is one of the major obstacles <br />Performance variance of a MapReduce job for a 50-node EC2 cluster and a 50-node local cluster<br />Examples (time as performance metric)<br />Repeatability of results for researchers<br />Time critical tasks for enterprises<br />
    96. 96. Benchmark Details<br />
    97. 97. Benchmark Results in EC2<br />The COV of large instance is higher than the small. However, both are at least by an order magnitude less stable than on a physical cluster.<br />The COV of S3 Access may be influenced by other traffic on the network, showing this experiment just for completeness.<br />Reference - Schad, Jo ̈rg, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance. In Proceedings of the 36th international conference on Very large data bases. Vol. 3. 1. Singapore, Singapore: VLDB Endowment.<br />
    98. 98. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    99. 99. Distributed Transactions in Cloud<br />There is now a range of Cloud Database types<br />NOSQL (Azure Table, GAE Datastore, Amazon SimpleDB...)<br />Much more ‘shardable’ architecture; No joins, not full ACID support<br />SQL (Azure SQL, Amazon RDS, Oracle on EC2...)<br />Variable distributed transactional support compared to their traditional RDBMS counterpart<br />Experience with porting PetShop<br />Challenge with porting the data access layer <br />Some JDO interface not supported by App Engine, eg. ‘Join query’ <br />No distributed transaction support in Azure SQL atm<br />79<br />
    100. 100. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    101. 101. Pricing fluctuates over space and time<br />On demand pricing (hourly, per GB, per ‘000 requests)<br />Reserved instances (1 or 3 year term + unit cost)<br />Spot pricing (typically cheaper in US-East!)<br />Similar pricing schemes observed for GAE and Azure <br />81<br />
    102. 102. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    103. 103. Sticky Session Support<br />Autoscaling alone does not guarantee that clients of the same session will always contact the same instance<br />Clients cannot perform a series of connected operations<br />Amazon ELB supports Session Affinity<br />Session affinity allows mapping to be created at the ELB<br />Limitations<br />Session affinity cannot handle HTTPS<br />Autoscaling down an instance with a live session<br />MS Azure advocates stateless sessions<br />If you must – store session state in eg table storage<br />Design issue - Server to remember conversation context? Or for client to remind it every time? How long should it ‘stick’? Too long: compromise server ability to distribute load<br />
    104. 104. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    105. 105. Customers’ Responsibility in IaaS Cloud<br />Application<br />Patching<br />App Data<br />Backup<br />Application<br />Monitoring<br />Application Installation/Configuration<br />OS/Application Security<br />(e.g., Active Directory)<br />Billing<br />(Cost Center Charging)<br />Antivirus<br />OS<br />Backup<br />OS<br />Monitoring<br />OS<br />Patching<br />OS/Middleware Installation/Configuration<br />Customers’<br />Responsibility<br />Infrastructure Configuration<br />(VPN, VMs, Disk, …)<br />Access Control<br />to IaaS<br />Infrastructure<br />Monitoring<br />(CPU, Disk, Net, …)<br />Usage Report<br />and<br />Basic Billing<br />Amazon EC2<br />(IaaS providers)<br />
    106. 106. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    107. 107. Secure Connection to the Cloud<br />87<br />
    108. 108. Performance Implications<br />Low Security Option – max throughput 5.6MB/sec<br />High Security Option - connection throughput is 4MB/sec<br />Performance hit due to encryption, decryption and firewall<br />Other interesting observations:<br />VPC only available US East-1 and EU-west1<br />in single availability zone only<br />S3 not working well with VPC yet (very slow), EBS is a workaround<br />MS Azure VPN support next year<br />Google Secure Connector<br />
    109. 109. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    110. 110. Time to Getting a New Instance<br />Typically takes minutes to create an instance from its image on EC2<br />Trick to “create” instances quicker<br />Create a pool of instances in advance, and stop (hibernate) them all<br />Pay no instance cost but need to pay for storage cost (for stopped instances)<br />Revive stopped instances if new instances are needed<br />
    111. 111. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    112. 112. Autoscaling is Not All Magic<br />Amazon EC2<br />“… your application can automatically scale itself up and down depending on its needs.”<br />Windows Azure<br />“Optimizd for scale-out applications-designed so that developers can easily build scale-out applications…”<br />Google App Engine<br /> “No matter how many users you have or how much data your application stores, App Engine can scale to meet your needs”<br />
    113. 113. Autoscaling is Not All Magical (contd)<br />
    114. 114. The 10 Things are...<br />How long does it take for data in cloud to become consistent <br />Limitation and quotas <br />How unpredictable/variable is the cloud?<br />Distributed transaction support in Cloud <br />Pricing variations over time and space <br />Sticky session support <br />The new matrix of roles and responsibilities for cloud providers, consumers and system integrators <br />Secure connections to the cloud <br />Time to getting a new instance<br /> Auto-scaling is not all magic <br />
    115. 115. Additional Slides<br />
    116. 116. Virtual Machine ‘Stolen Time’<br />Using traditional system resource monitoring tools in cloud<br />Measuring system performance within a virtual instance (using tools such as vmstat and top) can give misleading information<br />Example: An EC2 instance (e.g. m1.small with 1 EC2 compute unit) does not go above around 40% CPU load as observed from vmstat<br />Certain percentage (around 50-60%) appears on vmstat as ‘st’<br />“st – Time stolen from a virtual machine” (from vmstat manpage) <br />Does it mean I am not getting what I paid for? No, not really<br />Amazon instances are measured by EC2 compute units<br />“One EC2 compute Unit provides the equivalent CPU capacity of a 1.0-1.2GHz 2007 Opteron or 2007 Xeon process” <br />Monitoring system performance in cloud<br />Use Cloud monitoring tools such as CloudWatch and RightScale <br />
    117. 117. Limitation of Virtual Private Cloud (VPC)<br />VPC hosts are logically detached from (but physically attached to) the Amazon network<br />No direct connection to and from S3 via the Amazon local network<br />Connection via internet only<br />What happen if we need to transfer data from S3 to a VPC host?<br />E.g. If we ship a removable media to Amazon, it would be uploaded to S3. How do we transfer the data to a VPC host?<br />Option 1: Direct transfer from S3 to VPC host<br />Traffic routes through the remote side and comes back (High latency)<br />Option 2: Transfer to EBS and mount EBS to VPC host <br />Traffic routes through local network (Low latency)<br />
    118. 118. 98<br />How Long You Need to Wait to Get Updated with Eventual Consistent Read?<br />Result of the “5 minutes run” for one week<br /><ul><li>t1: the first time to read updated data
    119. 119. t2: the first time to reach 100% of reading updated
    120. 120. t3: the last time to read stale data</li></ul> Mostly updated after 600ms but no guarantee<br />
    121. 121. Let’s Switch Gear… what’s happening in industry?<br />99<br />
    122. 122. Australian Cloud Adoption<br />Software as a service<br />Enterprise and SME<br />Productivity suites, CRM<br />Telco and SaaS vendor partnership<br />emerging tier 2 System integrator<br />Platform and Infrastructure as a Service<br />SME, startups well on their way<br />Enterprise doing evaluation<br />Government Cloud, Community Cloud<br />Data centre consolidation<br />SOA, shared services<br />Financial industry leadership<br />100<br />
    123. 123. Some Australian Enterprise Proof of Concepts<br />Internet scale web applications<br />User base from around the world<br />Integration with existing web APIs <br />Transient campaigns<br />Many Mobile devices connecting to cloud<br />Good adoption in utilities industries <br />Development/Test environment<br />Dynamic provisioning of dev/test resources<br />Pay for usage<br />Bursty workload<br />Web apps<br />Large scale data analysis<br />eScience, Financial risk calculations, Government statistical data<br />101<br />
    124. 124. One example POC detail findings<br />An Example POC Experience<br />102<br />
    125. 125. 103<br />Proof of Concept Overview<br />Objective<br />reduce IT cost<br />evaluate cloud opportunity and risks <br />Test and Dev environment, as opposed to production<br />Maximise re-applicability of learning experience across other apps<br />Evaluation dimensions<br />Performance, security, feasibility<br />cost and license, flexibility and elasticity<br />integration with existing environment, migration effort<br />disaster recovery and backup, new roles and responsibilities<br />…<br />
    126. 126. Solution Design Rationale<br />POC Solution Design Rationale<br />Standard 3 tier web application, with backend and authentication server integration <br />Location of data tier<br />Maintain as much as dev/test configuration as common as possible<br />PaaS or IaaS<br /> Selection of cloud platform for POC<br />Project Management<br />Governance: CIO/Director level sponsorship<br />Project participants: enterprise architect, solution developer, security specialist, commercial specialist<br />NICTA: cloud computing experience and evaluation framework<br />2 wks POC selection; 6 wks POC; 2 wks consolidate findings<br />104<br />
    127. 127. 105<br />Architecture of a Hybrid Dev Environment<br />NICTA Corporate Network<br />Internet<br />Remote-desktop to XX.XX.0.*<br />(No direct access to Amazon VPC)<br />Amazon Cloud (US-East Datacenter)<br />IPSec VPN<br />approx 230ms RTT<br />Enterprise Data store<br />Authentication server<br />Business Web application<br />On-Premise Servers<br />Virtual Machines<br />Private Cloud (Isolated Network)<br />Only accessible from NICTA<br />Isolated Network in Amazon<br />
    128. 128. 106<br />Security<br />There is ‘Secure integration to cloud’ solutions emerging<br />Amazon VPC, Google Secure Data Connector, Azure App Fabric, etc<br />Standard IPSec-VPN brings peace of mind to enterprise users<br />One of the strong key enablers for enterprise use<br />Fit in an existing security policy<br />Data masking could increase the cost/effort<br />An automated method is necessary for further cost/effort reduction<br />Secure Software Development Lifecycle<br />Process change required <br />
    129. 129. 107<br />Performance<br />The performance of each component (network, VMs, …) in cloud is comparable to or better than current on-premise components<br />For dev/test environments, suitable for production systems?<br />Do not underestimate the latency in hybrid environments<br />Many of traditional applications and protocols are not optimized for a high-latency/WAN environment<br />E.g., a protocol is too “chatty” and we observed that the network usage never exceeds 0.1% in some cases<br />There are performance improvement opportunities<br />Alternative solution design, Configuration and tuning<br />
    130. 130. 108<br />Cost<br />Many companies use ‘private cloud’; however, current offering is seen to be more expensive and less flexible<br />increasingly Pay-as-you-go options are available<br />unit price is typically ~100 times more costly for storage <br />SLA & management services usually included<br />Cost of keeping data/VMs is larger<br /><ul><li>Current Cost would vary depending on the SLA tiers of service</li></li></ul><li>Customers’ Responsibility in IaaS Cloud<br />Application<br />Patching<br />App Data<br />Backup<br />Application<br />Monitoring<br />Application Installation/Configuration<br />OS/Application Security<br />(e.g., Active Directory)<br />Billing<br />(Cost Center Charging)<br />Antivirus<br />OS<br />Backup<br />OS<br />Monitoring<br />OS<br />Patching<br />OS/Middleware Installation/Configuration<br />Customers’<br />Responsibility<br />Infrastructure Configuration<br />(VPN, VMs, Disk, …)<br />Access Control<br />to IaaS<br />Infrastructure<br />Monitoring<br />(CPU, Disk, Net, …)<br />Usage Report<br />and<br />Basic Billing<br />Amazon EC2<br />(IaaS providers)<br />
    131. 131. Commercial Implications<br />Software Licensing in the cloud?<br />Reuse enterprise license<br />Pay for usage software license model<br />Payment model?<br />enterprise governance model<br />Metering and chargeback<br />Service level agreement?<br />Monitoring and management<br />Contracts<br />Backup, disaster recovery<br />New roles and responsibility?<br />Existing IT outsourcing arrangements<br />110<br />
    132. 132. POC Experience Summary<br />Cloud Computing has the potential to reduce existing enterprise IT cost<br />There are technical solutions for managing performance, security risks<br />Need some fresh approach to manage:<br />Enterprise architecture and governance<br />Commercial implications such as SLA, new roles and responsibility<br />111<br />
    133. 133. Other Global Challenges<br />Policy and Procedure<br />Procurement strategy?<br />Pricing strategy?<br />Governance and Control<br />Financial control vs shared model<br />Taxation and legal<br />Federal and state based taxation, sales and payroll tax<br />Compliance and assessment<br />112<br />
    134. 134. Other Challenges Australian Face<br />The Tyranny of Distance<br />Latency: ~250ms Singapore, ~220ms US west coast, ~5-600ms US east coast, Europe<br />No business case for an Australian Data centre<br />22 mil population, 12 mil internet users<br />National Broadband Network<br />The rise of oz cloud innovations<br />Strong Privacy Laws<br />Federal Privacy Act<br />APRA – Australian Prudential Regulation Authority<br />EU Safe Harbour <> oz Safe Harbour<br />113<br />
    135. 135. Agenda<br />Introduction to Cloud Computing<br />Characteristics, Deployment and Delivery Models<br />Enterprise Architecture and Migration Framework<br />Usage Scenarios<br />Evaluating Cloud Computing<br />Enterprise context, Business opportunities, risks<br />Technical qualities of platforms<br />Platform Architectural Insights<br />Proof of Concept Experiences<br />Advanced Architecture Issues<br />Future Directions<br />Industry happenings<br />Research Agenda<br />
    136. 136. Other Industry Happenings<br />Specialist cloud<br />New types of System integrators<br />Innovative Scenarios<br />115<br />
    137. 137. Research Agenda<br />Enterprise Architecture Framework<br />Evaluation, acquisition, effort estimation, project and risk management<br />Software Development Lifecycle<br />Requirement solicitation for cloud, design for interoperable services, MDA/MDD/DSL, testing at massively parallel scale, cloud design patterns<br />Interoperability and Integration<br />Hybrid cloud, integration challenges across clouds<br />Performance Engineering<br />Monitoring and measurement, performance modelling, prediction and analysis, quality of service, SLA and assurance<br />Many more…<br />116<br />
    138. 138. Cost Effort Estimation for Cloud Migration<br />Cost implication/estimation for cloud migration is especially challenging because:<br />Applications and migration projects vary in terms of: size/complexity, functionality, quality requirements, target deployment platforms...<br />Cloud computing is new and different from traditional software engineering paradigm: different development and deployment models, non-functional characteristics, pricing models...<br />Migration effort/cost estimation is not trivial<br />Little Empirical Data in cloud<br />V Tran, K Lee, A Fekete, A Liu, J Keung, “Size Estimation of Cloud Migration Projects with Cloud Migration Point (CMP)”, 5th Intl Symposium on Empirical Software Engineering and Measurement<br />V Tran, J Keung, A Liu, A Fekete, “Application Migration to Cloud: A Taxonomy of Critical Factors”, ICSE Software Engineering For Cloud Computing Workshop 2011.<br />117<br />
    139. 139. Adaptive Cloud Middleware Research<br />Evaluating Cloud Performance – Measuring Elasticity<br />Achieving Cloudburst – Integrated monitoring and management<br />Cloud Data Management – Elastic Data Store<br />S Sakr, L Zhao, H Wada, A Liu, “CloudDB AutoAdmin: Towards a Truly Elastic Cloud-Based Data Store”, 9th IEEE Intl Conf on Web Service ICWS 2011.<br />S Islam, J Keung, K Lee, A Liu, “An Empirical Study into Adaptive Resource Provisioning in the Cloud”, IEEE Intl Conf on Utility and Cloud Computing UCC2010.<br />L Zhao, A Liu, J Keung, “Evaluating Cloud Platform Architecture with the CARE Framework”, APSEC 2010.<br />P Brebner, A Liu, “Modeling Cloud Cost and Performance”, Cloud Computing and Virtualisation (CCV 2010)<br />H Wada, A Fekete, L Zhao, K Lee, A Liu, “Data Consistency Properties And the Trade-offs in Commercial Cloud Storage: The Consumers’ Perspective”, CiDR 2011.<br />118<br />
    140. 140. Elasticity Measure<br />Elasticity is the defining characteristic of cloud<br />Challenge: No existing metrics to measure elasticity<br />Not the same as ‘scalability’ or ‘throughput’ measures<br />Users care about running cost, agility<br />Understanding elasticity<br />“the ability of software to meet changing capacity demands, deploying and releasing relevant necessary resources on-demand” <br />Varying elasticity behaviour across platforms<br />SPEC Standardisation effort<br />
    141. 141. Data Consistency in Cloud<br />Inconsistent views of data is common in cloud<br />Due to the distributed nature and support of massive scalability<br />Understanding data inconsistency is a new and big challenge for software industry<br />What is the exact characteristics? When (not) to use them? How to use them? <br /><ul><li>Conducted scientific measurements and theoretical analysis
    142. 142. Working on a decision making algorithm involving large number of parameters
    143. 143. CiDR 2011 paper for more details
    144. 144. H Wada, A Fekete, L Zhao, K Lee, A Liu, “Data Consistency Properties And the Trade-offs in Commercial Cloud Storage: The Consumers’ Perspective”, CiDR 2011.</li></ul>120<br />
    145. 145. Cloud Data Management<br />One of the main goals of the next wave of Cloud Computing is to facilitate the job of implementing every application as a distributed, scalable and widely-accessible service on the Web. <br />Recently, a new generation of low-cost, high-performance database software has emerged to challenge dominance of RDBMS named as NoSQL (Not Only SQL).<br />Examples are BigTable, Dynamo, Cassandra, Hbase, HyperTable,…<br />The main features of these systems include: ability to horizontally scale, supporting weaker consistency models, using flexible schemas and data models and supporting simple low-level query interfaces.<br />121<br />
    146. 146. Cloud Data Management<br />122<br />
    147. 147. Cloud Data Management: NoSQL Limitations<br />In practice, there are many obstacles still need to overcome before theses systems can appeal to mainstream enterprises such as:<br />Simple Programming Model: Even a simple query requires signicant programming expertise.<br />Transaction Support: limited support (if any) of the transaction notion from NoSQL database systems<br />Maturity: NoSQL alternatives are in pre-production versions with many key features either not stable enough or yet to be implemented.<br />Support: small start-ups without the global reach, support resources, or credibility of an Oracle, Microsoft, or IBM.<br />123<br />
    148. 148. Database-as-a-service (DaaS)<br />DaaS is a new paradigm for data management in which a third party service provider hosts a database as a service.<br />The service provides data management for its customers and thus alleviates the need for the service user to purchase expensive hardware and software, deal with software upgrades and hire professionals for administrative and maintenance tasks.<br />Examples: Amazon RDS, Windows SQL Azure<br />124<br />
    149. 149. Database-as-a-service (DaaS)<br />In general, the service level agreements (SLA) of cloud database services are mainly focusing on providing their customers with high availability (99.99%) to the hosted databases. <br />On the other side, they are not providing any guarantee or support on the performance and scalability aspects.<br />Consumer applications of cloud-based database services have to take care of additional responsibilities and challenges in order to achieve performance improvement, scalability and elasticity goals<br />125<br />
    150. 150. DaaS: Challenges<br />Handling the Performance and Cost Aspects of Application-Defined SLAs<br />Data Spike<br />Distributed Transactions<br />Geo-Distributed User and Geo-Replicated Databases<br />126<br />
    151. 151. DaaS: Challenges<br />127<br />
    152. 152. DaaS: Challenges<br />128<br />
    153. 153. Our Solution: CloudDB AutoAdmin<br />129<br />
    154. 154. CloudDB AutoAdmin: Goals<br />Declarative Specification of Replication Management Strategies<br />Declarative Specification of Data Partitioning and Re-distribution<br />Declarative Specification of Consistency Management<br />Logging and Monitoring<br />130<br />
    155. 155. Our Solution: CloudDB AutoAdmin Architecture<br />131<br />
    156. 156. Measuring Elasticity - Cloud Benchmark<br />Performance Evaluation and Analysis<br />L Zhao, A Liu, J Keung, “Evaluating Cloud Platform Architecture with the CARE Framework”, APSEC 2010.<br />Modelling Cost and Performance<br />P Brebner, A Liu, “Modeling Cloud Cost and Performance”, Cloud Computing and Virtualisation (CCV 2010)<br />Measuring Elasticity, research contribution to SPEC<br />Submission to SOCC 2010<br />132<br />
    157. 157. Storing and Processing Large Datasets<br />Scalable Cloud Storage<br />Stores billions of records (e.g. user/application profiles and status)<br />Partitions automatically to preserve scalability<br />Supports structured data such as RDF and OWL (W3C recommendations) <br />Retains rich semantic information<br />Processing large datasets with parallelised frameworks<br />Supports real-time reasoning<br />Checks consistency against rules<br />Infers implicit knowledge from dataset<br />Enables efficient data analytics <br />
    158. 158. “always-on” costs in cloud. Also, very hot one is not feasible<br />Cost<br />Hot Standby<br />Warm Standby<br />Cold Standby<br /><ul><li> Ship backup to offsite
    159. 159. Hardware is not already set up
    160. 160. Recover systems after disaster
    161. 161. Run transactions on multiple sites but use only one
    162. 162. Mirror data via dedicated high speed network (e.g., SANs)
    163. 163. Regularly backup app/data in a backup site
    164. 164. Launch systems upon a disaster</li></ul>Cost of cold and warm is comparable<br />TraditionalDR<br />CloudDR<br />seconds – minutes<br />(automatic failover,<br />minimum data loss)<br />minutes – hours<br />(manual failover, few data loss)<br />hours – days<br />(large data loss)<br />Downtime<br />134<br />
    165. 165. Automated Business Continuity<br />For standard application stacks, automatically builds a backup site in cloud and keeps in sync<br />Given application architecture/implementation, suggests the best DR solutions<br />In-house<br />Failover<br />Config, launch only propagate changes<br />Config, replicate always<br />Build or pick, config<br />135<br />
    166. 166. 2. Hybrid Cloud Control Centre<br />Extensible architectures supporting various plug-ins<br />Diagnose and suggest optimal system configurations<br />Auto generation of reconfiguration workflows<br /><ul><li>Integrated monitoring across local and remote public clouds
    167. 167. Works with existing enterprise monitoring and mgmt tools</li></ul>6/24/2011<br />136<br />
    168. 168. Rent computing resources in public cloud(s) and replicated App. C to meet the (short-time) demand<br />Application A<br />Application C<br />Application B<br />Application C<br />Public Cloud<br />What Is Cloudburst?<br />Cloudburst<br />reconfiguration<br />Application A<br />Application B<br />Application C<br />Private Cloud<br />Spikes in demand for App.C but your private cloud has no resources!<br />Application C<br />Application A<br />Application B<br />If App. C has huge amount of data or has sensitive data to transfer<br /><ul><li>Dynamic reconfiguration of applications to use a public cloud when a private cloud cannot provide enough computing resources</li></ul>137<br />
    169. 169. 1: Monitoring Cloud Applications<br />Cloud management tools should monitor performance of cloud(s) and support writing of rules to trigger cloudburst<br />Problem:Many limitations to existing tools<br />Difficult to come up with appropriate rules manually<br />Rules do not automatically adapt to changes over time<br />No way to ensure quality of these rules<br />Our solutions:<br />Generate rules automatically from historical data<br />Reconfigure rules automatically over time<br />Provide guarantees on the quality of generated rules<br />138<br />
    170. 170. 2: Determining a Reconfiguration<br />Many possible ways to reconfigure applications<br />Scale-up/down? Scale-out/back? Which application components to migrate to a public cloud?<br />Problem:Difficult to find the best reconfiguration(s) due to conflicting objectives<br />Performance and cost of after-reconfigured applications<br />Time and cost to reconfigure applications<br />Our solutions:<br />Analyse trade-offs of possible reconfigurations with respect to performance, cost and time requirements<br />Determine series of steps for automatic reconfiguration<br />139<br />
    171. 171. 3: Selecting Cloud Technologies and Architectures<br />Many cloud technologies and architectures with different characteristics. E.g., for data storage:<br />RDB: strong consistency, low scalability<br />Distributed RDB + cache: high scalability, high maintenance<br />Key-value storages: low consistency, high scalability<br />Problem:Difficult to select appropriate technologies & architectures satisfying applications’ requirements<br />Consistency level, data portability, scalability, throughput, …<br />Our solutions:<br />Determine the best mixture of cloud technologies (e.g., data storage) and architectures depending on requirements<br />140<br />
    172. 172. Adaptive Cloud Technologies<br />Extensible monitoring engine across local and cloud<br />Diagnose and suggest optimal system configurations<br />Auto generation of reconfiguration workflows<br /><ul><li>Integrated with existing enterprise monitoring and management tools
    173. 173. SPEC standardisation lead for cloud computing
    174. 174. Adaptation engine patent pending</li></li></ul><li>3. Cloud Computing Cost Estimator<br />Application Profile<br /><ul><li>Resource consumption per business transaction
    175. 175. Daily, weekly, monthly, yearly usage patterns
    176. 176. Possible deployment locations - US, EU, Asia or Australia</li></ul>Live Usage Patternor“What-If” Scenarios<br />IT Administrator<br />System Monitoring<br />(ACT Monitor) <br />Cloud Computing Providers<br />Cloud Cost Estimator<br /><ul><li>Calculate operating cost of applications</li></ul>Knowledge base on<br />cost model, SLA, …<br /><ul><li>Total operating cost on each vendor
    177. 177. Monthly cost and break-down</li></ul>Estimated Operating Cost<br />
    178. 178. Standing on the shoulder of giants<br />The team<br />Hiroshi Wada, Kevin Lee, Adnene Guabtni, Sherif Sakr, Alan Fekete, Quanqing Xu, Sean Xiong, Bruce McCabe, Jacky Keung, Paul Bannerman, Liang Zhao, Sadeka Islam, Van Tran, Xiaomin Wu…<br />
    179. 179. Getting Involved<br />Linkage with National ICT Australia<br />Research Collaboration<br />Researcher exchanges<br />Expert Advisory Services, Architecture Reviews<br />Public and In-house Training Courses <br />Market Surveys, Case Studies<br />Professional in Research Residence<br />Anna.Liu@nicta.com.au, @annaliu<br />http://blogs.unsw.edu.au/annaliu/<br />
    180. 180.
    181. 181. 146<br />Alternative Architecture of a Hybrid Dev Environment (Non-VPN based)<br />NICTA Corporate Network<br />Internet<br />Remote-desktop to XX.XX.0.*<br />(Possible direct access to Amazon VPC)<br />Amazon Cloud (US-East Datacenter)<br />Secure connection (e.g., SSL)<br />Enterprise Data store<br />Authentication server<br />Business Web application<br />On-Premise Servers<br />Virtual Machines<br />Private Cloud (Isolated Network)<br />Only accessible from NICTA<br />Isolated Network in Amazon<br />
    182. 182. 147<br />Alternative Architecture of a Hybrid DevEnvironment (contd)<br />Characteristics of a non-VPN based architecture:<br />Simpler to setup and more light-weight<br />No special hardware required<br />Preserves isolated network in Amazon (i.e., cloud hosts with private IPs)<br />VPC host can directly access the internet<br />Assign elastic IP (i.e., public IP) to VPC host if internet access is required<br />Arguably less secure (because two firewalls to take care of)<br />Yields better throughput to internet hosts (because no rerouting through in-house network)<br />Suitable for applications with fewer connection points between in-house and cloud<br />
    183. 183. Bondi Beach<br />

    ×