[NetPonto] Arquitectura dos Serviços da plataforma Windows Azure


Published on

O Windows Azure é uma plataforma que fornece serviços de alta disponibilidade e escalabilidade. Nesta sessão iremos abordar a arquitectura dos serviços base desta plataforma (Compute, Storage e SQLAzure) de modo a entendermos de que forma é que a escalabilidade e alta disponibilidade são conseguidas. Iremos ver as diferenças para as plataformas "tradicionais" e algumas consequências no desenvolvimento de soluções para este ambiente.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Slide Objectives:Explain the differences and relationship between IaaS, PaaS, and SaaS in more detail.Speaking Points:Here’s another way to look at the cloud services taxonomy and how this taxonomy maps to the components in an IT infrastructure. Packaged SoftwareWith packaged software a customer would be responsible for managing the entire stack – ranging from the network connectivity to the applications. IaaSWith Infrastructure as a Service, the lower levels of the stack are managed by a vendor. Some of these components can be provided by traditional hosters – in fact most of them have moved to having a virtualized offering. Very few actually provide an OSThe customer is still responsible for managing the OS through the Applications. For the developer, an obvious benefit with IaaS is that it frees the developer from many concerns when provisioning physical or virtual machines. This was one of the earliest and primary use cases for Amazon Web Services Elastic Cloud Compute (EC2). Developers were able to readily provision virtual machines (AMIs) on EC2, develop and test solutions and, often, run the results ‘in production’. The only requirement was a credit card to pay for the services.PaaSWith Platform as a Service, everything from the network connectivity through the runtime is provided and managed by the platform vendor. The Windows Azure Platform best fits in this category today. In fact because we don’t provide access to the underlying virtualization or operating system today, we’re often referred to as not providing IaaS.PaaS offerings further reduce the developer burden by additionally supporting the platform runtime and related application services. With PaaS, the developer can, almost immediately, begin creating the business logic for an application. Potentially, the increases in productivity are considerable and, because the hardware and operational aspects of the cloud platform are also managed by the cloud platform provider, applications can quickly be taken from an idea to reality very quickly.SaaSFinally, with SaaS, a vendor provides the application and abstracts you from all of the underlying components.
  • Speaking Points:At PDC10 in just over a month, we will introduce several new services including: Caching and Reporting. We will also have a new CTP for the Data Sync Service and Project Dallas will be finally available. Let’s drill into these services in a bit more detail.--Speaking Points:I suspect most if not all of you in this room are familiar with the Windows Azure Platform today.Today the platform consists of a set of foundational services SQL Azure relational databaseAppFabric provides services that can be used by any apps – hosted in Windows Azure, on-premises, or hosted in another environment. Questions:How many of you are building applications for Windows Azure?How many are using SQL Azure?How many are using the Access Control service today? The Service Bus?Notes:Windows Azure StoryWe are building an open platform to run your applications in the cloud. Your apps are .NET, Java, PHP, etc. We love everyone.We are going to help you migrate your existing apps to the cloud. The cloud platform is the future. Enables scale, self-service, lowers friction, etc. We provide the best cloud platform for building new apps. (aka n-tier, web services, etc.)
  • Slide ObjectiveUse this slide to transition into an explanation of SQL Azure Database (Reporting and Data Sync will be covered later)Explain at a high level how SQL Azure worksSpeaker NotesDesign Principle of SQL Azure: Focus on combining the best features of SQL Server running at scale with low frictionSQL Azure is a high availability databaseAlways three transaction consistent replicas of the databaseOne primary replica; two slave replicasFailure of a replica will result in another replica being spun up immediately by the fabricFailure of the primary replica means a slave replica will become the primary and a new slave will spin upMinimal down timeTypically just a few dropped connectionsEasy to code for the failover scenario- if you are ding god connection management and error handling will be fineClustered index required on all tables to allow replicationNotesUseful article from SQL Azure teamhttp://msdn.microsoft.com/en-us/magazine/ee321567.aspx
  • [NetPonto] Arquitectura dos Serviços da plataforma Windows Azure

    1. 1. 4ª Reunião Coimbra - 11/02/2012 http://netponto.org Arquitectura dos Serviços da plataforma Windows Azure Vítor Tomaz
    2. 2. Vítor TomazISEL – LEICConsultor IndependenteNetPontoAzurePTRevista ProgramarPortugal@ProgramarSQLPort
    3. 3. Patrocinadores “GOLD”
    4. 4. Patrocinadores “Silver”
    5. 5. Agenda• Introdução• Arquitectura do Datacenter e Windows Azure• Arquitectura do Windows Azure Storage• Arquitectura do SQL Azure
    6. 6. Deploying A Service Manually• Resource allocation – Machines must be chosen to host roles of the service – Procure additional hardware if necessary – IP addresses must be acquired• Provisioning – Machines must be setup This is ongoing – Virtual machines created work…you’re – Applications configured never done – DNS setup – Load balancers must be programmed• Upgrades – Locate appropriate machines – Update the software/settings as necessary – Only bring down a subset of the service at a time• Maintaining service health – Software faults must be handled – Hardware failures will occur – Logging infrastructure is provided to diagnose issues
    7. 7. Recusos Capacidade Poucos disponíveis Prevista recursos DemasiadosCAPACIDADE recursos Capacidade Real TEMPO
    8. 8. Capacidade on Capacidade Escalabilidade Demand Prevista CAPACIDADE Não há recursos Elasticidade desperdiçados BaixoInvestimento Capacidade Real TEMPO
    9. 9. Escalabilidade Custos
    10. 10. Packaged Infrastructure Platform Software Software (as a Service) (as a Service) (as a Service) You manage Applications Applications Applications Applications You manage Data Data Data Data Runtime Runtime Runtime Runtime Managed by vendor Middleware Middleware Middleware MiddlewareYou manage Managed by vendor O/S O/S O/S O/S Managed by vendor Virtualization Virtualization Virtualization Virtualization Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
    11. 11. Windows Azure Automation Switches “What” is Load-balancers needed Agent Fabric Make it Controller happen Agent Agent
    12. 12. Reporting Data Sync Virtual Network
    13. 13. System Center Windows Azure Portal AppManagerFabric Controller Fabric Controller Fabric Controller Datacenter Datacenter Datacenter
    14. 14. Datacenter network Aggregation Routers and Load L AG L Top of Balancers B G B Rack Switches TOR TOR TOR TOR TOR … … … … … Power PDU PDU PDU PDU PDUDistribution Units
    15. 15. Datacenter networkAggregation TORRouters and Load Balancers Ag g LB LBTop of Rack Switches TO TO TO … R R R Nodes Nodes Nodes … PD PD PD U U U Power PDU Distribution Units
    16. 16. Server Datacenter Kernel Fabric Controller Process Service SQL Exchange SQLWord Server Online Azure Server Datacenter
    17. 17. TOR…PDU
    18. 18. WindowsImage Repository Fabric Controller Deployment Server Maintenance Parent Windows Azure RoleRole RoleRole PXE OS OS Images Images Images Images Server Windows FC Host Azure Agent Node OS Windows Azure Hypervisor
    19. 19. Role B Worker Rolewww.mycloudapp.net Count: 2 Update Domains: 2 Size: Medium www.mycloudapp.net Load Balancer
    20. 20. Role: Front-End Role: Middle-Tier Definition DefinitionMy Type: Web Type: Worker VM Size: Large VM Size: MediumService Endpoints: External-1 Endpoints: Internal-1 Configuration Configuration Instances: 3 Instances: 2 Update Domains: 3 Update Domains: 2 Fault Domains: 3 Fault Domains: 2
    21. 21. Nó físico Guest Guest Guest Partition Partition Partition Role Role Role Instance Instance Instance Guest Guest Guest Agent Agent AgentTrust boundaryHost Partition FC Host Image Repository (OS Agent VHDs, role ZIP files)Fabric Controller Fabric Controller Fabric Controller (Primary) (Replica) … (Replica)
    22. 22. Role Virtual Machine C:Resource Disk Windows VHD Role VHD
    23. 23. OS Volume Resource Volume Role VolumeGuest AgentRole HostRole Entry Point
    24. 24. Role B Worker Rolewww.mycloudapp.net Count: 2 Update Domains: 2 Size: Medium www.mycloudapp.net Load Balancer
    25. 25. Problem How Detected Fabric ResponseRole instance crashes FC guest agent monitors role FC restarts role terminationGuest VM or agent FC host agent notices missing guest FC restarts VM and hosted rolecrashes agent heartbeatsHost OS or agent FC notices missing host agent Tries to recover nodecrashes heartbeat FC reallocates roles to other nodesDetected node Host agent informs FC FC migrates roles to other nodeshardware issue Marks node “out for repair”
    26. 26. Fault Domain Fault Domain Rack Rack Web Role Web Role U/G Domain #1 U/G Domain #2 Worker Role Worker Role U/G Domain #1 U/G Domain #2
    27. 27. Front- Middle End-1 End-2 Tier-3 Tier-1 Tier-2 Front- Front- End-1 End-2 Middle Middle Middle Tier-1 Tier-2 Tier-3 Update Update UpdateDomain 1 Domain 2 Domain 3
    28. 28. Production VIP – VIP1 Staging VIP – VIP2 <dnsname>.cloudapp.net <guid>.cloudapp.netPort Port Port Port Port Port 80 3389 3390 80 3389 3390Role A Role B Role A’ Role B’ Deployment A Deployment A’
    29. 29. Container Blobs http://<account>.blob.core.windows.net/<container>Account Table Entities http://<account>.table.core.windows.net/<table> Queue Messages http://<account>.queue.core.windows.net/<queue>
    30. 30. • Blobs• Tables• Queues• Drives
    31. 31. Design Goals
    32. 32. Access blob storage via the URL: http://<account>.blob.core.windows.net/ StorageData access Location Service LB LB Front-Ends Front-Ends Partition Layer Partition Layer Inter-stamp (Geo) replication Stream Layer Stream Layer Intra-stamp replication Intra-stamp replication Storage Stamp Storage Stamp
    33. 33. Incoming Write Request AckFront End FE FE FE FE FELayer Partition Master Lock ServicePartition Layer Partition Partition Partition Partition Server Server Server ServerStream Layer Extent Nodes (EN)
    34. 34. FE FE FE FE FE
    35. 35. Partition MasterPartition Partition Partition Partition Server Server Server Server
    36. 36. Extent Nodes (EN)• Sistema de ficheiros distibuido e “append-only”• Os dados são armazenados em ficheiros (extents)• Todos os extent estão replicados 3 vezes em diferentes fault e upgrade domains• Todos os dados passam por Checksum• Novamente replicado se houver falha de disco/nó/rack ou checksum
    37. 37. Partition Layer
    38. 38. Blob IndexAccountAccount Container Container Blob Blob Storage Stamp Name Name Name Name Name Name aaaa aaaa aaaa aaaa aaaaa aaaaa A-H: PS1 …….. ……… …….. ……… …….. ……… Partition H’-R: PS2 Master R’-Z: PS3 …….. ……… …….. ……… …….. ……… …….. …….. …….. Partition PartitioharryAccount pictures sunrise Container Blob n Map Name …….. Name …….. Name …….. Server A-H Front-End harry pictures sunset PS 1 ……..Server…….. …….. ……… ……… ……… …….. …….. …….. A-H: PS1 ……… ……… …….. H’-R: PS2 …….. ……… …….. Partition PartitionAccount Container Blobrichard R’-Z: PS3 videos soccer Server Server R’-Z Name Name Name …….. …….. …….. H’-Rrichard videos …….. …….. tennis …….. PS 2 PS 3 ……… Partition ……… ……… …….. Map …….. …….. ……… ……… ……… …….. …….. …….. zzzz zzzz zzzz zzzz zzzzz zzzzz
    39. 39. Stream Layer
    40. 40. Stream //foo/myfile.dataPtr E1 Ptr E2 Ptr E3 Ptr E4BlockBlockBlockBlock Block Block Block Block Block Block Block Block Block Block Block Extent E1 Extent E2 Extent E3 Extent E4
    41. 41. Paxos SM Create Stream/Extent Stream SMPartition Master Layer EN1 Primary EN2, EN3 Secondary Allocate Extent replica set EN 1 EN 2 EN 3 EN Primary Secondary A Secondary B
    42. 42. Paxos SM Stream EN1 Primary SM Partition Master Layer EN2, EN3 Secondary AppendAck EN 1 EN 2 EN 3 EN Primary Secondary A Secondary B
    43. 43. Stream //foo/myfile.datPtr E1 Ptr E2 Ptr E3 Ptr E4 Ptr E5 Extent E1 Extent E2 Extent E3 Extent E4 Extent E5
    44. 44. Paxos Seal Extent SM Seal Extent Stream SM Sealed at 120Partition Master Layer Append 120 120 Ask for current length EN 1 EN 2 EN 3 EN 4 Primary Secondary A Secondary B
    45. 45. Paxos Seal Extent SM Stream SM Sealed at 120Partition Master Layer Sync with SM 120 EN 1 EN 2 EN 3 EN 4 Primary Secondary A Secondary B
    46. 46. Paxos Seal Extent SM Seal Extent SM SM Sealed at 100Partition Layer Append Ask for current length 120 100 EN 1 EN 2 EN 3 EN 4 Primary Secondary A Secondary B
    47. 47. Paxos Seal Extent SM SM Sealed at 100Partition SM Layer 100 Sync with SM EN 1 EN 2 EN 3 EN 4 Primary Secondary A Secondary B
    48. 48. Account ServerDatabase
    49. 49. Três Uma Base de DadosBase de Dados Físicas Lógica Primary Secondary DB 1 Secondary 2
    50. 50. Apps use standard SQL client Application libraries: ODBC, ADO.Net, PHP, … Internet Load balancer forwards ‘sticky’ TDS (tcp) LB sessions to TDS protocol tier Security BoundaryTDS (tcp) Gateway Gateway Gateway Gateway Gateway GatewayTDS (tcp)SQL SQL SQL SQL SQL SQL Scalability and Availability: Fabric, Failover, Replication, and Load balancing
    51. 51. Client LayerInfrastructure Layer
    52. 52. Client LayerPHP ASP.NET WCF Data Services OBDC ADO.NET Tabular Data Stream (TDS)
    53. 53. • Verifica os comandos (parser)TDS • Handshake SSL • “Denial of Service” guard Services • Valida credenciais de acesso Layer • Valida regras da Firewall • Mapeia o nome da base de dadosSessão TDS Gateway usado pelo cliente ao nome interno • Cria a sessão entre a base de dados física e o cliente • Fica a fazer de proxy da sessão
    54. 54. • Cada nó contêmPlatform Layer • Uma única instância de SQL Server Node 14 SQL Instance • Com uma única instância de base de SQL DB dadosUserDB1 User DB2 User DB3 User DB4 • Com várias partições (até 650) • Cada partição é uma base de dados SQLAzure SQL Azure Fabric • Que pode ser primária ou secundária • Uma instância de SQL Azure Fabric Node 15 SQL Instance • Failure detection SQL DBUser User User User • Reconfiguration AgentDB1 DB2 DB3 DB4 • Engine Throttling SQL Azure Fabric • Ring Topology • Partition Manager Location Resolution
    55. 55. • Failure detection • Detecta falhas num réplica primária ou secundária de modo a accionar o Reconfiguration Agent• Reconfiguration Agent • Gere o re-estabelecimento de réplicas após falha de um nó• Engine Throttling • Gere a utilização dos recursos• Ring Topology • Mecanismo de ajuda à detecção de falhas• Partition Manager Location Resolution • Gere as comunicações com o Partition Manager
    56. 56. • Detecção de falhas • Topologia lógica em anel lógico faz com que cada máquina tenha duas máquinas vizinhas que podem detectar falhas nessa máquina. • Cada transacção tem que ser commited pela primária e pelo menos por uma secundária• Reconfiguração • Falha de hardware, crash do sistema operativo, problemas na instância de SQL Server, actualizações (SO, SQL Server, SQL Azure)
    57. 57. • Falha da réplica primária • Réplica secundária com menos carga passa a primária • O cliente recebe uma disconnection • Pode demorar 30 segundos a propagar a mudança aos gateways• Falha de uma réplica secundária • Se a falha for permanente cria uma nova réplica secundária e copia os dados da primária. • Esta cópia é uma das principais razões para a limitação do tamanho das bases de dados em SQL Azure
    58. 58. %d
    59. 59. Decoding Throttling Code 131075 Check Transient Fault Handling Framework 8 7 6 5 4 3 2 1 0
    60. 60. • Customer A using 30% CPU on a machine• Customer B kicks of load of 70% additional CPU on the same machine• Customer B gets throttled• Customer A using 70% CPU on a machine• Customer B kicks of load to 30% additional CPU on the same machine• Customer A gets throttled• Machine has no active workload• Customer A kicks of load to 100% CPU and gets throttled repeatedly• Customer A gets throttled
    61. 61. select sum(reserved_page_count)*8.0/1024 AS[Storage_in_MB]fromsys.dm_db_partition_stats
    62. 62. select highest_cpu_queries.total_worker_time, q.text AS [Query_Text], highest_cpu_queries.plan_handlefrom (select top 50 qs.plan_handle, qs.total_worker_time from sys.dm_exec_query_stats qs order by qs.total_worker_time desc) ashighest_cpu_queries cross apply sys.dm_exec_sql_text(plan_handle) as qorder by highest_cpu_queries.total_worker_time desc
    63. 63. select top 25 (total_logical_reads/execution_count) asavg_logical_reads, (total_logical_writes/execution_count) asavg_logical_writes, (total_physical_reads/execution_count) asavg_phys_reads, Execution_count, sql_handle, plan_handlefrom sys.dm_exec_query_statsorder by(total_logical_reads + total_logical_writes) Desc
    64. 64. Patrocinadores “Silver”
    65. 65. • 11/02/2012 – Fevereiro (Coimbra)Reserva estes dias na agenda! :)
    66. 66. Vítor Tomazvitorbstomaz AT gmail.comhttp://twitter.com/vitortomaz