• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cnam azure 2014   storage
 

Cnam azure 2014 storage

on

  • 1,199 views

Support de cours sur Azure au Cnam d'Aymeric Weinbach

Support de cours sur Azure au Cnam d'Aymeric Weinbach
storage sur Azure

Statistics

Views

Total Views
1,199
Views on SlideShare
238
Embed Views
961

Actions

Likes
0
Downloads
17
Comments
0

4 Embeds 961

http://www.zecloud.fr 953
http://feedly.com 5
http://zecloudv3.cloudapp.net 2
http://ranksit.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Block blobs : Adapté au "streaming" de données Page Blobs : Adapté aux données en lecture/écriture aléatoire

Cnam azure 2014   storage Cnam azure 2014 storage Presentation Transcript

  • Cnam Azure ZeCloud 17/01/2014
  • SQL Azure Database SQL Azure. Une ou plusieurs bases. Database Application Database Application Database Database
  • Les applications utilisent les librairies standards d’accès SQL : ODBC, ADO.Net, PHP, … Application Implémentation Internet TDS (tcp) Les load balancer répartissent la charge sur les passerelles TDS en tenant compte des affinités de session LB TDS (tcp) Gateway Gateway Gateway Gateway Gateway Gateway Gateway: TDS protocol gateway, enforces AUTHN/AUTHZ policy; proxy to backend SQL TDS (tcp) SQL SQL SQL SQL SQL Scalability and Availability: Fabric, Failover, Replication, and Load balancing SQL
  • Sql Azure Sql Server dans les nuages avec ses avantages : Provisioning simple  Via le portail  Via l’API REST Haute disponibilité Load Balancing Protocole TDS (le même que SQL Server) pour tout le reste sur SSL (crypté)
  • Les différences avec Sql Server Vous n’avez pas accès à tout ce qui est physique (filegroup …) Pas de CLR Pas de transactions distribuées Pas de service Broker
  • Développer avec Sql Azure Implémenter une politique de Retry Facturation de la bande passante donc utiliser dés que possible :  Lazy loading  Cache
  • Windows Azure Storage • Cloud Storage - Anywhere and anytime access • Blobs, Disks, Tables and Queues • Highly Durable, Available and Massively Scalable • • • Easily build “internet scale” applications 8.5 trillion stored objects 900K request/sec on average (2.3+ trillion per month) • Pay for what you use • Exposed via easy and open REST APIs • Client libraries in .NET, Java, Node.js, Python, PHP, Ruby
  • Abstractions – Blobs and Disks • • • • • • •
  • Abstractions – Tables and Queues • • • • • • • • •
  • http://<account>.blob.core.windows.net/<container>/<blobname> Blobs Account Container Blob PIC01.JPG images PIC02.JPG cohowinery videos VID1.AVI
  • Blob Storage Pour stocker vos fichiers petits ou très grands Les blocks blobs pour les fichiers image, vidéo etc.. 200 GB max Les page blobs optimisé pour la lecture écriture rapide 1Tb Max Les Azure Drives : un disque NTFS que vous pouvez « monter » dans votre rôle et qui est sauvegardé automatiquement dans un page blob
  • CDN avec smooth streaming pour les vidéos Les blobs sont dans des containers Accès public, ou privé Snapshot Shared access signature Lease
  • Table Storage 1 seul index le couple PartitionKey/RowKey Transactions possibles au sein d’une même partition ODATA + authentification Sdk .net opensource https://github.com/WindowsAzure/azure-sdk-for-net API REST Table non relationnelle Schéma flexible ( plusieurs versions de schéma peuvent cohabiter dans la même table)
  • 1) Receive work Web Role Worker Role Queue typical usage ASP.NET, WCF, etc. 2) Put message in queue main() { … } 4) Do work 3) Get message from queue 5) Delete message from queue Queue
  • Data centers
  • Windows Azure Data Storage Concepts Container Blobs https://<account>.blob.core.windows.net/<container> Account Table Entities https://<account>.table.core.windows.net/<table> Queue Messages https://<account>.queue.core.windows.net/<queue>
  • How is Azure Storage used by Microsoft?
  • Internals
  • Design Goals Highly Available with Strong Consistency • Provide access to data in face of failures/partitioning Durability • Replicate data several times within and across regions Scalability • Need to scale to zettabytes • Provide a global namespace to access data around the world • Automatically scale out and load balance data to meet peak traffic demands • Additional details can be found in the SOSP paper: • “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
  • Windows Azure Storage Stamps Access blob storage via the URL: http://<account>.blob.core.windows.net/ Storage Location Service Data access LB LB Front-Ends Front-Ends Partition Layer Partition Layer Inter-stamp (Geo) replication DFS Layer DFS Layer Intra-stamp replication Intra-stamp replication Storage Stamp Storage Stamp
  • Architecture Layers inside Stamps • • • • • • Partition Layer • • Index • • • • •
  • • • Availability with Consistency for Writing All writes are appends to the end of a log, which is an append to the last extent in the log Write Consistency across all replicas for an extent: • • • • Appends are ordered the same across all 3 replicas for an extent (file) Only return success if all 3 replica appends are committed to storage When extent gets to a certain size or on write failure/LB, seal the extent’s replica set and never append anymore data to it Write Availability: To handle failures during write • • • Seal extent’s replica set Append immediately to a new extent (replica set) on 3 other available nodes Add this new extent to the end of the partition’s log (stream) Partition Layer
  • • Read Consistency: Can read from any Availability with Consistency for Reading replica, since data in each replica for an extent is bitwise identical Partition Layer • Read Availability: Send out parallel read requests if first read is taking higher than 95% latency
  • • Spreads index/transaction Balancing – Partition Layer Dynamic Load processing across partition servers • • Master monitors traffic load/resource utilization on partition servers Partition Layer Dynamically load balance partitions across servers to achieve better performance/availability Index • Does not move data around, only reassigns what part of the index a partition server is responsible for
  • Dynamic Load Balancing – DFS Layer • DFS Read load balancing across replicas • • • • • • Monitor latency/load on each node/replica; dynamically select what replica to read from and start additional reads in parallel based on 95% latency Partition Layer
  • Architecture Summary • Durability: All data stored with at least 3 replicas • Consistency: All committed data across all 3 replicas are identical • Availability: Can read from any 3 replicas; If any issues writing seal extent and continue appending to new extent • Performance/Scale: Retry based on 95% latencies; Auto scale out and load balance based on load/capacity • Additional details can be found in the SOSP paper: • “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
  • What’s Coming
  • What’s Coming by end of 2013 • • • • • • Geo-Replication • • Queue Geo-Replication Secondary Read-Only Access Windows Azure Import/Export Real-Time Metrics for Blobs, Tables and Queues CORS for Azure Blobs, Tables and Queues JSON for Azure Tables New .NET 2.1 Library
  • Two Types of Durability Offered Local Redundant Storage Accounts • • Maintain 3 copies of data within a given region ~ 2/3 price of Geo Redundant Storage Geo Redundant Storage Accounts • Maintain 6 copies of data spread over 2 regions at least 400 miles apart from each other (3 copies are kept at each region)
  • Geo Redundant Storage Data geo-replicated across regions 400+ miles apart • • Provide data durability in face of potential major regional disasters Provided for Blob, Tables and Queues (NEW) North Central US User chooses primary region during account creation • Each primary region has a predefined secondary region Geo-replication Asynchronous geo-replication • Off critical path of live requests North Europe South East Asia East Asia Geo-replication South Central US Geo-replication Europe West Geo-replication West US East US
  • Geo-Rep & Geo-Failover Hostname http://account.blob.core.windows.net/ Azure DNS IP Address account.blob.core.windows.net East US West US Update DNS DNS lookup Data access West US Failover East US Geo-replication • • • • Existing URL works after failover Failover Trigger – failover would only be used by MS if primary could not be recovered Asynchronous Geo-replication – may lose recent updates during failover Typically geo-replicate data within minutes, though no SLA for how long it will take
  • Geo Redundant Storage Roadmap • Customer Controlled Failover (Future) • Provide APIs to allow clients to switch the primary and secondary regions for a storage account • Queue Geo-Replication (Done) • Secondary Read Only Access (by end of CY13)
  • Secondary Read-Only Access – Scenarios Read-only access to data even if primary is unavailable • Access to an eventually consistent copy of the data in the other region Provides another read source for geographically distributed applications/customers • • Allows lower latency access to data in secondary region Have compute at both primary and secondary region and use the storage stored in that region • For these, the application semantics need to allow for eventually consistent reads
  • Secondary RO Access – How it Works Customers using Geo Redundant Storage can opt to have read-only access to the eventually consistent copy of the data on Secondary tenant • Customer choose primary region, and the secondary region is fixed Get two endpoints for accessing your storage account • Primary endpoint • • accountname.<service>.core.windows.net Secondary endpoint • accountname-secondary.<service>.core.windows.net Same storage keys work for both endpoints Consistency • All Writes go to the Primary • Reads to Primary are Strongly Consistent • Reads to Secondary are Eventually Consistent
  • Secondary RO Access – Capabilities Application will be able to control which location they read data from • Use one of the two endpoints • Primary: accountname.<service>.core.windows.net • Secondary: accountname-secondary.<service>.core.windows.net • Our client library SDK will provide features for reading from the secondary • PrimaryOnly, SecondaryOnly, PrimaryThenSecondary, etc Application will be able to query the current max geo-replication delay for each service (blob, table, queue) for a storage account There will be separate storage account metrics for primary and secondary locations
  • Windows Azure Import/Export • Move TBs of data into and out of Windows Azure Blobs by shipping disks Windows Azure Storage
  • Import Process Support Staff
  • Export Process Support Staff
  • Import/Export Features • Accessible via REST with Portal integration • Each Job imports/exports data for a single storage account • Each Job can be up to 10 disks • Support 3.5” SATA HDDs • All Disks must be encrypted with BitLocker
  • • • • • • • $MetricsRealtimeTransactionsBlob, $MetricsRealtimeTransactionsTable and $MetricsRealtimeTransactionsQueue • • • • •
  • 6/24/2013 6/24/2013 1:00 6/24/2013 2:00 6/24/2013 3:00 6/24/2013 4:00 6/24/2013 5:00 6/24/2013 6:00 6/24/2013 7:00 6/24/2013 8:00 6/24/2013 9:00 6/24/2013 10:00 6/24/2013 11:00 6/24/2013 12:00 6/24/2013 13:00 6/24/2013 14:00 6/24/2013 15:00 6/24/2013 16:00 6/24/2013 17:00 6/24/2013 18:00 6/24/2013 19:00 6/24/2013 20:00 6/24/2013 21:00 6/24/2013 22:00 6/24/2013 23:00 6/25/2013 700000 200 695000 198 690000 196 194 685000 192 680000 190 675000 188 670000 186 184 665000 182 660000 180 Average of TransactionCount Average of TPS
  • 6/24/2013 1:00 6/24/2013 0:57 6/24/2013 0:54 6/24/2013 0:51 6/24/2013 0:48 6/24/2013 0:45 6/24/2013 0:42 6/24/2013 0:39 6/24/2013 0:36 6/24/2013 0:33 6/24/2013 0:30 6/24/2013 0:27 6/24/2013 0:24 6/24/2013 0:21 6/24/2013 0:18 6/24/2013 0:15 6/24/2013 0:12 6/24/2013 0:09 6/24/2013 0:06 6/24/2013 0:03 6/24/2013 20000 350 18000 16000 300 14000 250 12000 200 10000 8000 150 6000 100 Average of TransactionCount 4000 2000 50 0 0 Average of TPS
  • • • • • • • • • • <RealtimeMetrics> <Version>1.0</Version> <Enabled>true</Enabled> <IncludeAPIs>true</IncludeAPIs> <RetentionPolicy> <Enabled>true</Enabled> <Days>7</Days> </RetentionPolicy> </RealtimeMetrics>
  • CORS (Cross Origin Resource Sharing) • What? • • • Browser by default prevents scripts from accessing resources from different origin CORS is a mechanism that enables cross origin access for scripts Set CORS rules via SetServiceProperties for Blobs, Tables and Queues • Can control the origins that can access resources • Can control the headers that can be accessed • Why? • Do not require running a proxy service for web apps to access storage service
  • • • CORS Settings <Cors> <CorsRule> <AllowedMethods>GET,PUT</AllowedMethods> • <AllowedOrigins>*</AllowedOrigins> • <AllowedHeaders>*</AllowedHeaders> • <ExposedHeaders>*</ExposedHeaders> • <MaxAgeInSeconds>180</MaxAgeInSeconds> • </CorsRule> • </Cors>
  • • What? • • JSON (JavaScript Object Notation) A popular concise format for REST protocols OData supports two formats • ATOMPub: We currently support this but is too verbose • JSON: OData has released multiple flavors of JSON • Why? • Improves COGS for applications • Lower bandwidth consumption (approx. 70% savings), lower cpu utilization and hence better responsiveness • Many applications use JSON to represent object model • Efficient object data model to wire protocol
  • • New Features • • • • 2.1 .NET Library Async Task methods with support for cancellation Byte Array, Text, File upload / download APIs for blobs IQueryable provider for Tables Compiled Expressions for Table Entities • Performance Improvements • • • Buffer Pooling Multi-Buffer Memory Stream for consistent performance when buffering unknown length data .NET MD5 now default (~20% faster than invoking native one) • Available Soon @ http://www.nuget.org/packages/WindowsAzure.Storage
  • Demo – CORS, JSON and Realtime Metrics
  • Best Practices – Account, Blobs, Tables and Queues
  • • Disable Nagle General .NET1400 b) Practices For Azure for small messages (< Best • ServicePointManager.UseNagleAlgorithm = false; • Disable Expect 100-Continue* • ServicePointManager.Expect100Continue = false; • Increase default connection limit • ServicePointManager.DefaultConnectionLimit = 100; (Or More) • Take advantage of .Net 4.5 GC • • GC performance is greatly improved Background GC: http://msdn.microsoft.com/en-us/magazine/hh882452.aspx
  • General Best Practices • Locate Storage accounts close to compute/users • Understand Account Scalability targets • • Use multiple storage accounts to get more Distribute your storage accounts across regions • Cache critical data sets • • As a Backup data set to fall back on To get more request/sec than the account/partition targets • Distribute load over many partitions and avoid spikes
  • General Best Practices (cont.) • Use HTTPS • Optimize what you send & receive • • • Blobs: Range reads, Metadata, Head Requests Tables: Upsert, Merge, Projection, Point Queries Queues: Update Message, Batch size • Control Parallelism at the application layer • Unbounded Parallelism can lead to slow latencies and throttling
  • General Best Practices (cont.) • Enable Logging & Metrics on each storage service • • • Can be done via REST, Client API, or Portal Enables clients to self diagnose issues, including performance related ones Data can be automatically GC’d according to a user specified retention interval • For example, have longer retention for hourly metrics and shorter retention for realtime metrics
  • Blob Best Practice • Try to match your read size with your write size • • Avoid reading small ranges on blobs with large blocks CloudBlockBlob.StreamMinimumReadSizeInBytes/ StreamWriteSizeInBytes • How do I upload a folder the fastest? • Upload multiple blobs simultaneously • How do I upload a blob the fastest? • Use parallel block upload • Concurrency (C)- Multiple workers upload different blobs • Parallelism (P) – Multiple workers upload different blocks for same blob
  • Concurrency Vs. Blob Parallelism XL VM Uploading 512, 256MB Blobs (Total upload size = 128GB) • • • • • • • • C=1, P=1 => Averaged ~ 13. 2 MB/s 10000 C=1, P=30 => Averaged ~ 50.72 MB/s C=30, P=1 => Averaged ~ 96.64 MB/s 8000 Single TCP connection is bound by TCP rate control & RTT P=30 vs. C=30: Test completed almost twice as fast! Single Blob is bound by the limits of a single partition Time (s) • • • 6000 4000 [NOM DE SÉRIE] Accessing multiple blobs concurrently scales 2000 0 [NOM DE SÉRIE] [NOM DE SÉRIE]
  • Blob Download 140 120 12.5GB) • • C=1, P=1 => Averaged ~ 96 MB/s C=30, P=1 => Averaged ~ 130 MB/s Time (s) • XL VM Downloading 50, 256MB Blobs (Total download size = 100 80 60 40 20 0 C=1, P=1 C=30, P=1
  • • Table Best Practice Critical Queries: Select PartitionKey, RowKey to avoid hotspots • Table Scans are expensive – avoid them at all costs for latency sensitive scenarios • Batch: Same PartitionKey for entities that need to be updated together • Schema-less: Store multiple types in same table • Single Index – {PartitionKey, RowKey}: If needed, concatenate columns to form composite keys • Entity Locality: {PartitionKey, RowKey} determines sort order • • Store related entites together to reduce IO and improve performance Table Service Client Layer in 2.1: Dramatic performance improvements and better NoSQL interface
  • Queue Messages become visible • Make message processing idempotent: Best Practice if client worker fails to delete message • Benefit from Update Message: Extend visibility time based on message or save intermittent state • Message Count: Use this to scale workers • Dequeue Count: Use it to identify poison messages or validity of invisibility time used • Blobs to store large messages: Increase throughput by having larger batches • Multiple Queues: To get more than a single queue (partition) target
  • Resources • Windows Azure Developer Website • http://www.windowsazure.com/en-us/develop/net/ • Windows Azure Storage Blog • http://blogs.msdn.com/b/windowsazurestorage/ • SOSP Paper/Talk • http://blogs.msdn.com/b/windowsazurestorage/archive/2011/11/20/windows-azure-storage-a-highlyavailable-cloud-storage-service-with-strong-consistency.aspx