NoSQL em Windows Azure Table Storage - Vitor Tomaz

942 views
826 views

Published on

Nesta sessão vamos analisar as características deste serviço fazer uma breve introdução à arquitectura que a suporta. Iremos verificar as considerações que devem ser tidas em conta na criação e utilização deste tipo de armazenamento, analisando o impacto que as decisões tomadas têm no que respeita a performance e objectivos de escalabilidade.

Serão ainda mostrados alguns exemplos de utilização em cenários distintos, incluindo algumas optimizações que se podem fazer para melhorar a performance.

Comunidade NetPonto, a comunidade .NET em Portugal!
http://netponto.org

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
942
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Slide Objectives:Explain the different Storage Libraries and languages that can be used to work with Windows Azure Storage. VALUE PROPProgrammatic access to the Blob, Queue, and Table services is available via the Windows Azure client libraries and the Windows Azure storage services REST API.Speaking Points:Windows Azure is an open cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters.You can build applications using any language, tool or framework.Notes:
  • Slide ObjectivesUnderstand TablesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesThe Table service provides structured storage in the form of tables. The Table service supports a REST API that is compliant with the ADO.NET Data Services REST API. Developers may also use the .NET Client Library for ADO.NET Data Services to access the Table service.NotesWithin a storage account, a developer may create named tables. Tables store data as entities. An entity is a collection of named properties and their values, similar to a row. Tables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.
  • Slide ObjectivesUnderstand Flexible EntitiesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTables store data as entities. A table can contain entities of any shapeThere is no fixed schemaThere is no schema checkingThere is no strong typing- not that Birthdate is stored as both a datetime value and as a stringNot that we can add additional columnsNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
  • Slide ObjectivesUnderstand the Windows Azure Storage scalability modelVALUE PROPWindows Azure Storage scales automatically to provide the best performanceSpeaker NotesFanout is automatic, handles by Windows AzureThe key here is “elasticity”. The ability to automatically scale based on load.Fanout is based on the load. Fanout isn’t immediate…Windows Azure will wait several seconds to ensure that the load is a true load and not just a temporary spikePartitioning is based on Partition Key – the choice of the partition key is criticalPartitions can be condensed when load increasesReads are load balanced against the three replicasNotes
  • Slide ObjectivesUnderstand the importance of Windows Azure Table scalability model and how Partition Key and Row Key are critical for table scalabilityVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTable entities represent the units of data stored in a table and are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is key/value pair defined by its name, value, and the value's data type. Entities must define the following three system properties as part of the property collection:PartitionKey – The PartitionKey property stores string values that identify the partition that an entity belongs to. This means that entities with the same PartitionKey values belong in the same partition. Partitions, as discussed later, are integral to the scalability of the table.RowKey – The RowKey property stores string values that uniquely identify entities within each partition.NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's primary key. The partition key may be a string value up to 1 KB in size.
  • Slide ObjectiveMore detail that Discusses horizontal partitioning in Windows Azure Table storageSpeaking notesUnderstanding the sequential nature of cross partition queries is importantContinuation tokens may be returned at any time (i.e. data comes back in multiple pages)You will always get a continuation token if you cross a hardware boundary- i.e. you move between partitions that sit on different nodesThe Storage API handles continuation tokens elegantly, but, it may mask a poor architecture- YOU DO NOT WANT TO RUN A QUERY THAT CROSSES HUNDRED OF SERVERS!Be aggressive with partitioning- if you’ll only ever query something by a single key use an empty Row key and a unique partition key for a partition of 1.Can also just use blob storage which is already partitioned by Blob nameNotesQueue storage is partitioned by Queue nameBlob storage is partitioned by Bob name (i.e. partition size of 1)http://www.syringe.net.nz/2009/08/08/SimplePartitioningWithWindowsAzureTableStorage.aspxhttp://nmackenzie.spaces.live.com/Blog/cns!B863FF075995D18A!417.entry Good article from Julie Lerman. Worth reading when discussing table storagehttp://msdn.microsoft.com/en-us/magazine/ff796231.aspx
  • Slide ObjectiveUnderstand why we need to partitionUnderstand the cloud specific driversSpeaking notesPartitioning is hardly a new topicDBAs have been partitioning databases for a long long timeTwo main reasons to partition Data volume.There are just too many bytes to fit.For example SQL Azure has a maximum DB size of 50GB. If you have more data than that then you’ll need to partitionWork loadEach partition can only handle so many transactions per secondIn Windows Azure tables for example partitioning is used to spread the request load over nodes in the storage systemThere are some new cloud focussed reasons tooCostDifferent types of storage have different costsArguably we’ve been doing cost driven partitioning on premise for some time too- for example partitioning a table across both expensive 15k RPM drives and cheaper 7200 RPM drivesIn the cloud the cost difference can be far more pronouncedThe cloud also provides a concept of elastic partitioningWhereas on premise a partition is often a separate server or separate disks with the related capital cost and lead timeA partition in the cloud can be created and destroyed in a matter of secondsThis presents the opportunity to create partitions just for a short period of time- say a period of peak loadNotes
  • Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
  • Slide ObjectiveDescribes Modulo partitioning Speaking notesThe module operator is very useful for partitioning exercisesThe important thing here is having a good distributionNoteshttp://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/985a3198-ba54-4dcc-932c-0e6bdb166a46
  • Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
  • Slide ObjectiveDescribes the challenge of managing partitions over timeSpeaking notesAs applications grow and change so may our partitioning needsHow do we deal with thisWhat happens if we need to re-partition our data?We will need to process it into a new partitioning schemeWe can also version our partitioning scheme such that our partition keys include an identifier to resolve the partition scheme to be usedIN the example above we’ll end up with 14 partitions- 4 for the v1 scheme, 10 for the v2 scheme Notes
  • Slide ObjectiveThe next few slides build on each otherRun through the worked exampleSpeaking notesSuppose we want to build a tweet search engineTwitter creates quite a bit of data; it’s well suited to storing in Windows Azure tablesIn SQL land we might start with a simple like query. This table scans every time…. We soon realize this is no goodNotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this
  • Next we’d probably pull the words out into a separate table, i.e. spit each tweet into separate wordsWe’d soon realize that we could collapse the Word table back into the index as we’d end up in a situation where the primary keys on the associative table were longer than the word itself- so we’re better to duplicate the word as rows in the word table
  • IN Windows Azure tables we take this one step further.We basically use worker roles to create indexes for usSo in the above example I canRetrieve all the Tweets made y a certain user by querying the Tweet table and including the user ID (there is a partition per user)Retrieve all the Tweets that contain a particular word by querying from the TweetIndex table and including the Word (there is a partition per word)
  • We may the choose to create a MentionIndex where the data is not partitioned by the person who wrote the tweet but rather by the person(s) who were mentioned in a tweet. If a tweet mentions 4 users it’ll appear 4 times in the MentionIndex table in four different partitions
  • Slide ObjectiveProvide some final notes on Tables data modeling Speaking notesThere are no secondary indexes so querying on any variable other than the Row key will result in a partition scan- keep partitions of manageable size for thisYou should ALWAYS include the partition key in your queries- build your data model top support thisIf you are building your own indexes then you can often include related data if it is small enough- Tweets are conveniently small for our example!NotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this
  • NoSQL em Windows Azure Table Storage - Vitor Tomaz

    1. 1. NoSQL em Windows Azure Table StorageVítor Tomazhttp://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013
    2. 2. Vítor TomazISEL – LEICSAFIRANetPontoAzurePTRevista ProgramarPortugal@ProgramarSQLPortMSDN
    3. 3. Agenda• Characteristics & Concepts• Service Architecture• Scalability Targets• Non-Relational Data Modeling• Best Practices
    4. 4. South Central USWest US East US
    5. 5. Table Details
    6. 6. Service Architecture
    7. 7. Extent Nodes (EN)Front End Layer FEIncoming Write RequestPartitionServerPartitionServerPartitionServerPartitionServerPartitionMasterFE FE FE FELockServiceAckPartition LayerStream Layer
    8. 8. http://tinyurl.com/ContToken
    9. 9. Scalability Targets
    10. 10. Scalability Targets -Storage AccountGeo RedundantLocally Redundant
    11. 11. Scalability Targets – Partition
    12. 12. Non-Relational Data Modeling
    13. 13. :
    14. 14. You’d soon realize that LIKE isn’t so wonderful.You’d do a little normalization
    15. 15. Entity Group Transactions
    16. 16. Best Practices
    17. 17. Common Design & ScalabilityAccess pattern lexically sorted byPartition Key values
    18. 18. Common Design & Scalability• Turn on analytics & take control of your investigations– Logging and Metrics• Who deleted my container? – Look at the client IP for delete container request• Why is my request latency increased? - Look at E2E vs. Server latency• What is my user demographics? – Use client request id to trace requests & client IP• How can I tune my service usage? – Use metrics to analyze API usage & peak trafficstats• And many more…• Use appropriate retry policy for intermittent errors• Storage client uses exponential retry by default
    19. 19. Storage Accounts
    20. 20. Storage Accounts
    21. 21. 0204060801001201401600510152025303540Storage Client 1.7 Storage Client 2.0 :DataServicesStorage Client 2.0 :ReflectionStorage Client 2.0 : NoReflectionTime(ms)Batch Stress Scenario Per Entity LatenciesDeleteQueryInsertProcessor Time (s)Test Duration (s)Faster NoSQL table accessUpto 72.06% reduction in execution timeUpto 31.92% reduction in processor timeUpto 69-90% reduction in latency
    22. 22. 05,00010,00015,00020,00025,00030,000Storage Client 1.7 Storage Client 2.0Time(s)Large Blob Scenario (256MB) ResourceUtilizationTotal Test Time (s)Total Processor Time (s)010203040506070Storage Client 1.7 Storage Client 2.0Time(s)Large Blob Scenario (256MB) LatenciesUploadDownloadFaster uploads and downloads31.46% reduction in processor timeUpto 22.07% reduction in latency
    23. 23. http://blogs.msdn.com/b/windowsazurestorage/https://www.windowsazure.com/en-us/develop/overview/https://www.windowsazure.com/en-us/pricing/details
    24. 24. Questões?
    25. 25. Próximas reuniões presenciais23/03/2013 – Março (Lisboa)20/04/2013 – Abril (Lisboa)22/06/2013 – Junho (Lisboa)??/??/2013 – ? (Porto)??/??/2013 – ? (Coimbra)Reserva estes dias na agenda! :)
    26. 26. Patrocinador “GOLD”Twitter: @PTMicrosoft http://www.microsoft.com/portugal
    27. 27. Patrocinadores “Silver”
    28. 28. Patrocinadores “Bronze”
    29. 29. Obrigado!Vítor Tomazvitorbstomaz AT gmail.comhttp://twitter.com/vitortomaz

    ×