NoSQL em Windows Azure Table Storage
Vítor Tomaz
http://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013
Vítor Tomaz
ISEL – LEIC
SAFIRA
NetPonto
AzurePT
Revista Programar
Portugal@Programar
SQLPort
MSDN
Agenda
• Characteristics & Concepts
• Service Architecture
• Scalability Targets
• Non-Relational Data Modeling
• Best Pra...
South Central US
West US East US
Table Details
Service Architecture
Extent Nodes (EN)
Front End Layer FE
Incoming Write Request
Partition
Server
Partition
Server
Partition
Server
Partition
S...
http://tinyurl.com/ContToken
Scalability Targets
Scalability Targets -Storage Account
Geo Redundant
Locally Redundant
Scalability Targets – Partition
Non-Relational Data Modeling
:
You’d soon realize that LIKE isn’t so wonderful.
You’d do a little normalization
Entity Group Transactions
Best Practices
Common Design & Scalability
Access pattern lexically sorted by
Partition Key values
Common Design & Scalability
• Turn on analytics & take control of your investigations– Logging and Metrics
• Who deleted m...
Storage Accounts
Storage Accounts
0
20
40
60
80
100
120
140
160
0
5
10
15
20
25
30
35
40
Storage Client 1.7 Storage Client 2.0 :
DataServices
Storage Client...
0
5,000
10,000
15,000
20,000
25,000
30,000
Storage Client 1.7 Storage Client 2.0
Time(s)
Large Blob Scenario (256MB) Resou...
http://blogs.msdn.com/b/windowsazurestorage/
https://www.windowsazure.com/en-us
/develop/overview/
https://www.windowsazur...
Questões?
Avaliação das sessões de hoje
http://bit.ly/netponto-aval-37
* Para quem não puder preencher durante a reunião,
iremos env...
Próximas reuniões presenciais
23/03/2013 – Março (Lisboa)
20/04/2013 – Abril (Lisboa)
22/06/2013 – Junho (Lisboa)
??/??/20...
Patrocinador “GOLD”
Twitter: @PTMicrosoft http://www.microsoft.com/portugal
Patrocinadores “Silver”
Patrocinadores “Bronze”
Obrigado!
Vítor Tomaz
vitorbstomaz AT gmail.com
http://twitter.com/vitortomaz
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
[NetPonto] NoSQL em Windows Azure Table Storage
Upcoming SlideShare
Loading in …5
×

[NetPonto] NoSQL em Windows Azure Table Storage

130
-1

Published on

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
130
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Slide Objectives:Explain the different Storage Libraries and languages that can be used to work with Windows Azure Storage. VALUE PROPProgrammatic access to the Blob, Queue, and Table services is available via the Windows Azure client libraries and the Windows Azure storage services REST API.Speaking Points:Windows Azure is an open cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters.You can build applications using any language, tool or framework.Notes:
  • Slide ObjectivesUnderstand TablesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesThe Table service provides structured storage in the form of tables. The Table service supports a REST API that is compliant with the ADO.NET Data Services REST API. Developers may also use the .NET Client Library for ADO.NET Data Services to access the Table service.NotesWithin a storage account, a developer may create named tables. Tables store data as entities. An entity is a collection of named properties and their values, similar to a row. Tables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.
  • Slide ObjectivesUnderstand Flexible EntitiesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTables store data as entities. A table can contain entities of any shapeThere is no fixed schemaThere is no schema checkingThere is no strong typing- not that Birthdate is stored as both a datetime value and as a stringNot that we can add additional columnsNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
  • Slide ObjectivesUnderstand the Windows Azure Storage scalability modelVALUE PROPWindows Azure Storage scales automatically to provide the best performanceSpeaker NotesFanout is automatic, handles by Windows AzureThe key here is “elasticity”. The ability to automatically scale based on load.Fanout is based on the load. Fanout isn’t immediate…Windows Azure will wait several seconds to ensure that the load is a true load and not just a temporary spikePartitioning is based on Partition Key – the choice of the partition key is criticalPartitions can be condensed when load increasesReads are load balanced against the three replicasNotes
  • Slide ObjectivesUnderstand the importance of Windows Azure Table scalability model and how Partition Key and Row Key are critical for table scalabilityVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTable entities represent the units of data stored in a table and are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is key/value pair defined by its name, value, and the value's data type. Entities must define the following three system properties as part of the property collection:PartitionKey – The PartitionKey property stores string values that identify the partition that an entity belongs to. This means that entities with the same PartitionKey values belong in the same partition. Partitions, as discussed later, are integral to the scalability of the table.RowKey – The RowKey property stores string values that uniquely identify entities within each partition.NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's primary key. The partition key may be a string value up to 1 KB in size.
  • Slide ObjectiveMore detail that Discusses horizontal partitioning in Windows Azure Table storageSpeaking notesUnderstanding the sequential nature of cross partition queries is importantContinuation tokens may be returned at any time (i.e. data comes back in multiple pages)You will always get a continuation token if you cross a hardware boundary- i.e. you move between partitions that sit on different nodesThe Storage API handles continuation tokens elegantly, but, it may mask a poor architecture- YOU DO NOT WANT TO RUN A QUERY THAT CROSSES HUNDRED OF SERVERS!Be aggressive with partitioning- if you’ll only ever query something by a single key use an empty Row key and a unique partition key for a partition of 1.Can also just use blob storage which is already partitioned by Blob nameNotesQueue storage is partitioned by Queue nameBlob storage is partitioned by Bob name (i.e. partition size of 1)http://www.syringe.net.nz/2009/08/08/SimplePartitioningWithWindowsAzureTableStorage.aspxhttp://nmackenzie.spaces.live.com/Blog/cns!B863FF075995D18A!417.entry Good article from Julie Lerman. Worth reading when discussing table storagehttp://msdn.microsoft.com/en-us/magazine/ff796231.aspx
  • Slide ObjectiveUnderstand why we need to partitionUnderstand the cloud specific driversSpeaking notesPartitioning is hardly a new topicDBAs have been partitioning databases for a long long timeTwo main reasons to partition Data volume.There are just too many bytes to fit.For example SQL Azure has a maximum DB size of 50GB. If you have more data than that then you’ll need to partitionWork loadEach partition can only handle so many transactions per secondIn Windows Azure tables for example partitioning is used to spread the request load over nodes in the storage systemThere are some new cloud focussed reasons tooCostDifferent types of storage have different costsArguably we’ve been doing cost driven partitioning on premise for some time too- for example partitioning a table across both expensive 15k RPM drives and cheaper 7200 RPM drivesIn the cloud the cost difference can be far more pronouncedThe cloud also provides a concept of elastic partitioningWhereas on premise a partition is often a separate server or separate disks with the related capital cost and lead timeA partition in the cloud can be created and destroyed in a matter of secondsThis presents the opportunity to create partitions just for a short period of time- say a period of peak loadNotes
  • Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
  • Slide ObjectiveDescribes Modulo partitioning Speaking notesThe module operator is very useful for partitioning exercisesThe important thing here is having a good distributionNoteshttp://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/985a3198-ba54-4dcc-932c-0e6bdb166a46
  • Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
  • Slide ObjectiveDescribes the challenge of managing partitions over timeSpeaking notesAs applications grow and change so may our partitioning needsHow do we deal with thisWhat happens if we need to re-partition our data?We will need to process it into a new partitioning schemeWe can also version our partitioning scheme such that our partition keys include an identifier to resolve the partition scheme to be usedIN the example above we’ll end up with 14 partitions- 4 for the v1 scheme, 10 for the v2 scheme Notes
  • Slide ObjectiveThe next few slides build on each otherRun through the worked exampleSpeaking notesSuppose we want to build a tweet search engineTwitter creates quite a bit of data; it’s well suited to storing in Windows Azure tablesIn SQL land we might start with a simple like query. This table scans every time…. We soon realize this is no goodNotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this
  • Next we’d probably pull the words out into a separate table, i.e. spit each tweet into separate wordsWe’d soon realize that we could collapse the Word table back into the index as we’d end up in a situation where the primary keys on the associative table were longer than the word itself- so we’re better to duplicate the word as rows in the word table
  • IN Windows Azure tables we take this one step further.We basically use worker roles to create indexes for usSo in the above example I canRetrieve all the Tweets made y a certain user by querying the Tweet table and including the user ID (there is a partition per user)Retrieve all the Tweets that contain a particular word by querying from the TweetIndex table and including the Word (there is a partition per word)
  • We may the choose to create a MentionIndex where the data is not partitioned by the person who wrote the tweet but rather by the person(s) who were mentioned in a tweet. If a tweet mentions 4 users it’ll appear 4 times in the MentionIndex table in four different partitions
  • Slide ObjectiveProvide some final notes on Tables data modeling Speaking notesThere are no secondary indexes so querying on any variable other than the Row key will result in a partition scan- keep partitions of manageable size for thisYou should ALWAYS include the partition key in your queries- build your data model top support thisIf you are building your own indexes then you can often include related data if it is small enough- Tweets are conveniently small for our example!NotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this
  • Para quem puder ir preenchendo, assim não chateio mais logo É importante para recebermos nós feedback, e para darmos feedback aos nossos oradores
  • [NetPonto] NoSQL em Windows Azure Table Storage

    1. 1. NoSQL em Windows Azure Table Storage Vítor Tomaz http://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013
    2. 2. Vítor Tomaz ISEL – LEIC SAFIRA NetPonto AzurePT Revista Programar Portugal@Programar SQLPort MSDN
    3. 3. Agenda • Characteristics & Concepts • Service Architecture • Scalability Targets • Non-Relational Data Modeling • Best Practices
    4. 4. South Central US West US East US
    5. 5. Table Details
    6. 6. Service Architecture
    7. 7. Extent Nodes (EN) Front End Layer FE Incoming Write Request Partition Server Partition Server Partition Server Partition Server Partition Master FE FE FE FE Lock Service Ack Partition Layer Stream Layer
    8. 8. http://tinyurl.com/ContToken
    9. 9. Scalability Targets
    10. 10. Scalability Targets -Storage Account Geo Redundant Locally Redundant
    11. 11. Scalability Targets – Partition
    12. 12. Non-Relational Data Modeling
    13. 13. :
    14. 14. You’d soon realize that LIKE isn’t so wonderful. You’d do a little normalization
    15. 15. Entity Group Transactions
    16. 16. Best Practices
    17. 17. Common Design & Scalability Access pattern lexically sorted by Partition Key values
    18. 18. Common Design & Scalability • Turn on analytics & take control of your investigations– Logging and Metrics • Who deleted my container? – Look at the client IP for delete container request • Why is my request latency increased? - Look at E2E vs. Server latency • What is my user demographics? – Use client request id to trace requests & client IP • How can I tune my service usage? – Use metrics to analyze API usage & peak traffic stats • And many more… • Use appropriate retry policy for intermittent errors • Storage client uses exponential retry by default
    19. 19. Storage Accounts
    20. 20. Storage Accounts
    21. 21. 0 20 40 60 80 100 120 140 160 0 5 10 15 20 25 30 35 40 Storage Client 1.7 Storage Client 2.0 : DataServices Storage Client 2.0 : Reflection Storage Client 2.0 : No Reflection Time(ms) Batch Stress Scenario Per Entity Latencies Delete Query Insert Processor Time (s) Test Duration (s) Faster NoSQL table access Upto 72.06% reduction in execution time Upto 31.92% reduction in processor time Upto 69-90% reduction in latency
    22. 22. 0 5,000 10,000 15,000 20,000 25,000 30,000 Storage Client 1.7 Storage Client 2.0 Time(s) Large Blob Scenario (256MB) Resource Utilization Total Test Time (s) Total Processor Time (s) 0 10 20 30 40 50 60 70 Storage Client 1.7 Storage Client 2.0 Time(s) Large Blob Scenario (256MB) Latencies Upload Download Faster uploads and downloads 31.46% reduction in processor time Upto 22.07% reduction in latency
    23. 23. http://blogs.msdn.com/b/windowsazurestorage/ https://www.windowsazure.com/en-us /develop/overview/ https://www.windowsazure.com/en-us /pricing/details
    24. 24. Questões?
    25. 25. Avaliação das sessões de hoje http://bit.ly/netponto-aval-37 * Para quem não puder preencher durante a reunião, iremos enviar um email com o link à tarde
    26. 26. Próximas reuniões presenciais 23/03/2013 – Março (Lisboa) 20/04/2013 – Abril (Lisboa) 22/06/2013 – Junho (Lisboa) ??/??/2013 – ? (Porto) ??/??/2013 – ? (Coimbra) Reserva estes dias na agenda! :)
    27. 27. Patrocinador “GOLD” Twitter: @PTMicrosoft http://www.microsoft.com/portugal
    28. 28. Patrocinadores “Silver”
    29. 29. Patrocinadores “Bronze”
    30. 30. Obrigado! Vítor Tomaz vitorbstomaz AT gmail.com http://twitter.com/vitortomaz

    ×