Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NoSQL Database in Azure for IoT and Business

901 views

Published on

IoT and Business don't depend on data, but on processes.
So choosing a relational Db is not always the correct choice. In an IoT scenario, is better finding a data solution to store data with more performance: NoSQL databases. We'll see DocumentDb, the NoSql Db from Microsoft in Azure. But there are also other alternatives!

Published in: Software
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • HOW TO UNLOCK HER LEGS! (SNEAK PEAK), learn more... ♣♣♣ http://ishbv.com/unlockher/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ➤➤ How Long Does She Want You to Last? Here's the link to the FREE report ♣♣♣ https://tinyurl.com/rockhardxxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

NoSQL Database in Azure for IoT and Business

  1. 1. IoT day 2015 NoSQL in Azure per l’IoT (e il Business) Marco Parenzan Microsoft Azure MVP @marco_parenzan marco [dot] parenzan [at] 1nn0va [dot] it
  2. 2. IoT day 2015 Sponsor
  3. 3. IoT day 2015 Speaker info/Marco Parenzan  www.slideshare.net/marco.parenzan  www.github.com/marcoparenzan  marco [dot] parenzan [at] 1nn0va [dot] it  www.1nnova.it  @marco_parenzan Formazione ,Divulgazione e Consulenza con 1nn0va Microsoft MVP 2014 for Microsoft Azure Cloud Architect, NET developer Loves Functional Programming, Html5 Game Programming and Internet of Things Microservices Saturday 2015: un viaggio con NServiceBus LI VE AZURE COMMUNITY BOOTCAMP 2015
  4. 4. IoT as an hobby (now…?)
  5. 5. IoT day 2015 Data Ecosystem Where do I put data received in EventHub?
  6. 6. From private to public Cloud A Continuous offering Microsoft Relational Storage Options
  7. 7. IoT day 2015 SQL Server database technology “as a Service” Fully managed database-as-a-service built on SQL with near zero administration Enterprise-ready with automatic support for HA, DR, Backups, replication and more Highly available and elastically scalable for unpredictable SaaS workloads Uptime SLA of 99.99% Predictable performance & Pricing Built-in regional database geo-replication for additional protection All core search capabilities - faceting, suggestions, geospatial Secure and compliant for your sensitive data Fully compatible with SQL Server 2014 databases SQL Azure features
  8. 8. StreamingRelational Internal & external  Non- relational NoSQL MobileReports Natural language queryDashboardsApplications Orchestration Machine learningModeling Information management Complex event processing Data The Microsoft data platform
  9. 9. The traditional world
  10. 10. IoT day 2015 Business, no longer data, is the foundation of software design DDD!=OOP Don’t start from Data Data are not unique No more ACID…ACID transactions are not useful with a distributed model over different storages Paradigm Shift
  11. 11. IoT day 2015 How many queries can be determined at level analysis? “A repository should offer an explicit and well defined contract and avoid arbitrary query” In business … don’t‘ delete anything (Repository doesn’t delete anything) From theory to practice
  12. 12. Classic MVC Business Logic Contract BL/P View Controller
  13. 13. CQRS (Service Bus powered) Event Handler UI EventCommand Handler Queue Topics/Subscription
  14. 14. CQRS for IoT (Service Bus Powered) Event Handler UI Event Command Handler Event Device Queue Topics/Subscription Event Hub Write Model Read /Search Model
  15. 15. IoT day 2015 No longer build on data…but on “what happens” No more one single data store Data store typess Logs Persistence Saga (long transactions) Search Event-based systems
  16. 16. The Big Picture A modern view:
  17. 17. The traditional world in Azure
  18. 18. Why Use a NoSQL Technology on Azure?
  19. 19. Choosing a Data Technology
  20. 20. IoT day 2015 Db for what? To store data? To manipulate data? Long-term theme
  21. 21. IoT day 2015 NoSql Introduction
  22. 22. IoT day 2015 Key/Value Table Blob Queue Graph Document Not Only Sql Paradigms
  23. 23. What is a document database? Definitely NOT this kind of document !
  24. 24. What is a document database? Not ideal, but it can work - { "id": "13244_post", "text": "Lorizzle ghetto dolor tellivizzle boofron, stuff pimpin' elizzle. Nullam sapizzle velizzle, my shizz tellivizzle, suscipizzle funky fresh, shizzle my nizzle crocodizzle vizzle, arcu. Pellentesque eget tortizzle. Sizzle erizzle. Mammasay mammasa mamma oo sa break it down dolor own yo' things fo shizzle mah nizzle fo rizzle, mah home g-dizzle sure. Maurizzle pellentesque dawg ghetto turpizzle. Shiz izzle my shizz. Pellentesque eleifend rhoncizzle nisi. In its fo rizzle owned ma nizzle dictumst. Sizzle gangsta. Curabitur tellizzle urna, pretizzle go to hizzle, mattizzle izzle, eleifend vitae, tellivizzle. Dawg shizzlin dizzle. Integer semper velit sizzle stuff. Boofron mofo auctizzle ma nizzle. Pot a elizzle ut nibh pretium tincidunt. Maecenizzle things erat. Own yo' in lacizzle sed maurizzle elementizzle tristique. I'm in the shizzle yippiyo sizzle daahng dawg eros ultricizzle . In velit tortor, ultricizzle ghetto, hendrerizzle fo shizzle mah nizzle fo rizzle, mah home g-dizzle, adipiscing crunk, boom shackalack. Etizzle velit doggy, hizzle consequizzle, pharetra get down get down, dictizzle sed, shut the shizzle up. Fo shizzle neque. Fo lorizzle. Bling bling vitae pizzle ut libero commodo gizzle. Fusce izzle augue eu yo mamma dang. Phasellizzle break it down fo nizzle erat. Suspendisse shizzlin dizzle owned, sollicitudin sizzle, mah nizzle izzle, commodo nec, justo. Donizzle fizzle porttitizzle ligula. Nunc feugizzle, tellus tellivizzle ornare tempor, sapizzle break it down tincidunt gangster, eget dapibus daahng dawg enizzle izzle that's the shizzle. Stuff quizzle leo, imperdizzle izzle, fo shizzle my nizzle izzle, semper izzle, sapien. Ut boofron magna vizzle ghetto. I'm in the shizzle ante bling bling, suscipizzle vitae, yo mamma stuff, rutrizzle pizzle, velizzle. Mauris da bomb go to zzle. Sizzle mammasay mammasa mamma oo sa magna own yo' amet risus congue. Boofron mofo auctizzle ma nizzle. Pot a elizzle ut nibh pretium tincidunt. things erat. Own yo' in lacizzle sed maurizzle elementizzle tristique. I'm in the shizzle yippiyo sizzle daahng dawg eros ultricizzle . In velit tortor, ultricizzle ghetto, hendrerizzle fo shizzle mah nizzle fo rizzle, mah home g-dizzle, adipiscing crunk, boom shackalack. Etizzle velit doggy, hizzle consequizzle, pharetra get down get down, dictizzle sed, shut the shizzle up. Fo shizzle neque. Fo lorizzle. Bling " }
  25. 25. What is a document database? Ideally suited to this kind of document - { "id": "13244_user", "firstName": "John", "lastName": "Smith", "age": 25, "employmentHistory" : [ { "company":"Contoso Inc" "start": {"date":"Thu, 02 Apr 2015 20:54:45 GMT", "epoch":1428008086}, "position":"CEO" }, { "start": {"date":"Thu, 02 Apr 2012 20:54:45 GMT", "epoch":1428008086}, "end": {"date":"Thu, 01 Apr 2015 20:54:45 GMT", "epoch":1428008086}, "position":"GM"}, ], "address": { "streetAddress": "21 2nd Str", "city": "New York", "state": "NY", "postalCode": "10021" }, "children": [ {"name":"Megan", "age":10}, {"name": "Bruce", "age":7}, {"name": "Angus", "sports" : ["football", "basketball", "hockey"]} ] "mobileNumber": "212 555-1234" }
  26. 26. IoT day 2015 JSON can represent complex containment relationships that are difficult to represent in RDBMS Schema-less – great for growing requirements during dev unlike RDBMS where you must know the structure up front and its painful to modify it Native notation for JavaScript Why JSON?
  27. 27. IoT day 2015 try to treat your entities as self-contained documents represented in JSON When working with relational databases, we've been taught for years to normalize, normalize, normalize. There are contains relationships between entities. There are one-to-few relationships between entities. There is embedded data that changes infrequently. There is embedded data won't grow without bound. There is embedded data that is integral to data in a document. Embedding better read performance
  28. 28. IoT day 2015 Representing one-to-many relationships. Representing many-to-many relationships. Related data changes frequently. Referenced data could be unbounded Provides more flexibility than embedding More round trips to read data Referencing Normalizing typically provides better write performance
  29. 29. • No magic bullet Think about how your data is going to be written, read and model accordingly Hybrid models ~ denormalize + reference + aggregate { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] }
  30. 30. IoT day 2015 Promote code first development (mapping objects to json) Resilient to iterative schema changes Richer query and indexing (compared to KV stores) Low impedance as object / JSON store; no ORM required It just works It’s fast Developer Appeal
  31. 31. IoT day 2015 DocumentDb Introduction
  32. 32. IoT day 2015 Store schema-less JSON documents Excels at search w/ SQL syntax JavaScript for Stored Procs, Triggers and UDFs Elastic capacity (not in specific Azure sense, up to now) Multi-document transaction (Batch) Tweak everything (read/write performance vs. consistency, index performance, security) Designed for massive scale What is DocumentDb?
  33. 33. IoT day 2015 Applications that need managed elastic scale Customer does not want to add additional IT resources for support and maintenance Avoiding CAPEX and OPEX Built-for-the-cloud database technology Access via RESTful HTTP API or client library DocumentDB: DbaaS
  34. 34. IoT day 2015 Catalog data Preferences and state Event store User generated content Data exchange Typical usage
  35. 35. IoT day 2015 Resource Model
  36. 36. Database Account JS JS JS 101 010
  37. 37. Database JS JS JS 101 010
  38. 38. Collections JS JS JS 101 010 * collection != table of homogenous entities collection ~ a data partition
  39. 39. Documents JS JS JS 101 010 { "id" : "123" "name" : "joe" "age" : 30 "address" : { "street" : "some st" } }
  40. 40. Users, Server Scripts, Attachments JS JS JS 101 010
  41. 41. IoT day 2015 Collections
  42. 42. IoT day 2015 a container of JSON documents and the associated JavaScript application logic JSON docs inside of a collection can vary dramatically A unit of scale for transaction and query throughput (capacity units allocated uniformly across all collections) A unit of scale for capacity A unit of replication What is a collection?
  43. 43. IoT day 2015 Collections in DocumentDB are not just logical containers, but also physical containers They are the transaction boundary for stored procedures and triggers entry point to queries and CRUD operations Each collection is assigned a reserved amount of throughput which is not shared with other collections in the same account Collections do not enforce schema Collections
  44. 44. IoT day 2015 Partitioning
  45. 45. Design: Partitioning Why Partition? • Data Size A single collection (currently*) holds 10GB • Throughput 3 Performance tiers with a max of 2,500 RU/sec
  46. 46. IoT day 2015 In hash partitioning, partitions are assigned based on the value of a hash function, allowing you to evenly distribute requests and data across a number of partitions. This is commonly used to partition data produced or consumed from a large number of distinct clients, and is useful for storing user profiles, catalog items, and IoT ("Internet of Things") telemetry data. Hash Partitioning
  47. 47. IoT day 2015 In range partitioning, partitions are assigned based on whether the partition key is within a certain range This is commonly used for partitioning with time stamp properties Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive Range partitioning
  48. 48. IoT day 2015 In lookup partitioning, partitions are assigned based on a lookup map that assigns discrete partition values to specific partitions a.k.a. a partition or shard map This is commonly used for partitioning by region Lookup partitioning Tenant Partition Id Customer 1 Big Customer 2 Another 3
  49. 49. { record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 } }, { record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 } } , { record: "123", created: { "date": "8/17/2013" "epoch": 1376779786 } } SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986 { record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 } }, { record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 } } { record: "43233", created: { "epoch": 1411512586 } } , { record: "1123", created: { "date": "8/17/2013" "epoch": 1376779786 } }, { record: "43234", created: { "epoch": 1376779786 } Partitioning - Fan-out Queries
  50. 50. IoT day 2015 Consistency
  51. 51. IoT day 2015 Query / transaction throughput (and reliability – i.e., hardware failure) depend on replication! All writes to the primary are replicated across two secondary replicas All reads are distributed across three copies “Scalability of throughput” – allowing different clients to read from different replicas helps prevent bottlenecks BUT replication takes time! Potential scenario: some clients are reading while another is writing Now, the data is out-of-date, inconsistent! Why worry about consistency?
  52. 52. IoT day 2015 Trade-off: speed (performance & availability) or consistency (data correctness)? “Does every read need the MOST current data?” “Or do I need every request to be handled and handled quickly?” No “one size fits all” answer … so it’s up to you! 4 options … For the entire Db… …In a future release, we intend to support overriding the default consistency level on a per collection basis. Tweakable Consistency
  53. 53. IoT day 2015 client always sees completely consistent data Slowest reads / writes Mission critical: e.x. stock market, banking, airline reservation Strong
  54. 54. IoT day 2015 Default – even trade-off between performance & availability vs. data correctness client reads its own writes, but other clients reading this same data might see older values Session
  55. 55. IoT day 2015 client might see old data, but it can specify a limit for how old that data can be (ex. 2 seconds) Updates happen in order received similar to Session consistency, but speeds up reads while still preserving the order of updates Bounded Staleness
  56. 56. IoT day 2015 client might see old data for as long as it takes a write to propagate to all replicas High performance & availability, but a client might sometimes read out-of-date information or see updates out of order Eventual
  57. 57. IoT day 2015 At the database level (see preview portal) On a per-read or per-query basis (optional parameter on CreateDocumentQuery method) Setting Consistency
  58. 58. IoT day 2015 Use Weaker Consistency Levels for better Read latencies • IoT • Data Analysis http://azure.microsoft.com/blog/2015/01/27/performance-tips- for-azure-documentdb-part-2/ Consistency Tips
  59. 59. IoT day 2015 Indexing
  60. 60. IoT day 2015 Efficient, rich hierarchical and relational queries without any schema or index definitions. Consistent query results while handling a sustained volume of writes. For high write throughput workloads with consistent queries, the index is updated incrementally, efficiently, and online while handling a sustained volume of writes. Storage efficiency. For cost effectiveness, the on-disk storage overhead of the index is bounded and predictable. Indexing
  61. 61. var collection = new DocumentCollection { Id = "lazyCollection" }; collection.IndexingPolicy.IndexingMode = IndexingMode.Lazy; client.CreateDocumentCollectionAsync(databaseLink, collection); Indexing modes Consistent Default mode Index updated synchronously on writes Lazy Useful for bulk ingestion scenarios Indexing policies Automatic Default Manual Can choose to index documents via RequestOptions Can read non-indexed documents via selflink Indexing – Modes and policies Set indexing mode Set indexing policy var collection = new DocumentCollection { Id = "manualCollection" }; collection.IndexingPolicy.Automatic = false; client.CreateDocumentCollectionAsync(databaseLink, collection);
  62. 62. Setting paths, types, and precision var collection = new DocumentCollection { Id = "Orders" }; collection.IndexingPolicy.ExcludedPaths.Add("/"metaData"/*"); collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath { IndexType = IndexType.Hash, Path = "/", }); collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath { IndexType = IndexType.Range, Path = @"/""shippedTimestamp""/?", NumericPrecision = 7 }); client.CreateDocumentCollectionAsync(databaseLink, collection); Index paths Include and/or Exclude paths Index types Hash Supported for strings and numbers Optimized for equality matches Range Supported for numbers Optimized for comparison queries Index precision String precision Default is 3 Numeric precision Default is 3 Increase for larger number fields Indexing – Paths and types
  63. 63. IoT day 2015 Use lazy indexing for faster peak time ingestion rates Exclude unused paths from indexing for faster writes Specify range index path type for all paths used in range queries Vary index precision for write vs query performance vs storage tradeoffs http://azure.microsoft.com/blog/2015/01/27/performance-tips- for-azure-documentdb-part-2/ Indexing tips
  64. 64. IoT day 2015 Querying
  65. 65. IoT day 2015 Optimize for queries with small result sets for scalability Limit use of scans (no range index, NOT, UDFs in WHERE) Use page size (MaxItemCount) and continuation tokens For large result sets, use a larger page size (1000) Querying
  66. 66. Query over heterogeneous documents without defining schema or managing indexes  Query arbitrary paths, properties and values without specifying secondary indexes or indexing hints  Execute queries with consistent results  Supported SQL features; predicates, iterations (arrays), sub-queries, logical operators, UDFs, intra-document JOINs, JSON transforms  In general, more predicates result in a larger request charge.  Additional predicates can help if they result in narrowing the overall result set. from book in client.CreateDocumentQuery<Book>(collectionSelfLink) where book.Title == "War and Peace" select book; from book in client.CreateDocumentQuery<Book>(collectionSelfLink) where book.Author.Name == "Leo Tolstoy" select book.Author; -- Nested lookup against index SELECT B.Author FROM Books B WHERE B.Author.Name = "Leo Tolstoy" -- Transformation, Filters, Array access SELECT { Name: B.Title, Author: B.Author.Name } FROM Books B WHERE B.Price > 10 AND B.Language[0] = "English" -- Joins, User Defined Functions (UDF) SELECT udf.CalculateRegionalTax(B.Price, "USA", "WA") FROM Books B JOIN L IN B.Languages WHERE L.Language = "Russian" LINQ Query SQL Query Grammar Query
  67. 67. IoT day 2015 Programmability
  68. 68. function region(doc) { switch (doc.Location.Region) { case 0: return "North"; case 1: return "Middle"; case 2: return "South"; } } The complexity of a query impacts the request units consumed for an operation: Use of user-defined functions (UDFs) SELECT or WHERE clauses To take advantage of indexing, try and have at least one filter against an indexed property when leveraging a UDF in the WHERE clause . Query with user-defined function
  69. 69. function count(filterQuery, continuationToken) { var collection = getContext().getCollection(); var maxResult = 25; // MAX number of docs to process in one batch, when reached, return to client/request continuation. // intentionally set low to demonstrate the concept. This can be much higher. Try experimenting. // We've had it in to the high thousands before seeing the stored proceudre timing out. // The number of documents counted. var result = 0; tryQuery(continuationToken); } Execute “explicit” Javascript code on collection Executing Stored Procedures
  70. 70. function normalize() { var collection = getContext().getCollection(); var collectionLink = collection.getSelfLink(); var doc = getContext().getRequest().getBody(); var newDoc = { "Sensor": { "Id": doc.sensorId, "Class": 0 }, "Degree": { "Value": doc.degreeValue, "Type": 0 }, "Location": { "Name": doc.locationName, "Region": doc.locationRegion, "Longitude": doc.locationLong, "Latitude": doc.locationLat }, "id": doc.id }; // Update the request -- this is what is going to be inserted. getContext().getRequest().setBody(newDoc); } Execute “implicit” Javascript code on CRUD operations (Insert, Update, Delete) on collections Triggers!
  71. 71. IoT day 2015 Performances
  72. 72. IoT day 2015 Data is saved on SSD All writes to the primary are replicated across two secondary replicas (Replicas are spread on different hardware in same region to protect against failures) All reads are distributed across the three copies (when and how depend on consistency level for db account and query) DocumentDb Performance
  73. 73. IoT day 2015 Measure and Tune for lower request units/second usage DocumentDB offers a rich set of database operations including relational and hierarchical queries with UDFs, stored procedures and triggers – all operating on the documents within a database collection. The cost associated with each of these operations will vary based on the CPU, IO and memory required to complete the operation. Instead of thinking about and managing hardware resources, you can think of a request unit (RU) as a single measure for the resources required to perform various database operations and service an application request. Handle Server throttles/request rate too large When a client attempts to exceed the reserved throughput for an account, there will be no performance degradation at the server and no use of throughput capacity beyond the reserved level. The server will preemptively end the request with RequestRateTooLarge (HTTP status code 429) and return the x-ms-retry-after-ms header indicating the amount of time, in milliseconds, that the user must wait before reattempting the request. Delete empty collections to utilize all provisioned throughput Every document collection created in a DocumentDB account is allocated reserved throughput capacity based on the number of Capacity Units (CUs) provisioned, and the number of collections created. A single CU makes available 2,000 request units (RUs) and supports up to 3 collections Design for smaller documents for higher throughput The Request Charge (i.e. request processing cost) of a given operation is directly correlated to the size of the document http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/ Performance Tips
  74. 74. IoT day 2015 Considerations
  75. 75. IoT day 2015 User generated content Many specific data (varbinary(MAX) in SQL) Catalog data Log data User preferences data Device sensor data IoT use cases commonly share some patterns in how they ingest, process and store data. First, these systems allow for data intake that can ingest bursts of data from device sensors of various locales. Next, these systems process and analyze streaming data to derive real time insights. And last but not least, most if not all data will eventually land in a data store for adhoc querying and offline analytics. Usage: what is DocumentDb for?
  76. 76. IoT day 2015 Maturity: Balancing embedding (ok) and relating (limits) Searching and Denormalizing Opportunity Storing transient Data Better Opportunities Storing Files Append Only (Table) Storage Limits from DocumentDb
  77. 77. IoT day 2015 Logs Attachments Transient Data Search Alternatives for some scenarios
  78. 78. IoT day 2015 Targeted at streaming workloads (E.g. files read from beginning to end like media files) Each blob consists of a sequence of blocks Each block is identified by a Block ID Each block can be a maximum of 64 MB in size Size limit 200GB per blob Azure Storage Blob: Block Blob Block Blob:
  79. 79. IoT day 2015 Targeted at random read/write workloads (E.g. backing storage for the VHDs used in Azure VMs) Each blob consists of an array of pages Each page is identified by its offset from the start of the blob Size limit 1TB per blob Azure Storage Blob: Page Blob
  80. 80. IoT day 2015 Not an RDBMS Table! The mental picture is ‘Entities’ Entity can have up to 255 properties Up to 1MB per entity Partitioning PartitionKey & RowKey are mandatory properties Composite key which uniquely identifies an entity They are the only indexed properties Defines the sort order Purpose of the PartitionKey: Entity Locality Entities in the same partition will be stored together Efficient querying and cache locality Entity Group Transactions Target throughput – 500 tps/partition, several thousand tps/account Microsoft Azure monitors the usage patterns of partitions Automatically load balance partitions Each partition can be served by a different storage node Scale to meet the traffic needs of your table Supports full manipulation (CRUD) Table Scalability Azure Table Storage Details
  81. 81. IoT day 2015 Embed a sophisticated search experience into web and mobile applications without having to worry about the complexities of full-text search and without having to deploy, maintain or manage any infrastructure. Perfect for enterprise cloud developers, cloud software vendors, cloud architects who need a fully-managed search solution. Search is a natural backend for Cortana Take a bunch of words  apply linguistics  return relevant results Azure Search
  82. 82. IoT day 2015 “Search service” Scope for capacity Bound to a region Has keys, indexes, indexers, data sources Provisioning Azure Portal Azure resource management API Elastic scale Capacity can be changed dynamically Replicas ~ more QPS, HA Partitions ~ more documents, write throughput Azure Search Service
  83. 83. IoT day 2015 Simple HTTP/JSON API for creating indexes, pushing documents, searching Keyword search with user-friendly operators (+, -, *, “”, etc.) Hit highlighting Faceting (histograms over ranges, typically used in catalog browsing) Based on ElasticSearch Search Functionality
  84. 84. IoT day 2015 Linguistics are key in search Support for 50 languages Word breaking, stop words, inflections Lucene analyzers Well-known analyzer stack Stemming Microsoft analyzers Same NLP stack used by parts of Office, Bing Lematization in many languages Linguistics
  85. 85. IoT day 2015 Suggestions (auto-complete) Rich structured queries (filter, select, sort) that combines with search Scoring profiles to model search result relevance Geo-spatial support integrated in filtering, sorting and ranking (such as finding all restaurants within 5 KM of your current location) Search Functionality
  86. 86. IoT day 2015 Redis is an open source, BSD licensed, networked, single- threaded, in-memory key-value cache and store. Key-value cache and store (value can be a couple of things) In-memory (no persistence, but you can) Single-threaded (atomic operations & transactions) Networked (it’s a server and it does master/slave) Some other stuff (scripting, pub/sub, Sentinel, snapshot Caching: Redis
  87. 87. IoT day 2015 Conclusions
  88. 88. IoT day 2015 Pro: partitioning, replica and scaling at it’s core self contained documents programmability in Javascript SQL like “intradocument” queries Cons: No SQL generic queries Can work alone just in few scenarios So DocumentDb…
  89. 89. IoT day 2015 Great storage opportunities in Azure • Log • Search • Transient • Files/Attachments • SQL! • And all new Data Analysis/Machine Learning opportunities Other Not Only SQL alternatives
  90. 90. IoT day 2015 http://bit.do/documentdb-pricing Capacity Units (CU) Capacity Throughput (in terms of rate of transactions / second) • Request Units (RU) = 2000 request per second • “Request” depends on the size of the document – ex. Uploading 1000 large JSON documents might count as more than one request Pricing
  91. 91. Standard pricing tier with hourly billing 1 hr from just $0.034! Performance levels can be adjusted Each collection = 10GB of SSD Collection* perf is set by S1, S2, S3 Limit of 100 collections (1 TB) Soft limit, can be lifted as needed per account What does DocumentDB cost? * collection != table of homogenous entities collection ~ a data partition
  92. 92. IoT day 2015 NoSQL in Azure per l’IoT (e il Business) Marco Parenzan Microsoft Azure MVP @marco_parenzan marco [dot] parenzan [at] 1nn0va [dot] it

×