Data liberty in an age post sql - with pizazz - as presented at cloudburst
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data liberty in an age post sql - with pizazz - as presented at cloudburst

on

  • 344 views

As presented at CloudBurst SE 2013

As presented at CloudBurst SE 2013

Statistics

Views

Total Views
344
Views on SlideShare
344
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Slide ObjectivesUnderstand the hierarchy of Blob storageSpeaker NotesPut Blob - Creates a new blob or replaces an existing blob within a container.Get Blob - Reads or downloads a blob from the system, including its metadata and properties.Delete Blob - Deletes a blobCopy Blob - Copies a source blob to a destination blob within the same storage account.SnapShot Blob - The Snapshot Blob operation creates a read-only snapshot of a blob.Lease Blob - Establishes an exclusive one-minute write lock on a blob. To write to a locked blob, a client must provide a lease ID.Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.NotesThe Blob service provides storage for entities, such as binary files and text files. The REST API for the Blob service exposes two resources: containers and blobs. A container is a set of blobs; every blob must belong to a container. The Blob service defines two types of blobs:Block blobs, which are optimized for streaming. This type of blob is the only blob type available with versions prior to 2009-09-19.Page blobs, which are optimized for random read/write operations and which provide the ability to write to a range of bytes in a blob. Page blobs are available only with version 2009-09-19.Containers and blobs support user-defined metadata in the form of name-value pairs specified as headers on a request operation.Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.A block blob may be created in one of two ways. Block blobs less than or equal to 64 MB in size can be uploaded by calling the Put Blob operation. Block blobs larger than 64 MB must be uploaded as a set of blocks, each of which must be less than or equal to 4 MB in size. A set of successfully uploaded blocks can be assembled in a specified order into a single contiguous blob by calling Put Block List. The maximum size currently supported for a block blob is 200 GB.Page blobs are created and initialized with a maximum size with a call to Put Blob. To write content to a page blob, you call the Put Page operation. The maximum size currently supported for a page blob is 1 TB.Blobs support conditional update operations that may be useful for concurrency control and efficient uploading. Blobs can be read by calling the Get Blob operation. A client may read the entire blob, or an arbitrary range of bytes. For the Blob service API reference, see Blob Service API.
  • Slide ObjectivesUnderstand the hierarchy of Blob storageSpeaker NotesPut Blob - Creates a new blob or replaces an existing blob within a container.Get Blob - Reads or downloads a blob from the system, including its metadata and properties.Delete Blob - Deletes a blobCopy Blob - Copies a source blob to a destination blob within the same storage account.SnapShot Blob - The Snapshot Blob operation creates a read-only snapshot of a blob.Lease Blob - Establishes an exclusive one-minute write lock on a blob. To write to a locked blob, a client must provide a lease ID.Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.NotesThe Blob service provides storage for entities, such as binary files and text files. The REST API for the Blob service exposes two resources: containers and blobs. A container is a set of blobs; every blob must belong to a container. The Blob service defines two types of blobs:Block blobs, which are optimized for streaming. This type of blob is the only blob type available with versions prior to 2009-09-19.Page blobs, which are optimized for random read/write operations and which provide the ability to write to a range of bytes in a blob. Page blobs are available only with version 2009-09-19.Containers and blobs support user-defined metadata in the form of name-value pairs specified as headers on a request operation.Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.A block blob may be created in one of two ways. Block blobs less than or equal to 64 MB in size can be uploaded by calling the Put Blob operation. Block blobs larger than 64 MB must be uploaded as a set of blocks, each of which must be less than or equal to 4 MB in size. A set of successfully uploaded blocks can be assembled in a specified order into a single contiguous blob by calling Put Block List. The maximum size currently supported for a block blob is 200 GB.Page blobs are created and initialized with a maximum size with a call to Put Blob. To write content to a page blob, you call the Put Page operation. The maximum size currently supported for a page blob is 1 TB.Blobs support conditional update operations that may be useful for concurrency control and efficient uploading. Blobs can be read by calling the Get Blob operation. A client may read the entire blob, or an arbitrary range of bytes. For the Blob service API reference, see Blob Service API.
  • Slide ObjectivesUnderstand the hierarchy of Blob storageSpeaker NotesPut Blob - Creates a new blob or replaces an existing blob within a container.Get Blob - Reads or downloads a blob from the system, including its metadata and properties.Delete Blob - Deletes a blobCopy Blob - Copies a source blob to a destination blob within the same storage account.SnapShot Blob - The Snapshot Blob operation creates a read-only snapshot of a blob.Lease Blob - Establishes an exclusive one-minute write lock on a blob. To write to a locked blob, a client must provide a lease ID.Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.NotesThe Blob service provides storage for entities, such as binary files and text files. The REST API for the Blob service exposes two resources: containers and blobs. A container is a set of blobs; every blob must belong to a container. The Blob service defines two types of blobs:Block blobs, which are optimized for streaming. This type of blob is the only blob type available with versions prior to 2009-09-19.Page blobs, which are optimized for random read/write operations and which provide the ability to write to a range of bytes in a blob. Page blobs are available only with version 2009-09-19.Containers and blobs support user-defined metadata in the form of name-value pairs specified as headers on a request operation.Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.A block blob may be created in one of two ways. Block blobs less than or equal to 64 MB in size can be uploaded by calling the Put Blob operation. Block blobs larger than 64 MB must be uploaded as a set of blocks, each of which must be less than or equal to 4 MB in size. A set of successfully uploaded blocks can be assembled in a specified order into a single contiguous blob by calling Put Block List. The maximum size currently supported for a block blob is 200 GB.Page blobs are created and initialized with a maximum size with a call to Put Blob. To write content to a page blob, you call the Put Page operation. The maximum size currently supported for a page blob is 1 TB.Blobs support conditional update operations that may be useful for concurrency control and efficient uploading. Blobs can be read by calling the Get Blob operation. A client may read the entire blob, or an arbitrary range of bytes. For the Blob service API reference, see Blob Service API.
  • Microsoft’s technology leadership in this area takes best of breed technology from industry and makes it enterprise ready. Furthermore, Microsoft has brought the ability to reuse existing IT skill on a new big data platform. The code for expressing this logic is has a shallow learning curve for experienced Microsoft .net developers.
  • The “burst” provisioning of data technologies for a duration that encapsulates the uptime of a certain query alone allows for the consideration of “the commoditised query” where very well understood costs can be weighed against business benefits in a profit centre within a business – liberating the previous sunk cost of BI technology. 
  • Relationship DB joins “Tables” of different data together to form a single picture of somethingDocument DB contains all the details of that something in a single document
  • Familiar operational style – insert, update, delete, readsAd-hoc querying capability, in many programming languagesStandard Map Reduce modelAlso supports SQL-style Aggregation capabilities

Data liberty in an age post sql - with pizazz - as presented at cloudburst Presentation Transcript

  • 1. Alternatives to the shackles of limited scale in data solutions Andy Cross Windows Azure MVP Elastacloud
  • 2. IBM have been a leader in Big Data for years. Wikimedia commons
  • 3. C# integration Remote Data & Jobs Hive in C# Serialization
  • 4. public class SwedishSessionsJob : HadoopJob<SwedishSessionsMapper, SessionsReducer> { public override HadoopJobConfiguration Configure(ExecutorContext context) { var config = new HadoopJobConfiguration() { InputPath = ""/AllSessions/*.gz"", OutputFolder = "/SwedishSessions/" }; return config; } }
  • 5. public class SwedishSessionsMapper : MapperBase { public override void Map(string inputLine, MapperContext context) { if (inputLine.Contains("Country=Sweden") { context.IncrementCounter("SwedishSession"); context.EmitKeyValue(“SE", "1"); } } }
  • 6. public class SessionsReducer : ReducerCombinerBase { public override void Reduce(string key, IEnumerable<string> values, ReducerContext context) { context.EmitKeyValue(key, values.Count()); } }
  • 7. var inputData = "Country=Sweden&Name=Magnus"; var result = StreamingUnit.Execute<Jobs.SwedishJob>(new[]{inputData}); Assert.AreEqual("SEt1", result.ReducerResult.First());
  • 8. Your existing development team can immediately realise value The frameworks facilitate deterministic testing for highly reliable queries Complex logic is best expressed in programmatic form
  • 9. Provision Execute De-provision
  • 10. * Tools are great but not friendly
  • 11. { "_id" : ObjectId("51fccc57f82352d76653bdae"), "Name" : { "FirstName" : "Owen", "LastName" : "Grzegorek" }, "Company" : "Howard Miller Co", "Address" : { "Line1" : "15410 Minnetonka Industrial Rd", "Line2" : "Minnetonka", "Line3" : "Hennepin", "Line4" : "MN", "Line5" : "55345" }, "ContactDetails" : { "Phone" : "952-939-2973", "Fax" : "952-939-4663", "Email" : "owen@grzegorek.com", "Web" : "http://www.owengrzegorek.com" } } { "_id" : ObjectId("51fccc57f82352d76653bdae"), "Name" : { "FirstName" : "Owen", "LastName" : "Grzegorek" }, "Company" : "Howard Miller Co", "Address" : { "Line1" : "15410 Minnetonka Industrial Rd", "Line2" : "Minnetonka", "Line3" : "Hennepin", "Line4" : "MN", "Line5" : "55345" }, "ContactDetails" : { "Phone" : "952-939-2973", "Fax" : "952-939-4663", "Email" : "owen@grzegorek.com", "Web" : "http://www.owengrzegorek.com" } } { "_id" : ObjectId("51fccc57f82352d76653bdae"), "Name" : { "FirstName" : "Owen", "LastName" : "Grzegorek" }, "Company" : "Howard Miller Co", "Address" : { "Line1" : "15410 Minnetonka Industrial Rd", "Line2" : "Minnetonka", "Line3" : "Hennepin", "Line4" : "MN", "Line5" : "55345" }, "ContactDetails" : { "Phone" : "952-939-2973", "Fax" : "952-939-4663", "Email" : "owen@grzegorek.com", "Web" : "http://www.owengrzegorek.com" } } { "_id" : ObjectId("51fccc57f82352d76653bdae"), "Name" : { "FirstName" : "Owen", "LastName" : "Grzegorek" }, "Company" : "Howard Miller Co", "Address" : { "Line1" : "15410 Minnetonka Industrial Rd", "Line2" : "Minnetonka", "Line3" : "Hennepin", "Line4" : "MN", "Line5" : "55345" }, "ContactDetails" : { "Phone" : "952-939-2973", "Fax" : "952-939-4663", "Email" : "owen@grzegorek.com", "Web" : "http://www.owengrzegorek.com" } } { "Name" : { "FirstName" : "Owen", "LastName" : "Grzegorek" }, "Company" : "Howard Miller Co", "Address" : { "Line1" : "15410 Minnetonka Industrial Rd", "Line2" : "Minnetonka", "Line3" : "Hennepin", "Line4" : "MN", "Line5" : "55345" }, "ContactDetails" : { "Phone" : "952-939-2973", "Fax" : "952-939-4663", "Email" : "owen@grzegorek.com", "Web" : "http://www.owengrzegorek.com" } } { "Name" : { "FirstName" : "Owen", "LastName" : "Grzegorek" }, "Company" : "Howard Miller Co", "Address" : { "Line1" : "15410 Minnetonka Industrial Rd", "Line2" : "Minnetonka", "Line3" : "Hennepin", "Line4" : "MN", "Line5" : "55345" }, "ContactDetails" : { "Phone" : "952-939-2973", "Fax" : "952-939-4663", "Email" : "owen@grzegorek.com", "Web" : "http://www.owengrzegorek.com" } } { "Name" : “Richard Conway", “Books Published” : “12”, “Specialises in” : “Data Science” } { "Name" : “Andy Cross", “Hometown" : “Blackpool“ } { "Name" : “Isaac Abraham", “Age" : “33“ “Football Team” : “Tottenham” “Icon” : }
  • 12. There are many different way to connect with MongoDB from a .net project. Official Wrapper Alternative Tool
  • 13. public class Book { public string Author { get; set; } public string Title { get; set; } } // "entities" is the name of the collection var books = database.GetCollection<Entity>("books"); Book book = new Book { Author = "Ernest Hemingway", Title = "For Whom the Bell Tolls" }; books.Insert(book);
  • 14. BsonDocument person = new BsonDocument { { "name", "John Doe" }, { "address", new BsonDocument { { "street", "123 Main St." }, { "city", "Centerville" }, { "state", "PA" }, { "zip", 12345} }} }; var people = database.GetCollection<BsonDocument>("people"); people.Insert(person);
  • 15. http://www.apcjones.com/arrows/#
  • 16. Open source Neo4j Client
  • 17. var query = neo4Jclient.Cypher .Start(new { sweden = Node.ByIndexLookup("countryIdx", "country", "sweden") }) .Match("sweden-[:FRIENDS]->friend-[:FRIENDS]->friendoffriend") .Return<Node<Friend>>("friendoffriend");
  • 18. Info@elastacloud.com