Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tamir Dresher
Senior Software Architect
May 19, 2014
Where is my Data? (In the Cloud)
About Me
• Software architect, consultant and instructor
• Software Engineering Lecturer @ Ruppin Academic Center
• Techno...
Agenda
• Storage
• Blob
• Azure SQL Server
• Azure Tables
• HDInsight
Agenda
• Storage
• Blob
• Azure SQL Server
• Azure Tables
• HDInsight
Storage
Where is my data Storage
Storage Prices
6
Types of information
Where is my data Storage
North America Europe Asia Pacific
Datacenters
Windows Azure Growing Global Presence
Storage SLA – 99.99%
52.56 minutes per...
AZURE BLOBS
9
What is a BLOB
• BLOB – Binary Large OBject
• Storage for any type of entity such as binary files and text
documents
• Dis...
Blob Storage Concepts
11
Where is my data BLOB
Blob Operations
REST
Where is my data BLOB
DEMO
Creating a Blob
13
BLOBS
• Block blob - up to 200 GB in size
• Page blobs – up to 1 TB in size
• Total Account Capacity - 500 TB
• Pricing
– ...
SQL AZURE
15
SQL Azure
• SQL Server in the cloud
• No administrative overheads
• High Availability
• pay-as-you-grow pricing
• Familiar...
DEMO
Creating and Using SQL Azure
17
SQL Azure – Pricing
Where is my data SQL Azure
Case Study - https://haveibeenpwned.com/
Where is my data SQL Azure
Case Study - https://haveibeenpwned.com/
• http://www.troyhunt.com/2013/12/working-with-154-million-
records-on.html
• How...
AZURE TABLES
21
Table Storage Concepts
22
Where is my data Tables
Table Storage
• Not RDBMS
– No relationships between entities
– NoSql
• Entity can have up to 255 properties - Up to 1MB p...
No Fixed Schema
24
Where is my data Tables
Table Object Model
• ITableEntity interface –PartitionKey, RowKey, Timestamp, and
Etag properties
– Implemented by TableEn...
Sample – Inserting an Entity into a Table
// You will need the following using statements
using Microsoft.WindowsAzure.Sto...
Retrieve
// Create the table client.
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable pe...
Table Storage – Important Points
• Azure Tables can store TBs of data
• Tables Operations are fast
• Tables are distribute...
Pricing
Where is my data Tables
Case Study - https://haveibeenpwned.com/
Where is my data Tables
Case Study - https://haveibeenpwned.com/
• How do I make querying 154 million email addresses as fast as
possible?
• foo@b...
HDINSIGHT
32
Hadoop in the cloud
• Hadoop on Azure Cloud
• Some Facts:
– Bing ingests > 7 petabytes
a month
– The Twitter community gen...
MapReduce – The BigData Power
• Map – takes input and output key;value pairs
(Key1,Value1)
(Key2,Value2)
:
:
(Keyn,Valuen)...
MapReduce – The BigData Power
• Reduce – take group of values per key and produce new group
of values
Key1:
[value1-1,Valu...
MapReduce - How Does It Work?
Where is my data HDInsight
So How Does It Work?
Where is my data HDInsight
Finding common friends
• Facebook shows you how many common friends you have with
someone
• There were 1,310,000,000 activ...
Finding common friends
• We can represent Friend Relationship as:
• Note that a Friend relationship is Symmetrical
– if A ...
Example of Friends file
• U1 -> U2 U3 U4
• U2 -> U1 U3 U4 U5
• U3 -> U1 U2 U4 U5
• U4 -> U1 U2 U3 U5
• U5 -> U2 U3 U4
Wher...
Designing our MapReduce job
• Each line from the file will input line to the Mapper
• The Mapper will output key-value pai...
Designing our MapReduce job - Mapper
• Each line from the file will input line to the Mapper
• The Mapper will output key-...
Mapper Example
Where is my data HDInsight Common Friends
Mapper Output:Given the Line:
(U1 U2)  U2 U3 U4
(U1 U3)  U2 U3 ...
Mapper Example
Where is my data HDInsight Common Friends
Mapper Output:Given the Line:
(U1 U2)  U2 U3 U4
(U1 U3)  U2 U3 ...
Mapper Example – final result
Where is my data HDInsight Common Friends
Mapper Output:Given the Line:
(U1 U2)  U2 U3 U4
(...
Designing our MapReduce job - Reducer
• The input for the reducer will be structured as:
(friend1, friend2)  (friend1 fri...
Reducer Example
Where is my data HDInsight Common Friends
Reducer Output:Given the Line:
(U1 U2) -> (U3 U4)(U1 U2) -> (U1 ...
Creating c# MapReduce
Where is my data HDInsight Common Friends
Creating c# MapReduce - Mapper
Where is my data HDInsight Common Friends
public class CommonFriendsMapper:MapperBase
{
pub...
Creating c# MapReduce - Reduce
Where is my data HDInsight Common Friends
public class CommonFriendsReducer:ReducerCombiner...
Creating c# MapReduce – Hadoop Job
Where is my data HDInsight Common Friends
HadoopJobConfiguration myConfig = new HadoopJ...
Pricing
Where is my data HDInsight
10 node cluster that will exist for 24 hours:
• Secure Gateway Node - free.
• head node...
WRAP UP
53
Comparing the alternatives
Storage Type When Should you Use Implications
BLOB Unstructured data
Files
- Application Logic ...
What have we seen
• Azure Blobs
• Azure Tables
• Azure SQL Server
• HDinsight
Where is my data Wrap Up
What’s Next
• NoSql – MongoDB, Cassandra, CouchDB, RavenDB
• Hadoop ecosystem – Hive, Pig, SQOOP, Mahout
• http://blogs.ms...
Presenter contact details
c: +972-52-4772946
t: @tamir_dresher
e: tamirdr@codevalue.net
b: TamirDresher.com
w: www.codeval...
Upcoming SlideShare
Loading in …5
×

Where is my data (in the cloud) tamir dresher

338 views

Published on

Azure Storage Option together with best practices and methods to handle Large Amounts of data

slides and recording can be found in my blog: http://blogs.microsoft.co.il/iblogger/2014/05/22/slides-from-where-is-my-data-in-the-cloud-webinar-19052014/

Published in: Software, Technology
  • Be the first to comment

Where is my data (in the cloud) tamir dresher

  1. 1. Tamir Dresher Senior Software Architect May 19, 2014 Where is my Data? (In the Cloud)
  2. 2. About Me • Software architect, consultant and instructor • Software Engineering Lecturer @ Ruppin Academic Center • Technology addict • 10 years of experience • .NET and Native Windows Programming @tamir_dresher tamirdr@codevalue.net http://www.TamirDresher.com.
  3. 3. Agenda • Storage • Blob • Azure SQL Server • Azure Tables • HDInsight
  4. 4. Agenda • Storage • Blob • Azure SQL Server • Azure Tables • HDInsight
  5. 5. Storage Where is my data Storage
  6. 6. Storage Prices 6
  7. 7. Types of information Where is my data Storage
  8. 8. North America Europe Asia Pacific Datacenters Windows Azure Growing Global Presence Storage SLA – 99.99% 52.56 minutes per year http://azure.microsoft.com/en-us/support/legal/sla
  9. 9. AZURE BLOBS 9
  10. 10. What is a BLOB • BLOB – Binary Large OBject • Storage for any type of entity such as binary files and text documents • Distributed File Service (DFS) – Scalability and High availability • BLOB file is distributed between multiple server and replicated at least 3 times Where is my data BLOB
  11. 11. Blob Storage Concepts 11 Where is my data BLOB
  12. 12. Blob Operations REST Where is my data BLOB
  13. 13. DEMO Creating a Blob 13
  14. 14. BLOBS • Block blob - up to 200 GB in size • Page blobs – up to 1 TB in size • Total Account Capacity - 500 TB • Pricing – Storage capacity used – Replication option (LRS, GRS, RA-GRS) – Number of requests – Data egress – http://azure.microsoft.com/en-us/pricing/details/storage/ Where is my data BLOB
  15. 15. SQL AZURE 15
  16. 16. SQL Azure • SQL Server in the cloud • No administrative overheads • High Availability • pay-as-you-grow pricing • Familiar Development Model* * Despite missing features and some limitations - http://msdn.microsoft.com/en-us/library/ff394115.aspx Where is my data SQL Azure
  17. 17. DEMO Creating and Using SQL Azure 17
  18. 18. SQL Azure – Pricing Where is my data SQL Azure
  19. 19. Case Study - https://haveibeenpwned.com/ Where is my data SQL Azure
  20. 20. Case Study - https://haveibeenpwned.com/ • http://www.troyhunt.com/2013/12/working-with-154-million- records-on.html • How do I make querying 154 million email addresses as fast as possible? • if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$) Where is my data SQL Azure
  21. 21. AZURE TABLES 21
  22. 22. Table Storage Concepts 22 Where is my data Tables
  23. 23. Table Storage • Not RDBMS – No relationships between entities – NoSql • Entity can have up to 255 properties - Up to 1MB per entity • Mandatory Properties for every entity – PartitionKey & RowKey (only indexed properties) • Uniquely identifies an entity • Same RowKey can be used in different PartitionKey • Defines the sort order – Timestamp - Optimistic Concurrency Where is my data Tables
  24. 24. No Fixed Schema 24 Where is my data Tables
  25. 25. Table Object Model • ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties – Implemented by TableEntity and DynamicTableEntity // This class defines one additional property of integer type, // since it derives from TableEntity it will be automatically // serialized and deserialized. public class SampleEntity : TableEntity { public int SampleProperty { get; set; } } Where is my data Tables
  26. 26. Sample – Inserting an Entity into a Table // You will need the following using statements using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Table; // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); peopleTable.CreateIfNotExists(); // Create a new customer entity. CustomerEntity customer1 = new CustomerEntity("Harp", "Walter"); customer1.Email = "Walter@contoso.com"; customer1.PhoneNumber = "425-555-0101"; // Create an operation to add the new customer to the people table. TableOperation insertCustomer1 = TableOperation.Insert(customer1); // Submit the operation to the table service. peopleTable.Execute(insertCustomer1); Where is my data Tables
  27. 27. Retrieve // Create the table client. CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); CloudTable peopleTable = tableClient.GetTableReference("people"); // Retrieve the entity with partition key of "Smith" and row key of "Jeff" TableOperation retrieveJeffSmith = TableOperation.Retrieve<CustomerEntity>("Smith", "Jeff"); // Retrieve entity CustomerEntity specificEntity = (CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result; Where is my data Tables
  28. 28. Table Storage – Important Points • Azure Tables can store TBs of data • Tables Operations are fast • Tables are distributed –PartitionKey defines the partition – A table might be stored in different partitions on different storage devices. Where is my data Tables
  29. 29. Pricing Where is my data Tables
  30. 30. Case Study - https://haveibeenpwned.com/ Where is my data Tables
  31. 31. Case Study - https://haveibeenpwned.com/ • How do I make querying 154 million email addresses as fast as possible? • foo@bar.com – the domain is the partition key and the alias is the row key • if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month • SQL Server will cost $176 a month - 22 times more expensive Where is my data Tables
  32. 32. HDINSIGHT 32
  33. 33. Hadoop in the cloud • Hadoop on Azure Cloud • Some Facts: – Bing ingests > 7 petabytes a month – The Twitter community generates over 1 terabyte of tweets every day – Cisco predicts that by 2013 annual internet traffic flowing will reach 667 exabytes Where is my data HDInsight Sources: The Economist, Feb ‘10; DBMS2; Microsoft Corp
  34. 34. MapReduce – The BigData Power • Map – takes input and output key;value pairs (Key1,Value1) (Key2,Value2) : : (Keyn,Valuen) Where is my data HDInsight
  35. 35. MapReduce – The BigData Power • Reduce – take group of values per key and produce new group of values Key1: [value1-1,Value1-2…] Key2: [value2-1,Value2-2…] Keyn: [valueN-1,ValueN-2…] [new_value1-1,new_value1-2…] [new_value2-1,new_value2-2…] [new_valueN-1,new_valueN-2…] : : Where is my data HDInsight
  36. 36. MapReduce - How Does It Work? Where is my data HDInsight
  37. 37. So How Does It Work? Where is my data HDInsight
  38. 38. Finding common friends • Facebook shows you how many common friends you have with someone • There were 1,310,000,000 active users in facebook with 130 friends on average (01.01.2014) • Calculating the mutual friends Where is my data HDInsight
  39. 39. Finding common friends • We can represent Friend Relationship as: • Note that a Friend relationship is Symmetrical – if A is a friend of B then B is a friend of A Where is my data HDInsight Someone  [List of hisher friends] Common Friends
  40. 40. Example of Friends file • U1 -> U2 U3 U4 • U2 -> U1 U3 U4 U5 • U3 -> U1 U2 U4 U5 • U4 -> U1 U2 U3 U5 • U5 -> U2 U3 U4 Where is my data HDInsight Common Friends
  41. 41. Designing our MapReduce job • Each line from the file will input line to the Mapper • The Mapper will output key-value pairs • Key: (user, friend) – Sorted, friend might be before user • value: list of friends Where is my data HDInsight Common Friends
  42. 42. Designing our MapReduce job - Mapper • Each line from the file will input line to the Mapper • The Mapper will output key-value pairs • Key: (user, friend) – Sorted, friend might be before user • value: list of friends • Having the key sorted will help us with the reducer, same pairs will be provided together Where is my data HDInsight Common Friends
  43. 43. Mapper Example Where is my data HDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1U2 U3 U4
  44. 44. Mapper Example Where is my data HDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2  U1 U3 U4 U5
  45. 45. Mapper Example – final result Where is my data HDInsight Common Friends Mapper Output:Given the Line: (U1 U2)  U2 U3 U4 (U1 U3)  U2 U3 U4 (U1 U4)  U2 U3 U4 U1U2 U3 U4 (U1 U2) -> U1 U3 U4 U5 (U2 U3) -> U1 U3 U4 U5 (U2 U4) -> U1 U3 U4 U5 (U2 U5) -> U1 U3 U4 U5 U2  U1 U3 U4 U5 (U1 U3) -> U1 U2 U4 U5 (U2 U3) -> U1 U2 U4 U5 (U3 U4) -> U1 U2 U4 U5 (U3 U5) -> U1 U2 U4 U5 U3 -> U1 U2 U4 U5 Mapper Output:Given the Line: (U1 U4) -> U1 U2 U3 U5 (U2 U4) -> U1 U2 U3 U5 (U3 U4) -> U1 U2 U3 U5 (U4 U5) -> U1 U2 U3 U5 U4 -> U1 U2 U3 U5 (U2 U5) -> U2 U3 U4 (U3 U5) -> U2 U3 U4 (U4 U5) -> U2 U3 U4 U5 -> U2 U3 U4
  46. 46. Designing our MapReduce job - Reducer • The input for the reducer will be structured as: (friend1, friend2)  (friend1 friends) (friend2 friends) • The reducer will find the intersection between the lists • Output: (friend1, friend2)  (intersection of friend1 and friend2 friends) Where is my data HDInsight Common Friends
  47. 47. Reducer Example Where is my data HDInsight Common Friends Reducer Output:Given the Line: (U1 U2) -> (U3 U4)(U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4) (U1 U3) -> (U2 U4)(U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4) (U1 U4) -> (U2 U3)(U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4) (U2 U3) -> (U1 U4 U5)(U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5) (U2 U4) -> (U1 U3 U5)(U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5) (U2 U5) -> (U3 U4)(U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4) (U3 U4) -> (U1 U2 U5)(U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5) (U3 U5) -> (U2 U4)(U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4) (U4 U5) -> (U2 U3)(U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)
  48. 48. Creating c# MapReduce Where is my data HDInsight Common Friends
  49. 49. Creating c# MapReduce - Mapper Where is my data HDInsight Common Friends public class CommonFriendsMapper:MapperBase { public override void Map(string inputLine, MapperContext context) { var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries); if (strings.Any()) { var currentUser = strings[0]; var friends = strings.Skip(1); foreach (var friend in friends) { var keyArr = new[] {currentUser, friend}; Array.Sort(keyArr); var key = String.Join(" ", keyArr); context.EmitKeyValue(key, string.Join(" ",friends)); } } } }
  50. 50. Creating c# MapReduce - Reduce Where is my data HDInsight Common Friends public class CommonFriendsReducer:ReducerCombinerBase { public override void Reduce(string key, IEnumerable<string> strings, ReducerCombinerContext context) { var friendsLists = strings .Select(friendList => friendList.Split(' ')) .ToList(); var intersection = friendsLists[0].Intersect(friendsLists[1]); context.EmitKeyValue(key, string.Join(" ", intersection)); } }
  51. 51. Creating c# MapReduce – Hadoop Job Where is my data HDInsight Common Friends HadoopJobConfiguration myConfig = new HadoopJobConfiguration(); myConfig.InputPath = "wasb:///example/data/friends/friends"; myConfig.OutputFolder = "wasb:////example/data/friends/output"; Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:hadoop"); Environment.SetEnvironmentVariable("Java_HOME", @"c:hadoopjvm"); var hadoop = Hadoop.Connect(clusterUri, clusterUserName, hadoopUserName, clusterPassword, azureStorageAccount, azureStorageKey, azureStorageContainer, createContinerIfNotExist); var jobResult = hadoop.MapReduceJob.Execute<CommonFriendsMapper, CommonFriendsReducer>(myConfig); int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)
  52. 52. Pricing Where is my data HDInsight 10 node cluster that will exist for 24 hours: • Secure Gateway Node - free. • head node - 15.36 USD per 24-hour day • 1 data node - 7.68 USD per 24-hour day • 10 data nodes - 76.80 USD per 24-hour day • Total: $92.16 USD
  53. 53. WRAP UP 53
  54. 54. Comparing the alternatives Storage Type When Should you Use Implications BLOB Unstructured data Files - Application Logic Responsibility - Consider using HDInsight(Hadoop) SQL Server Structured Relational Data ACID transactions Max 150GB (500GB in preview) - SQL DML+DDL - Could affect scalability - BI Abilities - Reporting Azure Tables Structured Data Loose Schema Geo Replication (High DR) Auto Sharding - OData, REST - Application Logic - Responsibility(Multiple Schemas) Where is my data Wrap Up
  55. 55. What have we seen • Azure Blobs • Azure Tables • Azure SQL Server • HDinsight Where is my data Wrap Up
  56. 56. What’s Next • NoSql – MongoDB, Cassandra, CouchDB, RavenDB • Hadoop ecosystem – Hive, Pig, SQOOP, Mahout • http://blogs.msdn.com/b/windowsazure/ • http://blogs.msdn.com/b/windowsazurestorage/ • http://blogs.msdn.com/b/bigdatasupport/ Where is my data Wrap Up
  57. 57. Presenter contact details c: +972-52-4772946 t: @tamir_dresher e: tamirdr@codevalue.net b: TamirDresher.com w: www.codevalue.net

×