Automatic Data Migration into the Cloud
MS Thesis: Computer Science & Software
Engineering

By: Kushal Mehra
Supervisor: D...
MOTIVATION
 Famous Applications (Facebook, Google Blogger,







Twitter.) depends upon NoSQL.
Some of advantages Of...
Agenda

•  Introduction
•  Review of Previous Studies
•  Proposed Model
• Experiment and Results
• Future work and Conclus...
Section 1: Introduction

Concordia University
4
INTRODUCTION AND PROBLEM

Relational Database Vs. NoSQL Database
 Cloud database definition.
 Existing Problem in Relati...
INTRODUCTION AND PROBLEM

Relational Database Vs. NoSQL Database

•Scale up

Concordia University
7
INTRODUCTION AND PROBLEM

Relational Database Vs. NoSQL Database

 Scale out

Concordia University
8
INTRODUCTION AND PROBLEM

Data Migration
 Enterprises seek to migrate their massive relational

databases to the NoSQL da...
INTRODUCTION AND PROBLEM

Importance of Data Migration

One of survey estimated that the data migration market would reach...
REVIEW OF PREVIOUS STUDIES

Previous Studies
 There are large number of works available for data

migration.
 Some of th...
REVIEW OF PREVIOUS STUDIES

Previous Studies
 Thakar et al. and Chanchary et al. migrated a large

relational database to...
Limitations of Existing Work
 Existing Migration methods are not sufficient for

data migration:
 Lack Migration strateg...
Amazon SimpleDB
 SimpleDB is a web service which provides structured

data storage in the cloud.
 Multi Value Attribute....
Amazon SimpleDB
Relational Database

SimpleDB

Table

Domain

Row

Item

Column

Attribute

Value

Value(s)

Table1 : Rela...
Characteristics of NoSQL Databases
 No Normalization.
 No Joins.
 Schemaless.
 Data Type.

Concordia University
18
Characteristics of NoSQL Databases
 Some of the cloud database that have same data

Model and characteristics.
CLOUD DATA...
Section 2: Proposed Model

Concordia University
21
PROPOSED MODEL

Data Migration Model

Relational-Cloud Mapping
22
PROPOSED MODEL

Migration Methods
 We Propose four Migration Methods.

• Type 1: complete relational database to one
doma...
PROPOSED MODEL

Migration Methods

Concordia University
26
Mapping Strategies

Concordia University
27
PROPOSED MODEL

Mapping Strategy 1 (MS1)

28
PROPOSED MODEL

Mapping Strategy 2 (MS2)

Concordia University
30
PROPOSED MODEL

Mapping Strategy 3 (MS3)

Concordia University
32
PROPOSED MODEL

Type 1 Migration

 Uses Mapping Strategy 2 (Ms2).

 Migrate Entire relational database.
 Exists only a ...
PROPOSED MODEL

Type 2 Migration

 Uses Mapping Strategy 1 (Ms1) and Mapping

Strategy 2 (Ms2).
 Migrate tables and thei...
PROPOSED MODEL

Type 3 Migration

 Uses Mapping Strategy 1 (Ms1).

 Migrate a table to one domain in a cloud database.
...
PROPOSED MODEL

Type 4 Migration

 Uses Mapping Strategy 1 (Ms1) and Mapping

Strategy 3 (Ms3).
 Migrates denormalized t...
PROPOSED MODEL

Migration Method Usage
 Type 1 < 10 GB
 Type 2

 Data size is more than 10 GB and Joins to be

performe...
Sharding and Redundancy in Migration Methods
 Sharding: Sharding is the process of storing data

records across multiple ...
EXPERIMENTS

Implementation Details

 Source System : can be Oracle, MySQL or Microsoft

SQL Server.
 Destination System...
EXPERIMENTS

Experiment

 Migrated the relational database to Amazon

Simpledb.
 A relational database of the “online

b...
EXPERIMENTS

Type 1 Migration

Concordia University
46
EXPERIMENTS

Type 2 Migration

Concordia University
47
EXPERIMENTS

Type 3 Migration

48
EXPERIMENTS

Type 4 Migration

49
Application Adaptation

Code Generation

 We propose an interface which will

assist the developer to generate code
autom...
EXPERIMENTS

Performance Analysis

 Perfomance Model

 Computation time.

 Storage Cost.

Concordia University
53
EXPERIMENTS

Average Computation Time

55
EXPERIMENTS

Storage Cost of 10GB
Amazon SimpleDB 2013

Concordia University
56
EXPERIMENTS

Storage Cost of 25GB
Amazon SimpleDB 2013

Concordia University
57
Comparison of Migration Methods
Migration Methods

Type 1

Type 2

Type 3

Type 4

<10GB









>10GB









S...
Limitations
 Stored Procedure.
 User Defined Functions.

 Triggers.

Concordia University
59
CONCLUSION AND FUTURE WORK

Conclusion and Future Direction
 This thesis proposes four diverse methods to migrate

relati...
CONCLUSION AND FUTURE WORK

Future Direction
 Migration of :
 Stored procedure.

 Triggers.
 User-Defined Functions.

...
Publications
 K. Mehra, Y. Yan and D. Lemire. Automatic data

migration to the cloud. In the Sixth International
workshop...
63
Upcoming SlideShare
Loading in …5
×

Thesis presentation

741 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
741
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Thesis presentation

  1. 1. Automatic Data Migration into the Cloud MS Thesis: Computer Science & Software Engineering By: Kushal Mehra Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire Concordia University 1
  2. 2. MOTIVATION  Famous Applications (Facebook, Google Blogger,     Twitter.) depends upon NoSQL. Some of advantages Of NoSQL Databases:  High Scalability. High reading and writing Performance. Availability at low cost. Suitable Applications. Big Data Geographical Data. Concordia University 2
  3. 3. Agenda •  Introduction •  Review of Previous Studies •  Proposed Model • Experiment and Results • Future work and Conclusion Concordia University 3
  4. 4. Section 1: Introduction Concordia University 4
  5. 5. INTRODUCTION AND PROBLEM Relational Database Vs. NoSQL Database  Cloud database definition.  Existing Problem in Relational Database. •Scale up •Scale out Concordia University 6
  6. 6. INTRODUCTION AND PROBLEM Relational Database Vs. NoSQL Database •Scale up Concordia University 7
  7. 7. INTRODUCTION AND PROBLEM Relational Database Vs. NoSQL Database  Scale out Concordia University 8
  8. 8. INTRODUCTION AND PROBLEM Data Migration  Enterprises seek to migrate their massive relational databases to the NoSQL databases.  The process of transferring data between storage types, formats, or computer systems is called data migration Concordia University 11
  9. 9. INTRODUCTION AND PROBLEM Importance of Data Migration One of survey estimated that the data migration market would reach $906 million by 2012 Concordia University 12
  10. 10. REVIEW OF PREVIOUS STUDIES Previous Studies  There are large number of works available for data migration.  Some of them are :  Schema Conversion.  ETL.  Integrated Model Concordia University 13
  11. 11. REVIEW OF PREVIOUS STUDIES Previous Studies  Thakar et al. and Chanchary et al. migrated a large relational database to the cloud database (2010) .  Calil et al. proposed a SimpleSQL, a relational layer over Amazon SimpleDB (2012). Concordia University 14
  12. 12. Limitations of Existing Work  Existing Migration methods are not sufficient for data migration:  Lack Migration strategy. Application Adaption. Sharding. Existing migrate data from the legacy system to relational database. Concordia University 15
  13. 13. Amazon SimpleDB  SimpleDB is a web service which provides structured data storage in the cloud.  Multi Value Attribute. Concordia University 16
  14. 14. Amazon SimpleDB Relational Database SimpleDB Table Domain Row Item Column Attribute Value Value(s) Table1 : Relational database and SimpleDB equivalence Concordia University 17
  15. 15. Characteristics of NoSQL Databases  No Normalization.  No Joins.  Schemaless.  Data Type. Concordia University 18
  16. 16. Characteristics of NoSQL Databases  Some of the cloud database that have same data Model and characteristics. CLOUD DATABASE Amazon SimpleDB MongoDB CouchDB Oracle NoSql Concordia University 19
  17. 17. Section 2: Proposed Model Concordia University 21
  18. 18. PROPOSED MODEL Data Migration Model Relational-Cloud Mapping 22
  19. 19. PROPOSED MODEL Migration Methods  We Propose four Migration Methods. • Type 1: complete relational database to one domain. • Type 2: multiple tables to one domain. • Type 3: a table to one domain. • Type 4: normalization to denormalization and tables to domain.  Each Method is independent of the other and is capable of migrating entire relational database. Concordia University 25
  20. 20. PROPOSED MODEL Migration Methods Concordia University 26
  21. 21. Mapping Strategies Concordia University 27
  22. 22. PROPOSED MODEL Mapping Strategy 1 (MS1) 28
  23. 23. PROPOSED MODEL Mapping Strategy 2 (MS2) Concordia University 30
  24. 24. PROPOSED MODEL Mapping Strategy 3 (MS3) Concordia University 32
  25. 25. PROPOSED MODEL Type 1 Migration  Uses Mapping Strategy 2 (Ms2).  Migrate Entire relational database.  Exists only a single domain in cloud database.  Number of items = number of rows in the entire relational database. Concordia University 34
  26. 26. PROPOSED MODEL Type 2 Migration  Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 2 (Ms2).  Migrate tables and their data to one domain.  Migrate a table to one domain. Concordia University 36
  27. 27. PROPOSED MODEL Type 3 Migration  Uses Mapping Strategy 1 (Ms1).  Migrate a table to one domain in a cloud database.  Implicit Conversion. Concordia University 38
  28. 28. PROPOSED MODEL Type 4 Migration  Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 3 (Ms3).  Migrates denormalized tables to one domain in a cloud database.  Migrate a single table and data to a one domain.  Explicit Conversion of columns. Concordia University 40
  29. 29. PROPOSED MODEL Migration Method Usage  Type 1 < 10 GB  Type 2  Data size is more than 10 GB and Joins to be performed.  Type 3  Needs same semantics as of relational database and database size is more than 10GB  Type 4  Denormalization.  Data size is more than 10 GB and Joins to be performed. 42
  30. 30. Sharding and Redundancy in Migration Methods  Sharding: Sharding is the process of storing data records across multiple domains.  Type1 does not support sharding.  Type2, Typ3, Type 4 Supports sharding.  Redundancy: Data redundancy is the superfluity of data. Concordia University 43
  31. 31. EXPERIMENTS Implementation Details  Source System : can be Oracle, MySQL or Microsoft SQL Server.  Destination System: Our destination system is a cloud database which supports key-value pairs.  We use Microsoft .Net Framework 3.5, Microsoft IIS 7.0 and MicrosoftSQL Server 2008 R2.  C# library of SimpleDB to perform all necessary action for migrating the data. 44
  32. 32. EXPERIMENTS Experiment  Migrated the relational database to Amazon Simpledb.  A relational database of the “online bookstore”application.  The sample database consists of thirteen tables and sample data 45
  33. 33. EXPERIMENTS Type 1 Migration Concordia University 46
  34. 34. EXPERIMENTS Type 2 Migration Concordia University 47
  35. 35. EXPERIMENTS Type 3 Migration 48
  36. 36. EXPERIMENTS Type 4 Migration 49
  37. 37. Application Adaptation Code Generation  We propose an interface which will assist the developer to generate code automatically.  This includes the basic usage of:  Select.  Insert.  Delete.  Update queries. Concordia University 52
  38. 38. EXPERIMENTS Performance Analysis  Perfomance Model  Computation time.  Storage Cost. Concordia University 53
  39. 39. EXPERIMENTS Average Computation Time 55
  40. 40. EXPERIMENTS Storage Cost of 10GB Amazon SimpleDB 2013 Concordia University 56
  41. 41. EXPERIMENTS Storage Cost of 25GB Amazon SimpleDB 2013 Concordia University 57
  42. 42. Comparison of Migration Methods Migration Methods Type 1 Type 2 Type 3 Type 4 <10GB     >10GB     Sharding     Joins Limited to one domain Limited to one domain Cross domain Limited to one domain Denormalzed Data     Storage cost Nearly same of Type 2, Type3 Nearly same of Type 1, Type3 Nearly same of Type 2, Type3 Less than Type 1, Type 2, Type3 Storage Space Computation Time Smallest Larger than Type1 Concordia University 58 Highest Larger than Type2
  43. 43. Limitations  Stored Procedure.  User Defined Functions.  Triggers. Concordia University 59
  44. 44. CONCLUSION AND FUTURE WORK Conclusion and Future Direction  This thesis proposes four diverse methods to migrate relational databases to cloud databases.  Each method is independent of the other.  Successfully migrated relational database to the NoSQL database.  Proposes an Interface for code generation. Concordia University 60
  45. 45. CONCLUSION AND FUTURE WORK Future Direction  Migration of :  Stored procedure.  Triggers.  User-Defined Functions. Concordia University 61
  46. 46. Publications  K. Mehra, Y. Yan and D. Lemire. Automatic data migration to the cloud. In the Sixth International workshop on Cloud Data Management (CloudDB 2014), submitted.  K. Mehra, Y. Yan and D. Lemire. Automatic data migration into the cloud. IEEE Services 2014, Manuscript. 62
  47. 47. 63

×