GUIDETO

SQL - NOSQL MIGRATION
AntonYazovskiy	

Solution Architect,ThumbtackTechnology
AGENDA
• Why would you want to migrate to NoSQL	

• Conceptual difference between RBDMS and
NoSQL	

• Data modeling and ar...
WHY?
scalability	

performance	

developer productivity
CONCEPTUAL DIFFERENCE
BETWEEN RBDMS AND NOSQL
• relational schema allows you to query data in many different ways in diffe...
DATA MODELING AND
ARCHITECTURAL BEST
PRACTICES
POLYGLOT PERSISTENCE
• different solutions are designed to solve different problems	

• session & fast transactions	

• ca...
POLYGLOT PERSISTENCE
NOSQL DATA STRUCTURES
• Key-Value: Riak, Redis, MemcacheDB,Aerospike
and Amazon DynamoDB (Cloud).	

• Key-Document: MongoD...
PRACTICAL
MIGRATION
STEPS
• what would you like to achieve	

• learn your traffic	

• lean your data set	

• what are you w...
WHAT WOULDYOU LIKETO
ACHIEVE
• better performance	

• scale current solution	

• process more or(and) different data	

• s...
LEARNYOURTRAFFIC
• how workload looks like:	

• OLTP (simple lookups, short transactions)	

• OLAP (aggregations, analytic...
LEANYOUR DATA SET
• what kind of data types do you operate with	

• simple key-value	

• structure, semi-structure	

• nes...
WHAT AREYOU WILLINGTO
SACRIFICE
• what data doesn't require a strong consistency	

• where transactional guarantees aren't...
APPLY POLYGLOT
PERSISTENCE
• Based on discovered answers, define the most obvious types of storages that
you may need	

• f...
DEFINE A DATA MODEL
DATA MODELING: BEFORE
YOU START
• from “what data do I have”to “what questions do I
have”	

• denormalization & duplicatio...
REFERENCES
• in-application joins	

• nothing to be
ashamed about	

• apply carefully
!
{
user_name: ayazovskiy,
contact: ...
DUPLICATION
• Duplication is a technique of copying pieces of data between
structures in order to either optimize query pr...
AGGREGATES
• simplify data processing logic	

• optimize read/write time	

• ability to distribute the data
across the clu...
AGGREGATES
• updates of duplicated
data are heavy and
complex	

• querying across
aggregates heavy and
complex
{
user_name...
COUNTERS
• NoSQL auto-increment analog	

• distributed consistent auto-increment is tricky	

• counters aren't always reli...
COMPOSITE KEYS
{
"ID": "chat#user_1#user_2#december_12_2014",
"messages": [
{ "user_1": "hey" },
{ "user_1": "how is going...
APPEND
{
ID: account#User_A,
account_total: $100,
account_total_calculation_time: ..,
changes_since_last_calculation: [
13...
THINK OF DATA
SYNCHRONIZATION
• application-level synchronization:	

• e.g. update user profile in document-oriented storag...
–AntonYazovskiy
“always remember that in most cases you run queries
across the cluster”
Any questions?
Thank you
@yazovsky	

ayazovksiy@thumbtack.net	

www.thumbtack.net
THANKS / REFERENCES
• NoSQL Distilled:A Brief Guide to the Emerging World of Polyglot
Persistence by Pramod J. Sadalage an...
Guide to SQL to NoSQL migration
Upcoming SlideShare
Loading in …5
×

Guide to SQL to NoSQL migration

646 views

Published on

Is your legacy database infrastructure struggling to meet the demand of customer Service Level Agreements? If you, like many companies, are discovering that your infrastructure is not robust enough to deal with the speed and scale required of today's Internet-scale applications, it may be time to consider a switch to NoSQL storage.

Changing storage systems can be a daunting process and, with all the buzz surrounding NoSQL, it can be difficult to know where to start. As a Solutions Architect at Thumbtack Technology, Anton Yazovskiy has helped many companies through the selection and deployment process of NoSQL technologies. In this webinar, Anton will explain the main advantages of NoSQL and common use cases in which the migration to NoSQL makes sense. You will learn key questions that you should ask before migration, as well as important differences in data modeling and architectural approaches. Finally, you will take a look at a typical application based on Relational Database Management System (RDBMS) and will migrate it to NoSQL step-by-step.

Key topics that will be covered:

> Why you would want to migrate to NoSQL
> Conceptual differences between RDBMS and NoSQL
> Data modeling and architectural best practices
> "I got it. But what exactly I need to do?" - Practical migration steps

ABOUT THE PRESENTER
Anton Yazovskiy is a Software Engineer at Thumbtack Technology, where he focuses on high-performance enterprise architecture. He has presented at a variety of IT conferences and “DevDays” on topics such as NoSQL and MarkLogic.

Published in: Engineering, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
646
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Guide to SQL to NoSQL migration

  1. 1. GUIDETO
 SQL - NOSQL MIGRATION AntonYazovskiy Solution Architect,ThumbtackTechnology
  2. 2. AGENDA • Why would you want to migrate to NoSQL • Conceptual difference between RBDMS and NoSQL • Data modeling and architectural best practices • Practical migration steps / questions you have to ask
  3. 3. WHY? scalability performance developer productivity
  4. 4. CONCEPTUAL DIFFERENCE BETWEEN RBDMS AND NOSQL • relational schema allows you to query data in many different ways in different contexts • accessible for many types of applications and separate dev teams • schema helps to control rules common for everybody ! • always remember that in most cases you run queries across the cluster • NoSQL is about focusing on particular need and goal • model your data for specific use case • define what are you willing to sacrifice to achieve better results
  5. 5. DATA MODELING AND ARCHITECTURAL BEST PRACTICES
  6. 6. POLYGLOT PERSISTENCE • different solutions are designed to solve different problems • session & fast transactions • cache • aggregations • analytical ad-hoc queries • graph traversal • the requirements for OLTP and OLAP storages are very different
  7. 7. POLYGLOT PERSISTENCE
  8. 8. NOSQL DATA STRUCTURES • Key-Value: Riak, Redis, MemcacheDB,Aerospike and Amazon DynamoDB (Cloud). • Key-Document: MongoDB and Couchbase. • Column-Family: Cassandra, HBase • Graph Databases - Neo4j and OrientDB.
  9. 9. PRACTICAL MIGRATION STEPS • what would you like to achieve • learn your traffic • lean your data set • what are you willing to sacrifice • apply polyglot persistence • model your data • synchronization
  10. 10. WHAT WOULDYOU LIKETO ACHIEVE • better performance • scale current solution • process more or(and) different data • speed-up the development • I heard of it
  11. 11. LEARNYOURTRAFFIC • how workload looks like: • OLTP (simple lookups, short transactions) • OLAP (aggregations, analytical queries, ad-hock scans, etc.) • heavy-read, heavy-write • what kind of queries do you perform in order to address application's questions: • simple lookups, uncertain search, inner requests, traversal, BI/Analysis
  12. 12. LEANYOUR DATA SET • what kind of data types do you operate with • simple key-value • structure, semi-structure • nested/hierarchical • graph-oriented • what size of each data type do you have
  13. 13. WHAT AREYOU WILLINGTO SACRIFICE • what data doesn't require a strong consistency • where transactional guarantees aren't require • what data are you willing to lost in case of hardware failure • where are you willing to sacrifice joins
  14. 14. APPLY POLYGLOT PERSISTENCE • Based on discovered answers, define the most obvious types of storages that you may need • fast & simple storage for lookups, non-critical data and short transactions • RDBMS for data that fit into single server • document-oriented storage for inner/hierarchical data and aggregate- oriented reads & writes • graph-oriented storage for traversal queries, social relations, etc. • highly-scalable storage for BigData background processing
  15. 15. DEFINE A DATA MODEL
  16. 16. DATA MODELING: BEFORE YOU START • from “what data do I have”to “what questions do I have” • denormalization & duplication are your best friends • hierarchical and embedded structures make your life easier, but they are your worst enemy
  17. 17. REFERENCES • in-application joins • nothing to be ashamed about • apply carefully ! { user_name: ayazovskiy, contact: {..}, access: { level: 523, group: dev } } { access_level: 523, rules: [...] }
  18. 18. DUPLICATION • Duplication is a technique of copying pieces of data between structures in order to either optimize query processing time or convert data into particular business model. ! • The main advantages of denormalization is ability to: 1. reduce the number of I/O operations and query time 2. reduce complexity of query processing in distributed systems
  19. 19. AGGREGATES • simplify data processing logic • optimize read/write time • ability to distribute the data across the cluster • reduce # of requests across the cluster • perform atomic updates { user_name: ayazovskiy, contact: { phone: 123, email: @thumbtack.net }, access: { level: 5, group: dev } }
  20. 20. AGGREGATES • updates of duplicated data are heavy and complex • querying across aggregates heavy and complex { user_name: ayazovskiy, contact: { phone: 123, email: @thumbtack.net }, access: { level: 5, group: dev } }
  21. 21. COUNTERS • NoSQL auto-increment analog • distributed consistent auto-increment is tricky • counters aren't always reliable *
  22. 22. COMPOSITE KEYS { "ID": "chat#user_1#user_2#december_12_2014", "messages": [ { "user_1": "hey" }, { "user_1": "how is going?" }, { "user_2": "thanks, pretty well!" } ] }
  23. 23. APPEND { ID: account#User_A, account_total: $100, account_total_calculation_time: .., changes_since_last_calculation: [ 1399493200: +$10, 1399892139: -$25 ] }
  24. 24. THINK OF DATA SYNCHRONIZATION • application-level synchronization: • e.g. update user profile in document-oriented storage, it's social network in graph storage, and session in key-value cache • regular synchronization: • this may be a hourly/daily/weekly process that takes updated data and propagates across the system • incremental background synchronization • solutions likeTungsten synchronizer allows you to track changes in RDBS via transactional log, and apply these changes immediately to NoSQL storage • e.g. user profiles in MySQL synchronized with Aerospike via property configuredTungsten Replicator
  25. 25. –AntonYazovskiy “always remember that in most cases you run queries across the cluster”
  26. 26. Any questions? Thank you @yazovsky ayazovksiy@thumbtack.net www.thumbtack.net
  27. 27. THANKS / REFERENCES • NoSQL Distilled:A Brief Guide to the Emerging World of Polyglot Persistence by Pramod J. Sadalage and Martin Fowler • NoSQL Data ModelingTechniques (http://highlyscalable.wordpress.com) • MongoDB documentation (http://docs.mongodb.org) • Couchbase documentation (http://docs.couchbase.com) • FoundationDB Blog (http://blog.foundationdb.com)

×