Data Storage
Moving Away from a Single
System Solution
Steve Perry
A brief history of data
storage
• File Storage
• Documents
• Spreadsheets
• Images
• Emails
• Databases
The rise of the database
• Relational data models first proposed in 1970
• By the mid 80’, computing hardware and
become powerful enough to allow the wide
deployment of relational databases
• By the early 90’s relational databases had
achieved total dominance
Data in the modern world
• Move towards the centralisation
• Users are encouraged to live their lives online
• Where you would have stored picture and MP3s on your
local hard drive you now use Instagram or Spotify
• This has given rise to new problems due to the volume
of data, the speed at which it is required or its
semistructured nature.
• Many of these issues can be solved in tradition relation
database but new products are also available
The new breed of data
storage
• New problems need new tools
• Mongo
• Couchbase
• Neo4j
• Hadoop
• These new tools address specific issues
• But do they have to replace your existing tools?
What is MemSQL?
• In memory distributed relational database
• Fully ACID compliant
• Scales horizontally
• Compatible with the MySql wire protocol
• Designed to provide real-time analytics
Demo
• A SQL Server 2012 database with the
AdventureWorks database
• A 3 machine MemSQL cluster
• An in-house Java sync program running on my
local instance
So what just happened?
1. We had a slow running query in SQL Server
2. We span up a MemSQL cluster using docker.The cluster
comprised of one aggregator and two leaf nodes.
3. We turned Change Data Capture on in SQL server
4. We started our Java sync service on my local instance
5. We synced all of the data from SQL Server into MemSql
6. We ran our slow running query in Memsql and found that it ran
much faster
7. We then saw how CDC and the sync service were able to
ensure that the cache was always up to date.
What we learnt
• You are going to make mistakes… Lots of
mistakes.
• If you are using a bleeding edge product, its
imperative that you know its limitations.
• Know at what point it is time to cut your losses
About Me
• Email - sperry1985@gmail.com
• Skype - Live:Steve.Perry
• Twitter - @DefiantDBA
• Head of Database Development at Kurtosys
System

Why would I store my data in more than one database?

  • 1.
    Data Storage Moving Awayfrom a Single System Solution Steve Perry
  • 2.
    A brief historyof data storage • File Storage • Documents • Spreadsheets • Images • Emails • Databases
  • 3.
    The rise ofthe database • Relational data models first proposed in 1970 • By the mid 80’, computing hardware and become powerful enough to allow the wide deployment of relational databases • By the early 90’s relational databases had achieved total dominance
  • 4.
    Data in themodern world • Move towards the centralisation • Users are encouraged to live their lives online • Where you would have stored picture and MP3s on your local hard drive you now use Instagram or Spotify • This has given rise to new problems due to the volume of data, the speed at which it is required or its semistructured nature. • Many of these issues can be solved in tradition relation database but new products are also available
  • 5.
    The new breedof data storage • New problems need new tools • Mongo • Couchbase • Neo4j • Hadoop • These new tools address specific issues • But do they have to replace your existing tools?
  • 6.
    What is MemSQL? •In memory distributed relational database • Fully ACID compliant • Scales horizontally • Compatible with the MySql wire protocol • Designed to provide real-time analytics
  • 7.
    Demo • A SQLServer 2012 database with the AdventureWorks database • A 3 machine MemSQL cluster • An in-house Java sync program running on my local instance
  • 8.
    So what justhappened? 1. We had a slow running query in SQL Server 2. We span up a MemSQL cluster using docker.The cluster comprised of one aggregator and two leaf nodes. 3. We turned Change Data Capture on in SQL server 4. We started our Java sync service on my local instance 5. We synced all of the data from SQL Server into MemSql 6. We ran our slow running query in Memsql and found that it ran much faster 7. We then saw how CDC and the sync service were able to ensure that the cache was always up to date.
  • 9.
    What we learnt •You are going to make mistakes… Lots of mistakes. • If you are using a bleeding edge product, its imperative that you know its limitations. • Know at what point it is time to cut your losses
  • 10.
    About Me • Email- sperry1985@gmail.com • Skype - Live:Steve.Perry • Twitter - @DefiantDBA • Head of Database Development at Kurtosys System