Your SlideShare is downloading. ×
Lessons learnt coverting from SQL to NoSQL
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Lessons learnt coverting from SQL to NoSQL

4,483
views

Published on

Enda Farrell's presentation to the NoSQL track at QCon 2011. It's not full of technical details, but describes some of the "different" complexity that NoSQL projects bring, and it gives some lessons …

Enda Farrell's presentation to the NoSQL track at QCon 2011. It's not full of technical details, but describes some of the "different" complexity that NoSQL projects bring, and it gives some lessons learnt and top tips that have been helpful.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,483
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
59
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • There may be large chunks of this talk that simply do not apply to your situation.If so, sorry. Perhaps you can identify situations that you will want to continue to avoid?I’ve tried to be “general” in that I hope it isn’t team/project/tool specific, but I’ve also tried to be at least a little specific in the lessons learntThe mail goal of this talk is to give you food for thought so that when you do this there will be a little less to be uncertain about or fewer “gotchas”I probably have not done a brilliant job of removing all of the “my situation” specifics, so if it’s confusing, please be kind ;-)
  • There may be large chunks of this talk that simply do not apply to your situation.If so, sorry. Perhaps you can identify situations that you will want to continue to avoid?I probably have not done a brilliant job of removing all of the “my situation” specifics, so if it’s confusing, please be kind ;-)
  • Nokia’s “Places Registry” aims to be the largest validated “point of interest” repository in the world.Question: hands up if you have you ever owned a Nokia phone?Question: who knows that you have free navigation on the Nokia devices?You know that navigation needs some sort of destination and that they are often interesting places (not just your friend’s address  )Hence Nokia built an ecosystem of applications/services to deal with this, and has bought some big names in the Geo/Mapping/SatNav world
  • Names - one mandatory 'default' name and many optional alternative names in multiple languagesCategories - at least one predefined core categoryTags - free text keywordsLocation information – the longitude and latitude of the place on Earth as well as the postal addressContact data - information such as phone number, email address and website URL that may help matching and de-duplication
  • The registry is from where all place information comes from for maps on the web and on Nokia devices – and soon on Windows Phone devices too.
  • Very large? The financial industries will think not …Is it something that any of you will be dealing with? I think so …XX = I can’t say this number ;-o There’s a “Where 2.0” conference coming up soon and those folks have asked that I don’t steal their thunder!
  • These are odd graphs. It’s an unusually quiet Monday morning, but clearly shows a background/underlying pattern.OPR is often a “write heavy” application. Read operations generally take place in constant time, the response times are generally consistent. Write times are pretty quick (200ms or so) for updates to existing POIs and much longer for registrations of new POIS (we check the existing places to see if we already know about that place – a process known as “matching”).The fact that we are (a) write heavy and (b) write costly was something that affected our decision to move to NoSQL.
  • There are more complications here – I had to build another KV (which had the same API as the Nokia vShards) which could be replicated across multiple data centres (as I am in two and will soon be in three) and the Nokia vShards does not yet support multiple data centres.Apologies if you were hoping to see CouchDB-specific lessons
  • Learnt about the business, the first version
  • Moving to custom JPA-based DAOs helped enormously, but writing so many rows in many different tables (some of which were UNINDEXED to speed up insert performance) things were slow
  • If everything “just works” this won’t be an issue, but, …When problems arise, as the ops team still see a regular SQL database and unless you have “real dev-ops” (another top tip) you’re app teams will have to check people’s assumptions
  • In the first releases, when the NoSQL is a synchronised “slave” of the SQL the answer is clearIn later releases if you have a single “cutover” release the answer is clearIf the two co-exist side by side, you will want to have a predefined policy for handling thisbut how do you know that your two systems disagree?
  • If you’re at this conference and you’re in the state of putting a new storage engine under an existing application you probably have more code than you want. Code to check two stores on each request is extraordinarily hard to write and have unit/integrations tests for
  • It can be easy if you can have some sort of cross-system two phased commit
  • What's wrong with what you have? It will take some time - during which things will get worseDo you have to maintain operational capability/up time/processing/new features/customers/clients etcetc while rebuilding? If so there may wellbe senior people in your organisation who do not understand the pros and cons of NoSQL and will be skeptical.
  • It’s just as easy to make bad KV JSON design decisions as it is to make bad schema designs!The team argued over the JSON and we changed the structure a few times. This then meant re-writes of code that accessed things. For other reasons we did not use model objects – and these changes therefore took more and more time.
  • We needed it as we had a wetware/pilot error when one DBA splatted our production database and also managed to corrupt the backup. Good work there. 1.2 million POIs were lost from the SQL database – but as they were in the NoSQL we were able eventually to repair.What are seeders?
  • We needed it as we had a wetware/pilot error when one DBA splatted our production database and also managed to corrupt the backup. Good work there. 1.2 million POIs were lost from the SQL database – but as they were in the NoSQL we were able eventually to
  • Transcript

    • 1. SQL to NoSQL
      Lessons learnt migrating a large and highly-relational database into a "classic" NoSQL
      Enda Farrell @endafarrell
    • 2. “It’s not you, it’s me …”
      This doesn’t apply to you
      … possibly … probably …
      2
    • 3. Here’s what’s coming
      What
      Why
      Complexity
      People
      Tools
      Data
      3
      Lessons
    • 4. What is this service?
      Nokia’s “Ovi Places Registry” aims to be the largest validated point of interest repository in the world
      4
    • 5. What kind of data?
      Names
      Categories
      Tags
      Location information
      longitude and latitude
      postal address
      Contact data
      5
    • 6. 6
    • 7. 7
    • 8. What about it is large and highly relational?
      10s of millions points of interest
      Many many 100s of millions of contributing records
      MySQL DB is 600 GB on disk
      32 tables, 202 columns
      46 non-PRIMARY constraints
      8
    • 9. What usage patterns do you have?
      9
    • 10. “Classic” NoSQL? What did you use?
      It isn’t CouchDB but a variation of a Nokia internal one
      It’s a Key Value store holding JSON
      Without the key you cannot access the value
      It isn’t a “document store” as the store does nothing with the structure
      10
    • 11. Why did you port this to NoSQL?
      bigger and bigger
      Nokia maps – web and phone
      Yahoo! and soon Bing, => Facebook
      Postponed sharding by bigger HDD
      We learnt a lot over the last 3 years
      11
    • 12. Why did you port this to NoSQL? (continued)
      SQL databases can be rigid
      The world is a messy place
      “State field”?
      Integrating other organisations’ data
      12
    • 13. Complicated?
      The SQL and NoSQL databases will need to run in parallel for some time
      Ops &Disk space
      Truth or System of Record
      Reconciliation
      Syncronisation
      Querying
      13
    • 14. Complicated Ops & disk
      Releases, QA, staging and live deployments are more complex when there are two concurrent data storage systems
      What assumptions are other people making?
      2 x HDD
      14
    • 15. Complicated truth
      When the two systems disagree, which one is “right”?
      15
    • 16. Complicated reconciliation
      How do you know that your two data stores disagree?
      Do you check each on on each read/write, or do you have some “batch” code to check equivalence?
      Top tip: build a batch reconciler to check keys and revision/etags
      you _do_ have etags don’t you?!
      16
    • 17. Complicated synchronisation
      Have you ever tried to keep two different calendars synchronised?
      Ever get two email clients telling you you have different numbers of unread mail for the same email account?
      17
    • 18. Complicated querying
      KV stores generally don’t do querying
      Some NoSQL stores allow some, but usually more restricted than SQL
      We used Solr for performance even though it isn’t as powerful as SQL
      The synchronisation complexity is here to stay for us
      18
    • 19. Complexity
      Complexity is often mistaken for “cleverness”
      19
    • 20. Lessons learnt: people
      Why.
      It’s a question you will be asked and you will have to answer
      20
    • 21. Lesson learnt: people
      The “DB~A” role is still needed
      Here the “~A” is more to do with data/information architecture than with administration
      Top tip: design your JSON. Print it out.
      21
    • 22. Lesson learnt: people
      The effect on your team
      You may have a team of enterprise Java-types who are used to writing Eclipse-enabled code
      In our case we wanted to keep the flexibility that JSON gives us, but it meant we no longer had the same sort of model objects
      22
    • 23. Lessons learnt: tools
      Build “SQL to NoSQL” and “NoSQL to SQL” seeders
      You will need to seed your NoSQL from your SQL. You probably have existing DAOs which can form the basis – but this assumes your entities are essentially the same (top tip: keep them so!)
      23
    • 24. Lessons learnt: tools
      Build “SQL to NoSQL” and “NoSQL to SQL” seeders
      “NoSQL to SQL” was a seeder we learnt the hard way.
      24
    • 25. Lessons learnt: tools
      What’s your unit/integration test coverage?
      5 releases post initial launch, is your test data still exercising all code paths?
      25
    • 26. Lesson learnt: tools
      You may find that not everything fits into a Key Value engine
      Even with a queryable index, some data sets really are relational ;-)
      The down-side is that you may therefore have to keep long term the SQL database
      26
    • 27. Lesson learnt: tools
      Visualise your system
      Monitoring: calls, load, response times
      Volumetrics: num docs, HDD, milestones
      Context: draw a systems context diagram
      (which reminds me …)
      27
    • 28. Lesson learnt: data
      Key generation – it’s not sequence numbers nor “auto-increments” anymore
      Many are UUIDs and they are long and ugly
      But “guess and check” is ugly too
      Consistent hash?
      28
    • 29. Lesson learnt: data
      Make the revision/etag of the JSON data visible in the JSON
      Not just in a header (assuming HTTP here)
      The data will be taken off-platform, if you want to change it you will need to know that the revision is (or is not) still the same
      29
    • 30. Lesson learnt: data
      Version of the JSON “schema” in the KV
      You have many docs
      You might have to “upgrade” the structure of the KV doc
      Keep the schema version in the JSON
      30
    • 31. Lesson learnt: data
      Many little not few big?
      Easier to replicate
      “Big” docs can be tough on networks
      Trade-off with more client calls (esp error handling)
      31
    • 32. Probably good ideas
      If you’re _thinking_ about doing this, do use one of the open source ones
      Get one that replicates easily
      Build POCs
      32
    • 33. Thank you!Other questions?
      @endafarrell
      http://endafarrell.net