Taking Diaspora from MongoDB to MySQL (RubyConf 2011)

7,309 views
6,366 views

Published on

Diaspora is the crowd-funded open-source decentralized social network built on Rails. Full buzzword compliance: on by default. We have many thousands of active users and they generate a lot of social data. But after nine months of full-time development with MongoDB as our primary storage engine, a few months ago we converted it all to MySQL. Wait...what? Most people are going the other way, dropping Mongo into a project in place of MySQL or PostgreSQL. Plus, conventional wisdom says that social data is ill-suited to a traditional data store. Come hear a story about a large-scale Rails project that tried it both ways. You'll see crisis and redemption, facts and figures, nerds, kittens, ponycorns, and, of course, the secret sauce. Hecklers will be piped to /dev/null.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,309
On SlideShare
0
From Embeds
0
Number of Embeds
453
Actions
Shares
0
Downloads
40
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Taking Diaspora from MongoDB to MySQL (RubyConf 2011)

    1. 1Image copyright Yukihiro Matsumoto
    2. MongoDB
to
MySQL 2 Image copyright Yukihiro Matsumoto
    3. MongoDB
to
MySQL • Data 2 Image copyright Yukihiro Matsumoto
    4. MongoDB
to
MySQL • Data • Ways
to
store
it 2 Image copyright Yukihiro Matsumoto
    5. MongoDB
to
MySQL • Data • Ways
to
store
it • Youthful
 indiscretion 2 Image copyright Yukihiro Matsumoto
    6. MongoDB
to
MySQL • Data • Ways
to
store
it • Youthful
 indiscretion • Data 2 Image copyright Yukihiro Matsumoto
    7. Sarah
Mei@sarahmei 3
    8. Sarah
Allen (not
me) 4 Photo copyright Lee Lundrigan
    9. 5Images copyright Pivotal Labs, Inc.
    10. 5Images copyright Pivotal Labs, Inc.
    11. 6
    12. 7Photo by Henrik Moltke
    13. Eben
 Moglen 8Photo from The Internet Society, original: http://www.flickr.com/photos/internetsociety/5876916246
    14. 9Images copyright (clockwise from top) Diaspora Inc., 10gen Inc., David Heinemeier Hansson.
    15. 9Images copyright (clockwise from top) Diaspora Inc., 10gen Inc., David Heinemeier Hansson.
    16. 9Images copyright (clockwise from top) Diaspora Inc., 10gen Inc., David Heinemeier Hansson.
    17. “Social
data
isn’t
 relational” 10
    18. “Social
data
isn’t
 relational” ‐Twitter 10
    19. OOH
SHINY 11Photo from flickr user helen61, original: http://www.flickr.com/photos/helen61/2873090945/
    20. “Social
data
isn’t
 relational” ‐Twitter 12
    21. OOH
SHINY 13Photo from flickr user helen61, original: http://www.flickr.com/photos/helen61/2873090945/
    22. Mismatches: 14
    23. Mismatches: • At
the
data
layer 14
    24. Mismatches: • At
the
data
layer • At
the
mapping
layer 14
    25. Mismatches: • At
the
data
layer • At
the
mapping
layerTest‐driving
a
data
migration 14
    26. Mismatches: • At
the
data
layer We
are
here • At
the
mapping
layerTest‐driving
a
data
migration 14
    27. “document‐oriented
 database” 15 Image copyright 10gen Inc.
    28. tv_showsmany seasons many reviews many episodes many cast_members 16
    29. {title: Babylon 5, seasons: [ {season_number: 1, episodes: [ {ordinal_within_season: 1, title: Midnight on the Firing reviews: [{...}], cast_members: [{...}] } ] } ]} 17
    30. {title: Babylon 5, seasons: [ {season_number: 1, episodes: [ {ordinal_within_season: 1, title: Midnight on the Firing reviews: [{...}], cast_members: [{...}] } ] } ]} 17
    31. {title: Babylon 5, seasons: [ {season_number: 1, episodes: [ {ordinal_within_season: 1, title: Midnight on the Firing reviews: [{...}], cast_members: [{...}] } ] } ]} 17
    32. {title: Babylon 5, seasons: [ {season_number: 1, episodes: [ {ordinal_within_season: 1, title: Midnight on the Firing reviews: [{...}], cast_members: [{...}] } ] } ]} 17
    33. 18
    34. 18
    35. usersmany friends many comments many posts many commenters many likes many likers 19
    36. {name: Sarah Mei, friends: [ {name: Thing 1, posts: [ {message: go fly a kite, comments: [{...}], likes: [{...}], reshares: [{...}] } ] } ]} 20
    37. {name: Sarah Mei, friends: [ {name: Thing 1, posts: [ {message: go fly a kite, comments: [{...}], likes: [{...}], reshares: [{...}] } ] } ]} 20
    38. {name: Sarah Mei, friends: [ {name: Thing 1, posts: [ {message: go fly a kite, comments: [{...}], likes: [{...}], reshares: [{...}] } ] } ]} 20
    39. {name: Sarah Mei, friends: [ {name: Thing 1, posts: [ {message: go fly a kite, comments: [{...}], likes: [{...}], reshares: [{...}] } ] } ]} 20
    40. tv_showsmany seasons many reviews many episodes many cast_members 21
    41. usersmany friends many comments many posts many commenters many likes many likers 22
    42. usersmany friends many comments many posts many commenters many likes many likers 23
    43. tv_showsmany seasons many reviews many episodes many cast_members 24
    44. 25
    45. 25
    46. {name: Sarah Mei, friends: [ {name: Thing 1, posts: [ {message: go fly a kite, comments: [{...}], likes: [{...}], reshares: [{...}] } ] } ]} 26
    47. {name: Sarah Mei, friends: [ {user_id: 4b866f08234ae01d21d8960 posts: [ {message: go fly a kite, comments: [{...}], likes: [{...}], reshares: [{...}] } ] } ]} 27
    48. 28Photo from flickr user befuddledsenses, original: http://www.flickr.com/photos/befuddledsenses/2903185831
    49. 1 “Social
data
isn’t
 relational” 29
    50. 1 “Social
data
isn’t
 relational” “Eight‐table
joins
 ZOMG” 29
    51. 2 
When
they
say
documents,
they
really
mean documents. 30 Photo from flickr user horrgakx, original: http://www.flickr.com/photos/horrgakx/2963449465
    52. Mismatches: • At
the
data
layer • At
the
mapping
layer We
are
hereTest‐driving
a
data
migration 31
    53. Faking
It 32
    54. Faking
It• Faux
relational
interface
bad 32
    55. Faking
It• Faux
relational
interface
bad• JS
map/reduce
good 32
    56. Faking
It• Faux
relational
interface
bad• JS
map/reduce
good • But
lower‐level 32
    57. Faking
It• Faux
relational
interface
bad• JS
map/reduce
good • But
lower‐level • And
harder
to
test 32
    58. Y
U
SO
HARD? 33
    59. Y
U
SO
HARD?• Less
documentation 33
    60. Y
U
SO
HARD?• Less
documentation• Harder
to
google
for
stuff 33
    61. Y
U
SO
HARD?• Less
documentation• Harder
to
google
for
stuff• Smaller
community 33
    62. Y
U
SO
HARD?• Less
documentation• Harder
to
google
for
stuff• Smaller
community• Less
gem
support 33
    63. Y
U
SO
HARD?• Less
documentation• Harder
to
google
for
stuff• Smaller
community• Less
gem
support• Developers
had
no
Mongo
experience 33
    64. 34Illustration from flickr user katy_tresedder, original: http://www.flickr.com/photos/katy_tresedder/4902794900
    65. Y
U
SO
HARD? In
General:• Less
documentation• Lower
googlability• Smaller
community• Less
gem
support• Team
of
developers
with
no
Mongo
experience 35
    66. 36Photo from flickr user nswmaritime, original: http://www.flickr.com/photos/nswmaritime/2963649924
    67. 36Photo from flickr user nswmaritime, original: http://www.flickr.com/photos/nswmaritime/2963649924
    68. Every
non‐standard
 technology
choice
reduces
our
ability
to
 iterate
quickly. 37
    69. 38Photo from flickr user nzgabriel, original: http://www.flickr.com/photos/nzgabriel/2607065194
    70. 39
    71. Mismatches: • At
the
data
layer • At
the
mapping
layerTest‐driving
a
data
migration We
are
finally
here 40
    72. 41Photo from flickr user werkunz, original: http://www.flickr.com/photos/werkunz/5160818883
    73. 42Photo from the Seattle Municipal Archives, accessed at: http://www.flickr.com/photos/seattlemunicipalarchives/4328053793
    74. Test‐driving
the
migration 43
    75. “Test‐driving”
the
migration 44
    76. “Test‐driving”
the
migrationThe
Easy
Way 44
    77. “Test‐driving”
the
migrationThe
Easy
Way 1. Test‐drive
conversion
of
a
single
sub‐ document
from
JSON
to
CSV 44
    78. “Test‐driving”
the
migrationThe
Easy
Way 1. Test‐drive
conversion
of
a
single
sub‐ document
from
JSON
to
CSV 2. Test‐drive
import
of
that
CSV
into
AR 44
    79. “Test‐driving”
the
migrationThe
Easy
Way 1. Test‐drive
conversion
of
a
single
sub‐ document
from
JSON
to
CSV 2. Test‐drive
import
of
that
CSV
into
AR 3. Run
it
on
a
copy
of
the
production
database* 44
    80. “Test‐driving”
the
migrationThe
Easy
Way 1. Test‐drive
conversion
of
a
single
sub‐ document
from
JSON
to
CSV 2. Test‐drive
import
of
that
CSV
into
AR 3. Run
it
on
a
copy
of
the
production
database* 4. Add
more
test
cases 44
    81. “Test‐driving”
the
migrationThe
Easy
Way 1. Test‐drive
conversion
of
a
single
sub‐ document
from
JSON
to
CSV 2. Test‐drive
import
of
that
CSV
into
AR 3. Run
it
on
a
copy
of
the
production
database* 4. Add
more
test
cases 44
    82. “Test‐driving”
the
migrationThe
Hard(er)
Way 45
    83. “Test‐driving”
the
migrationThe
Hard(er)
Way • Batching 45
    84. “Test‐driving”
the
migrationThe
Hard(er)
Way • Batching • activerecord‐import 45
    85. “Test‐driving”
the
migrationThe
Hard(er)
Way • Batching • activerecord‐import • LOAD
DATA
INFILE 45
    86. “Test‐driving”
the
migration Gotchas 46
    87. “Test‐driving”
the
migration Gotchas • character
encoding 46
    88. “Test‐driving”
the
migration Gotchas • character
encoding • converting
IDs 46
    89. “Test‐driving”
the
migration Gotchas • character
encoding • converting
IDs • character
encoding 46
    90. Codegithub.com/diaspora/diaspora tag
last‐data‐conversion 47
    91. Diaspora@joindiaspora Sarah
Mei @sarahmei 48Images not otherwise credited are copyright Sarah Mei and licensed cc-attribution
    92. tv_showsmany seasons many reviews many episodes many cast_members 49
    93. surprise! 50 Photo from flickr user industry_is_virtue, original: http://www.flickr.com/photos/industry_is_virtue/3209194592
    94. tv_showsmany seasons many reviews many episodes many cast_members 51
    95. cast_members{character_name: “Cmdr Jeffrey Sinclair”,actor_name: “Michael O’Hare”} 52
    96. So
far: 53
    97. So
far: • Why
MongoDB
wasn’t
a
good
fit
for
 Diaspora’s
data 53
    98. So
far: • Why
MongoDB
wasn’t
a
good
fit
for
 Diaspora’s
data • The
mechanics
of
the
migration 53
    99. So
far: • Why
MongoDB
wasn’t
a
good
fit
for
 Diaspora’s
data • The
mechanics
of
the
migration • Why
we
switched 53

    ×