Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strip your TEXT fields - Exeter Web Feb/2016

1,729 views

Published on

We use a TEXT field to store JSON, plain text and sometimes even HTML content. But why this kind of field is so prejudicial to your database? What can we use instead to have the same flexibility? And if it can't be avoided, what can be the best solution to using it?

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Strip your TEXT fields - Exeter Web Feb/2016

  1. 1. strip your `TEXT` fields Gabi @gabidavila
 http://gabriela.io
  2. 2. about • Data Engineer • @Crowdcube
  3. 3. uses • serialised Arrays • JSON • large strings • images*
  4. 4. how does it work?
  5. 5. alter table •creates a temporary table with the new structure •copies the data from the old table to the new one •consolidates the new table
  6. 6. example
  7. 7. alter PK: INT -> BIGINT - 2.147.483.648 TO
 2.147.483.647 INT - 922.337.203.854.775.808 TO 922.337.203.854.775.807 BIGINT Illustrationoutofscale
  8. 8. case • > 750 GB • > 380 million lines • 3 TEXT fields • Auto increment: 898.191.090
  9. 9. how long did it take?
  10. 10. 2 days
  11. 11. why?
  12. 12. speed fast slow
  13. 13. storage • 1 file in the disk for each row for each TEXT field • stored in different location than the table data itself • each field up to 4 MB
  14. 14. engines
  15. 15. MyISAM • fastest read speed • supports FULLTEXT indexes • non transactional • less data reliability
  16. 16. InnoDB • transactional • better data integrity • until MySQL 5.6 it didn’t support FULLTEXT indexes in TEXT fields
  17. 17. querying into TEXT fields • inefficient search with LIKE statement • slow DDL operations (like alter table) • unnecessary increase of the table size
  18. 18. possible alternatives
  19. 19. fastest search
  20. 20. search servers • indexes large bodies • api bindings • decoupled
  21. 21. smart retrieval
  22. 22. content delivery network • high availability • cheap
  23. 23. TEXT field within the RDBMS
  24. 24. conclusion • there is no silver bullet solution • some implementations may add an additional layer of complexity to the application • some implementation works better with decoupled applications
  25. 25. thanks! come say hi :)

×