Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton #23

1,082 views

Published on

Talk at Ruby Meditation #23
September 14, Odessa
2018

Published in: Technology
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton #23

  1. 1. Postgres vs Elasticsearch while enriching data. Vlad Somov @ Salt Edge Inc.
  2. 2. Unstructured Data Enrichment Incoming raw data Structured identified data
  3. 3. Keyword1 Keyword2 Website Name Tag Keyword1 Keyword2 Website Name Tag Unstructured Data Enrichment Some Transaction Description Website Incoming raw data Keyword1 Keyword2 Website Structured identified data Name Tag Description Keyword1 Tag
  4. 4. Basic Setup Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch ~4mln. Records
  5. 5. Basic Setup Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch 28.73 9.88 2.10 ~4mln. Records
  6. 6. Basic Setup Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch 1.37 0.99 0.73 28.73 9.88 2.10 ~4mln. Records
  7. 7. B-tree index structure 3 39 68 meta 39 42 55 68 89 943 15 28 3 9 15 21 29 32 39 42 42 48 55 68 68 77 89 93 94 98
  8. 8. 39 B-tree index structure 3 68 meta 39 42 55 68 89 943 15 28 3 9 15 21 29 32 39 42 42 48 55 68 68 77 89 93 94 98
  9. 9. 39 39 B-tree index structure 3 68 meta 42 55 68 89 943 15 28 3 9 15 21 29 32 39 42 42 48 55 68 68 77 89 93 94 98
  10. 10. 39 39 B-tree index structure 3 68 meta 42 55 68 89 943 15 28 3 9 15 21 29 32 39 42 42 48 55 68 68 77 89 93 94 98
  11. 11. 39 39 B-tree index structure 3 68 meta 42 55 68 89 943 15 28 3 9 15 21 29 32 39 42 42 48 55 68 68 77 89 93 94 98
  12. 12. 39 39 B-tree index structure 3 68 meta 42 55 68 89 943 15 28 3 9 15 21 29 32 39 42 42 48 55 68 68 77 89 93 94 98
  13. 13. Why it is useful? • b-tree index sort values inside each node. • b-tree is balanced • Same level nodes are connected using doubly linked list.
  14. 14. After multicolumn index on country_id and merchant_type Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch Postgres + multicolumn index ~4mln. Records
  15. 15. After multicolumn index on country_id and merchant_type Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch Postgres + multicolumn index 28.73 9.88 2.1 ~4mln. Records
  16. 16. After multicolumn index on country_id and merchant_type Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch Postgres + multicolumn index 1.37 0.99 0.73 28.73 9.88 2.1 ~4mln. Records
  17. 17. After multicolumn index on country_id and merchant_type Performance Min Average Max Seconds 0 7.5 15 22.5 30 Postgres Elasticsearch Postgres + multicolumn index 10.19 5.09 2.28 1.37 0.99 0.73 28.73 9.88 2.1 ~4mln. Records
  18. 18. What is GiST Generalized Search Tree • In GiST each leaf contains logical expression and pointer to TID, where indexed data should satisfy logical expression. • Faster on insert, update What is GIN Generalized Inverted Index • It is b-tree with elements to which is connected another b-tree or plain list of TID's. • Faster and more accurate on select.
  19. 19. Welcome to ruby meditation.
 All of us love ruby. Does everyone love meditation? Everyone Of Welcome All Does WelcomeRuby ToOfLove MeditationEveryone 0,1 0,10,1 2,1 1,51,5 2,1 2,1 Yellow rectangle are TID’s. First number is a page number and second is position on a page 0,1 1,52,1 1,5
  20. 20. Welcome to ruby meditation.
 All of us love ruby. Does everyone love meditation? Everyone Of Welcome All Does WelcomeRuby ToOfLove MeditationEveryone 0,1 1,5 0,1 0,10,1 2,1 1,51,5 2,1 2,1 Yellow rectangle are TID’s. First number is a page number and second is position on a page 2,1 1,5 ruby rubylove love
  21. 21. 1,5 1,5 Welcome to ruby meditation.
 All of us love ruby. Does everyone love meditation? Everyone Of Welcome All Does WelcomeRuby ToOfLove MeditationEveryone 0,1 0,1 0,10,1 2,1 1,51,5 2,1 2,1 Yellow rectangle are TID’s. First number is a page number and second is position on a page 2,1 ruby rubylove love love ruby
  22. 22. gin_trgm_ops A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share.
  23. 23. Performance after gin index on websites Min Average Max Seconds 0 5 10 15 20 Postgres Elasticsearch Postgres + multicolumn index Postgres + gin index with trgm_ops on websites ~4mln. Records
  24. 24. Performance after gin index on websites Min Average Max Seconds 0 5 10 15 20 Postgres Elasticsearch Postgres + multicolumn index Postgres + gin index with trgm_ops on websites 28.73 9.88 2.1 ~4mln. Records
  25. 25. Performance after gin index on websites Min Average Max Seconds 0 5 10 15 20 Postgres Elasticsearch Postgres + multicolumn index Postgres + gin index with trgm_ops on websites 1.37 0.99 0.75 28.73 9.88 2.1 ~4mln. Records
  26. 26. Performance after gin index on websites Min Average Max Seconds 0 5 10 15 20 Postgres Elasticsearch Postgres + multicolumn index Postgres + gin index with trgm_ops on websites 10.19 5.09 2.28 1.37 0.99 0.75 28.73 9.88 2.1 ~4mln. Records
  27. 27. Performance after gin index on websites Min Average Max Seconds 0 5 10 15 20 Postgres Elasticsearch Postgres + multicolumn index Postgres + gin index with trgm_ops on websites 0.55 0.34 0.26 10.19 5.09 2.28 1.37 0.99 0.75 28.73 9.88 2.1 ~4mln. Records
  28. 28. How elasticsearch works • It uses analyzers for all incoming data. (it could be custom or default one) • Each analyzer has at least one tokenizer • Zero or more TokenFilters • Tokenizer may be preceded by one or more CharFilters
  29. 29. How analyzer works?
  30. 30. How analyzer works? Input
  31. 31. How analyzer works? Input Char Filter String
  32. 32. How analyzer works? Input Char Filter Tokenizer String String
  33. 33. How analyzer works? Input Char Filter Tokenizer Token Filter String String Tokens
  34. 34. How analyzer works? Input Char Filter Tokenizer Token Filter Output String String Tokens Tokens
  35. 35. Example
  36. 36. Example The 2 QUICK <p>Brown-Foxes</p> jumped over the lazy dog's bone.
  37. 37. Example The 2 QUICK <p>Brown-Foxes</p> jumped over the lazy dog's bone. html_strip The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
  38. 38. Example The 2 QUICK <p>Brown-Foxes</p> jumped over the lazy dog's bone. html_strip The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. standart tokenizer The 2 QUICK Brown jumpedFoxes over the lazy dog’s bone
  39. 39. Example The 2 QUICK <p>Brown-Foxes</p> jumped over the lazy dog's bone. html_strip The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. standart tokenizer The 2 QUICK Brown jumpedFoxes over the lazy dog’s bone lowercase the 2 quick brown jumpedfoxes over the lazy dog’s bone
  40. 40. Example The 2 QUICK <p>Brown-Foxes</p> jumped over the lazy dog's bone. html_strip The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. standart tokenizer The 2 QUICK Brown jumpedFoxes over the lazy dog’s bone lowercase the 2 quick brown jumpedfoxes over the lazy dog’s bone stop 2 quick brown jumpedfoxes over lazy dog’s bone the the
  41. 41. Example The 2 QUICK <p>Brown-Foxes</p> jumped over the lazy dog's bone. html_strip The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. standart tokenizer The 2 QUICK Brown jumpedFoxes over the lazy dog’s bone lowercase the 2 quick brown jumpedfoxes over the lazy dog’s bone stop 2 quick brown jumpedfoxes over lazy dog’s bone snowball 2 quick brown jumpfox over lazi dog bone the the jump lazi dog
  42. 42. Postgres full search implementation • We can use tsvector type to achieve almost the same functionality. By using to_tsvector function • To imporve perfomance we could create separate tsvector column with to_tsvector values. • To create a request we should use to_tsquery. & | <-> • plainto_tsquery works with plain text so you don’t need to insert any special symbols. Inserts & • phraseto_tsquery also works with plain text but marks that each token should be close to each other. Inserts <->
  43. 43. Rum access method • Based on GIN access method code • Solves slow ranking • Solves slow phrase search (tsquery with <-> operator) • Supports index on tsquery column
  44. 44. 122 1 5 3 2 4 4 3 3 4211 Welcome to ruby meditation.
 All of us love ruby. Does everyone love meditation? ruby, meditation, love Everyone Of Welcome All Does WelcomeRuby ToOfLove MeditationEveryone 0,1 0,10,1 2,1 1,51,5 2,1 2,1 The number in green rectangle is word position in the document. 0,1 1,52,1 1,5 8,4 8,4 8,4
  45. 45. 122 1 5 3 2 4 4 3 3 4211 Welcome to ruby meditation.
 All of us love ruby. Does everyone love meditation? ruby, meditation, love Everyone Of Welcome All Does WelcomeRuby ToOfLove MeditationEveryone 0,1 1,5 0,1 0,10,1 2,1 1,51,5 2,1 2,1 The number in green rectangle is word position in the document. 2,1 1,5 ruby rubylove love love ruby 8,4 8,4 8,4
  46. 46. 122 1 5 3 2 4 4 3 3 4211 1,5 1,5 Welcome to ruby meditation.
 All of us love ruby. Does everyone love meditation? ruby, meditation, love Everyone Of Welcome All Does WelcomeRuby ToOfLove MeditationEveryone 0,1 0,1 0,10,1 2,1 1,51,5 2,1 2,1 The number in green rectangle is word position in the document. 2,1 ruby rubylove love love ruby love ruby 8,4 8,4 8,4
  47. 47. Conclusion • Postgres can also be fast. • Multicolumn indexes can improve performance if your search has multicolumn constraints. • For fast text search prefer using Gin when table doesn’t update occasionally, otherwise use GiST • Use gin with trgm_ops when using full text search. If full text search is still slow try to use tsvector data type with gin index on it. • When you have some kind ‘inverse full-text search’ problem. Add tsquery type in your table as a query and incoming data treat as a document. Add rum access method on query column with tsquery_ops for fast classification. • Before moving to other instrument make analysis of current/new instrument and verify is it worth moving or not.
  48. 48. email: vlad.somov@icloud.com twitter: @vsomov93 Questions?

×