SlideShare a Scribd company logo
TAGGING SCHEMA DESIGN
FOR HIGH PERFORMANCE
Alexander Tokarev
Senior Developer, DataArt
atokarev@dataart.com
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 3
Plan
• Tagging basis
• Database challenges
• Tagging solutions
• Pros and cons
• Q&A session
Tagging terms
• Tag is a non-hierarchical keyword or term assigned to a piece of
information
• Tags are generally chosen informally and personally by the item's
creator or by its viewer
• If tags are assigned by the creator and are limited it is taxonomy
• If tags are assigned by the viewer and are unlimited it is folksonomy
• Started to be widely used from 2003 by Flikr and Delicious web sites
• Tags are showed usually inline as well as tag cloud
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 4
Tagging terms
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 5
INLINE TAG CLOUD
Tagging challenges
1. Used vocabulary reflects the
user’s vocabulary directly
2. Flexibility - the user can add or
remove tags
3. Multi-dimensional nature - users
can assign any number and
combination of tags to express a
concept
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 6
1. Specialized tags or tags without
meaning to others than themselves,
misspellings, singular/plural form,
compound words
2. Tags are often ambiguous, overly
personalized, poorly applied tag
3. Using synonyms, acronyms and
homonyms which aren’t handled
well
+ –
Database challenges
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 7
1. Performance
2. Queries awkwardness
3. Database size
4. Housekeeping
High normalized approach
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 8
Denormalized approach
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 9
Complex data type approach
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 10
Full-text-search oriented solutions
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 11
Stackoverflow: <php><mysql><guid><encryption>
JSON: {“tags”:[“php”, “apache2”, “openinviter”]}
Full-text-search approaches
Approach 1 Approach 2
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 12
FTS
inside DB
+
FTS model
Application
server
Relational/denormalized/FTS
model
FTS server
(Lucene, Sphinx,
Elastic, Solr,
Xapian, etc)
Application
server
Housekeeping
Denormalized/FTS
1. Change all affected tags in all documents if a tag name changed
FTS
1. FTS index rebuild due fragmentation
2. FTS index refresh if it isn’t refreshed on COMMIT
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 13
25 November 2016
Test example
StackOverflow posts via
http://data.stackexchange.com/
From 31/07/2008 to 21-12-2012
Posts: 2 680 474
Applied tags: 7 791 527
Used unique tags: 30 485
Max tags count for a post: 5
Comparison
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 15
Comparison.
Initial population time
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 16
0 500 1000 1500 2000 2500
Relational
Denormalized
Complex data type
Full text search
Insert time
Model Insert time, sec
Relational 1048
Denormalized 1205
Complex data type 2086
Full text search 1950
Comparison.
Database size
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 17
0 200 400 600 800 1000 1200 1400
Relational
Denormalized
Complex data type
Full text search
DB size
Index size, MB Data size, MB Size total, MB
Model Size total, MB Data size, MB Index size, MB
Relational 1166 338 828
Denormalized 1080 376 704
Complex data type 1134 256 878
Full text search 1055 416 639
Comparison.
Search by document id and all tag retrieval
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 18
0 0.2 0.4 0.6 0.8 1
Relational
Denormalized
Complex data type
Full text search
Speed with cold cache,
seconds
0 0.001 0.002 0.003 0.004
Relational
Denormalized
Complex data type
Full text search
Speed with hot cache, seconds
Model Speed with cold cache, sec Speed with hot cache, sec.
Relational 0,2 0,003
Denormalized 0,07 0,002
Complex data type 0,9 0,002
Full text search 0,3 0,001
Comparison.
Search using 1 tags and all tag retrieval
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 19
0 0.5 1 1.5 2
Relational
Denormalized
Complex data type
Full text search
Speed with cold cache,
seconds
0 0.0010.0020.0030.0040.0050.006
Relational
Denormalized
Complex data type
Full text search
Speed with hot cache, seconds
Model Speed with cold cache, sec Speed with hot cache, sec
Relational 1 0,005
Denormalized 0,7 0,004
Complex data type 1,7 0,005
Full text search 0,7 0,002
Comparison.
Search by AND using 2 tags & all tag retrieval
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 20
0 10 20 30 40 50
Relational
Denormalized
Complex data type
Full text search
Search speed
Speed with hot cache, seconds Speed with cold cache, seconds
Model
Speed with cold
cache, sec
Speed with hot
cache, sec
Relational 40 34
Denormalized 34 20
Complex data
type 34 14
Full text
search 20 2
Comparison.
Cloud tag population
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 21
0 50 100 150 200 250
relation
relational simplified
relational without fk
denormalized
array
fts
Speed, seconds
Model Speed, seconds
Relation 20
Relational simplified 18
Relational without fk 202
Denormalized 18
Complex data type 21
fts 40
Pros & Cons
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 22
Model
Space
consumption
Search
performance Insert performance Maintenance
Additional
housekeeping
Risk of
failure
Search queries
development
Relational
worst worst highest minimal not required no worst
Denormalized
moderate moderate good required required no moderate
Complex data
type
moderate moderate worst required required no moderate
Full text search
optimal optimal moderate required required yes optimal
There is no silver bullet
for tag storage model!
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 23
Conclusion
Conclusion
1. Choose your best model based on:
• Performance (search/insert/update)
• Space consumption
• Engineer experience
• Hardware cost
• Software cost
2. Each storage model should be checked on your RDBMS – don’t be afraid to try
and measure
3. Understanding how complex data types are stored inside is crucial
4. Understanding how FTS works inside is crucial
5. Investigate your DBMS unique features
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 24
Q&A
25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 25
THANK YOU!
Alexander Tokarev
Senior Developer, DataArt
atokarev@dataart.com
IT talk SPb: Найдется все

More Related Content

Viewers also liked

«Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE...
 «Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE... «Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE...
«Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE...DataArt
 
Jkd indoor & outdoor signage catalog
Jkd indoor & outdoor signage catalogJkd indoor & outdoor signage catalog
Jkd indoor & outdoor signage catalogSatyendra Gupta
 
Артур Чеканов «Microframeworks» (Python Meetup)
Артур Чеканов  «Microframeworks» (Python Meetup)Артур Чеканов  «Microframeworks» (Python Meetup)
Артур Чеканов «Microframeworks» (Python Meetup)DataArt
 
Estrategika nuevos productos proteccion
Estrategika nuevos productos proteccionEstrategika nuevos productos proteccion
Estrategika nuevos productos proteccionJUAN CARLOS CALDERON
 
Лилия Зданевич "Automation testing save time and money"
Лилия Зданевич "Automation testing save time and money"Лилия Зданевич "Automation testing save time and money"
Лилия Зданевич "Automation testing save time and money"DataArt
 
Яна Пролис "Business value: developers against product owner"
Яна Пролис "Business value: developers against product owner"Яна Пролис "Business value: developers against product owner"
Яна Пролис "Business value: developers against product owner"DataArt
 
180 blue dining room training
180 blue dining room training180 blue dining room training
180 blue dining room trainingBill Buffalo
 
Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...
Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...
Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...DataArt
 
Jornal HOJE! - Edição 840
Jornal HOJE! - Edição 840Jornal HOJE! - Edição 840
Jornal HOJE! - Edição 840Jornal HOJE!
 
Constitucionalidad internet
Constitucionalidad internetConstitucionalidad internet
Constitucionalidad internetJoel Quintana
 
Андрей Беляев - 20 лет Java
Андрей Беляев - 20 лет JavaАндрей Беляев - 20 лет Java
Андрей Беляев - 20 лет JavaDataArt
 
Big data school demo
Big data school demoBig data school demo
Big data school demoDataArt
 

Viewers also liked (17)

Kudzu and Palmer Amaranth Weed Pests
Kudzu and Palmer Amaranth Weed PestsKudzu and Palmer Amaranth Weed Pests
Kudzu and Palmer Amaranth Weed Pests
 
Mobile Flex-Cool Brochure
Mobile Flex-Cool BrochureMobile Flex-Cool Brochure
Mobile Flex-Cool Brochure
 
«Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE...
 «Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE... «Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE...
«Azure Mobile Apps: и снова о мобильных сервисах», Анастасия Белокурова (.NE...
 
Jkd indoor & outdoor signage catalog
Jkd indoor & outdoor signage catalogJkd indoor & outdoor signage catalog
Jkd indoor & outdoor signage catalog
 
Артур Чеканов «Microframeworks» (Python Meetup)
Артур Чеканов  «Microframeworks» (Python Meetup)Артур Чеканов  «Microframeworks» (Python Meetup)
Артур Чеканов «Microframeworks» (Python Meetup)
 
Estrategika nuevos productos proteccion
Estrategika nuevos productos proteccionEstrategika nuevos productos proteccion
Estrategika nuevos productos proteccion
 
Лилия Зданевич "Automation testing save time and money"
Лилия Зданевич "Automation testing save time and money"Лилия Зданевич "Automation testing save time and money"
Лилия Зданевич "Automation testing save time and money"
 
Яна Пролис "Business value: developers against product owner"
Яна Пролис "Business value: developers against product owner"Яна Пролис "Business value: developers against product owner"
Яна Пролис "Business value: developers against product owner"
 
180 blue dining room training
180 blue dining room training180 blue dining room training
180 blue dining room training
 
Boxwood Blight
Boxwood BlightBoxwood Blight
Boxwood Blight
 
Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...
Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...
Дмитрий Паньков ( DataArt) "Business intelligence: перспективы для будущих сп...
 
Bean Plataspid
Bean PlataspidBean Plataspid
Bean Plataspid
 
Jornal HOJE! - Edição 840
Jornal HOJE! - Edição 840Jornal HOJE! - Edição 840
Jornal HOJE! - Edição 840
 
Constitucionalidad internet
Constitucionalidad internetConstitucionalidad internet
Constitucionalidad internet
 
git - the basics
git - the basicsgit - the basics
git - the basics
 
Андрей Беляев - 20 лет Java
Андрей Беляев - 20 лет JavaАндрей Беляев - 20 лет Java
Андрей Беляев - 20 лет Java
 
Big data school demo
Big data school demoBig data school demo
Big data school demo
 

Similar to IT talk SPb: Найдется все

The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Go fast in a graph world
Go fast in a graph worldGo fast in a graph world
Go fast in a graph worldAndrea Giuliano
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingDATAVERSITY
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016Matt Turner
 
Metadata Rules Folders Drool
Metadata Rules Folders DroolMetadata Rules Folders Drool
Metadata Rules Folders DroolTamara Bredemus
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
 
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 
SEO Audit of JLITTLEFORD.COM.pdf
SEO Audit of JLITTLEFORD.COM.pdfSEO Audit of JLITTLEFORD.COM.pdf
SEO Audit of JLITTLEFORD.COM.pdfSQ Expertise
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseAtScale
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentOntotext
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...
Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...
Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...Informatik Aktuell
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google SearchKasper de Waard
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfamrapalibuildersreviews
 

Similar to IT talk SPb: Найдется все (20)

The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Go fast in a graph world
Go fast in a graph worldGo fast in a graph world
Go fast in a graph world
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-Finding
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
 
Metadata Rules Folders Drool
Metadata Rules Folders DroolMetadata Rules Folders Drool
Metadata Rules Folders Drool
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
 
SEO Audit of JLITTLEFORD.COM.pdf
SEO Audit of JLITTLEFORD.COM.pdfSEO Audit of JLITTLEFORD.COM.pdf
SEO Audit of JLITTLEFORD.COM.pdf
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...
Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...
Wolfgang Epting – IT-Tage 2015 – Testdaten – versteckte Geschäftschance oder ...
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
Introduction to PIG
Introduction to PIG Introduction to PIG
Introduction to PIG
 
Cheatsheet: Google Search
Cheatsheet: Google SearchCheatsheet: Google Search
Cheatsheet: Google Search
 
Google Search Cheat Sheet
Google Search Cheat SheetGoogle Search Cheat Sheet
Google Search Cheat Sheet
 
Amrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdfAmrapali builders -- google cheatsheet.pdf
Amrapali builders -- google cheatsheet.pdf
 

More from DataArt

DataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human ApproachDataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human ApproachDataArt
 
DataArt Healthcare & Life Sciences
DataArt Healthcare & Life SciencesDataArt Healthcare & Life Sciences
DataArt Healthcare & Life SciencesDataArt
 
DataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt
 
About DataArt HR Partners
About DataArt HR PartnersAbout DataArt HR Partners
About DataArt HR PartnersDataArt
 
Event management в IT
Event management в ITEvent management в IT
Event management в ITDataArt
 
Digital Marketing from inside
Digital Marketing from insideDigital Marketing from inside
Digital Marketing from insideDataArt
 
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)DataArt
 
DevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проектDevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проектDataArt
 
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArtIT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArtDataArt
 
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
 «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han... «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...DataArt
 
Communication in QA's life
Communication in QA's lifeCommunication in QA's life
Communication in QA's lifeDataArt
 
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьмиНельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьмиDataArt
 
Знакомьтесь, DevOps
Знакомьтесь, DevOpsЗнакомьтесь, DevOps
Знакомьтесь, DevOpsDataArt
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real lifeDataArt
 
Codeless: автоматизация тестирования
Codeless: автоматизация тестированияCodeless: автоматизация тестирования
Codeless: автоматизация тестированияDataArt
 
Selenoid
SelenoidSelenoid
SelenoidDataArt
 
Selenide
SelenideSelenide
SelenideDataArt
 
A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"DataArt
 
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...DataArt
 
IT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNGIT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNGDataArt
 

More from DataArt (20)

DataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human ApproachDataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human Approach
 
DataArt Healthcare & Life Sciences
DataArt Healthcare & Life SciencesDataArt Healthcare & Life Sciences
DataArt Healthcare & Life Sciences
 
DataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt Financial Services and Capital Markets
DataArt Financial Services and Capital Markets
 
About DataArt HR Partners
About DataArt HR PartnersAbout DataArt HR Partners
About DataArt HR Partners
 
Event management в IT
Event management в ITEvent management в IT
Event management в IT
 
Digital Marketing from inside
Digital Marketing from insideDigital Marketing from inside
Digital Marketing from inside
 
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
 
DevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проектDevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проект
 
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArtIT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
 
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
 «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han... «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
 
Communication in QA's life
Communication in QA's lifeCommunication in QA's life
Communication in QA's life
 
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьмиНельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
 
Знакомьтесь, DevOps
Знакомьтесь, DevOpsЗнакомьтесь, DevOps
Знакомьтесь, DevOps
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real life
 
Codeless: автоматизация тестирования
Codeless: автоматизация тестированияCodeless: автоматизация тестирования
Codeless: автоматизация тестирования
 
Selenoid
SelenoidSelenoid
Selenoid
 
Selenide
SelenideSelenide
Selenide
 
A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"
 
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
 
IT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNGIT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNG
 

Recently uploaded

Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 

Recently uploaded (20)

Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 

IT talk SPb: Найдется все

  • 1.
  • 2. TAGGING SCHEMA DESIGN FOR HIGH PERFORMANCE Alexander Tokarev Senior Developer, DataArt atokarev@dataart.com
  • 3. 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 3 Plan • Tagging basis • Database challenges • Tagging solutions • Pros and cons • Q&A session
  • 4. Tagging terms • Tag is a non-hierarchical keyword or term assigned to a piece of information • Tags are generally chosen informally and personally by the item's creator or by its viewer • If tags are assigned by the creator and are limited it is taxonomy • If tags are assigned by the viewer and are unlimited it is folksonomy • Started to be widely used from 2003 by Flikr and Delicious web sites • Tags are showed usually inline as well as tag cloud 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 4
  • 5. Tagging terms 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 5 INLINE TAG CLOUD
  • 6. Tagging challenges 1. Used vocabulary reflects the user’s vocabulary directly 2. Flexibility - the user can add or remove tags 3. Multi-dimensional nature - users can assign any number and combination of tags to express a concept 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 6 1. Specialized tags or tags without meaning to others than themselves, misspellings, singular/plural form, compound words 2. Tags are often ambiguous, overly personalized, poorly applied tag 3. Using synonyms, acronyms and homonyms which aren’t handled well + –
  • 7. Database challenges 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 7 1. Performance 2. Queries awkwardness 3. Database size 4. Housekeeping
  • 8. High normalized approach 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 8
  • 9. Denormalized approach 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 9
  • 10. Complex data type approach 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 10
  • 11. Full-text-search oriented solutions 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 11 Stackoverflow: <php><mysql><guid><encryption> JSON: {“tags”:[“php”, “apache2”, “openinviter”]}
  • 12. Full-text-search approaches Approach 1 Approach 2 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 12 FTS inside DB + FTS model Application server Relational/denormalized/FTS model FTS server (Lucene, Sphinx, Elastic, Solr, Xapian, etc) Application server
  • 13. Housekeeping Denormalized/FTS 1. Change all affected tags in all documents if a tag name changed FTS 1. FTS index rebuild due fragmentation 2. FTS index refresh if it isn’t refreshed on COMMIT 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 13
  • 14. 25 November 2016 Test example StackOverflow posts via http://data.stackexchange.com/ From 31/07/2008 to 21-12-2012 Posts: 2 680 474 Applied tags: 7 791 527 Used unique tags: 30 485 Max tags count for a post: 5
  • 15. Comparison 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 15
  • 16. Comparison. Initial population time 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 16 0 500 1000 1500 2000 2500 Relational Denormalized Complex data type Full text search Insert time Model Insert time, sec Relational 1048 Denormalized 1205 Complex data type 2086 Full text search 1950
  • 17. Comparison. Database size 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 17 0 200 400 600 800 1000 1200 1400 Relational Denormalized Complex data type Full text search DB size Index size, MB Data size, MB Size total, MB Model Size total, MB Data size, MB Index size, MB Relational 1166 338 828 Denormalized 1080 376 704 Complex data type 1134 256 878 Full text search 1055 416 639
  • 18. Comparison. Search by document id and all tag retrieval 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 18 0 0.2 0.4 0.6 0.8 1 Relational Denormalized Complex data type Full text search Speed with cold cache, seconds 0 0.001 0.002 0.003 0.004 Relational Denormalized Complex data type Full text search Speed with hot cache, seconds Model Speed with cold cache, sec Speed with hot cache, sec. Relational 0,2 0,003 Denormalized 0,07 0,002 Complex data type 0,9 0,002 Full text search 0,3 0,001
  • 19. Comparison. Search using 1 tags and all tag retrieval 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 19 0 0.5 1 1.5 2 Relational Denormalized Complex data type Full text search Speed with cold cache, seconds 0 0.0010.0020.0030.0040.0050.006 Relational Denormalized Complex data type Full text search Speed with hot cache, seconds Model Speed with cold cache, sec Speed with hot cache, sec Relational 1 0,005 Denormalized 0,7 0,004 Complex data type 1,7 0,005 Full text search 0,7 0,002
  • 20. Comparison. Search by AND using 2 tags & all tag retrieval 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 20 0 10 20 30 40 50 Relational Denormalized Complex data type Full text search Search speed Speed with hot cache, seconds Speed with cold cache, seconds Model Speed with cold cache, sec Speed with hot cache, sec Relational 40 34 Denormalized 34 20 Complex data type 34 14 Full text search 20 2
  • 21. Comparison. Cloud tag population 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 21 0 50 100 150 200 250 relation relational simplified relational without fk denormalized array fts Speed, seconds Model Speed, seconds Relation 20 Relational simplified 18 Relational without fk 202 Denormalized 18 Complex data type 21 fts 40
  • 22. Pros & Cons 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 22 Model Space consumption Search performance Insert performance Maintenance Additional housekeeping Risk of failure Search queries development Relational worst worst highest minimal not required no worst Denormalized moderate moderate good required required no moderate Complex data type moderate moderate worst required required no moderate Full text search optimal optimal moderate required required yes optimal
  • 23. There is no silver bullet for tag storage model! 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 23 Conclusion
  • 24. Conclusion 1. Choose your best model based on: • Performance (search/insert/update) • Space consumption • Engineer experience • Hardware cost • Software cost 2. Each storage model should be checked on your RDBMS – don’t be afraid to try and measure 3. Understanding how complex data types are stored inside is crucial 4. Understanding how FTS works inside is crucial 5. Investigate your DBMS unique features 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 24
  • 25. Q&A 25 November 2016 T A G G I N G S C H E M A D E S I G N F O R H I G H P E R F O R M A N C E 25
  • 26. THANK YOU! Alexander Tokarev Senior Developer, DataArt atokarev@dataart.com

Editor's Notes

  1. May be show uber-optimized version
  2. Synonyms are bad for reporting
  3. In order to respond these challenges appropriate database design should be applied. HK – indexing, reindexing, tag change name, компромисс между realtime и прочее
  4. Tell about clusters or IOT
  5. Tell about clusters
  6. Tell how they are set up in oracle and index tricks. It is significant to understand how complex data types are implemented in your database and where complex data are actually stored in.
  7. Tags are stored in structured format Usage of full text search improves search by tags via native language It is deadly simple to deal with previously mentioned data models but it worth to stay on fts in detail
  8. SQL search approach is rather straightforward so let’s consider FTS approach. full text search index is maintained either in DB or in dedicated server. App server uses FTS dialect either of db or a server. We will have a look into Approach 1. Pros and cons out of the ItTalk. Stackoverflow uses MSSql and Elastic for instance in model 2 with FTS model.
  9. Index becomes fragmented due delete/insert usually adds new records and invalidates old
  10. We took real world data via sql-like interface to StackOverflow. Please pay attention about maximum tag count for a post – I presume it is done intentionally. I presume they use 4rd data model and use VARCHAR field rather than CLOB/BLOB. Permits to export by 50000 bunches + capture required. Let’s have a look how we created tables.
  11. For some models difference is more 2 times. The reason is clear – fts maintenance, parcing.
  12. Please pay attention it is only for Oracle DB. That stuff is completely DB-dependend. 5 years – 1 Gb so it worth to think about in-memory solutions. Let’s have a look into queries and will see in tables.
  13. The difference it time for cache is huge so I put in 2 diagrams Sophisticated plan 2. starts from tag meanwhile complex data type starts from document 4. Could be faster using varchar2 and USE CACHE option which is switched off by default 1, 2 and 3 could be faster and consume less space using Oracle tricks like IOT/clusters (joined values are located closer) but aren’t used to not make the test very Oracle tailored.
  14. There is an opinion arrays are extremely fast in Postgress due they work completely different than in Oracle. Please pay attention that first attempt in FTS in slightly different from the second – second is the same as cold cache. It seems Oracle initialize some structures on first attempt so it is 2-3 times slower that the second so here the second is mentioned. Complex datatype makes like FTS sort of init if we search by it so it is slower.
  15. Please pay attention that extra table is omitted so the performance is nearly equal to denormalized. If we drop PK we use index so it takes extra time.
  16. By maintenance I mean additional actions in case of tag changing
  17. 1. Due results could be very different all over databases
  18. I would be happy if someone could repeat the cases in other DBMS + some additional features like full document list fetch as well as paging, IOT/clusters/in-memory – I’m ready to share table structure as well as dataset or you could speak with DataArt PR and I’ll do it by myself.