SlideShare a Scribd company logo
Alexander Tokarev
In-Memory Computing Summit Europe 2018
Oracle In-Memory from trenches
Speed-of-light faceted search
• Alexander Tokarev
• Age 38
• Database performance architect at DataArt:
1. Solution design
2. Performance tuning
3. POCs and crazy ideas
• First experience with Oracle - 2001, Oracle 8.1.7
• First experience with In-Memory solutions - 2015
• Lovely In-Memory databases:
1. Oracle InMemory
2. Exasol
3. Tarantool
• Hobbies: spearfishing, drums
Who am I
DataArt
Consulting, solution design, re-engineering
20 development centers
>2500 employees
Famous clients: NASDAQ
S&P
Ocado
JacTravel
Maersk
Regus and etc
Who is my employer
 Faceted search
 Project
 Architecture
 Faceted search place
 Performance issues
 In-memory internals
 Implementation steps and traps
 Key findings
 Conclusion
 Q&A
Overview
The presentation may include predictions, estimates or other information that
might be considered forward-looking. While these forward-looking
statements represent our current judgment on what the future holds, they are
subject to risks and uncertainties that could cause actual results to differ
materially. You are cautioned not to place undue reliance on these forward-
looking statements, which reflect our opinions only as of the date of this
presentation.
We are not obligating ourselves to revise or publicly release the results of any
revision to these statements in light of new information or future events.
Safe Harbor Statement
Faceted search
Facet
Constraint
Facet count
Facet types
Values/Terms
Interval
Range
1. Filter by multiple taxonomies
2. Combine text, taxonomy terms and numeric values
3. Discover relationships or trends between objects
4. Make huge items volume navigable
5. Simplify "advanced" search UI
What for
Tags
• Keyword assigned to an object
• Chosen informally and personally by item's creator or viewer
• Assigned by the viewer + unlimited = Folksonomy
• Assigned by the creator + limited = Taxonomy
• Tag-based
• Plain-structure based
Faceted search base
Object Author Category Format Price
Days to
deliver
The Oracle Book: Answers to
Life's Questions
Cerridwen
Greenleaf Fortune telling Paper 18 4
Ask a Question, Find Your Fate:
The Oracle Book Ben Hale Fantasy Kindle 12 2
Object Tags
Book 1 Paper, Fortune telling, Very good book, worth reading
Book 2 Ben Halle, Fantasy, Kindle
1. Facet source: taxonomy + folksonomy
2. Facet types: terms mostly
3. Implementation type: tag-based
Our case facets
Our case statistics
Extracted entities: Objects, Tags, Tags of objects, Facet types
Date: 2016 to 2017
Tagged objects: 3 000 000
Applied tags: 42 000 000
Unique tags count: 100 000
Max tags count for an object: 15
Max tag length: 50
Facets count: 150
Data volume = StackOveflow x 3!
Architecture
Oracle
main
Oracle
DG
Application
server
Application
server
Load
balancer
Client
browser
10000 RPS
In-memory
cluster
Architecture
~20 fine-tuned SQL queries
UI
Database structure
OLTP DWH
ONLY ACTIVE VALUES
Search table structure
0.0010.010.11
Search by PK
Search by 1 tag
Search by 5 tags
No In-Memory
Performance
Full Text Search server
Act III
To FTS, or not to FTS, that is the
question
Solution design
Implementation
It isn’t our case completely!
1. Limited POC resources: people + time
2. Customer has a license
3. Wide searсh table
4. A lot of rows
5. A lot of equal values:
• Object types
• Facet types
• Tag names
6. Size is fine for InMemory
7. Queries use a lot of aggregate functions
Why to try
Dual storage format!
Internals
Search table structure
Naive implementation
InMemory size
Options Volume, Gb
data in row format 6,5
no compress InMememory 7,2
Performance
Performance
Performance
Performance profit
2x <> 21x!
Where is our performance?!
InMemory internals
1. IMCU – InMemory Compression Unit
Size = 1 Mb
Columnar Data
2. SMU – Snapshot Metadata Unit
Size = 64 Kb
Zone Map Based Index
Zone Maps
ZoneMap on
State column
InMemory Zone Maps
Implementation
Implementation
New InMemory Zone Maps
InMemory size
Options
Volume,
Gb
Load time,
seconds
data in row format 6,5 300
no compress InMememory 7,2 40
memcompress for dml 6 45
memcompress for query low 4 45
memcompress for query high 2,5 49
memcompress for capacity low 3,5 48
memcompress for capacity high 2 50
No significant differences for loading!
FastStart
3x loading speed boost!
Columnar format
Test rerun
Where is our performance?!
Same metrics!
Zone Maps nuances
Zone Maps not efficient!
Table structure
+ all indexes are dropped
Final table structure
Searchable via Full Text Search indexes!
Final table structure
+ all indexes are dropped
No InMemory
• Sporadic search degradation – 5-10%
• Happens when a lot of records inserted
Performance spikes
1. Changed records -> Mark as stale
2. Stale records -> Read from row storage
• buffer cache
• disk
3. Stale records -> repopulate IMCU:
• Staleness threshold
• Trickle repopulation process - each 2 minutes
• processes count - INMEMORY_MAX_REPOPULATE_SERVERS
• processes utilization - INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT
Transaction processing
1. INMEMORY_MAX_REPOPULATE_SERVERS = 4
2. INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT = 8
Performance spikes elimination
Troubleshooting
More than 500 statistics!
More than 190 parameters!
Troubleshooting
8 statistics covers 90% cases!
Troubleshooting
3 parameters covers 90% cases!
Troubleshooting
View name Description
V$IM_COL_CU SMU detailed information per column
V$IM_SMU_HEAD SMU header statistics
v$im_segments Inmemory segment parameters
3 views covers 90% cases!
Performance with In-Memory final
Performance with In-Memory final
Performance with In-Memory final
 InMemory size <> table data size
 All data InMemory <> High performance
 Decent time to be loaded
 8 IM statistics is enough
 Trickle parameters relevant to workload
DBAs findings
 Advanced IM features <> significant profit our case
 Zone maps = numeric and date data type only
 Dictionary pruning – not in Oracle InMemory
 Simple data types = High perfomance
 High compression <> Slow ingestion
Developers finding
 Add extra memory 
 Implement a POC with IM DWH
 Understand all 523 IM statistics + 190 parameters
 Use FastStart to speedup database wakeup
 Play with Oracle 18c features
Furthers plans
 Always try and measure
 IM works for short queries as well
 Understanding of IM internals is a must
 Application changes are required
 No extra software/hardware introduced
 Fast POC followed by production deploy
 5x times performance boost
Conclusion
Thank you for your time!
Alexander Tokarev
Database expert
DataArt
shtock@mail.ru
https://github.com/shtock
https://www.linkedin.com/in/alexander-tokarev-14bab230

More Related Content

What's hot

Inside sql server in memory oltp sql sat nyc 2017
Inside sql server in memory oltp sql sat nyc 2017Inside sql server in memory oltp sql sat nyc 2017
Inside sql server in memory oltp sql sat nyc 2017
Bob Ward
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?
Huy Nguyen
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
Databricks
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
John Beresniewicz
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
Databricks
 
SQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should KnowSQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should Know
Bob Ward
 
How to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersHow to performance tune spark applications in large clusters
How to performance tune spark applications in large clusters
Omkar Joshi
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
Guy Harrison
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
Columnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalColumnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil Agarwal
Brent Ozar
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
Databricks
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013
John Beresniewicz
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
Abishek V S
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
Arno Huetter
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 

What's hot (20)

Inside sql server in memory oltp sql sat nyc 2017
Inside sql server in memory oltp sql sat nyc 2017Inside sql server in memory oltp sql sat nyc 2017
Inside sql server in memory oltp sql sat nyc 2017
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 
SQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should KnowSQL Server R Services: What Every SQL Professional Should Know
SQL Server R Services: What Every SQL Professional Should Know
 
How to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersHow to performance tune spark applications in large clusters
How to performance tune spark applications in large clusters
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Columnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalColumnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil Agarwal
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
 
Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 

Similar to Faceted search with Oracle InMemory option

Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM Metrics
Maaz Anjum
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
BigML, Inc
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
MongoDB
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2akitda
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
BrandonWilhelm4
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
Andreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
Warply
 
MongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSMongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMS
Nicholas Tang
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
MongoDB
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FME
Safe Software
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
jixuan1989
 

Similar to Faceted search with Oracle InMemory option (20)

Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM Metrics
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
MongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSMongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMS
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FME
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
 

More from Alexander Tokarev

Rate limits and all about
Rate limits and all aboutRate limits and all about
Rate limits and all about
Alexander Tokarev
 
rnd teams.pptx
rnd teams.pptxrnd teams.pptx
rnd teams.pptx
Alexander Tokarev
 
FinOps for private cloud
FinOps for private cloudFinOps for private cloud
FinOps for private cloud
Alexander Tokarev
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
Alexander Tokarev
 
FinOps introduction
FinOps introductionFinOps introduction
FinOps introduction
Alexander Tokarev
 
Open Policy Agent for governance as a code
Open Policy Agent for governance as a code Open Policy Agent for governance as a code
Open Policy Agent for governance as a code
Alexander Tokarev
 
Row Level Security in databases advanced edition
Row Level Security in databases advanced editionRow Level Security in databases advanced edition
Row Level Security in databases advanced edition
Alexander Tokarev
 
Row level security in enterprise applications
Row level security in enterprise applicationsRow level security in enterprise applications
Row level security in enterprise applications
Alexander Tokarev
 
Inmemory BI based on opensource stack
Inmemory BI based on opensource stackInmemory BI based on opensource stack
Inmemory BI based on opensource stack
Alexander Tokarev
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
Alexander Tokarev
 
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Alexander Tokarev
 
Oracle JSON internals advanced edition
Oracle JSON internals advanced editionOracle JSON internals advanced edition
Oracle JSON internals advanced edition
Alexander Tokarev
 
Oracle Result Cache deep dive
Oracle Result Cache deep diveOracle Result Cache deep dive
Oracle Result Cache deep dive
Alexander Tokarev
 
Oracle result cache highload 2017
Oracle result cache highload 2017Oracle result cache highload 2017
Oracle result cache highload 2017
Alexander Tokarev
 
Oracle json caveats
Oracle json caveatsOracle json caveats
Oracle json caveats
Alexander Tokarev
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
Alexander Tokarev
 
Data structures for cloud tag storage
Data structures for cloud tag storageData structures for cloud tag storage
Data structures for cloud tag storage
Alexander Tokarev
 
Oracle High Availabiltity for application developers
Oracle High Availabiltity for application developersOracle High Availabiltity for application developers
Oracle High Availabiltity for application developers
Alexander Tokarev
 

More from Alexander Tokarev (18)

Rate limits and all about
Rate limits and all aboutRate limits and all about
Rate limits and all about
 
rnd teams.pptx
rnd teams.pptxrnd teams.pptx
rnd teams.pptx
 
FinOps for private cloud
FinOps for private cloudFinOps for private cloud
FinOps for private cloud
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
 
FinOps introduction
FinOps introductionFinOps introduction
FinOps introduction
 
Open Policy Agent for governance as a code
Open Policy Agent for governance as a code Open Policy Agent for governance as a code
Open Policy Agent for governance as a code
 
Row Level Security in databases advanced edition
Row Level Security in databases advanced editionRow Level Security in databases advanced edition
Row Level Security in databases advanced edition
 
Row level security in enterprise applications
Row level security in enterprise applicationsRow level security in enterprise applications
Row level security in enterprise applications
 
Inmemory BI based on opensource stack
Inmemory BI based on opensource stackInmemory BI based on opensource stack
Inmemory BI based on opensource stack
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
 
Oracle JSON internals advanced edition
Oracle JSON internals advanced editionOracle JSON internals advanced edition
Oracle JSON internals advanced edition
 
Oracle Result Cache deep dive
Oracle Result Cache deep diveOracle Result Cache deep dive
Oracle Result Cache deep dive
 
Oracle result cache highload 2017
Oracle result cache highload 2017Oracle result cache highload 2017
Oracle result cache highload 2017
 
Oracle json caveats
Oracle json caveatsOracle json caveats
Oracle json caveats
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Data structures for cloud tag storage
Data structures for cloud tag storageData structures for cloud tag storage
Data structures for cloud tag storage
 
Oracle High Availabiltity for application developers
Oracle High Availabiltity for application developersOracle High Availabiltity for application developers
Oracle High Availabiltity for application developers
 

Recently uploaded

APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 

Recently uploaded (20)

APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 

Faceted search with Oracle InMemory option

Editor's Notes

  1. Hello, guys. My name is Alex and I work for DataArt as a performance architect. Does someone use Oracle database in your applications? Have you ever used Oracle InMemory option to bring a performance profit? Perfect. Today we are going to discuss how we improved faceted search performance with Oracle InMemory option for one of our client.
  2. Let me introduce myself I work for DataArt as a performance architect and my main focus is data warehousing for big clients. I have huge Oracle experience but I works with Exasol and Tarantool databases either.
  3. My employer is a big software house which is spread all over the world. We have a lot of clients in many industries. Probably you know some of them.
  4. We have rather straightforward agenda for today discussion We’ll discuss what is faceted search, client application overview and its performance issues I try to explain how Oracle InMemory option is implemented internally but without digging into details. My main focus is how we made the customer happy and share with you our issues you could face with as well For sure I’ll answer your questions.
  5. Probably you’ve already seen it 5 times today but what I would like to stress that our findings are completely for our case and may be not relevant for you without an thoroughly assesment.
  6. Everybody know what is faceted search. We have groups named facets, Values named constraints and Figures names facet counts
  7. Common facet types are terms Intervals And ranges
  8. We need facets if Users need to filter content using multiple taxonomy terms at the same time. Users want to combine text searches, taxonomy term filtering, and other search criteria. trying to discover relationships or trends between contents. Navigate inside tons of data And your end users are already frightened by "advanced" search forms in enterprise applications
  9. There are many ways how to implement faceted search internally and one of them is using tags. Everyone knows what is tags but let me reming. Tag is a keyword assigned to an object Mostly is is chosen informaly If it is assigned by the viewer and tags count is unlimited it is folksonomy When assigned by the creator and a set of tags is limited it is taxonomy
  10. So we could store data for facets in tags which is extremely convenient for terms faceting by folksonomies and taxonomies Or in a plain table which is very convenient for range, interval facets defined by taxonomies. I think everything is clear enough but may be you have some questions about theory? So, let’s move on
  11. We use taxonomy and foksonomy, so the best part of our facets are terms and we choosen tag-based storage.
  12. Let me show some statistics. We extract objects, tags, facets for 2 years We have about 3 000 000 objects tagged, 42 000 000 terms applied, 100 000 unique terms, maximum tags count is 15 and there are about 150 facets. Is it whether decent volume or not? I think it is decent because we are bigger than StackOverflow by tags in 3 times.
  13. We have ordinal architecture for enterprise. Web browser, balancer, a set of application servers which deal with inmemory grid and oracle with the replica on the backend. Pay attention that there is no full text search server which has been prohibited in accordance with customer policies.
  14. We have microservice architecture and one of the services it tagging service which is responsible for data ingestion and faceted search. It is widely used by many other services and receives about 10000 requests per second. In order to serve search requests there are about 20 fine-tuned sql queries.
  15. I can’t show you real printscreens from the application because of NDA so I implemented the mockup in Paint. As we could see UI is full of features as Predefined faceted search templates Full text search by random tags. Here it is done by names. We could search via audit information and we have faceted navigation for sure. We could filter out objects where some tags are present (not tag values) and user paging. It is rather sophisticated UI.
  16. Without digging into details I could say that the best part of search is made on a wide Denormalized table with tags which stores only current tag values
  17. We could have a look into the table in details. It contains tagged object types, facet information, tags information and values. There is Denormalized object name and object description tags because they are used in full text search mostly.
  18. Let’s have a look into performance metrics. Pay attention the insert performance is very good so I used logarithmic scale to put everything in one diagram. On a first glance figures look good. We do complicated searches less than zero point two seconds but actually it is bad because we need to serve more and more users and data so queries become slow
  19. In accordance with one of my otherspresentations the best option for faceted search is full text search dedicated server but we had a lack of time to implement it as well as reluctant to introduce additional layers
  20. We started looking into InMemory solutions especialy regarding the customer already had Oracle InMemory license. In accordance with Oracle documentation it should be used for analytical queries which isn’t our case completely but we decided to try.
  21. Why we should try Because we are under limitations preasure The customer has a license The table is rather wide, full of data which has a lot of equal values So it will fit in memory especialy regarding compression features And last but not least queries are full of aggregate functions
  22. In accordance with the Oracle documentation InMemory feature is sort of silver bullets so we decided not waste time understanding how it works internaly. What we knew that the option stores row and columnar data in dual mode maintaining their consistency automatically.
  23. Let me remind you our table
  24. We set InMemory area size to 10 gigabytes and assigned the table into the area. We decided that performance will be faster without compression so no compression option was used.
  25. What is significant to stress that InMemory brought some overhead without compression
  26. So we started our tests and faced with the difference in performance was extremely unsignificant which is expect for the search by primary keys but not for search by tags
  27. So we started our tests and faced with the difference in performance was extremely unsignificant which is expect for the search by primary keys but not for search by tags
  28. So we started our tests and faced with the difference in performance was extremely unsignificant which is expect for the search by primary keys but not for search by tags
  29. When we started our POC we relied on Lufthansa usecase when they got average 21 performance profit but our profit was 10 times less. It means that we do something wrong and we need to understand how InMemory works internaly.
  30. Do you recall I mentioned InMemory area? It consists from 2 parts actualy. The first one in IMCU – InMemory Compression Unit which stores data in columnar format And SMU – Snapshot Metadata Unit which stores transaction related metadata, dictionaries and Zone Maps. Do you know what is zone map?
  31. Let’s have a look into simplified example. We have sales table which is split by 1 Mb IMCU blocks. Each block stores minimal and maximum values for each column so when we need to search sales for California we scans zone maps and check does our value is located in min/max boundaries so we do not scan extra blocks.
  32. I converted SMU zone maps in readable format. As you could see they are stored per-column base as I told before. You see numeric zone maps and character ones. Please pay attention that dictionary_entries column contains only zero.
  33. As we understood it happens because no compression were switched on.
  34. Let’s enable the compression
  35. We’ve seen that dictionary extries had been populated which should give dictionary-based pruning in accordance with the documentation
  36. Compression ration is rather high but it depends from you data for sure. You could see that compression doesn’t influence on ingestion so high compression could be used as the default. Frankly speaking loading 6 gigabytes for 1 minute it is to much so we decided to have a look into FastStart feature.
  37. The feature writes InMemory area in a columnar format and once database is rebooted it loads the data in memory with the speed of file reading so we got 3 times loading profit.
  38. It is time for rerun. Try to guess which performance profit we got once we used high compression? Actually no profit at all.
  39. If we return to our table we will see that we use binary format because of global unique identifiers. Zone Maps were not efficient so we
  40. Changed all types to bigint and removed indexes
  41. We understood that no reasons to store all columns inmemory because we search by them via Oracle Full Text Search indexes
  42. So the final structure is simple data types without b-tree indexes and 2 fields are removed from inmemory storage
  43. We did some performance tests and found sporadic search degradation which happens where records are inserted
  44. Let’s understand how Oracle provides read consistency for InMemory Once a record is changed it is marked as stale. All stale records are read either from buffer cache or disk. Oracle repopulates stale records by staleness threshold or each 2 minutes by so named trickle repopulation process. It could be done in parallel plus we could state how much resources could be consumed for stale records repopulation
  45. As soon as we set 4 repopulation servers and 8 percent of time to repopulate stale records and performance spikes disappeared
  46. InMemory option is very easily to trace and maintain but let me share a secret information with you. We needn’t to know all of them at all.
  47. 8 parameters covers the best part of InMemory performance tuning. Let me share one more secret. only 1 parameter is enough. Try to guess which one. It states how efficient zone maps indexes are used.
  48. 3 settings covers the best part of dba activities
  49. And 2 views give us enough information for basic InMemory undertanding
  50. Let’s have a look into final figures. As you could see we have about 4-5 times performance profit which isn’t so good but regarding our use case is fine.
  51. Let’s have a look into final figures. As you could see we have about 4-5 times performance profit which isn’t so good but regarding our use case is fine.
  52. Let’s have a look into final figures. As you could see we have about 4-5 times performance profit which isn’t so good but regarding our use case is fine.
  53. Let’s elaborate DBA findings. InMemory area size should be estimated taking into account compression and your data If we have all data in memory it doesn’t mean we are extremely fast InMemory takes time to be loaded after server reboot 8 statistics is enough to help developers with the best part of their issues Alling tackle processes settings with your workload
  54. Findings for developers are that in case of our usecase extremely advanced inmemory features like join groups, inmemory aggregations hasn’t brought significant performance profit. Zone maps work perfect with numeric and date data types. Dictionary based pruning doesn’t work in our oracle version at least. More simple data types are used – more performance we have And extremely significant that high compression doesn’t affect ingestion speed
  55. Faststart Add some star schema data with star – try what for IM is intended initialy Tell about cool stuff with multithread scans
  56. Our main conclusion is that you shouldn’t rely on documentation and experts – try and measure your particular use-case IM works with short queries as well Without understanding how it works you will not get performance profit You have to change application to use simple data types What is good no extra layers, message queues, hardware need to be added We had managed to put our IM POC in production very fast (1 week actualy) And business got 5 times performance boost. Probably that’s it from my side for today.
  57. Thank you for your time and I’m ready answer your questions