The presentation was prepared for Austria Oracle User group 30 years. It tells us a lot of challenges which Oracle developers face with implementing high-load json processing pipelines.
2. “
”
Oracle Database provides all of the benefits of
SQL and relational databases to JSON data, which
you store and manipulate in the same ways and
with the same confidence as any other type of
database data.
Oracle JSON developer guide
Disclaimer: The contents of this presentation are for informal guidance and discussion purposes only.
22. JSON ingestion performance
N, run
with json strict
and unique
names, seconds
with json
seconds
With json strict
and cache,
seconds
with json lax,
seconds
without
seconds
without
constraints
and with
cache
1 132 115 110 121 83 78
2 142 119 110 117 80 78
3 132 119 106 115 91 84
4 136 115 111 110 90 75
5 138 117 109 125 92 78
6 135 122 108 117 90 75
7 134 116 102 117 88 75
8 142 127 105 120 81 78
9 152 115 105 125 80 77
10 147 118 110 114 83 77
AVG,
seconds 139 118 107 118 85 77
23. Storage recommendations
Data types
Small values up to 4000 characters – VARCHAR2
More than 4000 - BLOB
BLOB
2x less space consumption
2x less I/O due less space
No implicit character-set conversion if the database character set is not AL32UTF8
Constraints
JSON STRICT without unique keys
Columns settings
CACHE=YES
24. JSON limitations
Path length - 4000 bytes
In-memory JSON length – 32767 bytes max
sql json functions return value length – 32Kb max
48. Retrieval Common issues
Views often become non mergeable
ORA-600 and ORA-7445 No Data to be read from socket
COUNT(distinct) = ORA-7445 No Data to be read from socket
2 or more json_table = wrong results in aggregates
49. Remediation
dbms_utility.expand_sql_text + sql plan
use single json_table
/*+ NO_MERGE*/ hint
/*+ NO_QUERY_TRANSFORMATION*/ hint
materialize JSON - materialize hint
materialize JSON - by CTAS
apex_json package + json-to-xml tranformation
121. Fast search KEEP pool
1. Allocate KEEP pool memory area using DB_KEEP_CACHE_SIZE
2. Pin in KEEP pool properly all:
DR$ tables
DR$ indexes
LOB segments of DR$ tables
3. Set CACHE for DR$ tables
4. Load data in KEEP pool once by stored procedure or do nothing – keep pool will be
populated by itself during queries
5. Be happy with 5x performance boost
Until server reboot
126. Ultra-fast search InMemory option
Oracle 12.2
Extended data types should be enabled
IMMEMORY_EXPRESSIONS_USAGE=ENABLE
IMMEMORY_VIRTUAL_COLUMNS=ENABLE
IS JSON check constraint is a must
The whole table should be marked as INMEMORY
127. JSON storage InMemory option
Stores JSON in binary OSON format (32 Kb max)
Tries to create in-memory virtual columns
Doesn’t affect json_textcontains operator
Use JSON-based InMemory materialized views instead !
137. Data structure maintenance
Do not base any checks on DBA_JSON_COLUMNS view!
Posfix/prefix columns with JSON data via _JSON like INVOICE_JSON
Create daily checks:
JSON format (strict/lax)
Field type (clob/blob/varchar2)
CACHE option
143. 1. Gather index statistics via CTX_REPORT.INDEX_STAT
2. Collect fragmented indexes - estimated row fragmentation.
3. Collect indexes with many deleted rows - estimated garbage size
4. Run ctx_ddl.optimize_index in FULL mode: SERIAL or PARALLEL
5. Oracle 18C Automatic Background Index Maintenance doesn’t optimize the index
Index optimization
No optimization = up to 10x search performance degradation!
147. Maintenance for STAGE_ITAB option
12.1-12.2
Enable AUTO_OPTIMIZE option
12.2-18
If you under 12.2 proper parallelism should be setup via stage_itab_max_parallel
If you under 12.2 setup how often optimization starts via stage_itab_max_rows
18
Disable AUTO_OPTIMIZE option
Use preference stage_itab_auto_opt
149. Conclusion
JSON = tradeoff
row-per-row scenario is safe
knowledge of Oracle Text is required
.notation isn’t production ready
Only Oracle 18 looks mature
Dedicated JSON search solutions are faster than Oracle
150. THANK YOU FOR YOUR TIME!
Alexander Tokarev
Database expert
DataArt
shtock@mail.ru
https://github.com/shtock
https://www.linkedin.com/in/alexander-tokarev-14bab230
Editor's Notes
Hello, guys.
My name is Alex and we are going to discuss JSON processing today.
Just a small survey
who has oracle json features in production? Have you ever faced with strange json parsing results? Perfect. It means the presentation will help you.
There are a lot of blogs which state that json is extremely powerful and seamlessly integrated inside the Oracle database. Nevertheless the best part of their experience is based on checking New features section of Oracle JSON guide manual.
Today I’ll share pure production experience, challenges and how we addressed them implementing various projects with Oracle database full of JSON treatment.
It is our experience but we suspect you will face same issues as we did.
Let’s start from the ground up.
We discuss what for do we need JSON in a database, how to configure it, store and process. We will touch JSON maintenance which is crucial for stable database performance. For sure I’ll answer your question in Q&A session.
Actually, json treatment and performance tuning are extremely volumetric so I’m going to implement sort of master class full of technical details further.
So, let’s start
Because all operation are consistent and there is a wiliness to denormalize complex objects.
Which are these objects we’re talking about. Logs, configuration, complex structured or semistructured objects are the best candidates to be stored in JSON.
And do not think about analytics on pure JSON data
If you would like to have stable json processing
install json fixes
Do regression because json issues resurrects often
If you’re under 12.1 3 patches must be installed otherwise JSON processing doesn’t work in predictable fashion at all. Otherwise consider the document with latest JSON patches.
If you’re under 12.2 json fixes could be found in release updates
And we could not care about patches in 18c for now at least
Rfc 4627 dictates how JSON should be encoded.
In order to check how it affects database efficiency let’s create 3 tables with varchar, clob and blob fields and populate all of them
And populate the tables by 10000 equal jsons
As we could see clob storage occupies 2x more space. It happens because clobs are stored in ucs2 so 2 bytes for 1 character is required so it is recommended to store blob whenever it is possible.
That’s about disk space.
Let’s discuss constraints. Assume we have a JSON which fails during processing by JAVA application.
Let’s add IS JSON constraint to ensure JSON is valid.
Inserts with bad JSON fails so the previous json is valid
What’s the reason of java failure?
If we check JSON in a json validator we see figure don’t have leading values so json isn’t valid actually but Oracle tells it is valid. Where is the truth?
The answer could be found in the documentation. Default Oracle check is LAX which isn’t severe for json data type
If you want be completely sure that your JSON is valid use JSON STRICT clause explicitly.
Let’s try to enable constraint with more and more severe limitations in order to get how they influence on ingestion performance.
And measure what time is required to add 10000 different records in a table with BLOB field
So it is clearly seen json constraints could introduce twice overhead in a worst case for ingestion performance. Could we make our insert statements faster? Yes, we could.
In order to get how we should look into user_lob view. What is bad here?
Exactly. If we would like fast inserts we need use YES value to not bypass the buffer cache.
If we summarize CACHE brings 10% of performance profit.
We recommend you our configuration for JSON storage. It provides decent performance and reliable json in-database validation
We use both blob and varchar2
Because blob doesn’t consume much space and is rather fast
We use json strict constraints and cache option set to Yes
There is a lot of JSON limitations in Oracle but we could neglect them. They are mostly about maximum JSON field name length, json path length and levels of array nesting. There is only one annoying limitation – sql json functions can’t return more than 32 kbytes. Moreover you have to enable extended varchar2 datatype explicitly to achieve it.
What’s about ingestion?
Oracle treats json as string so insert works fine but is very complicated to update data inside of json documents.
We use 12.1 now so we use JAVA to update data inside of JSON even for data migration scripts. It is much faster that PL/SQL hand-made JSON parsing and piece-by-piece updates.
As I told we could change json only in plsql. We could do it rather simple.
Create an object from textual representation, upsert new fields and remove some useless. Once we do it we deserialeze JSON into a string again for further usage.
Let’s simulate ETL-loading pipeline which transfers JSON data
In order to do it we create source table with json check constraint and populate by random data
Let’s populate a target table from the source one.
We have APPEND hint and we see direct path insert which is fine.
Let’s add a json constraint into destination table
Once we add it our direct path insert goes nuts
Json checks are complicated enough so it is more or less clear but it isn’t mentioned in the documentation at all
Let’s make the case more complicated
Let’s drop the constrain which screws up direct path insert and assume that we need an arbitrary WHERE clause for our insert into the destination table
Let’s have a look into the plan.
The direct path insert hasn’t returned back even though we dropped the constraint.
Why WHERE clause kill direct path insert?
hat we need to do is to drop IS JSON constraint in a SOURCE table. The main point here that if you have IS JSON constraints they could kill direct path insert in arbitrary moment of time.
Why? Actually no ideas. Even tracing for direct path decision didn’t give any results.
If we want to load a json data from a file we should do some action before like create directory and create the file. Let it be a file with 2 rows in accordance with the documentation.
Let’s select data from our file using BFILENAME clause. The construction looks extremely simple but how many records do you expect to see?
You will see only one recor
Why?
I did some tests and let me share so conlclusion I came up with. the result is the json_table treats the file as a huge single json so the file should look like an array of json objects
Once you use such format our query returns 2 rows as expected
There are many options for json retrieval and parsing
Extract 1 row with raw JSON data and pass it to application server as is
Parse by oracle feature
Use virtual columns
Parse in a view which hide complexity of parsing
The first one is valid before oracle 12 so let’s stick to others one
Oracle has simplified syntax for json.
We could select id of menu. Looks very simple but doesn’t work.
.notation works only if check constraints are defined. When we have them in place the parsing works perferct.
18c doesn’t require constraint but treat as json must be added before
.notation can’t works with arrays in 12.1 but is in a good shape in 12.2
It is possible to use virtual columns for json retrieval. You could use embedded sql functions or your own plsql functions.
But if you would like to play with them on production you should disable adaptive optimization of install the patch which fixes virtual columns issues
Json_table feature permits to parse very complex JSON structure in a way close to XMLTABLE.
Oracle 18c permits even do not care about verbose json path expressions but use .notation as well
When you deal with messages from integration buses you could get rather strange JSONs which could be not parsable.
I would recommend always use single quotes for your json path otherwise you have to face with reserved word issues overtime
I wouldn’t recommend you to use many json_value or json_exist or json_query functions in one query because each of them parses JSON once again. If you have small documents it doesn’t bring any disturbance but once you have documens more 4 Kb it is annoying so try to use json_table from the beginning.
Oracle 12.2 has an optimizer transformation which tries to rewrite many json functions call to a single json_table function
Frankly speaking JSON_TABLE is full of issues because of LATERAL internals so if you work with json extensively you probably noticed that:
Json statement tends to make sql non-mergeable which leads to performance detioration
Arbitrary ora-600 and No data to be read from socket
Bad count distinct processing
And even wrong mixed results if there are more than 2 json_table in one query
What is good in case of ora-600 and no data read oracle creates core dump and incedent file with stack trace which we could investigate and open SR.
Let me suggest you remediation steps ordered by their power
Identify issue by expand_sql_text and sql_plan access and filter predicates first
Try to use one json_table for the query
When results are mixed use no-_merge or no_query_transformation hints. The latest one fixes the best part of issues for queries with joins
Materialize json parsing by materialize hint or by create table as select. The hint works more or less fine starting from oracle 12.2
Use apex_json package which transforms data to xml that is error-prone enough
It could give you a glue how Oraclre rewrote the query.
We could see here that Oracle added extra erroneous condition into JSON_TEXTCONTAINT function
Always pay attention to full text search access predicates – it happens often they are not relevant to actual search condition
We could simulate DISTINCT clause via analytical function so the query doesn’t fail with No data to be read from the socket error.
It is clear how to use no_merge
Or no_query_transformation hints
As I told materialize doesn’t work in 12.1 but provides even in-memory cursors for JSON during temporary table materialization
And the worst case – transform json to XML via apex_json and parse xml
Json parsing complexity could be hidden in materialized views nevertheless when we try to create fast refresh matviews in Oracle 12.1 we will face the issue even though materialized view log is created properly
When we try to play with ON DEMAND views to no depend from commit statements and do not deal with stale results we have
An error. It doesn’t work neither 12.2 nor 18c
Oracle introduced new materialized view type ON STATEMENT which doesn’t requires mat view logs, are not depended from COMMIT clause but they do not work with JSON either
Oracle 18c fixed issues and we needn’t any commits to see fresh data
If you need fast search you need indexes. Please be careful with JSON FILTER clause. Indexes should use it.
When you create ordinal indexes please be careful with index expressions. They should be equal to expressions in filter clause.
Once we have it indexes work fine
A small question – which of indexes will be used by .notation?
Exactly – it neglects indexes.
If we try to create the index as is we will face the error but once we add an alias in the index we have it working fine.
Does it mean that we should create a separated set of indexes do deal with .notation?
We could make it working even with ordinal indexes but we have to use error or error clause. It is bad actualy because it kills schema-less features of json.
But it could be sort of validation. If we created error on error index and try to insert some data which doesn’t fit it we get an oracle exception which is free JSON validator
If we need index many fields it works but as you see the syntax is over-verbose as for create index
As well as for queries
So we should consider virtual columns indexing
So we should consider virtual columns indexing
But don’t forget about patch for virtual columns I mentioned before please
So let’s try to gess which of indexes will be used for JSON table.
The answer is no indexes will be used – JSON table works with ordinal indexes in very reluctant fashion.
In accordance with the manual it should work with ERROR ON ERROR indexes but it doesn’t work mostly.
In order to provide ad-hoc queries oracle suggests using context indexes with JSON section. Oracle 12.2 hides it via new syntax but they are equal internally
They are mostly used with json_textcontains function which rewrites json path to ordinal oracle contains function internaly
They work perfect for json_value
Json_exists
Morevover multi-columns search works fine
And even json_table works properly
There is only one exception.
Which one?
Exactly. .notation is not supported
It seems that it is a little bit boring part so let’s have a quiz
We have a json from rs 485 devices. If we try to identify these JSONs we see that they can’t be found. When we search by keywords we find these records successfully
If we look into index table with tokens we will see that the protocol consists from 2 tokens so we need to tell Oracle index we should treat protocol name as whole word or just escape underscore mark
There is a bunch of reserved words and characters you should be aware if you deal with Oracle indexes and json
Let’s imagine we have json about cartoons with 3 sentences and a full text index based on it
When we search by a whole sentence we find it succesfuly
When we search Mr Jerry cartoons we find all sentences
The same is for Ms Jerry
The reason that there is no title in the token tables
Let’s try to figure it out. We need to extract index DDL first.
But do not use get_ddl for full text search indexes
Use ctx_report.create_index_script instead.
You will see a stoplist which is a set of word to not be parsed. All titles are there.
In order to fix we should drop the index, create empty stoplist and use it in index create script
So we see that we find the whole sentence, Mr Jerry and Ms Jerry in proper single rows
Moreover we see titles in token list
I think it is the latest quiz for search.
You need to find records where we have class type Country and 640 in a single json object so only first json should be found
Let’s try to solve by an obvious way – it doesn’t work because works with JSON as a string rather than object collection
If we use json_table we could parse json as a row so filter works fine
But is requires 5 steps. What is good – index is used
In order to solve such issues Oracle introduced JSON search expressions which permit filtering on different objects level.
For json search expression 3 steps is enough
Oracle 12-18 permits to add functions call after JSON path expression now. The best part of them is for data conversion only but some of them could be very usefull and permit to implement less verbose code.
For instance array size could be calculated without significant efforts.
Let’s investigate how search indexes affect ingestion performance and their transaction scope
If we search data just after insert statement we will fine nothing so we have to call sync procedure manualy
Let’s setup the index as refresh on commit
It works perfect but brings some performance overhead
Let’s measure which one.
It takes 100 second to insert 10000 records.
Try to guess which operation will be in the most time consuming here?
Actually no. If we have a look into awr report we will see a stored procedure which is invoked after commit statement in a shadow mode.
We could setup index update schedule. For instance 1 hour.
It is implemented as simple job which could be found in index and scheduler metadata.
So it takes 10 second to load data and 5 to refresh the index
There is an option which permits to use context indexes in transaction scope
Insert works fast but let’s try to search
None of search doesn’t work.
But if you use simple CONTAINS without JSON features it starts working. It works 2 times slower but it works. So if you need transactional access for your json data you should invent something by yourself.
If we would like to have even faster insert we should use stage_itab option. It create $G table which is separated from main index tables
It should be kept in KEEP pool in accordance with Oracle recommendations
If you have 12.2 you could control the size of the table to not consume all keep pool size. When given boundaries are reached data is moved to $I table
ITAB works very fast,
5 seconds without any tweak
2 seconds after keep pool
And a merge job from G to I table works for 2 seconds
If your application inserts data in many threads you will face with concurrency due of user locks
It happens that Oracle locks exclusively a table for reverse index lookup which consists from 1 row by default. A new row is added if it exceeds 20 000 000 records.
In order to not have such huge locks SMALL_R_ROW property should be enable. It is an instruction to Oracle split rows after 35000 records exceeded. When we enable such option we receive 3 times less locks usually.
If we try to do it in 12.1 we get an exception because it isn’t documented
In order to use small_r_row in 12.1 please set the event before index script
When you have full text search indexes and have a look into AWR report top statements by executions it is very likely you see statements to have all tokens in buffer cache. Oracle has an internal algorithm how to warm cache but it isn’t enough to provide very fast search.
One of the options is to put full text search index tables in keep pool.
You need allocate keep pool memory
Pin DR tables, indexes and lob segment
Set CACHE for DR tables and populate keep pool
It gives 5 times performance boost usualy
Never try to pin DR$ tables via ALTER commands otherwise settings will disappear after index rebuild.
Keep pool settings should be setup via properties _TABLE_CLAUSE for all you DR tables
If we do it via properties keep pool settings are intact after rebuild
If you would like implement a loading stored procedure the example is on the slide
Please pay attention on the first command. If we don’t use it Oracle bypasses shared pool via direct path read.
Don’t forget to load BLOB data, otherwise they will be out of keep pool.
In order to preload data into keep pool I would recommend create a generic stored procedure which does a set of actions mentioned on slides
You could put your JSON in memory by doing setup steps above
Which encourage Oracle to store JSON in binary format.
but what we’ve seen it works good when virtual columns are created and materialized into in-memory virtual columns. We haven’t seen a significant benefit of array parsing by In-Memory engine.
We would recommend use JSON-based materialized views instead.
In oracle 12.1 we had to populate JSON by sqll. For plain structures concatenation is used and listadd for nested strucures
You could use plsql functions which do concatenation as well
Oracle 12.2 and later significantly simplified JSON generation introducing dedicated functions
Oracle 12.2 and later significantly simplified JSON generation introducing dedicated functions
Why do I have rownum <= 5? If I don’t it will fail with
Even if you enable extended data types you are not permitted to return json more than 32Kb
It isn’t fixed in Oracle 18 as well. Returning CLOB clause doesn’t work.
There is only 1 workaround – dirty games with XML and manual concatenation
The most interesting question for me is what is faster – concatenation or embedded functions. In order to measure it let’s collect full fetch time for all cases
So when we create plain json an overhead from embedded json functions is about 15%
When we create nested json embedded functions are extremely efficient but if we need big json xmlagg approach works rather slow.
As I told before we need some housekeeping.
Do not base in on DBA_JSON_COLUMNS. It could contains not all JSON columns or brings extra data
That’s why just prefix your json fields.
Create daily checks for strict/lax constraints
Field types
And CACHE option
Let me show what I meant when talked about DBA_JSON_COLUMN.
We add a constrain with AND condition.
As we see from the view it has 2 columns and one of them isn’t connected with JSON at all
The other case is OR condition. We’ve just lost all JSON columns here
Check for CLOB is rather simple
As well as for IS JSON STRICT
And CACHE option
Context indexes become fragmented overtime. Why? Unfortunately we have no time discussing it but we should have them less fragmented to provide fast search.
Oracle 18 manuals tell it provides Automatic Background Index Maintenance but actually you have to do run optimize_index in you own
You need to identify fragmentation and garbage volume before. If they are too big run OPTIMIZE_INDEX either on serial or parallel modes
Please pay attention that fragmentation is in percent and you have get rid of percent sign via regexp, but garbage size in megabytes so you have to combine total index size with garbage size via regexp parsing as well.
Serial optimization makes indexes less fragmented but parallel is faster
Check indexes on JSON fields after index create/rebuild, TRUNCATE or huge data loading. They may to become corrupted sometimes
If you use STAGE_ITAB option you should care about maintenance either
Autooptimize works by unknown alghorithm in Oracle 12.1 and works when it needn’t at all
12.2 has sort of threshold
18 – haven’t understood does it provide merge of G or merge of I and G and I optimization
Auto_optimize procedure creates a job which runs either by unknown algorithm in 12.1 or after reaching threshold in 12.2
Default threshold = 1 million and 16 parallels which is bad actually. It depends on your workload but based on our json which are more or less structured 10000 records looks better
JSON is always tradeoff between performance/data treatment convenience/integrity
JSON treatment is mostly acceptable in row-per-row scenario
JSON search requires perfect knowledge of Oracle Text nuances
2 JSON notations in one project could lead to a mess
Only Oracle 18 looks mature in terms of JSON treatment
Dedicated JSON search solutions are much faster than Oracle
So, thank you for your time. I’m ready answer your questions.
Follow my Linkedin – I’ll share masterclass schedule there