1 of 55
Managing Unstructured Data:
LOBs in the World of JSON
Michael Rosenblum
www.dulcian.com
2 of 55
Who Am I? – “Misha”
Oracle ACE
Co-author of 3 books
 PL/SQL for Dummies
 Expert PL/SQL Practices
 Oracle PL/SQL Performance Tuning Tips & Techniques
Known for:
 SQL and PL/SQL tuning
 Complex functionality
 Code generators
 Repository-based development
3 of 55
Once upon a time…
There was a company that…
Received JSON files from external sources
Stored them in the database as VARCHAR2(4000) column
Manipulated JSON locally
Relied on triggers to do the audit
And SUDDENLY…
4 of 55
Service provider changed
APIs!
5 of 55
JSON files are now
BIGGER THAN 4000 CHARs
6 of 55
Sad story begins
Somewhere in the dungeon in DBA land:
Let’s convert all storage to CLOB!
 … well, it’s designed to store unlimited data volume, isn’t it?
Somewhere in the Never Never Land management office:
Let’s make all changes in 24 hours – or ELSE!!
 … well, we are agile, aren’t we?
Somewhere in the castle in Developer land:
Let’s assign whoever available to make changes NOW!!!
 … well, SQL and PL/SQL are easy, aren’t they?
7 of 55
Sad Story continues…
“Whoever was available” was the Sorcerer’s Apprentice
… AKA Junior Developer
8 of 55
… and continues…
 First 24 hours:
PL/SQL is not a strongly typed language, so…
the same functions can use both VARCHAR2 and CLOB, so…
everything should work without changes , so I am …
… DONE!
9 of 55
End of the World #1
 Action:
Changes to storage are deployed to PROD / No code changes
 Impact:
Major problems that bring the whole system to a standstill
10 of 55
Rescue #1
Action:
Call a friend!
Suggestion from friend:
H-m-m-m, are you using concatenation? BAD IDEA!
11 of 55
Rescue #1: Original
create or replace procedure p_concat is
v_cl clob;
begin
for c in (select object_name
from all_objects where rownum<=20000)
loop
v_cl:=v_cl||c.object_name;
end loop;
end;
12 of 55
Rescue #1: Proposal
create or replace procedure p_append_buffer is
v_cl clob;
v_buffer_tx varchar2(32767);
procedure p_flush is
begin
dbms_lob.writeappend(v_cl,length(v_buffer_tx), v_buffer_tx);
v_buffer_tx:=null;
end;
procedure p_add (i_tx varchar2) is
begin
if length(i_tx)+length(v_buffer_tx)>32767 then
p_flush;
v_buffer_tx:=i_tx;
else
v_buffer_tx:=v_buffer_tx||i_tx;
end if;
end;
13 of 55
Rescue #1: Proposal (continued)
begin
dbms_lob.createtemporary(v_cl,true,dbms_lob.call);
for c in (select object_name
from all_objects where rownum<=20000)
loop
p_add(c.object_name);
end loop;
p_flush;
end;
14 of 55
Rescue #1: impact
SQL> exec runstats_pkg.rs_start;
SQL> exec p_concat;
SQL> exec runstats_pkg.rs_middle;
SQL> exec p_append_buffer;
SQL> exec runstats_pkg.rs_stop(100);
----------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- -------------------------------------- ------------ ------------ -----------
TIMER cpu time (hsecs) 214 100 -114
TIMER elapsed time (hsecs) 227 106 -121
STAT lob writes 10,912 22 -10,890
STAT db block changes 19,767 424 -19,343
STAT db block gets 56,985 1,141 -55,844
STAT session logical reads 119,538 52,796 -66,742
STAT logical read bytes from cache 979,255,296 432,504,832 -46,750,464
Way faster
and
cheaper!
15 of 55
… and continues?
 Next 24 hours:
All concatenation operations are quickly changed to
DBMS_LOB all over the system
… DONE???
16 of 55
End of the World #2
 Action:
Changes deployed to PROD with minimal testing
 Impact:
Audit records are not generated!
17 of 55
Rescue #2
Action:
Call a DIFFERENT friend!
Suggestion from that (different) friend:
Are you using DBMS_LOB.WHRITEAPPEND?? BAD IDEA!
18 of 55
Rescue #2: Analysis
CREATE TABLE json_tab_cl
(id number primary key,
demo_cl CLOB,
CONSTRAINT json_tab_cl_chk CHECK (demo_cl IS JSON) );
CREATE OR REPLACE TRIGGER json_tab_cl_biud
BEFORE INSERT OR DELETE OR UPDATE ON json_tab_cl FOR EACH ROW
BEGIN
IF inserting THEN dbms_output.put_line
('Row-before-INSERT');
ELSIF updating THEN dbms_output.put_line
('Row-before-UPDATE');
ELSIF deleting THEN dbms_output.put_line
('Row-before-DELETE');
END IF;
END;
/
19 of 55
Rescue #2: analysis (continued)
SQL> INSERT INTO trigger_tab(id,demo_cl) VALUES (1,empty_clob());
Row-before-INSERT
1 row created.
SQL> DECLARE
2 v_cl CLOB;
3 BEGIN
4 SELECT demo_cl
5 INTO v_cl
6 FROM json_tab_cl WHERE id = 1 FOR UPDATE;
7 dbms_lob.writeAppend(v_cl,1,'{"name":"Misha"}');
8 END;
9 /
SQL> SELECT id, demo_cl FROM trigger_tab;
ID DEMO_CL
---------- --------------------------------
1 {"name":"Misha"}
INSERT trigger
fired
UPDATE trigger
didn’t!
20 of 55
NOT Using DBMS_LOB  BAD IDEA
Using DBMS_LOB  also BAD IDEA???
21 of 55
The Moral of the Story
CLOBs are NOT just larger VARCHAR2:
Access mechanisms are different
Storage mechanisms are different
Processing mechanisms are different
Yes, you need to know about all of that, because…
22 of 55
Data is growing!
23 of 55
JSON is winning over XML
24 of 55
Databases are the same
25 of 55
What about Oracle (0)?
20c and above – JSON datatype! But…
20c will not be release for OnPrem at all
21c has been just recently announced
Key point: “innovation releases” are fun to experiment,
but questionable as production environments
26 of 55
What about Oracle (1)?
Production-ready (19c and below) - JSON still doesn’t
have its own STORAGE datatype – the only option is:
Store as Varchar2/CLOB/BLOB + CHECK constraint
Manually manage all storage properties
CREATE TABLE trigger_json_tab
(id number primary key,
demo_cl CLOB,
CONSTRAINT trg_json_tab_chk CHECK (demo_cl IS JSON)
);
27 of 55
What about Oracle (2)?
JSON PL/SQL types (JSON_ELEMENT_T,
JSON_OBJECT_T etc) are about granular manipulation
in procedural code
SQL/JSON elements (JSON_VALUE, JSON_ARRAY
etc) are about efficient data extraction in SQL queries
28 of 55
So?
If you store JSON data
and DON’T manipulate it within Oracle  you need to
understand CLOB architecture
And DO manipulate it within Oracle  you REALLY need
to understand CLOB architecture
29 of 55
CLOB Architecture
30 of 55
Data Access
 Problem
 How can you access gigabytes of data?
 Solution
 Separate LOB data from LOB locator
LOB locator – special logical entity that points to LOB data and allows
communication with it.
 Types of LOB operations:
 Copy semantics - Data alone is copied from source to destination and a new locator
is created for the new LOB.
 Reference semantics - Only the locator is copied (without changing the underlying
data).
31 of 55
Data Access
Declare
v_cl CLOB;
Begin
select a_cl
into v_cl
from t_table;
End;
Data
Locator
32 of 55
LOB Data States
Variable/column in the row can be in a number of different
states:
Null – exists but is not initialized
Empty – exists and has locator that doesn't point to any data
 Since you can only access LOBs via locators, you must first create
them.
 In some cases, an initial NULL value is needed.
Populated – exists, has locator, and contains real data
33 of 55
Internal LOB Data Storage
Persistent LOBs:
 Represented as values in the table column
 Participate in transactions (Commit, Rollback, generate logs)
 Each LOB has its own storage structure separate from the table in
which it is located.
Temporary LOBs:
 Created in temporary tablespace and released when no longer
needed
 Created when LOB variable is instantiated
 Become permanent when inserted into table
Variables cause IO!
34 of 55
IO Proof (1)
exec runstats_pkg.rs_start;
declare
v_cl CLOB;
begin
for i in 1..100 loop
v_cl:=v_cl||lpad('A',32000,'A');
end loop;
end;
/
exec runstats_pkg.rs_middle; end;
declare
v_cl CLOB;
begin
dbms_lob.createTemporary(v_cl, true,dbms_lob.call);
for i in 1..100 loop
dbms_lob.writeappend(v_cl, 32000, lpad('A',32000,'A'));
end loop;
end;
/
exec runstats_pkg.rs_stop(0);
Load 3.2 MB
35 of 55
IO Proof (2)
...
----------------------------------------------------------------------------------
2. Statistics report
----------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ------------------------------------- ------------ ------------ ------------
STAT consistent gets 91 99 8
STAT lob writes 92 100 8
STAT db block changes 1,803 1,963 160
STAT db block gets 4,872 5,304 432
STAT db block gets from cache 4,872 5,304 432
STAT session logical reads 4,963 5,403 440
Lots of activities!
36 of 55
LOB Implementations
Contemporary implementation (SecureFiles)
Introduced in Oracle Database 11g
Optimized for significant data flow
Constantly being extended
Old implementation (BasicFile)
Backward compatible in all versions 11g+
Well-understood by DBAs, but more resource-heavy
37 of 55
Retention
LOBs have a special way to manage UNDO
 BasicFile
 Disabled (Default) – only to support consistent reads
 Enabled – apply the same UNDO_RETENTION parameter as the regular data
SecureFile
 Auto (Default) – used only to support consistent reads
 None – no UNDO is required
 MAX <N> - keep up to N MB of UNDO
 MIN <N> - guarantee up to N seconds of retention
38 of 55
Navigation Using Indexes
 Data is stored in chunks
 Consist of one or more data blocks up to 32K
 Each I/O operation works with one chunk.
 SecureFiles: Chunks are dynamic and cannot be managed.
 Chunks are navigated using indexes
 Each LOB column is represented by 2 segments:
 One for storing data; one to store the index
 Each segment has the same storage properties as regular tables.
 Some restrictions:
 Cannot drop or rebuild LOB indexes
 Cannot specify different properties for index and data segments
39 of 55
LOB Operations
 Require physical I/O
 May have a high number of wait events
 Nice cheat - storage “in row”
 Enabled – data less than 4000 characters will look like VARCHAR2(4000)
 Disabled – everything goes directly to LOB segment
40 of 55
LOB Caching
 Caching options:
 NOCACHE (default) – OK for very large LOBs or those with infrequent access
 CACHE – best option but requires a lot of read/write activity
 CACHE READS – useful when creating LOB once and reading data from it
often.
 NOCACHE implementation
 BasicFile – Direct I/O. May cause major “hiccups” in the database
 SecureFile – special Shared Pool area (SHARED_IO_POOL)
41 of 55
LOB Logging
 Logging options:
 Logging
 Nologging (BasicFiles) – the same meaning as for table
 FILESYSTEM_LIKE_LOGGING (SecureFiles) –keep the catalogue of LOBs
without generating logs on the data itself
 Faster initial recovery (table is accessible)
 LOBs have to be reloaded.
 CACHE always implies LOGGING.
 By default, logging option is inherited from the table.
42 of 55
SecureFile extras: Free
Fragment operations – direct modifications to LOBs
without reloading (32K limit)
dbms._lob.Fragment_Insert
dbms._lob.Fragment_Delete
dbms._lob.Fragment_Move
dbms._lob.Fragment_Replace
43 of 55
SecureFile Extras: Options
Oracle Advanced Compression Option
De-duplication - keep only one copy of LOB if they match
exactly
Compression (High/Medium/Low) – trade-off between storage
and CPU
Oracle Advanced Security Option
Encryption – direct implementationTransparent Data
Encryption (TDE)
44 of 55
Impact
45 of 55
Playground (1)
CREATE TABLE json_tab_vc2
(id number primary key,
demo_tx varchar2(4000),
CONSTRAINT json_tab_vc2_chk CHECK (demo_tx IS JSON)
);
insert into json_tab_vc2 values (1,'{"name":"Misha"}');
CREATE TABLE json_tab_cl
(id number primary key,
demo_cl CLOB,
CONSTRAINT json_tab_cl_chk CHECK (demo_cl IS JSON)
);
46 of 55
Playground (2)
declare
v_cl CLOB;
procedure p_add (i_tx varchar2) is
begin
dbms_lob.writeAppend(v_cl,length(i_tx),i_tx);
end;
begin
dbms_lob.createTemporary(v_cl, true,dbms_lob.call);
p_add('{');
for i in 1..1000 loop
p_add('"tag'||i||'":"value'||i||'",');
end loop;
p_add('"name":"ClobDemo"');
p_add('}');
insert into trigger_json_tab values (100,v_cl);
end;
Above 4K!
47 of 55
Single operation
exec runstats_pkg.rs_start;
select json_value(demo_cl,'$.name') from json_tab_cl;
exec runstats_pkg.rs_middle;
select json_value(demo_tx,'$.name') from json_tab_vc2;
exec runstats_pkg.rs_stop(1);
-------------------------------------------------------------------------------------
2. Statistics report
-------------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------------------- ------------ ------------ ------------
STAT physical read IO requests 1 0 -1
STAT securefile direct read ops 1 0 -1
STAT physical reads direct (lob) 3 0 -3
STAT physical read bytes 24,576 0 -24,576
STAT securefile direct read bytes 24,576 0 -24,576
48 of 55
Two operations
…
select json_value(demo_cl,'$.name') from json_tab_cl
where json_exists(demo_cl,'$.name' false on error);
…
select json_value(demo_tx,'$.name') from json_tab_vc2
where json_exists(demo_tx,'$.name' false on error);
…
-------------------------------------------------------------------------------------
2. Statistics report
-------------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------------------- ------------ ------------ ------------
STAT physical read IO requests 2 0 -2
STAT securefile direct read ops 2 0 -2
STAT physical reads direct (lob) 6 0 -6
STAT physical read bytes 49,152 0 -49,152
STAT securefile direct read bytes 49,152 0 -49,152
IO impact
for every operator?
49 of 55
Multi-column (1)
…
select name_tx, tag10_tx, tag100_tx
from json_tab_cl,
json_table(demo_cl,'$'
columns (name_tx varchar2(2000) path '$.name',
tag10_tx varchar2(2000) path '$.tag10',
tag100_tx varchar2(2000) path '$.tag100')
);
…
select name_tx, tag10_tx, tag100_tx
from json_tab_vc2,
json_table(demo_tx,'$'
columns (name_tx varchar2(2000) path '$.name',
tag10_tx varchar2(2000) path '$.tag10',
tag100_tx varchar2(2000) path '$.tag100')
);
…
50 of 55
Multi-column (2)
-------------------------------------------------------------------------
2. Statistics report
-------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
STAT physical read IO requests 1 0 -1
STAT securefile direct read ops 1 0 -1
STAT physical reads direct (lob) 3 0 -3
STAT physical read bytes 24,576 0 -24,576
STAT securefile direct read bytes 24,576 0 -24,576
Exactly as a single call
51 of 55
JSON_VALUE vs JSON_TABLE
select name_tx, tag10_tx, tag100_tx
from json_tab_cl,
json_table(demo_cl,'$'
columns (name_tx varchar2(2000) path '$.name',
tag10_tx varchar2(2000) path '$.tag10',
tag100_tx varchar2(2000) path '$.tag100'
)
);
…
select json_value(demo_cl,'$.name'),
json_value(demo_cl,'$.tag10'),
json_value(demo_cl,'$.tag100')
from json_tab_cl;
-------------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------------------- ------------ ------------ ------------
STAT physical read IO requests 1 1 0
STAT securefile direct read ops 1 1 0
STAT physical reads direct (lob) 3 3 0
STAT physical read bytes 24,576 24,576 0
STAT securefile direct read bytes 24,576 24,576 0
Multiple
JSON_VALUE calls
are combined!
52 of 55
JSON Storage Best Practices
SecureFiles are strongly recommended
Compression can significantly decrease the database footprint
(If you can get that extra option)
Always keep CACHE enabled
But you can play with LOGGING parameter if there is a way to
quickly recover JSON from the external source.
Always have IN ROW enabled (small CLOBs would be
treated as VARCHAR2)
53 of 55
JSON Manipulating Best Practices
PL/SQL JSON types are very resource-intensive:
 May not be recommended if your resources are limited
 … But they guarantee the data quality
 Keep the number of operations to a minimum
 … Especially for large JSON files (parsing is expensive!)
Manual changes to JSON
 Keep CLOB operations to a minimum
 Any CLOB operation on >4K of data is causing IO
 Use buffers whenever possible
 PL/SQL VARCHAR2 limit is 32K!
54 of 55
Summary
JSON is the new norm
… and we need to be able to efficiently store/manipulate it
Oracle provides everything you need to do it
… as long as you read footprints
Newer versions of Oracle database will make JSON
operations easier
… but we are not there yet
55 of 55
Contact Information
 Michael Rosenblum – mrosenblum@dulcian.com
 Dulcian, Inc. website - www.dulcian.com
 Blog: wonderingmisha.blogspot.com
Available NOW:
Oracle PL/SQL Performance Tuning Tips & Techniques

Managing Unstructured Data: Lobs in the World of JSON

  • 1.
    1 of 55 ManagingUnstructured Data: LOBs in the World of JSON Michael Rosenblum www.dulcian.com
  • 2.
    2 of 55 WhoAm I? – “Misha” Oracle ACE Co-author of 3 books  PL/SQL for Dummies  Expert PL/SQL Practices  Oracle PL/SQL Performance Tuning Tips & Techniques Known for:  SQL and PL/SQL tuning  Complex functionality  Code generators  Repository-based development
  • 3.
    3 of 55 Onceupon a time… There was a company that… Received JSON files from external sources Stored them in the database as VARCHAR2(4000) column Manipulated JSON locally Relied on triggers to do the audit And SUDDENLY…
  • 4.
    4 of 55 Serviceprovider changed APIs!
  • 5.
    5 of 55 JSONfiles are now BIGGER THAN 4000 CHARs
  • 6.
    6 of 55 Sadstory begins Somewhere in the dungeon in DBA land: Let’s convert all storage to CLOB!  … well, it’s designed to store unlimited data volume, isn’t it? Somewhere in the Never Never Land management office: Let’s make all changes in 24 hours – or ELSE!!  … well, we are agile, aren’t we? Somewhere in the castle in Developer land: Let’s assign whoever available to make changes NOW!!!  … well, SQL and PL/SQL are easy, aren’t they?
  • 7.
    7 of 55 SadStory continues… “Whoever was available” was the Sorcerer’s Apprentice … AKA Junior Developer
  • 8.
    8 of 55 …and continues…  First 24 hours: PL/SQL is not a strongly typed language, so… the same functions can use both VARCHAR2 and CLOB, so… everything should work without changes , so I am … … DONE!
  • 9.
    9 of 55 Endof the World #1  Action: Changes to storage are deployed to PROD / No code changes  Impact: Major problems that bring the whole system to a standstill
  • 10.
    10 of 55 Rescue#1 Action: Call a friend! Suggestion from friend: H-m-m-m, are you using concatenation? BAD IDEA!
  • 11.
    11 of 55 Rescue#1: Original create or replace procedure p_concat is v_cl clob; begin for c in (select object_name from all_objects where rownum<=20000) loop v_cl:=v_cl||c.object_name; end loop; end;
  • 12.
    12 of 55 Rescue#1: Proposal create or replace procedure p_append_buffer is v_cl clob; v_buffer_tx varchar2(32767); procedure p_flush is begin dbms_lob.writeappend(v_cl,length(v_buffer_tx), v_buffer_tx); v_buffer_tx:=null; end; procedure p_add (i_tx varchar2) is begin if length(i_tx)+length(v_buffer_tx)>32767 then p_flush; v_buffer_tx:=i_tx; else v_buffer_tx:=v_buffer_tx||i_tx; end if; end;
  • 13.
    13 of 55 Rescue#1: Proposal (continued) begin dbms_lob.createtemporary(v_cl,true,dbms_lob.call); for c in (select object_name from all_objects where rownum<=20000) loop p_add(c.object_name); end loop; p_flush; end;
  • 14.
    14 of 55 Rescue#1: impact SQL> exec runstats_pkg.rs_start; SQL> exec p_concat; SQL> exec runstats_pkg.rs_middle; SQL> exec p_append_buffer; SQL> exec runstats_pkg.rs_stop(100); ---------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- -------------------------------------- ------------ ------------ ----------- TIMER cpu time (hsecs) 214 100 -114 TIMER elapsed time (hsecs) 227 106 -121 STAT lob writes 10,912 22 -10,890 STAT db block changes 19,767 424 -19,343 STAT db block gets 56,985 1,141 -55,844 STAT session logical reads 119,538 52,796 -66,742 STAT logical read bytes from cache 979,255,296 432,504,832 -46,750,464 Way faster and cheaper!
  • 15.
    15 of 55 …and continues?  Next 24 hours: All concatenation operations are quickly changed to DBMS_LOB all over the system … DONE???
  • 16.
    16 of 55 Endof the World #2  Action: Changes deployed to PROD with minimal testing  Impact: Audit records are not generated!
  • 17.
    17 of 55 Rescue#2 Action: Call a DIFFERENT friend! Suggestion from that (different) friend: Are you using DBMS_LOB.WHRITEAPPEND?? BAD IDEA!
  • 18.
    18 of 55 Rescue#2: Analysis CREATE TABLE json_tab_cl (id number primary key, demo_cl CLOB, CONSTRAINT json_tab_cl_chk CHECK (demo_cl IS JSON) ); CREATE OR REPLACE TRIGGER json_tab_cl_biud BEFORE INSERT OR DELETE OR UPDATE ON json_tab_cl FOR EACH ROW BEGIN IF inserting THEN dbms_output.put_line ('Row-before-INSERT'); ELSIF updating THEN dbms_output.put_line ('Row-before-UPDATE'); ELSIF deleting THEN dbms_output.put_line ('Row-before-DELETE'); END IF; END; /
  • 19.
    19 of 55 Rescue#2: analysis (continued) SQL> INSERT INTO trigger_tab(id,demo_cl) VALUES (1,empty_clob()); Row-before-INSERT 1 row created. SQL> DECLARE 2 v_cl CLOB; 3 BEGIN 4 SELECT demo_cl 5 INTO v_cl 6 FROM json_tab_cl WHERE id = 1 FOR UPDATE; 7 dbms_lob.writeAppend(v_cl,1,'{"name":"Misha"}'); 8 END; 9 / SQL> SELECT id, demo_cl FROM trigger_tab; ID DEMO_CL ---------- -------------------------------- 1 {"name":"Misha"} INSERT trigger fired UPDATE trigger didn’t!
  • 20.
    20 of 55 NOTUsing DBMS_LOB  BAD IDEA Using DBMS_LOB  also BAD IDEA???
  • 21.
    21 of 55 TheMoral of the Story CLOBs are NOT just larger VARCHAR2: Access mechanisms are different Storage mechanisms are different Processing mechanisms are different Yes, you need to know about all of that, because…
  • 22.
    22 of 55 Datais growing!
  • 23.
    23 of 55 JSONis winning over XML
  • 24.
    24 of 55 Databasesare the same
  • 25.
    25 of 55 Whatabout Oracle (0)? 20c and above – JSON datatype! But… 20c will not be release for OnPrem at all 21c has been just recently announced Key point: “innovation releases” are fun to experiment, but questionable as production environments
  • 26.
    26 of 55 Whatabout Oracle (1)? Production-ready (19c and below) - JSON still doesn’t have its own STORAGE datatype – the only option is: Store as Varchar2/CLOB/BLOB + CHECK constraint Manually manage all storage properties CREATE TABLE trigger_json_tab (id number primary key, demo_cl CLOB, CONSTRAINT trg_json_tab_chk CHECK (demo_cl IS JSON) );
  • 27.
    27 of 55 Whatabout Oracle (2)? JSON PL/SQL types (JSON_ELEMENT_T, JSON_OBJECT_T etc) are about granular manipulation in procedural code SQL/JSON elements (JSON_VALUE, JSON_ARRAY etc) are about efficient data extraction in SQL queries
  • 28.
    28 of 55 So? Ifyou store JSON data and DON’T manipulate it within Oracle  you need to understand CLOB architecture And DO manipulate it within Oracle  you REALLY need to understand CLOB architecture
  • 29.
    29 of 55 CLOBArchitecture
  • 30.
    30 of 55 DataAccess  Problem  How can you access gigabytes of data?  Solution  Separate LOB data from LOB locator LOB locator – special logical entity that points to LOB data and allows communication with it.  Types of LOB operations:  Copy semantics - Data alone is copied from source to destination and a new locator is created for the new LOB.  Reference semantics - Only the locator is copied (without changing the underlying data).
  • 31.
    31 of 55 DataAccess Declare v_cl CLOB; Begin select a_cl into v_cl from t_table; End; Data Locator
  • 32.
    32 of 55 LOBData States Variable/column in the row can be in a number of different states: Null – exists but is not initialized Empty – exists and has locator that doesn't point to any data  Since you can only access LOBs via locators, you must first create them.  In some cases, an initial NULL value is needed. Populated – exists, has locator, and contains real data
  • 33.
    33 of 55 InternalLOB Data Storage Persistent LOBs:  Represented as values in the table column  Participate in transactions (Commit, Rollback, generate logs)  Each LOB has its own storage structure separate from the table in which it is located. Temporary LOBs:  Created in temporary tablespace and released when no longer needed  Created when LOB variable is instantiated  Become permanent when inserted into table Variables cause IO!
  • 34.
    34 of 55 IOProof (1) exec runstats_pkg.rs_start; declare v_cl CLOB; begin for i in 1..100 loop v_cl:=v_cl||lpad('A',32000,'A'); end loop; end; / exec runstats_pkg.rs_middle; end; declare v_cl CLOB; begin dbms_lob.createTemporary(v_cl, true,dbms_lob.call); for i in 1..100 loop dbms_lob.writeappend(v_cl, 32000, lpad('A',32000,'A')); end loop; end; / exec runstats_pkg.rs_stop(0); Load 3.2 MB
  • 35.
    35 of 55 IOProof (2) ... ---------------------------------------------------------------------------------- 2. Statistics report ---------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ------------------------------------- ------------ ------------ ------------ STAT consistent gets 91 99 8 STAT lob writes 92 100 8 STAT db block changes 1,803 1,963 160 STAT db block gets 4,872 5,304 432 STAT db block gets from cache 4,872 5,304 432 STAT session logical reads 4,963 5,403 440 Lots of activities!
  • 36.
    36 of 55 LOBImplementations Contemporary implementation (SecureFiles) Introduced in Oracle Database 11g Optimized for significant data flow Constantly being extended Old implementation (BasicFile) Backward compatible in all versions 11g+ Well-understood by DBAs, but more resource-heavy
  • 37.
    37 of 55 Retention LOBshave a special way to manage UNDO  BasicFile  Disabled (Default) – only to support consistent reads  Enabled – apply the same UNDO_RETENTION parameter as the regular data SecureFile  Auto (Default) – used only to support consistent reads  None – no UNDO is required  MAX <N> - keep up to N MB of UNDO  MIN <N> - guarantee up to N seconds of retention
  • 38.
    38 of 55 NavigationUsing Indexes  Data is stored in chunks  Consist of one or more data blocks up to 32K  Each I/O operation works with one chunk.  SecureFiles: Chunks are dynamic and cannot be managed.  Chunks are navigated using indexes  Each LOB column is represented by 2 segments:  One for storing data; one to store the index  Each segment has the same storage properties as regular tables.  Some restrictions:  Cannot drop or rebuild LOB indexes  Cannot specify different properties for index and data segments
  • 39.
    39 of 55 LOBOperations  Require physical I/O  May have a high number of wait events  Nice cheat - storage “in row”  Enabled – data less than 4000 characters will look like VARCHAR2(4000)  Disabled – everything goes directly to LOB segment
  • 40.
    40 of 55 LOBCaching  Caching options:  NOCACHE (default) – OK for very large LOBs or those with infrequent access  CACHE – best option but requires a lot of read/write activity  CACHE READS – useful when creating LOB once and reading data from it often.  NOCACHE implementation  BasicFile – Direct I/O. May cause major “hiccups” in the database  SecureFile – special Shared Pool area (SHARED_IO_POOL)
  • 41.
    41 of 55 LOBLogging  Logging options:  Logging  Nologging (BasicFiles) – the same meaning as for table  FILESYSTEM_LIKE_LOGGING (SecureFiles) –keep the catalogue of LOBs without generating logs on the data itself  Faster initial recovery (table is accessible)  LOBs have to be reloaded.  CACHE always implies LOGGING.  By default, logging option is inherited from the table.
  • 42.
    42 of 55 SecureFileextras: Free Fragment operations – direct modifications to LOBs without reloading (32K limit) dbms._lob.Fragment_Insert dbms._lob.Fragment_Delete dbms._lob.Fragment_Move dbms._lob.Fragment_Replace
  • 43.
    43 of 55 SecureFileExtras: Options Oracle Advanced Compression Option De-duplication - keep only one copy of LOB if they match exactly Compression (High/Medium/Low) – trade-off between storage and CPU Oracle Advanced Security Option Encryption – direct implementationTransparent Data Encryption (TDE)
  • 44.
  • 45.
    45 of 55 Playground(1) CREATE TABLE json_tab_vc2 (id number primary key, demo_tx varchar2(4000), CONSTRAINT json_tab_vc2_chk CHECK (demo_tx IS JSON) ); insert into json_tab_vc2 values (1,'{"name":"Misha"}'); CREATE TABLE json_tab_cl (id number primary key, demo_cl CLOB, CONSTRAINT json_tab_cl_chk CHECK (demo_cl IS JSON) );
  • 46.
    46 of 55 Playground(2) declare v_cl CLOB; procedure p_add (i_tx varchar2) is begin dbms_lob.writeAppend(v_cl,length(i_tx),i_tx); end; begin dbms_lob.createTemporary(v_cl, true,dbms_lob.call); p_add('{'); for i in 1..1000 loop p_add('"tag'||i||'":"value'||i||'",'); end loop; p_add('"name":"ClobDemo"'); p_add('}'); insert into trigger_json_tab values (100,v_cl); end; Above 4K!
  • 47.
    47 of 55 Singleoperation exec runstats_pkg.rs_start; select json_value(demo_cl,'$.name') from json_tab_cl; exec runstats_pkg.rs_middle; select json_value(demo_tx,'$.name') from json_tab_vc2; exec runstats_pkg.rs_stop(1); ------------------------------------------------------------------------------------- 2. Statistics report ------------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ---------------------------------------- ------------ ------------ ------------ STAT physical read IO requests 1 0 -1 STAT securefile direct read ops 1 0 -1 STAT physical reads direct (lob) 3 0 -3 STAT physical read bytes 24,576 0 -24,576 STAT securefile direct read bytes 24,576 0 -24,576
  • 48.
    48 of 55 Twooperations … select json_value(demo_cl,'$.name') from json_tab_cl where json_exists(demo_cl,'$.name' false on error); … select json_value(demo_tx,'$.name') from json_tab_vc2 where json_exists(demo_tx,'$.name' false on error); … ------------------------------------------------------------------------------------- 2. Statistics report ------------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ---------------------------------------- ------------ ------------ ------------ STAT physical read IO requests 2 0 -2 STAT securefile direct read ops 2 0 -2 STAT physical reads direct (lob) 6 0 -6 STAT physical read bytes 49,152 0 -49,152 STAT securefile direct read bytes 49,152 0 -49,152 IO impact for every operator?
  • 49.
    49 of 55 Multi-column(1) … select name_tx, tag10_tx, tag100_tx from json_tab_cl, json_table(demo_cl,'$' columns (name_tx varchar2(2000) path '$.name', tag10_tx varchar2(2000) path '$.tag10', tag100_tx varchar2(2000) path '$.tag100') ); … select name_tx, tag10_tx, tag100_tx from json_tab_vc2, json_table(demo_tx,'$' columns (name_tx varchar2(2000) path '$.name', tag10_tx varchar2(2000) path '$.tag10', tag100_tx varchar2(2000) path '$.tag100') ); …
  • 50.
    50 of 55 Multi-column(2) ------------------------------------------------------------------------- 2. Statistics report ------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ---------------------------- ------------ ------------ ------------ STAT physical read IO requests 1 0 -1 STAT securefile direct read ops 1 0 -1 STAT physical reads direct (lob) 3 0 -3 STAT physical read bytes 24,576 0 -24,576 STAT securefile direct read bytes 24,576 0 -24,576 Exactly as a single call
  • 51.
    51 of 55 JSON_VALUEvs JSON_TABLE select name_tx, tag10_tx, tag100_tx from json_tab_cl, json_table(demo_cl,'$' columns (name_tx varchar2(2000) path '$.name', tag10_tx varchar2(2000) path '$.tag10', tag100_tx varchar2(2000) path '$.tag100' ) ); … select json_value(demo_cl,'$.name'), json_value(demo_cl,'$.tag10'), json_value(demo_cl,'$.tag100') from json_tab_cl; ------------------------------------------------------------------------------------- Type Name Run1 Run2 Diff ----- ---------------------------------------- ------------ ------------ ------------ STAT physical read IO requests 1 1 0 STAT securefile direct read ops 1 1 0 STAT physical reads direct (lob) 3 3 0 STAT physical read bytes 24,576 24,576 0 STAT securefile direct read bytes 24,576 24,576 0 Multiple JSON_VALUE calls are combined!
  • 52.
    52 of 55 JSONStorage Best Practices SecureFiles are strongly recommended Compression can significantly decrease the database footprint (If you can get that extra option) Always keep CACHE enabled But you can play with LOGGING parameter if there is a way to quickly recover JSON from the external source. Always have IN ROW enabled (small CLOBs would be treated as VARCHAR2)
  • 53.
    53 of 55 JSONManipulating Best Practices PL/SQL JSON types are very resource-intensive:  May not be recommended if your resources are limited  … But they guarantee the data quality  Keep the number of operations to a minimum  … Especially for large JSON files (parsing is expensive!) Manual changes to JSON  Keep CLOB operations to a minimum  Any CLOB operation on >4K of data is causing IO  Use buffers whenever possible  PL/SQL VARCHAR2 limit is 32K!
  • 54.
    54 of 55 Summary JSONis the new norm … and we need to be able to efficiently store/manipulate it Oracle provides everything you need to do it … as long as you read footprints Newer versions of Oracle database will make JSON operations easier … but we are not there yet
  • 55.
    55 of 55 ContactInformation  Michael Rosenblum – mrosenblum@dulcian.com  Dulcian, Inc. website - www.dulcian.com  Blog: wonderingmisha.blogspot.com Available NOW: Oracle PL/SQL Performance Tuning Tips & Techniques

Editor's Notes