8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database.

1Confidential – Oracle Restricted

<Insert Picture Here>
Balancing Act to improve RDF Query Performance
in Oracle Database
Eugene I. Chong

3
Agenda
• RDF Query processing Issues
• RDF Order-By and Filter Processing
• RDF In-Memory Processing
• RDF In-Memory Virtual Columns
• Conclusion
Confidential – Oracle Restricted

4
Oracle RDF
• RDF_LINK$ table (triples)
– normalized
– subject, predicate, object IDs
• RDF_VALUE$ table (ID to value mapping)
– value, type, etc.
• Issues
– frequent joins with RDF_VALUE$ table to present results,
process filters and order-by queries
– complete de-normalization incurs large storage requirements
– self-joins: large intermediate join results

5
Oracle RDF Filters and Order-By Processing
• SPARQL order-by semantics
– order: no values, blank nodes, IRIs, literals
– case statement: value type, numeric value, date value, string
value
– ORDER BY CASE WHEN (V4.VALUE_TYPE IS NULL)
THEN 0
WHEN (V4.VALUE_TYPE IN ('BLN','BN')) THEN 1
WHEN (V4.VALUE_TYPE IN ('URI','UR')) THEN 2
WHEN (V4.VALUE_TYPE IN ('PL', 'PLL', 'CPLL', 'PL@',
'PLL@', 'CPLL@', 'TL', 'TLL', 'CTLL', 'LIT'))
THEN (CASE WHEN (V4.LANGUAGE_TYPE IS NOT NULL)
THEN 5
……..

6
Oracle RDF Filters and Order-By Processing
– literal type - numeric: TO_NUMBER( )
– literal type - date/time: TO_TIMESTAMP_TZ ( ), DECODE( )
– use function calls to generate SQL for order-by
– case statements executed for every row at runtime
– same problem for filters
• Solution
– materialize value type and values in RDF_VALUE$ table
– stored as ORDER_TYPE, ORDER_NUM, ORDER_DATE
– filled in at load time
– generate SQL: ORDER BY order_type, order_num,
order_date, value_name
– filter clause: WHERE order_num < to_number(89)

7
Oracle RDF Order-By and Filter Performance using
BSBM Benchmark Queries (in secs)
0
10
20
30
40
50
60
70
80
90
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 BI1 BI2 BI3 BI4 BI5 BI6 BI7 BI8
Without Order Columns With Order Columns

8
Oracle RDF In-Memory Processing
• Utilize Oracle IMC
– load frequently accessed columns in memory
• RDF_LINK$ table: subject, predicate, object IDs
• RDF_VALUE$: id, value
– fast full scan of the table: good for hash join
• Experiment
– 32GB memory, 2TB disk space
– LUBM benchmark queries (8,763,829 rows including
entailment)
– varying the size of the memory: 6G(100%), 4G(56%),
2G(27%), 1G(12%)

9
Oracle RDF In-Memory Query Times (in sec) for
LUBM Benchmark Queries
• 100% : 4x – 6x gain
• 56%
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (100%)
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (56%)

10
Oracle RDF In-Memory Query Times (in sec) for
LUBM Benchmark Queries
• 27%
• 12%
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (27%)
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (12%)

11
Oracle RDF In-Memory Full Scan Performance
(in sec)
• Fetching 3 IDs from RDF_LINK$ table
• 100% - 190x gain
0
0.5
1
1.5
2
2.5
IM(100%) IM(56%) IM(27%) IM(12%)
NoIM
IM

12
Oracle RDF In-Memory Virtual Columns
• In-memory complete de-normalization without
incurring disk storage requirements
– define virtual columns in RDF_LINK$ table for values, types,
etc. : VALUE_NAME_S, VALUE_NAME_P,
VALUE_NAME_O, etc.
– useful for fully populated data in memory: virtual model
Virtual column in-memory performance (in min) –fetching 3 IDs & 3 VCs
0
5
10
15
20
25
IM (100%) IM (56%) IM (27%) IM (12%)
No IM
IM-No VC
IM-VC

13
– remove joins with RDF_VALUE$ table
– queries are processed on RDF_LINK$ table only
– compression, smart scans (in-memory storage index),
dictionary code for values, SIMD vector processing

14
Oracle RDF In-memory Virtual Column Performance
using LUBM Benchmark Queries (in secs)
• Up to 8x gain
• As the number of joins increases, a bigger gain is achievable
0
10
20
30
40
50
60
70
80
90
100
Q9 (3 joins) Q2 (2 joins) Q6 (2 joins) Q13 (1 join) Q14 (1 join)
No IMVC IMVC

15
• Can apply to data mart/data warehousing star/
snowflake schema
– remove joins with dimension tables
• Can apply to any applications where joined tables
have one-to-one mapping on their join keys

16
Conclusion
• Significant performance improvement
– use order columns in place of complex logic in the query for
RDF filter and order-by processing
– improve hash joins by in-memory processing of frequently
accessed columns
– remove costly joins using in-memory virtual columns by
complete de-normalization for fully populated data

17Confidential – Oracle Restricted
<Insert Picture Here>
Your Questions

8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database.

Similar to 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database. (20)

More from LDBC council

More from LDBC council (20)

Recently uploaded

Recently uploaded (20)

8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database.