SlideShare a Scribd company logo
1 of 74
SQL is the best language for
Big Data
Thomas Kyte
http://asktom.oracle.com/
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Oracle Database Support for All Data
• Structured Data
• Numeric, String, Date, …
• Row and column formats
• Unstructured Data
• LOB
• Text
• XML
• JSON
• Spatial
• Graph
4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Run the Business
 Scale-out and scale-up
 Collect any data
 SQL
 Transactional and analytic
applications for the enterprise
 Secure and highly available
Relational
Oracle Support for Any Data Management System
5
Hadoop
Change the Business
 Scale-out, low cost store
 Collect any data
 Map-reduce, SQL
 Analytic applications
NoSQL
Scale the Business
 Scale-out, low cost store
 Collect key-value data
 Find data by key
 Web applications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SQL is Critical
“….the complexity of dealing with a
non-ACID data store in every part of
our business logic would be too
great, and there was simply no way
our business could function without
SQL queries.”
Google, VLDB 2013
“[Facebook] started in the Hadoop
world. We are now bringing in
relational to enhance that. ... [we]
realized that using the wrong
technology for certain kinds of
problems can be difficult.”
Ken Rudin, Facebook, TDWI 2013
6
http://tdwi.org/articles/2013/05/06/facebooks-relational-platform.aspxhttps://www.linkedin.com/groups/Find-out-why-Google-decided-4434815.S.273792742
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Analytics
Ordered Array Semantics in SQL queries
Deptno Ename Sal
Select deptno,ename,sal
from emp
10
20
30
over (partition by deptno
King 5000
Clark 2450
Miller 1300
Order by sal desc )
1
2
3
Row_number()
SCOTT 3000
FORD 3000
JONES 2975
ADAMS 1100
SMITH 800
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Why Analytics
• A running total
• Percentages within a group
• Top-N queries
• Moving Averages
• Ranking Queries
• Medians
• And the list is infinitely long
– "Analytics are the coolest thing to happen to SQL since the keyword Select"
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Find the average amount of time between
patient visits
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
TKYTE@ORA12C> create table patient_visits
2 as
3 select distinct
4 object_type||
5 trunc(row_number() over
6 (partition by object_type
7 order by created) / 250) patient_id,
8 created+rownum visit_date
9 from all_objects;
Table created.
TKYTE@ORA12C> alter table patient_visits
2 add constraint
3 patient_visits_pk
4 primary key(patient_id,visit_date);
Table altered.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
TKYTE@ORA12C> select avg(visit_date-last_visit_date)
2 from (
3 select t1.visit_date, max(t2.visit_date) last_visit_date
4 from patient_visits t1, patient_visits t2
5 where t1.patient_id = t2.patient_id
6 and t2.visit_date < t1.visit_date
7 group by t1.patient_id, t1.visit_date
8 )
9 /
AVG(VISIT_DATE-LAST_VISIT_DATE)
-------------------------------
23.5868092
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 16 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 5.69 5.78 0 598 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 5.70 5.79 0 614 0 1
Rows (1st) Row Source Operation
---------- ---------------------------------------------------
1 SORT AGGREGATE (cr=598 pr=0 pw=0 time=5782851 us)
89224 VIEW (cr=598 pr=0 pw=0 time=6920605 us cost=37226 size=1612980 card
89224 HASH GROUP BY (cr=598 pr=0 pw=0 time=6021974 us cost=37226 size=340
11045765 HASH JOIN (cr=598 pr=0 pw=0 time=3936745 us cost=488 size=3952522
89610 TABLE ACCESS FULL PATIENT_VISITS (cr=299 pr=0 pw=0 time=11514 us
89610 TABLE ACCESS FULL PATIENT_VISITS (cr=299 pr=0 pw=0 time=38901 us
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
TKYTE@ORA12C> select avg( visit_date-last_visit_date )
2 from (
3 select visit_date,
4 (select max(visit_date)
5 from patient_visits t2
6 where t2.patient_id = t1.patient_id
7 and t2.visit_date < t1.visit_date) last_visit_date
8 from patient_visits t1
9 )
10 /
AVG(VISIT_DATE-LAST_VISIT_DATE)
-------------------------------
23.5868092
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 6.85 6.90 360 30919 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 6.85 6.91 360 30919 0 1
Rows (1st) Row Source Operation
---------- ---------------------------------------------------
89610 SORT AGGREGATE (cr=30620 pr=360 pw=0 time=5432914 us)
89224 FIRST ROW (cr=30620 pr=360 pw=0 time=3154562 us cost=2 size=19 card
89224 INDEX RANGE SCAN (MIN/MAX) PATIENT_VISITS_PK (cr=30620 pr=360 pw=0
1 SORT AGGREGATE (cr=30919 pr=360 pw=0 time=6908877 us)
89610 TABLE ACCESS FULL PATIENT_VISITS (cr=299 pr=0 pw=0 time=403891 us co
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
TKYTE@ORA12C> select avg(visit_date-last_visit_date)
2 from (
3 select visit_date,
4 lag(visit_date) over
5 (partition by patient_id
6 order by visit_date)
7 as last_visit_date
8 from patient_visits
9 )
10 /
AVG(VISIT_DATE-LAST_VISIT_DATE)
-------------------------------
23.5868092
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.19 0.19 0 361 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.19 0.20 0 361 0 1
Rows (1st) Row Source Operation
---------- ---------------------------------------------------
1 SORT AGGREGATE (cr=361 pr=0 pw=0 time=197264 us)
89610 VIEW (cr=361 pr=0 pw=0 time=2751836 us cost=362 size=1612980 card=8
89610 WINDOW BUFFER (cr=361 pr=0 pw=0 time=771339 us cost=362 size=170259
89610 INDEX FULL SCAN PATIENT_VISITS_PK (cr=361 pr=0 pw=0 time=640345 us
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
 In-database data mining algorithms and
open source R algorithms
 SQL, PL/SQL, R languages
 Scalable, parallel in-database execution
 Workflow GUI and IDEs
 Integrated component of Database
 Enables enterprise analytical applications
Key Features
Oracle Advanced Analytics Database Option
Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Be Specific in Problem Statement
Poorly Defined Better Data Mining Technique
Predict employees that leave •Based on past employees that voluntarily left:
• Create New Attribute EmplTurnover  O/1
Predict customers that churn •Based on past customers that have churned:
• Create New Attribute Churn  YES/NO
Target “best” customers •Recency, Frequency Monetary (RFM) Analysis
•Specific Dollar Amount over Time Window:
• Who has spent $500+ in most recent 18 months
How can I make more $$? •What helps me sell soft drinks & coffee?
Which customers are likely to buy? •How much is each customer likely to spend?
Who are my “best customers”? •What descriptive “rules” describe “best
customers”?
How can I combat fraud? •Which transactions are the most anomalous?
• Then roll-up to physician, claimant, employee, etc.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
12c New Features
• Predictive Queries
– Immediate build/apply of ODM
models in SQL query
• Classification & regression
– Multi-target problems
• Clustering query
• Anomaly query
• Feature extraction query
New Server Functionality
Select
cust_income_level, cust_id,
round(probanom,2) probanom, round(pctrank,3)*100 pctrank from (
select
cust_id, cust_income_level, probanom,
percent_rank()
over (partition by cust_income_level order by probanom desc) pctrank
from (
select
cust_id, cust_income_level,
prediction_probability(of anomaly, 0 using *)
over (partition by cust_income_level) probanom
from customers
)
)
where pctrank <= .05
order by cust_income_level, probanom desc;
OAA automatically creates multiple anomaly
detection models “Grouped_By” and “scores” by
partition via powerful SQL query
R
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fraud Prediction Demo
drop table CLAIMS_SET;
exec dbms_data_mining.drop_model('CLAIMSMODEL');
create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));
insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');
insert into CLAIMS_SET values ('PREP_AUTO','ON');
commit;
begin
dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION',
'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET');
end;
/
-- Top 5 most suspicious fraud policy holder claims
select * from
(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,
rank() over (order by prob_fraud desc) rnk from
(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud
from CLAIMS
where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4')))
where rnk <= 5
order by percent_fraud desc;
Automated In-DB Analytical Methodology
POLICYNUMBER PERCENT_FRAUD RNK
------------ ------------- ----------
6532 64.78 1
2749 64.17 2
3440 63.22 3
654 63.1 4
12650 62.36 5
Automated Monthly “Application”! Just
add:
Create
View CLAIMS2_30
As
Select * from CLAIMS2
Where mydate > SYSDATE – 30
Time measure: set timing on;
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SQL Developer/Oracle Data Miner 4.0
New Features
 SQL Script Generation
– Deploy entire methodology as a SQL
script
– Immediate deployment of data analyst’s
methodologies
R
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Model Clause
• A feature of the database since 10g (2003!)
• Spreadsheet like construct
• Procedural processing in a non-procedural language
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
I need to have running totals that group
rows into groups such that the total for that
group does not exceed some threshold
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
60,500
50,000
61,000
49,000
Site cnt
----- --------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
St_key end_key total
------ ------- -----
1001 1003 60500
1004 1004 50000
1005 1006 61000
1007 1008 49000
Threshold = 65,000
Model Clause
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
TKYTE@ORA12C> select start_site, max(end_site), max(running_total)
2 from
3 (
4 select *
5 from
6 ( select start_site, end_site, cnt, running_total, rn
7 from site_data
8 model dimension by(row_number()
9 over(order by site) rn)
10 measures(site start_site, site end_site, cnt, cnt running_total)
11 rules(running_total[rn > 1] =
12 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
13 or cnt[cv()] > 65000
14 then cnt[cv()]
15 else running_total[cv() - 1] + cnt[cv()]
16 end,
17 start_site[rn > 1] =
18 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
19 or cnt[cv()] > 65000
20 then start_site[cv()]
21 else start_site[cv() - 1]
22 end
23 )
24 )
25 )
26 group by start_site
27 order by start_site
28 /
START_SITE MAX(END_SITE) MAX(RUNNING_TOTAL)
---------- ------------- ------------------
1001 1003 60500
1004 1004 50000
1005 1006 61000
1007 1008 49000
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Row 1
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Row 2
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Row 3
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Row 4
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Row 5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SITE CNT
---------- ----------
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000
START_SITE END_SITE CNT RUNNING_TOTAL RN
---------- ---------- ---------- ------------- ----------
1001 1001 10000 10000 1
1001 1002 20000 30000 2
1001 1003 30500 60500 3
1004 1004 50000 50000 4
1005 1005 25000 25000 5
1005 1006 36000 61000 6
1007 1007 28000 28000 7
1007 1008 21000 49000 8
TKYTE@ORA12C> select *
2 from
3 ( select start_site, end_site, cnt, running_total, rn
4 from site_data
5 model dimension by(row_number()
6 over(order by site) rn)
7 measures(site start_site, site end_site, cnt, cnt running_total)
8 rules(running_total[rn > 1] =
9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000
10 or cnt[cv()] > 65000
11 then cnt[cv()]
12 else running_total[cv() - 1] + cnt[cv()]
13 end,
14 start_site[rn > 1] =
15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000
16 or cnt[cv()] > 65000
17 then start_site[cv()]
18 else start_site[cv() - 1]
19 end
20 )
21 )
Row 6
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
I need to group a series of audit trail records
together based on how close they are to
each other. All records within three seconds
of each other should be in a group
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Row Pattern Matching
• This can be done with analytics
• Requires three passes
– Tag records with ROW_NUMBER as RN that have a timestamp more then 3 seconds
away from the prior record
– Carry down the maximum RN, all records in a group will have the same RN now
– Aggregate
• Can be done with modeling in multiple passes as well
– One to create the group identifier
– One to aggregate
• Can be done in a single pass *easily* with pattern matching
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12CR1> select *
2 from t
3 match_recognize
4 ( order by x
5 measures first(x) start_time,
6 last(x) end_time,
7 sum(y) sum_y
8 one row per match
9 after match skip past last row
10 pattern (any_row another_row_within_3_secs*)
11 define
12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
13 );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
TKYTE@ORA12C> select *
2 from (select created x, 1 y
3 from all_objects)
4 match_recognize
5 ( order by x
6 measures first(x) start_time,
7 last(x) end_time,
8 sum(y) sum_y
9 one row per match
10 after match skip past last row
11 pattern (any_row another_row_within_3_secs*)
12 define
13 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60
14 )
15 order by start_time
16 /
START_TIME END_TIME SUM_Y
-------------------- -------------------- ----------
01-aug-2014 16:28:39 01-aug-2014 16:28:42 182
01-aug-2014 16:28:46 01-aug-2014 16:28:47 62
01-aug-2014 16:28:55 01-aug-2014 16:29:32 1075
01-aug-2014 16:31:13 01-aug-2014 16:32:30 4484
01-aug-2014 16:49:15 01-aug-2014 16:52:14 8315
01-aug-2014 16:52:18 01-aug-2014 16:53:20 1582
01-aug-2014 16:53:56 01-aug-2014 16:54:34 1204
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte@ora12cr1> SELECT *
2 FROM stocks MATCH_RECOGNIZE
3 ( PARTITION BY symbol
4 ORDER BY tstamp
5 MEASURES
6 STRT.tstamp AS start_tstamp,
7 LAST(DOWN.tstamp) AS bottom_tstamp,
8 LAST(UP.tstamp) AS end_tstamp
9 ONE ROW PER MATCH
10 AFTER MATCH SKIP TO LAST UP
11 PATTERN (STRT DOWN+ UP+)
12 DEFINE
13 DOWN AS DOWN.price < PREV(DOWN.price),
14 UP AS UP.price > PREV(UP.price)
15 ) MR
16 ORDER BY MR.symbol, MR.start_tstamp;
SYMBOL START_TST BOTTOM_TS END_TSTAM
---------- --------- --------- ---------
ORCL 01-SEP-12 03-SEP-12 07-SEP-12
ORCL 07-SEP-12 10-SEP-12 13-SEP-12
SYMBOL TSTAMP PRICE HIST
---------- --------- ---------- -------------------------------------
---
ORCL 01-SEP-12 35 ***********************************
ORCL 02-SEP-12 34 **********************************
ORCL 03-SEP-12 33 *********************************
ORCL 04-SEP-12 34 **********************************
ORCL 05-SEP-12 35 ***********************************
ORCL 06-SEP-12 36 ************************************
ORCL 07-SEP-12 37 *************************************
ORCL 08-SEP-12 36 ************************************
ORCL 09-SEP-12 35 ***********************************
ORCL 10-SEP-12 34 **********************************
ORCL 11-SEP-12 35 ***********************************
ORCL 12-SEP-12 36 ************************************
ORCL 13-SEP-12 37 *************************************
13 rows selected.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
External Tables
Sqlldr is the legacy data loading tool from the 20th century
• Query flat files
• Query datapump format files
• Query output of programs (10.2.0.5 and above)
– Load compressed files without uncompressing
– Query program output, like ls, ps, df, etc
• Query HDFS/Hive
• Infinite possibilities
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12C> CREATE TABLE EMP_ET
2 (
3 "EMPNO" NUMBER(4),
4 "ENAME" VARCHAR2(10),
5 "JOB" VARCHAR2(9),
6 "MGR" NUMBER(4),
7 "HIREDATE" DATE,
8 "SAL" NUMBER(7,2),
9 "COMM" NUMBER(7,2),
10 "DEPTNO" NUMBER(2)
11 )
12 ORGANIZATION external
13 ( TYPE oracle_loader
14 DEFAULT DIRECTORY load_dir
15 ACCESS PARAMETERS
16 ( RECORDS DELIMITED BY NEWLINE
17 preprocessor exec_dir:'run_gunzip.sh'
18 FIELDS TERMINATED BY "|" LDRTRIM
19 )
20 location ( 'emp.dat.gz')
21 )
22 /
Table created.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
ops$tkyte%ORA12C> !file emp.dat.gz
emp.dat.gz: gzip compressed data, was "emp.dat", from Unix, last …
ops$tkyte%ORA12C> !cat run_gunzip.sh
#!/bin/bash
/usr/bin/gunzip -c $*
ops$tkyte%ORA11GR2> select empno, ename from emp_et where rownum <= 5;
EMPNO ENAME
---------- ----------
7369 SMITH
7499 ALLEN
7521 WARD
7566 JONES
7654 MARTIN
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SQL> !cat /home/tkyte/df
#!/bin/bash
/bin/df –Pl
SQL> !/home/tkyte/run_df.sh
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/mapper/VolGr... 18156292 10827600 6391528 63% /
/dev/sda1 101086 12062 83805 13% /boot
tmpfs 517520 0 517520 0% /dev/shm
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SQL> create table df
2 (
3 fsname varchar2(100),
4 blocks number,
5 used number,
6 avail number,
7 capacity varchar2(10),
8 mount varchar2(100)
9 )
10 organization external
11 (
12 type oracle_loader
13 default directory exec_dir
14 access parameters
15 (
16 records delimited
17 by newline
18 preprocessor
19 exec_dir:'run_df.sh'
20 skip 1
21 fields terminated by
22 whitespace ldrtrim
23 )
24 location
25 (
26 exec_dir:'run_df.sh'
27 )
28 )
29 /
Table created.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
SQL> select * from df;
FSNAME BLOCKS USED AVAIL CAPACITY MOUNT
——————————————————————————————— ———————— ———————— ——————— ——————— ——————
/dev/mapper/VolGroup00-LogVol00 18156292 10827600 6391528 63% /
/dev/sda1 101086 12062 83805 13% /boot
tmpfs 517520 0 517520 0% /dev/shm
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
with fs_data
as
(select /*+ materialize */ * from df)
select mount,
file_name,
bytes,
tot_bytes,
avail_bytes,
case
when 0.2 * tot_bytes < avail_bytes
then 'OK'
else 'Short on disk space'
end status
from (
select file_name, mount, avail_bytes, bytes,
sum(bytes) over
(partition by mount) tot_bytes
from (
select a.file_name,
b.mount,
b.avail*1024 avail_bytes, a.bytes,
row_number() over
(partition by a.file_name
order by length(b.mount) DESC) rn
from dba_data_files a,
fs_data b
where a.file_name
like b.mount || '%'
)
where rn = 1
)
order by mount, file_name
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 64
Big Data Appliance
+
Hadoop
HDFS
DataNode
Exadata
+
Oracle Database
OracleCatalog
ExternalTable
create table customer_address
( ca_customer_id number(10,0)
, ca_street_number char(10)
, ca_state char(2)
, ca_zip char(10)
)
organization external (
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(com.oracle.bigdata.cluster hadoop_cl_1)
LOCATION ('hive://customer_address')
)
HDFS
DataNode
HDFS
NameNode
Hivemetadata
ExternalTable
Hivemetadata
Publish Hadoop Metadata to Oracle Catalog
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 65
create table customer_address
( ca_customer_id number(10,0)
, ca_street_number char(10)
, ca_state char(2)
, ca_zip char(10)
)
organization external (
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(com.oracle.bigdata.cluster hadoop_cl_1)
LOCATION ('hive://customer_address')
)
Publish Hadoop Metadata to Oracle Catalog
Big Data Appliance
+
Hadoop
HDFS
DataNode
Exadata
+
Oracle Database
OracleCatalog
ExternalTable
HDFS
DataNode
HDFS
NameNode
Hivemetadata
ExternalTable
Hivemetadata
create table customer_address
( ca_customer_id number(10,0)
, ca_street_number char(10)
, ca_state char(2)
, ca_zip char(10)
)
organization external (
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(com.oracle.bigdata.cluster hadoop_cl_1)
LOCATION ('hive://customer_address')
)
• SerDe
• RecordReader
• InputFormat
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 66
HDFS
DataNode
OracleCatalog
ExternalTable
Select c_customer_id
, c_customer_last_name
, ca_county
From customers
, customer_address
where c_customer_id = ca_customer_id
and ca_state = ‘CA’
HDFS
DataNode
HDFS
NameNode
Hivemetadata
ExternalTable
Hivemetadata
Executing Queries on Hadoop
HDFS
DataNode
HDFS
DataNode
Determine:
• Data locations
• Data structure
• Parallelism
Send to specific data nodes:
• Data request
• Context
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 67
HDFS
DataNode
OracleCatalog
ExternalTable
Select c_customer_id
, c_customer_last_name
, ca_county
From customers
, customer_address
where c_customer_id = ca_customer_id
and ca_state = ‘CA’
HDFS
DataNode
HDFS
NameNode
Hivemetadata
ExternalTable
Hivemetadata
Executing Queries on Hadoop
HDFS
DataNode
HDFS
DataNode
“Tables”
Do I/O and Smart Scan:
• Filter rows
• Project columns
Move only relevant data
• Relevant rows
• Relevant columns
Apply join with
database data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Storage
Indexes
Optimizing Scans on Hadoop
• Automatically collect and
store the minimum and
maximum value within a
storage unit
• Before scanning a storage
unit, verify whether the data
requires falls within the Min-
Max
• If not, skip scanning the block
and reduce scan time
68
HDFS
DataNode
HDFS
DataNode
HDFS
NameNode
Hivemetadata
HDFS
DataNode
HDFS
DataNode
“Blocks”
Min
Max
Min
Max
Min
Max
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 69
What if You Could Query All Data without Conversion?
Store unconverted
JSON data in Hadoop
JSON
Store business-critical data in
Oracle (JSON or Relational)
Select customers_document.address.state
, revenue
from customers, sales
where customers_document.id=sales.custID
group by customers_document.address.state;
Push Down to Hadoop
• JSON parsing
• Column projection
• Bloom filter for faster join
JSON
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 70
What if You Could Govern All Data?
DBMS_REDACT.ADD_POLICY(
object_schema => 'txadp_hive_01',
object_name => 'customer_address_ext',
column_name => 'ca_street_name',
policy_name => 'customer_address_redaction',
function_type => DBMS_REDACT.RANDOM,
expression => 'SYS_CONTEXT('‘
SYS_SESSION_ROLES'', ''REDACTION_TESTER'')
=''TRUE'''
);
JSON
JSON
Store unconverted
JSON data in Hadoop
Store business-critical data in
Oracle (JSON or Relational)
Apply advanced Security on
Hadoop resident data
• Masking/Redaction
• Virtual Private Database
• Fine-Grained Access Controls
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data lives in even more places
71
RelationalHadoop
SQL
NoSQL
Andmore…
The magic of
Storage
Handlers
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Analytics
Model
Pattern Matching
External Tables
SQL over Hadoop
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
BIWA Summit
January 27-29, 2015
Oracle HQ Conference Center
www.biwasummit.org
SQL is the best language for analyzing Big Data

More Related Content

What's hot

Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...Sandesh Rao
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningSandesh Rao
 
Exachk Customer Presentation
Exachk Customer PresentationExachk Customer Presentation
Exachk Customer PresentationSandesh Rao
 
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...Sandesh Rao
 
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...Tammy Bednar
 
What's new in Oracle Trace File Analyzer version 12.2.1.1.0
What's new in Oracle Trace File Analyzer version 12.2.1.1.0What's new in Oracle Trace File Analyzer version 12.2.1.1.0
What's new in Oracle Trace File Analyzer version 12.2.1.1.0Sandesh Rao
 
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should knowAIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should knowSandesh Rao
 
20 Tips and Tricks with the Autonomous Database
20 Tips and Tricks with the Autonomous Database 20 Tips and Tricks with the Autonomous Database
20 Tips and Tricks with the Autonomous Database Sandesh Rao
 
What's new in oracle trace file analyzer 18.2.0
What's new in oracle trace file analyzer 18.2.0What's new in oracle trace file analyzer 18.2.0
What's new in oracle trace file analyzer 18.2.0Sandesh Rao
 
Sangam 18 - The New Optimizer in Oracle 12c
Sangam 18 - The New Optimizer in Oracle 12cSangam 18 - The New Optimizer in Oracle 12c
Sangam 18 - The New Optimizer in Oracle 12cConnor McDonald
 
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...Sandesh Rao
 
Data meets AI - AICUG - Santa Clara
Data meets AI  - AICUG - Santa ClaraData meets AI  - AICUG - Santa Clara
Data meets AI - AICUG - Santa ClaraSandesh Rao
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao
 
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...Sandesh Rao
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGSandesh Rao
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaSandesh Rao
 
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Milomir Vojvodic
 
Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHPMats Kindahl
 

What's hot (18)

Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
 
Exachk Customer Presentation
Exachk Customer PresentationExachk Customer Presentation
Exachk Customer Presentation
 
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
 
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
 
What's new in Oracle Trace File Analyzer version 12.2.1.1.0
What's new in Oracle Trace File Analyzer version 12.2.1.1.0What's new in Oracle Trace File Analyzer version 12.2.1.1.0
What's new in Oracle Trace File Analyzer version 12.2.1.1.0
 
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should knowAIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
AIOUG : ODEVCYathra 2018 - Oracle Autonomous Database What Every DBA should know
 
20 Tips and Tricks with the Autonomous Database
20 Tips and Tricks with the Autonomous Database 20 Tips and Tricks with the Autonomous Database
20 Tips and Tricks with the Autonomous Database
 
What's new in oracle trace file analyzer 18.2.0
What's new in oracle trace file analyzer 18.2.0What's new in oracle trace file analyzer 18.2.0
What's new in oracle trace file analyzer 18.2.0
 
Sangam 18 - The New Optimizer in Oracle 12c
Sangam 18 - The New Optimizer in Oracle 12cSangam 18 - The New Optimizer in Oracle 12c
Sangam 18 - The New Optimizer in Oracle 12c
 
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
 
Data meets AI - AICUG - Santa Clara
Data meets AI  - AICUG - Santa ClaraData meets AI  - AICUG - Santa Clara
Data meets AI - AICUG - Santa Clara
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
AIOUG - Groundbreakers - Jul 2019 - 19 Troubleshooting Tips and Tricks for Da...
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
 
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
 
Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHP
 

Similar to SQL is the best language for analyzing Big Data

Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application developmentpasalapudi123
 
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?KPI Partners
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Database Basics with PHP -- Connect JS Conference October 17th, 2015
Database Basics with PHP -- Connect JS Conference October 17th, 2015Database Basics with PHP -- Connect JS Conference October 17th, 2015
Database Basics with PHP -- Connect JS Conference October 17th, 2015Dave Stokes
 
Oracle super cluster for oracle e business suite
Oracle super cluster for oracle e business suiteOracle super cluster for oracle e business suite
Oracle super cluster for oracle e business suiteOTN Systems Hub
 
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL PerformanceTommy Lee
 
How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016oysteing
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...DataWorks Summit
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Charlie Berger
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMayank Prasad
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...jdijcks
 
SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP Dave Stokes
 
OSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdf
OSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdfOSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdf
OSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdfAshesNag1
 
Barun_Practical_and_Efficient_SQL_Performance_Tuning
Barun_Practical_and_Efficient_SQL_Performance_TuningBarun_Practical_and_Efficient_SQL_Performance_Tuning
Barun_Practical_and_Efficient_SQL_Performance_TuningVlado Barun
 
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMayank Prasad
 
Manage online profiles with oracle no sql database tht10972 - v1.1
Manage online profiles with oracle no sql database   tht10972 - v1.1Manage online profiles with oracle no sql database   tht10972 - v1.1
Manage online profiles with oracle no sql database tht10972 - v1.1Robert Greene
 
01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring
01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring
01 - ACL - Solution Sheet - Team Collaboration & Continuous Control MonitoringArunprasad Sukumar
 

Similar to SQL is the best language for analyzing Big Data (20)

Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application development
 
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Database Basics with PHP -- Connect JS Conference October 17th, 2015
Database Basics with PHP -- Connect JS Conference October 17th, 2015Database Basics with PHP -- Connect JS Conference October 17th, 2015
Database Basics with PHP -- Connect JS Conference October 17th, 2015
 
Oracle super cluster for oracle e business suite
Oracle super cluster for oracle e business suiteOracle super cluster for oracle e business suite
Oracle super cluster for oracle e business suite
 
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
 
AWR, ASH with EM13 at HotSos 2016
AWR, ASH with EM13 at HotSos 2016AWR, ASH with EM13 at HotSos 2016
AWR, ASH with EM13 at HotSos 2016
 
How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
 
Apouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12cApouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12c
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasia
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
 
SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP
 
OSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdf
OSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdfOSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdf
OSP_Mfg_WIP_Advisor_Webcast_2013_0327.pdf
 
Barun_Practical_and_Efficient_SQL_Performance_Tuning
Barun_Practical_and_Efficient_SQL_Performance_TuningBarun_Practical_and_Efficient_SQL_Performance_Tuning
Barun_Practical_and_Efficient_SQL_Performance_Tuning
 
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
 
Manage online profiles with oracle no sql database tht10972 - v1.1
Manage online profiles with oracle no sql database   tht10972 - v1.1Manage online profiles with oracle no sql database   tht10972 - v1.1
Manage online profiles with oracle no sql database tht10972 - v1.1
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring
01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring
01 - ACL - Solution Sheet - Team Collaboration & Continuous Control Monitoring
 

More from Connor McDonald

Sangam 19 - PLSQL still the coolest
Sangam 19 - PLSQL still the coolestSangam 19 - PLSQL still the coolest
Sangam 19 - PLSQL still the coolestConnor McDonald
 
Sangam 19 - Analytic SQL
Sangam 19 - Analytic SQLSangam 19 - Analytic SQL
Sangam 19 - Analytic SQLConnor McDonald
 
UKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsUKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsConnor McDonald
 
Sangam 19 - Successful Applications on Autonomous
Sangam 19 - Successful Applications on AutonomousSangam 19 - Successful Applications on Autonomous
Sangam 19 - Successful Applications on AutonomousConnor McDonald
 
Sangam 2019 - The Latest Features
Sangam 2019 - The Latest FeaturesSangam 2019 - The Latest Features
Sangam 2019 - The Latest FeaturesConnor McDonald
 
UKOUG 2019 - SQL features
UKOUG 2019 - SQL featuresUKOUG 2019 - SQL features
UKOUG 2019 - SQL featuresConnor McDonald
 
APEX tour 2019 - successful development with autonomous
APEX tour 2019 - successful development with autonomousAPEX tour 2019 - successful development with autonomous
APEX tour 2019 - successful development with autonomousConnor McDonald
 
APAC Groundbreakers 2019 - Perth/Melbourne
APAC Groundbreakers 2019 - Perth/Melbourne APAC Groundbreakers 2019 - Perth/Melbourne
APAC Groundbreakers 2019 - Perth/Melbourne Connor McDonald
 
OOW19 - Flashback, not just for DBAs
OOW19 - Flashback, not just for DBAsOOW19 - Flashback, not just for DBAs
OOW19 - Flashback, not just for DBAsConnor McDonald
 
OOW19 - Read consistency
OOW19 - Read consistencyOOW19 - Read consistency
OOW19 - Read consistencyConnor McDonald
 
OOW19 - Slower and less secure applications
OOW19 - Slower and less secure applicationsOOW19 - Slower and less secure applications
OOW19 - Slower and less secure applicationsConnor McDonald
 
OOW19 - Killing database sessions
OOW19 - Killing database sessionsOOW19 - Killing database sessions
OOW19 - Killing database sessionsConnor McDonald
 
OOW19 - Ten Amazing SQL features
OOW19 - Ten Amazing SQL featuresOOW19 - Ten Amazing SQL features
OOW19 - Ten Amazing SQL featuresConnor McDonald
 
Latin America Tour 2019 - 18c and 19c featues
Latin America Tour 2019   - 18c and 19c featuesLatin America Tour 2019   - 18c and 19c featues
Latin America Tour 2019 - 18c and 19c featuesConnor McDonald
 
Latin America tour 2019 - Flashback
Latin America tour 2019 -  FlashbackLatin America tour 2019 -  Flashback
Latin America tour 2019 - FlashbackConnor McDonald
 
Latin America Tour 2019 - 10 great sql features
Latin America Tour 2019  - 10 great sql featuresLatin America Tour 2019  - 10 great sql features
Latin America Tour 2019 - 10 great sql featuresConnor McDonald
 
Latin America Tour 2019 - pattern matching
Latin America Tour 2019 - pattern matchingLatin America Tour 2019 - pattern matching
Latin America Tour 2019 - pattern matchingConnor McDonald
 
Latin America Tour 2019 - slow data and sql processing
Latin America Tour 2019  - slow data and sql processingLatin America Tour 2019  - slow data and sql processing
Latin America Tour 2019 - slow data and sql processingConnor McDonald
 

More from Connor McDonald (20)

Flashback ITOUG
Flashback ITOUGFlashback ITOUG
Flashback ITOUG
 
Sangam 19 - PLSQL still the coolest
Sangam 19 - PLSQL still the coolestSangam 19 - PLSQL still the coolest
Sangam 19 - PLSQL still the coolest
 
Sangam 19 - Analytic SQL
Sangam 19 - Analytic SQLSangam 19 - Analytic SQL
Sangam 19 - Analytic SQL
 
UKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsUKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tips
 
Sangam 19 - Successful Applications on Autonomous
Sangam 19 - Successful Applications on AutonomousSangam 19 - Successful Applications on Autonomous
Sangam 19 - Successful Applications on Autonomous
 
Sangam 2019 - The Latest Features
Sangam 2019 - The Latest FeaturesSangam 2019 - The Latest Features
Sangam 2019 - The Latest Features
 
UKOUG 2019 - SQL features
UKOUG 2019 - SQL featuresUKOUG 2019 - SQL features
UKOUG 2019 - SQL features
 
APEX tour 2019 - successful development with autonomous
APEX tour 2019 - successful development with autonomousAPEX tour 2019 - successful development with autonomous
APEX tour 2019 - successful development with autonomous
 
APAC Groundbreakers 2019 - Perth/Melbourne
APAC Groundbreakers 2019 - Perth/Melbourne APAC Groundbreakers 2019 - Perth/Melbourne
APAC Groundbreakers 2019 - Perth/Melbourne
 
OOW19 - Flashback, not just for DBAs
OOW19 - Flashback, not just for DBAsOOW19 - Flashback, not just for DBAs
OOW19 - Flashback, not just for DBAs
 
OOW19 - Read consistency
OOW19 - Read consistencyOOW19 - Read consistency
OOW19 - Read consistency
 
OOW19 - Slower and less secure applications
OOW19 - Slower and less secure applicationsOOW19 - Slower and less secure applications
OOW19 - Slower and less secure applications
 
OOW19 - Killing database sessions
OOW19 - Killing database sessionsOOW19 - Killing database sessions
OOW19 - Killing database sessions
 
OOW19 - Ten Amazing SQL features
OOW19 - Ten Amazing SQL featuresOOW19 - Ten Amazing SQL features
OOW19 - Ten Amazing SQL features
 
Latin America Tour 2019 - 18c and 19c featues
Latin America Tour 2019   - 18c and 19c featuesLatin America Tour 2019   - 18c and 19c featues
Latin America Tour 2019 - 18c and 19c featues
 
Latin America tour 2019 - Flashback
Latin America tour 2019 -  FlashbackLatin America tour 2019 -  Flashback
Latin America tour 2019 - Flashback
 
Latin America Tour 2019 - 10 great sql features
Latin America Tour 2019  - 10 great sql featuresLatin America Tour 2019  - 10 great sql features
Latin America Tour 2019 - 10 great sql features
 
Latin America Tour 2019 - pattern matching
Latin America Tour 2019 - pattern matchingLatin America Tour 2019 - pattern matching
Latin America Tour 2019 - pattern matching
 
Latin America Tour 2019 - slow data and sql processing
Latin America Tour 2019  - slow data and sql processingLatin America Tour 2019  - slow data and sql processing
Latin America Tour 2019 - slow data and sql processing
 
ANSI vs Oracle language
ANSI vs Oracle languageANSI vs Oracle language
ANSI vs Oracle language
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

SQL is the best language for analyzing Big Data

  • 1.
  • 2. SQL is the best language for Big Data Thomas Kyte http://asktom.oracle.com/ Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  • 3. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 4. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Database Support for All Data • Structured Data • Numeric, String, Date, … • Row and column formats • Unstructured Data • LOB • Text • XML • JSON • Spatial • Graph 4
  • 5. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Run the Business  Scale-out and scale-up  Collect any data  SQL  Transactional and analytic applications for the enterprise  Secure and highly available Relational Oracle Support for Any Data Management System 5 Hadoop Change the Business  Scale-out, low cost store  Collect any data  Map-reduce, SQL  Analytic applications NoSQL Scale the Business  Scale-out, low cost store  Collect key-value data  Find data by key  Web applications
  • 6. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SQL is Critical “….the complexity of dealing with a non-ACID data store in every part of our business logic would be too great, and there was simply no way our business could function without SQL queries.” Google, VLDB 2013 “[Facebook] started in the Hadoop world. We are now bringing in relational to enhance that. ... [we] realized that using the wrong technology for certain kinds of problems can be difficult.” Ken Rudin, Facebook, TDWI 2013 6 http://tdwi.org/articles/2013/05/06/facebooks-relational-platform.aspxhttps://www.linkedin.com/groups/Find-out-why-Google-decided-4434815.S.273792742
  • 7. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 8. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 9. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Analytics Ordered Array Semantics in SQL queries Deptno Ename Sal Select deptno,ename,sal from emp 10 20 30 over (partition by deptno King 5000 Clark 2450 Miller 1300 Order by sal desc ) 1 2 3 Row_number() SCOTT 3000 FORD 3000 JONES 2975 ADAMS 1100 SMITH 800 1 2 3 4 5
  • 10. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Why Analytics • A running total • Percentages within a group • Top-N queries • Moving Averages • Ranking Queries • Medians • And the list is infinitely long – "Analytics are the coolest thing to happen to SQL since the keyword Select"
  • 11. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Find the average amount of time between patient visits
  • 12. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | TKYTE@ORA12C> create table patient_visits 2 as 3 select distinct 4 object_type|| 5 trunc(row_number() over 6 (partition by object_type 7 order by created) / 250) patient_id, 8 created+rownum visit_date 9 from all_objects; Table created. TKYTE@ORA12C> alter table patient_visits 2 add constraint 3 patient_visits_pk 4 primary key(patient_id,visit_date); Table altered.
  • 13. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | TKYTE@ORA12C> select avg(visit_date-last_visit_date) 2 from ( 3 select t1.visit_date, max(t2.visit_date) last_visit_date 4 from patient_visits t1, patient_visits t2 5 where t1.patient_id = t2.patient_id 6 and t2.visit_date < t1.visit_date 7 group by t1.patient_id, t1.visit_date 8 ) 9 / AVG(VISIT_DATE-LAST_VISIT_DATE) ------------------------------- 23.5868092
  • 14. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 16 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 5.69 5.78 0 598 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 5.70 5.79 0 614 0 1 Rows (1st) Row Source Operation ---------- --------------------------------------------------- 1 SORT AGGREGATE (cr=598 pr=0 pw=0 time=5782851 us) 89224 VIEW (cr=598 pr=0 pw=0 time=6920605 us cost=37226 size=1612980 card 89224 HASH GROUP BY (cr=598 pr=0 pw=0 time=6021974 us cost=37226 size=340 11045765 HASH JOIN (cr=598 pr=0 pw=0 time=3936745 us cost=488 size=3952522 89610 TABLE ACCESS FULL PATIENT_VISITS (cr=299 pr=0 pw=0 time=11514 us 89610 TABLE ACCESS FULL PATIENT_VISITS (cr=299 pr=0 pw=0 time=38901 us
  • 15. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | TKYTE@ORA12C> select avg( visit_date-last_visit_date ) 2 from ( 3 select visit_date, 4 (select max(visit_date) 5 from patient_visits t2 6 where t2.patient_id = t1.patient_id 7 and t2.visit_date < t1.visit_date) last_visit_date 8 from patient_visits t1 9 ) 10 / AVG(VISIT_DATE-LAST_VISIT_DATE) ------------------------------- 23.5868092
  • 16. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 6.85 6.90 360 30919 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 6.85 6.91 360 30919 0 1 Rows (1st) Row Source Operation ---------- --------------------------------------------------- 89610 SORT AGGREGATE (cr=30620 pr=360 pw=0 time=5432914 us) 89224 FIRST ROW (cr=30620 pr=360 pw=0 time=3154562 us cost=2 size=19 card 89224 INDEX RANGE SCAN (MIN/MAX) PATIENT_VISITS_PK (cr=30620 pr=360 pw=0 1 SORT AGGREGATE (cr=30919 pr=360 pw=0 time=6908877 us) 89610 TABLE ACCESS FULL PATIENT_VISITS (cr=299 pr=0 pw=0 time=403891 us co
  • 17. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | TKYTE@ORA12C> select avg(visit_date-last_visit_date) 2 from ( 3 select visit_date, 4 lag(visit_date) over 5 (partition by patient_id 6 order by visit_date) 7 as last_visit_date 8 from patient_visits 9 ) 10 / AVG(VISIT_DATE-LAST_VISIT_DATE) ------------------------------- 23.5868092
  • 18. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.19 0.19 0 361 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.19 0.20 0 361 0 1 Rows (1st) Row Source Operation ---------- --------------------------------------------------- 1 SORT AGGREGATE (cr=361 pr=0 pw=0 time=197264 us) 89610 VIEW (cr=361 pr=0 pw=0 time=2751836 us cost=362 size=1612980 card=8 89610 WINDOW BUFFER (cr=361 pr=0 pw=0 time=771339 us cost=362 size=170259 89610 INDEX FULL SCAN PATIENT_VISITS_PK (cr=361 pr=0 pw=0 time=640345 us
  • 19. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |  In-database data mining algorithms and open source R algorithms  SQL, PL/SQL, R languages  Scalable, parallel in-database execution  Workflow GUI and IDEs  Integrated component of Database  Enables enterprise analytical applications Key Features Oracle Advanced Analytics Database Option Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics
  • 20. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Be Specific in Problem Statement Poorly Defined Better Data Mining Technique Predict employees that leave •Based on past employees that voluntarily left: • Create New Attribute EmplTurnover  O/1 Predict customers that churn •Based on past customers that have churned: • Create New Attribute Churn  YES/NO Target “best” customers •Recency, Frequency Monetary (RFM) Analysis •Specific Dollar Amount over Time Window: • Who has spent $500+ in most recent 18 months How can I make more $$? •What helps me sell soft drinks & coffee? Which customers are likely to buy? •How much is each customer likely to spend? Who are my “best customers”? •What descriptive “rules” describe “best customers”? How can I combat fraud? •Which transactions are the most anomalous? • Then roll-up to physician, claimant, employee, etc.
  • 21. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 12c New Features • Predictive Queries – Immediate build/apply of ODM models in SQL query • Classification & regression – Multi-target problems • Clustering query • Anomaly query • Feature extraction query New Server Functionality Select cust_income_level, cust_id, round(probanom,2) probanom, round(pctrank,3)*100 pctrank from ( select cust_id, cust_income_level, probanom, percent_rank() over (partition by cust_income_level order by probanom desc) pctrank from ( select cust_id, cust_income_level, prediction_probability(of anomaly, 0 using *) over (partition by cust_income_level) probanom from customers ) ) where pctrank <= .05 order by cust_income_level, probanom desc; OAA automatically creates multiple anomaly detection models “Grouped_By” and “scores” by partition via powerful SQL query R
  • 22. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fraud Prediction Demo drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL'); create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; begin dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION', 'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET'); end; / -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4'))) where rnk <= 5 order by percent_fraud desc; Automated In-DB Analytical Methodology POLICYNUMBER PERCENT_FRAUD RNK ------------ ------------- ---------- 6532 64.78 1 2749 64.17 2 3440 63.22 3 654 63.1 4 12650 62.36 5 Automated Monthly “Application”! Just add: Create View CLAIMS2_30 As Select * from CLAIMS2 Where mydate > SYSDATE – 30 Time measure: set timing on;
  • 23. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SQL Developer/Oracle Data Miner 4.0 New Features  SQL Script Generation – Deploy entire methodology as a SQL script – Immediate deployment of data analyst’s methodologies R
  • 24. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 25. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Model Clause • A feature of the database since 10g (2003!) • Spreadsheet like construct • Procedural processing in a non-procedural language
  • 26. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | I need to have running totals that group rows into groups such that the total for that group does not exceed some threshold
  • 27. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 60,500 50,000 61,000 49,000 Site cnt ----- -------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 St_key end_key total ------ ------- ----- 1001 1003 60500 1004 1004 50000 1005 1006 61000 1007 1008 49000 Threshold = 65,000 Model Clause
  • 28. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | TKYTE@ORA12C> select start_site, max(end_site), max(running_total) 2 from 3 ( 4 select * 5 from 6 ( select start_site, end_site, cnt, running_total, rn 7 from site_data 8 model dimension by(row_number() 9 over(order by site) rn) 10 measures(site start_site, site end_site, cnt, cnt running_total) 11 rules(running_total[rn > 1] = 12 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 13 or cnt[cv()] > 65000 14 then cnt[cv()] 15 else running_total[cv() - 1] + cnt[cv()] 16 end, 17 start_site[rn > 1] = 18 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 19 or cnt[cv()] > 65000 20 then start_site[cv()] 21 else start_site[cv() - 1] 22 end 23 ) 24 ) 25 ) 26 group by start_site 27 order by start_site 28 / START_SITE MAX(END_SITE) MAX(RUNNING_TOTAL) ---------- ------------- ------------------ 1001 1003 60500 1004 1004 50000 1005 1006 61000 1007 1008 49000
  • 29. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 )
  • 30. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 )
  • 31. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 )
  • 32. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 ) Row 1
  • 33. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 ) Row 2
  • 34. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 ) Row 3
  • 35. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 ) Row 4
  • 36. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 ) Row 5
  • 37. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SITE CNT ---------- ---------- 1001 10000 1002 20000 1003 30500 1004 50000 1005 25000 1006 36000 1007 28000 1008 21000 START_SITE END_SITE CNT RUNNING_TOTAL RN ---------- ---------- ---------- ------------- ---------- 1001 1001 10000 10000 1 1001 1002 20000 30000 2 1001 1003 30500 60500 3 1004 1004 50000 50000 4 1005 1005 25000 25000 5 1005 1006 36000 61000 6 1007 1007 28000 28000 7 1007 1008 21000 49000 8 TKYTE@ORA12C> select * 2 from 3 ( select start_site, end_site, cnt, running_total, rn 4 from site_data 5 model dimension by(row_number() 6 over(order by site) rn) 7 measures(site start_site, site end_site, cnt, cnt running_total) 8 rules(running_total[rn > 1] = 9 case when (running_total[cv() - 1] + cnt[cv()]) > 65000 10 or cnt[cv()] > 65000 11 then cnt[cv()] 12 else running_total[cv() - 1] + cnt[cv()] 13 end, 14 start_site[rn > 1] = 15 case when(running_total[cv() - 1] + cnt[cv()]) > 65000 16 or cnt[cv()] > 65000 17 then start_site[cv()] 18 else start_site[cv() - 1] 19 end 20 ) 21 ) Row 6
  • 38. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  • 39. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  • 40. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 41. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | I need to group a series of audit trail records together based on how close they are to each other. All records within three seconds of each other should be in a group
  • 42. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Row Pattern Matching • This can be done with analytics • Requires three passes – Tag records with ROW_NUMBER as RN that have a timestamp more then 3 seconds away from the prior record – Carry down the maximum RN, all records in a group will have the same RN now – Aggregate • Can be done with modeling in multiple passes as well – One to create the group identifier – One to aggregate • Can be done in a single pass *easily* with pattern matching
  • 43. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 44. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 45. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 46. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 47. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 48. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 49. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 50. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 51. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12CR1> select * 2 from t 3 match_recognize 4 ( order by x 5 measures first(x) start_time, 6 last(x) end_time, 7 sum(y) sum_y 8 one row per match 9 after match skip past last row 10 pattern (any_row another_row_within_3_secs*) 11 define 12 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 13 );
  • 52. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | TKYTE@ORA12C> select * 2 from (select created x, 1 y 3 from all_objects) 4 match_recognize 5 ( order by x 6 measures first(x) start_time, 7 last(x) end_time, 8 sum(y) sum_y 9 one row per match 10 after match skip past last row 11 pattern (any_row another_row_within_3_secs*) 12 define 13 another_row_within_3_secs as (x-prev(x)) <= 3/24/60/60 14 ) 15 order by start_time 16 / START_TIME END_TIME SUM_Y -------------------- -------------------- ---------- 01-aug-2014 16:28:39 01-aug-2014 16:28:42 182 01-aug-2014 16:28:46 01-aug-2014 16:28:47 62 01-aug-2014 16:28:55 01-aug-2014 16:29:32 1075 01-aug-2014 16:31:13 01-aug-2014 16:32:30 4484 01-aug-2014 16:49:15 01-aug-2014 16:52:14 8315 01-aug-2014 16:52:18 01-aug-2014 16:53:20 1582 01-aug-2014 16:53:56 01-aug-2014 16:54:34 1204
  • 53. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte@ora12cr1> SELECT * 2 FROM stocks MATCH_RECOGNIZE 3 ( PARTITION BY symbol 4 ORDER BY tstamp 5 MEASURES 6 STRT.tstamp AS start_tstamp, 7 LAST(DOWN.tstamp) AS bottom_tstamp, 8 LAST(UP.tstamp) AS end_tstamp 9 ONE ROW PER MATCH 10 AFTER MATCH SKIP TO LAST UP 11 PATTERN (STRT DOWN+ UP+) 12 DEFINE 13 DOWN AS DOWN.price < PREV(DOWN.price), 14 UP AS UP.price > PREV(UP.price) 15 ) MR 16 ORDER BY MR.symbol, MR.start_tstamp; SYMBOL START_TST BOTTOM_TS END_TSTAM ---------- --------- --------- --------- ORCL 01-SEP-12 03-SEP-12 07-SEP-12 ORCL 07-SEP-12 10-SEP-12 13-SEP-12 SYMBOL TSTAMP PRICE HIST ---------- --------- ---------- ------------------------------------- --- ORCL 01-SEP-12 35 *********************************** ORCL 02-SEP-12 34 ********************************** ORCL 03-SEP-12 33 ********************************* ORCL 04-SEP-12 34 ********************************** ORCL 05-SEP-12 35 *********************************** ORCL 06-SEP-12 36 ************************************ ORCL 07-SEP-12 37 ************************************* ORCL 08-SEP-12 36 ************************************ ORCL 09-SEP-12 35 *********************************** ORCL 10-SEP-12 34 ********************************** ORCL 11-SEP-12 35 *********************************** ORCL 12-SEP-12 36 ************************************ ORCL 13-SEP-12 37 ************************************* 13 rows selected.
  • 54. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 55. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | External Tables Sqlldr is the legacy data loading tool from the 20th century • Query flat files • Query datapump format files • Query output of programs (10.2.0.5 and above) – Load compressed files without uncompressing – Query program output, like ls, ps, df, etc • Query HDFS/Hive • Infinite possibilities
  • 56. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12C> CREATE TABLE EMP_ET 2 ( 3 "EMPNO" NUMBER(4), 4 "ENAME" VARCHAR2(10), 5 "JOB" VARCHAR2(9), 6 "MGR" NUMBER(4), 7 "HIREDATE" DATE, 8 "SAL" NUMBER(7,2), 9 "COMM" NUMBER(7,2), 10 "DEPTNO" NUMBER(2) 11 ) 12 ORGANIZATION external 13 ( TYPE oracle_loader 14 DEFAULT DIRECTORY load_dir 15 ACCESS PARAMETERS 16 ( RECORDS DELIMITED BY NEWLINE 17 preprocessor exec_dir:'run_gunzip.sh' 18 FIELDS TERMINATED BY "|" LDRTRIM 19 ) 20 location ( 'emp.dat.gz') 21 ) 22 / Table created.
  • 57. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | ops$tkyte%ORA12C> !file emp.dat.gz emp.dat.gz: gzip compressed data, was "emp.dat", from Unix, last … ops$tkyte%ORA12C> !cat run_gunzip.sh #!/bin/bash /usr/bin/gunzip -c $* ops$tkyte%ORA11GR2> select empno, ename from emp_et where rownum <= 5; EMPNO ENAME ---------- ---------- 7369 SMITH 7499 ALLEN 7521 WARD 7566 JONES 7654 MARTIN
  • 58. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SQL> !cat /home/tkyte/df #!/bin/bash /bin/df –Pl SQL> !/home/tkyte/run_df.sh Filesystem 1024-blocks Used Available Capacity Mounted on /dev/mapper/VolGr... 18156292 10827600 6391528 63% / /dev/sda1 101086 12062 83805 13% /boot tmpfs 517520 0 517520 0% /dev/shm
  • 59. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SQL> create table df 2 ( 3 fsname varchar2(100), 4 blocks number, 5 used number, 6 avail number, 7 capacity varchar2(10), 8 mount varchar2(100) 9 ) 10 organization external 11 ( 12 type oracle_loader 13 default directory exec_dir 14 access parameters 15 ( 16 records delimited 17 by newline 18 preprocessor 19 exec_dir:'run_df.sh' 20 skip 1 21 fields terminated by 22 whitespace ldrtrim 23 ) 24 location 25 ( 26 exec_dir:'run_df.sh' 27 ) 28 ) 29 / Table created.
  • 60. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | SQL> select * from df; FSNAME BLOCKS USED AVAIL CAPACITY MOUNT ——————————————————————————————— ———————— ———————— ——————— ——————— —————— /dev/mapper/VolGroup00-LogVol00 18156292 10827600 6391528 63% / /dev/sda1 101086 12062 83805 13% /boot tmpfs 517520 0 517520 0% /dev/shm
  • 61. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | with fs_data as (select /*+ materialize */ * from df) select mount, file_name, bytes, tot_bytes, avail_bytes, case when 0.2 * tot_bytes < avail_bytes then 'OK' else 'Short on disk space' end status from ( select file_name, mount, avail_bytes, bytes, sum(bytes) over (partition by mount) tot_bytes from ( select a.file_name, b.mount, b.avail*1024 avail_bytes, a.bytes, row_number() over (partition by a.file_name order by length(b.mount) DESC) rn from dba_data_files a, fs_data b where a.file_name like b.mount || '%' ) where rn = 1 ) order by mount, file_name
  • 62. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  • 63. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 64. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 64 Big Data Appliance + Hadoop HDFS DataNode Exadata + Oracle Database OracleCatalog ExternalTable create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10) ) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1) LOCATION ('hive://customer_address') ) HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata Publish Hadoop Metadata to Oracle Catalog
  • 65. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 65 create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10) ) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1) LOCATION ('hive://customer_address') ) Publish Hadoop Metadata to Oracle Catalog Big Data Appliance + Hadoop HDFS DataNode Exadata + Oracle Database OracleCatalog ExternalTable HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10) ) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1) LOCATION ('hive://customer_address') ) • SerDe • RecordReader • InputFormat
  • 66. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 66 HDFS DataNode OracleCatalog ExternalTable Select c_customer_id , c_customer_last_name , ca_county From customers , customer_address where c_customer_id = ca_customer_id and ca_state = ‘CA’ HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata Executing Queries on Hadoop HDFS DataNode HDFS DataNode Determine: • Data locations • Data structure • Parallelism Send to specific data nodes: • Data request • Context
  • 67. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 67 HDFS DataNode OracleCatalog ExternalTable Select c_customer_id , c_customer_last_name , ca_county From customers , customer_address where c_customer_id = ca_customer_id and ca_state = ‘CA’ HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata Executing Queries on Hadoop HDFS DataNode HDFS DataNode “Tables” Do I/O and Smart Scan: • Filter rows • Project columns Move only relevant data • Relevant rows • Relevant columns Apply join with database data
  • 68. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Storage Indexes Optimizing Scans on Hadoop • Automatically collect and store the minimum and maximum value within a storage unit • Before scanning a storage unit, verify whether the data requires falls within the Min- Max • If not, skip scanning the block and reduce scan time 68 HDFS DataNode HDFS DataNode HDFS NameNode Hivemetadata HDFS DataNode HDFS DataNode “Blocks” Min Max Min Max Min Max
  • 69. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 69 What if You Could Query All Data without Conversion? Store unconverted JSON data in Hadoop JSON Store business-critical data in Oracle (JSON or Relational) Select customers_document.address.state , revenue from customers, sales where customers_document.id=sales.custID group by customers_document.address.state; Push Down to Hadoop • JSON parsing • Column projection • Bloom filter for faster join JSON
  • 70. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 70 What if You Could Govern All Data? DBMS_REDACT.ADD_POLICY( object_schema => 'txadp_hive_01', object_name => 'customer_address_ext', column_name => 'ca_street_name', policy_name => 'customer_address_redaction', function_type => DBMS_REDACT.RANDOM, expression => 'SYS_CONTEXT('‘ SYS_SESSION_ROLES'', ''REDACTION_TESTER'') =''TRUE''' ); JSON JSON Store unconverted JSON data in Hadoop Store business-critical data in Oracle (JSON or Relational) Apply advanced Security on Hadoop resident data • Masking/Redaction • Virtual Private Database • Fine-Grained Access Controls
  • 71. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Data lives in even more places 71 RelationalHadoop SQL NoSQL Andmore… The magic of Storage Handlers
  • 72. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda Analytics Model Pattern Matching External Tables SQL over Hadoop 1 2 3 4 5
  • 73. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | BIWA Summit January 27-29, 2015 Oracle HQ Conference Center www.biwasummit.org