VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
SQL : The one language to rule all your data
1. 06/03/2018
1
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
The one language to rule all your data
Brendan Tierney
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
2. 06/03/2018
2
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
§ Data Warehousing since 1997
§ Data Mining since 1998
§ Analytics since 1993
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
3. 06/03/2018
3
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
April 2017 : http://blog.sqlizer.io/posts/sql-43/
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
5. 06/03/2018
5
Analyze
SELECT product, SUM(sale) AS "Total Sales"
FROM order_details
GROUP BY product;
Analyze
SELECT product, SUM(sale) AS "Total Sales"
FROM order_details
GROUP BY product
HAVING SUM(sale) >= 10000;
COUNT
AVG
MIN
MAX
MEAN
MODE
…
SELECT
gcc.segment1 AS bal_seg,
hi.parent_flex_value AS bu_rollup_group,
gcc.segment2 AS business_unit,
gcc.segment3 AS LOB,
gcc.segment4 AS ACCOUNT,
gcc.segment5 AS department,
gcc.segment6 AS product,
gcc.segment7 AS responsibility_center,
gcc.segment8 AS sub_department,
TRUNC (gjl.creation_date) AS je_line_creation_date,
gjh.name AS je_name,
gjb.name AS je_batch_name,
TO_CHAR (prds.end_date, 'yyyy/mm') AS period_name,
gjb.status batch_status,
gjl.je_line_num AS JE_LINE_NUMBER,
CASE NVL(xdl.application_id,0) WHEN 0 THEN NVL(gjl.entered_dr,0) ELSE NVL(xdl.unrounded_entered_dr,0) END entered_dr,
CASE NVL(xdl.application_id,0) WHEN 0 THEN NVL(gjl.entered_cr,0) ELSE NVL(xdl.unrounded_entered_cr,0) END entered_cr,
gjl.description AS je_line_description,
aps.segment1 AS vendor_number,
NVL(aps.vendor_name, gjl.attribute1) AS vendor_name,
NVL(aia.invoice_num, gjl.attribute3) AS invoice_number,
aia.invoice_date,
NVL(pha.segment1, gjl.attribute2) AS po_number,
NVL(aida.attribute5, gjl.attribute4) AS beginning_service_date,
NVL(aida.attribute6, gjl.attribute5) AS ending_service_date,
ppa.segment1 AS project_number,
gjl.attribute6 AS payroll_check_number,
gjc.user_je_category_name AS JE_CATEGORY_NAME,
gjh.posted_date,
gjh.description AS JE_HEADER_DESCRIPTION,
DECODE (gjh.actual_flag, 'A', 'A', 'B') actual_flag,
TRUNC (gjh.creation_date) AS JE_CREATED_ON_DATE,
gjs.user_je_source_name AS JE_SOURCE_NAME,
gjl.code_combination_id,
NVL (gjl.attribute7, papf.full_name) AS created_by,
T1.file_name AS ipm_image_id,
T1.url,
hi.division AS division,
hi.parent_flex_value AS region,
aia.invoice_id,
NVL(MAIN_DOC.ipm_image_flg,'N') AS ipm_image_flg,
NVL(MAIN_DOC.IMAGE_CNT,0) AS ipm_image_cnt,
gjl.je_line_num,
gjl.je_header_id
FROM apps.gl_je_headers gjh
inner join apps.gl_je_batches gjb on gjh.je_batch_id = gjb.je_batch_id and gjb.status = 'P'
inner join apps.gl_je_sources gjs on gjh.je_source = gjs.je_source_name
6. 06/03/2018
6
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Let us start with some Basics
7. 06/03/2018
7
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
8. 06/03/2018
8
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
9. 06/03/2018
9
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
SUM(x)
AVG(x)
STDDEV(x)
CORR(x, y)
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
10. 06/03/2018
10
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
11. 06/03/2018
11
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Creating a story about our data.
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
12. 06/03/2018
12
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
How we do Analytics?
Sometimes how we are told how to do Analytics
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Do we really need to use other tools & languages?
13. 06/03/2018
13
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
14. 06/03/2018
14
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
But !
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
But !
15. 06/03/2018
15
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
But !
Our data no longer fits on our laptop.
a Big Data issue?
Creating Data Silos is BAD
This kind of approach is BAD
This approach does not scale – this is BAD
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
16. 06/03/2018
16
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
R - The Challenges
§ Scalability
§ Regardless of the number of cores on your CPU, R will only use 1 on a default
build
§ Performance
§ R reads data into memory by default. Easy to exhaust RAM by storing unnecessary
data. Typically R will throw an exception at 2GB.
§ Parallelization can be challenge. Is not Default. Packages available
§ Production Deployment
§ Difficulties deploying R in production
§ Typically need to re-code in …..
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
I’m getting too old for this new stuff !
Can you teach an old dog new tricks?
17. 06/03/2018
17
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
What if you could use the language and skills you
already have?
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Did you know?
18. 06/03/2018
18
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Statistical Functions in Oracle
All of these are
FREE
with the Database
These are often
forgotten about
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
19. 06/03/2018
19
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
R for Data Profiling
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
I didn’t
20. 06/03/2018
20
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
DBMS_STAT_FUNC
22. 06/03/2018
22
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
23. 06/03/2018
23
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Scalable
Highly Secure
No Data
Movement
Real Time
Production
Deployment
Faster
24. 06/03/2018
24
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
25. 06/03/2018
25
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
26. 06/03/2018
26
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Comprehensive Machine Learning Platform
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Oracle Data Mining
§ PL/SQL Package
§ DBMS_DATA_MINING
§ DBMS_DATA_MINING_TRANSFORM
§ DBMS_PREDICTIVE_ANALYTICS
§ SQL Functions
– PREDICTION
– PREDICTION_PROBABILITY
– PREDICTION_BOUNDS
– PREDICTION_COST
– PREDICTION_DETAILS
– PREDICTION_SET
– CLUSTER_ID
– CLUSTER_DETAILS
– CLUSTER_DISTANCE
– CLUSTER_PROBABILITY
– CLUSTER_SET
– FEATURE_ID
– FEATURE_DETAILS
– FEATURE_SET
– FEATURE_VALUE
§ 12c – Predictive Queries
§ aka Dynamic Queries
§ Transitive dynamic Data Mining models
§ Can scale to many 100+ models all in one
statement
27. 06/03/2018
27
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER () pred_affinity_card
from mining_data_build_v;
PQ to predict the
AFFINITY_CARD value.
Using all the data
USING *
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER () pred_affinity_card
from mining_data_build_v;
With PQs we can
dynamically create
new DM models based
on an Attribute(s)
28. 06/03/2018
28
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER
(PARTITION BY "COUNTRY_NAME") pred_affinity_card
from mining_data_build_v;
A new DM Model will
be created for each
Country (19)
With PQs we can
dynamically create
new DM models based
on an Attribute(s)
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Analytic Functions (in 12c)
>46 Analytics Functions in 12c
29. 06/03/2018
29
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
What about R ?
30. 06/03/2018
30
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
--
-- There are 2 ways to use the GLM model : in Batch and in Real-Time mode
--
-- First Step : Build the in-database R script to score you new data
--
Begin
sys.rqScriptDrop('Demo_GLM_Batch');
sys.rqScriptCreate('Demo_GLM_Batch',
'function(dat, datastore_name) {
ore.load(datastore_name)
prd <- predict(mod, newdata=dat)
prd[as.integer(rownames(prd))] <- prd
res <- cbind(dat, PRED = prd)
res}');
end;
/
--
-- Now you can run the script to score the new data in Batch model
-- The data is located in the table MINING_DATA_APPLY
--
select * from table(rqTableEval(
cursor(select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION,
HOUSEHOLD_SIZE, YRS_RESIDENCE
from MINING_DATA_APPLY_V
where rownum <= 10),
cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name" from dual),
'select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION,
HOUSEHOLD_SIZE, YRS_RESIDENCE, 1 PRED from MINING_DATA_APPLY_V','Demo_GLM_Batch'))
order by 1, 2, 3;
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
31. 06/03/2018
31
Store
Access
Analyze
Protect
Oracle is no longer a Relational Database
But is more like a Polyglot or Multi-modal Database
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
External
View
External
View
Conceptual Schema
Physical Schema
32. 06/03/2018
32
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE TABLE countries_ext (
country_code VARCHAR2(5),
country_name VARCHAR2(50),
country_language VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY ext_tab_data
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
country_code CHAR(5),
country_name CHAR(50),
country_language CHAR(50)
)
)
LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;
SELECT * FROM countries_ext ORDER BY country_name;
COUNT COUNTRY_NAME COUNTRY_LANGUAGE
----- ---------------------------- -----------------------------
ENG England English
FRA France French
GER Germany German
IRE Ireland English
External
View
External
View
Conceptual Schema
Physical Schema
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE_TABLE CUSTOMER_RAWDATA (
customer_number NUMBER,
customer_name VARCHAR2(50),
postal_code VARCHAR2 (5)
)
ORGANIZATION EXTERNAL (
type oracle_hdfs
default directory TEMP
access parameters
(
com.oracle.bigdata.cluster = hadoop_clust
com.oracle.bigdata.rowformat = delimited fields terminated by ','
)
location('hdfs/p1a.dat',
'hdfs/p1b.dat',
'hdfs/p2.dat',
'hdfs/p3.dat'
) );
External
View
External
View
Conceptual Schema
Physical Schema
Partitioned External tables (new in 12.2)
Data is stored on our
Hadoop cluster
Problem: We still need to scan all the files for the data we need
We may not get the degree of parallelism we want.
But with Partitioned External tables we can provide meta-data
Data is stored in
many files
33. 06/03/2018
33
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE_TABLE CUSTOMER_RAWDATA (
customer_number NUMBER,
customer_name VARCHAR2(50),
postal_code VARCHAR2 (5)
)
ORGANIZATION EXTERNAL (
type oracle_hdfs
default directory TEMP
access parameters
(
com.oracle.bigdata.cluster = hadoop_clust
com.oracle.bigdata.rowformat = delimited fields terminated by ','
)
location('hdfs/p1a.dat',
'hdfs/p1b.dat',
'hdfs/p2.dat',
'hdfs/p3.dat'
) );
External
View
External
View
Conceptual Schema
Physical Schema
Partitioned External tables (new in 12.2)
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE_TABLE CUSTOMER_RAWDATA (
customer_number NUMBER,
customer_name VARCHAR2(50),
postal_code VARCHAR2 (5)
)
ORGANIZATION EXTERNAL (
type oracle_hdfs
default directory TEMP
access parameters
(
com.oracle.bigdata.cluster = hadoop_clust
com.oracle.bigdata.rowformat = delimited fields terminated by ','
)
partition by range(customer_number)
(
partition p1 values less than (100) location('hdfs/p1a.dat', 'hdfs/p1b.dat'),
partition p2 values less than (200) location('hdfs/p2.dat'),
partition p3 values less than (300) location('hdfs/p3.dat')
) );
External
View
External
View
Conceptual Schema
Physical Schema
Partitioned External tables (new in 12.2)
Now we get Partition elimination
Only really works if the data is natively partitioned when files
are created.
and does this correctly every time !!!
Doesn’t have to be on
Hadoop. Also works
with files on server.
34. 06/03/2018
34
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE TABLE json_dump_file_contents (
json_document CLOB
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY order_entry_dir
ACCESS PARAMETERS (
RECORDS DELIMITED BY 0x'0A'
DISABLE_DIRECTORY_LINK_CHECK
BADFILE loader_output_dir: 'JSONDumpFile.bad'
LOGFILE order_entry_dir: 'JSONDumpFile.log'
FIELDS (
json_document CHAR(5000)
)
)
LOCATION (order_entry_dir:'PurchaseOrders.dmp')
)
PARALLEL
REJECT LIMIT UNLIMITED;
SELECT count(*)
FROM json_dump_file_contents po
WHERE to_number(json_value(json_document, '$.PONumber')) > 1500;
External
View
External
View
Conceptual Schema
Physical Schema
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE TABLE json_documents (
id RAW(16) NOT NULL,
data CLOB,
CONSTRAINT json_documents_pk PRIMARY KEY (id),
CONSTRAINT json_documents_json_chk CHECK (data IS JSON (STRICT) )
);
INSERT INTO json_documents (id, data)
VALUES (SYS_GUID(),
'{ "FirstName" : ”Brendan",
"LastName" : ”Tierney",
"Job" : "Clerk",
"Address" : { "Street" : ”1 Main Street",
"City" : ”Dublin",
"Country" : ”Ireland”},
"ContactDetails" : { "Email" : ”xyz@oralytics.com",
"Phone" : ”353 123 1234567",
"Twitter" : "@brendantierney" },
"DateOfBirth" : "01-JAN-2000",
"Active" : unknown }');
SELECT a.data.FirstName,
a.data.LastName,
a.data.ContactDetails.Email AS Email
FROM json_documents a
ORDER BY a.data.FirstName, a.data.LastName;
FIRSTNAME LASTNAME EMAIL
--------------- --------------- -------------------------
Brendan Tierney xyz@oralytics.com
External
View
External
View
Conceptual Schema
Physical Schema
35. 06/03/2018
35
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
CREATE TABLE customer (
id NUMBER(38),
name VARCHAR2(100),
address VARCHAR2(100),
city VARCHAR2(40),
country VARCHAR2(50),
location MDSYS.SDO_GEOMETRY
);
INSERT INTO customer VALUES (
cust_seq.nextval,‘Brendan Tierney’, ‘1 Main Street’, ‘Dublin’, ‘Ireland’,
SDO_GEOMETRY
(2001, -- Geometry Type: 2-D Point
8307, -- SRID, Datum: WGS84
SDO_POINT_TYPE
(53.3498, -- Longitude for Dublin
6.2603, -- Latitude for Dublin
NULL),
NULL,
NULL
)
)
SELECT sdo_geom.sdo_distance(c1.locationm c2.location, 0.5, ‘unit=kilometer’)
FROM customer c1,
customer c2
WHERE c1.id = 1
AND c2.id = 2;
External
View
External
View
Conceptual Schema
Physical Schema
Spatial
&
Graph
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Using Oracle Big Data SQL, organizations can:
• Combine data from Oracle Database, Apache Hadoop and NoSQL in a single SQL query
• Query and analyze data In Apache Hadoop and NoSQL
• Maximize query performance on all data using advanced techniques like Smart Scan,
Partition Pruning, Storage Indexes, Bloom Filters and Predicate Push-Down in a
distributed architecture
• Integrate big data analyses into existing applications and architectures
External
View
External
View
Conceptual Schema
Physical Schema
Spatial
&
Graph
Oracle
NoSQL
Lots more Data
Sources coming
available
36. 06/03/2018
36
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
External
View
External
View
Conceptual Schema
Physical Schema
Spatial
&
Graph
Oracle
NoSQL
Accessing data on Hadoop or an Oracle NoSQL Database requires access via
Hive/HCatalog.
To use this Hadoop or NoSQL data
• Creating a NoSQL Store and a Table (or Hadoop data)
• Configuring Hive/HCatalog to access NoSQL Table or other data
• Configuring Oracle Database to talk to HCatalog via an external table
Lots more Data
Sources coming
available
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
External
View
External
View
Conceptual Schema
Physical Schema
Spatial
&
Graph
Oracle
NoSQL
CREATE TABLE movieapp_log_json (
custid INTEGER ,
movieid INTEGER ,
genreid INTEGER ,
time VARCHAR2 (20) ,
recommended VARCHAR2 (4) ,
activity NUMBER,
rating INTEGER,
price NUMBER
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
)
REJECT LIMIT UNLIMITED;
SELECT f.custid, m.title, m.year, m.gross, f.rating
FROM movieapp_log_json f, movie m
WHERE f.movieId = m.movie_id
AND f.rating > 4
Selects Hadoop data
and in-database data
38. 06/03/2018
38
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Data Security
§ Can apply all the typical data security that comes with Oracle on all our data
– Masking/Redaction
– Virtual Private Databases
– Fine-grained access control
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
Store
Access
Analyze
Protect
External
View
External
View
Conceptual Schema
Physical Schema
Spatial
&
Graph
Oracle
NoSQL
39. 06/03/2018
39
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
SQL
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
Sauron – Lord of the Rings
40. 06/03/2018
40
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
SQL
One SQL to rule them all, One SQL to find them,
One SQL to bring them all and in the Database bind them
Sauron – Lord of the Rings
Brendan Tierney
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
SQL
One Language to rule them all, One Language to find them,
One Language to bring them all and in the Database bind them
Sauron – Lord of the Rings
Brendan Tierney
41. 06/03/2018
41
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
42. 06/03/2018
42
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com
www.oralytics.com t : @brendantierney e : brendan.tierney@oralytics.com