More Related Content Similar to Teradata Big Data London Seminar (20) More from Hortonworks (20) Teradata Big Data London Seminar2. Need for a Unified Data Architecture for New Insights
Enabling Any User for Any Data Type from Data Capture to Analysis
Java, C/C++, Python, R, SAS, SQL, Excel, BI, Visualization
Reporting and Execution
Discover and Explore
in the Enterprise
Capture, Store and Refine
Audio/ Web & Machine
Images Docs Text CRM SCM ERP
Video Social Logs
2 4/23/12 Teradata Confidential
3. UNIFIED DATA ARCHITECTURE
Data Scientists Quants Customers / Partners Front-Line Workers
Engineers Business Analysts Executives Operational Systems
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
DISCOVERY INTEGRATED
PLATFORM DATA WAREHOUSE
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
3 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
4. Requirements for an Integrated Data
Warehouse
Customers/Partners
• Single View of Your Business Marketing
Business Analysts Front-line Workers
• Cross-Functional Analysis Executives
Knowledge Workers Operational Systems
• Shared Source for Analytics
• Load Once, Use Many Times
• Highest Business Value BUSINESS INTELLIGENCE DATA MINING APPLICATIONS
• Lowest Total Cost of
Ownership
• Fastest Time-to-Market For
New Apps
INTEGRATED
DATA WAREHOUSE
4 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
5. Requirements of a Discovery Platform
DATA SOURCES DISCOVERY DISCOVERY TOOLS USERS
Non- SQL
Relational
Data
Discovery
Platform Data
MapReduce Scientist
Multi-
Structured
• Structured and Statistical Functions Business
Data Analyst
multi-structured
data
• Fraud patterns
• Doesn’t require
Structured extensive data • Customer behavior
Data modeling • Digital marketing
• Doesn’t balance the optimization
books • Supply chain and
• Data completeness supply line sensors
OLTP can be good enough
DBMS’s • No stringent SLAs
5 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
6. UNIFIED DATA ARCHITECTURE
Data Scientists Quants Customers / Partners Front-Line Workers
Engineers Business Analysts Executives Operational Systems
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Big Data Analytics DISCOVERY INTEGRATED
PLATFORM DATA WAREHOUSE
Big Data Management
CAPTURE | STORE | REFINE
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
6 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
7. TERADATA UNIFIED DATA ARCHITECTURE
Data Scientists Quants Customers / Partners Front-Line Workers
Engineers Business Analysts Executives Operational Systems
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Productionize
Analytic Score with Path Variable
Golden Path Application Submit Event Triggers
Fraud Sentiment Analysis Marketing Integration
Multi-Channel Customer Behavior Customer Behavior Analysis
Channel Hoping MySpending Report
Attrition Paths Customer Segmentation
Fraudulent Paths Credit Risk Analysis
Digital Marketing Attribution DISCOVERY INTEGRATED Customer profitability
PLATFORM DATA WAREHOUSE Portfolio Analysis
Consumerization
Sessionization
Cross Platform Aggregation
CAPTURE | STORE | REFINE
E-MAIL STORE SVP SURVEY ON-LINE BRANCH DATA CALL CENTER ATM PROFILE
8. TERADATA UNIFIED DATA ARCHITECTURE
Data Scientists Quants Customers / Partners Front-Line Workers
Engineers Business Analysts Executives Operational Systems
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
DISCOVERY INTEGRATED
PLATFORM DATA WAREHOUSE
SQL-H
CAPTURE | STORE | REFINE
8 ConfidentialVIDEOproprietary. Copyright © 2012 Teradata Corporation.
AUDIO & and IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
9. SQL-H In Action
Join Teradata, Hadoop, Aster tables; feed into Map Reduce
SELECT qrd_focus_area, count(*) SQL manipulation
for calculation
FROM nPath(
ON (
SELECT * FROM
( SELECT * FROM load_from_teradata(
TD Connector to
ON mr_driver TDPID(‘dbc’)
get OWNERSHIP
USERNAME(‘name1’) PASSWORD(‘password1’) data
QUERY(‘SELECT * FROM owner.prod_own_fact’) ) ) AS td
Include local Aster
JOIN owner.prod_dim proddim ON td.prod_id = proddim.product_id
tables in JOIN
JOIN
( SELECT * FROM load_from_hadoop(
ON mr_driver SERVER ('10.10.3.139') Hadoop Connector
to get WARRANTY
USERNAME (‘name2') DBNAME (‘repair') data
TABLENAME ('transaction') ) ) AS sqlh
ON sqlh.prod_ident_nbr = proddim.id )
PARTITION BY party_id, prod_id ORDER BY repair_dt
Any path you
MODE (OVERLAPPING) want, specified
PATTERN ( ‘REPAIR{3}' ) with the power
of regular
SYMBOLS ( event = ‘REPAIR’ AS REPAIR )
expressions!
RESULT (ACCUMULATE(qrd_focus_area OF ANY(REPAIR)) AS qrd_focus_area_path )
)n
GROUP BY 1 ORDER BY 2 desc ;
9 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
10. TERADATA UNIFIED DATA ARCHITECTURE
Data Scientists Quants Customers / Partners Front-Line Workers
Engineers Business Analysts Executives Operational Systems
VIEWPOINT LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS SUPPORT
DISCOVERY Aster Teradata INTEGRATED
PLATFORM Connector DATA WAREHOUSE
Aster Connector for SQL-H Teradata Connector
Hadoop for Hadoop
Aster Loader Teradata Loader
CAPTURE | STORE | REFINE
10 ConfidentialVIDEOproprietary. Copyright © 2012 Teradata Corporation.
AUDIO & and IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
11. When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
“Simple math
Data Pre-
Low Cost at scale” Joins, Analytics
Processing,
Storage and (Score, filter, Unions, (Iterative and Reporting
Refining,
Fast Loading sort, avg., Aggregates data mining)
Cleansing
count...)
Financial Analysis, Ad-Hoc/OLAP
Stable Teradata/ Enterprise-Wide BI TeradataReporting
Teradata Teradata
and Teradata Teradata
Schema Hadoop
Spatial/Temporal
Active Execution
Interactive Data Discovery Aster
Evolving (SQL +
Web Clickstream, Set-Top Box Analysis
Aster /
Aster /
Hadoop Aster Aster Aster
Schema Hadoop
Hadoop MapReduce
CDRs, Sensor Logs, JSON Analytics)
Social Feeds, Text, Image Processing Aster
Format,
No Schema
Hadoop
Hadoop Audio/Video Storage and Refining
Hadoop
Hadoop Hadoop
Hadoop Aster
Aster Aster
Aster (MapReduce
Aster
Analytics)
Storage and Batch Transformations
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
12. When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
“Simple math
Data Pre-
Low Cost at scale” Joins, Analytics
Processing,
Storage and (Score, filter, Unions, (Iterative and Reporting
Refining,
Fast Loading sort, avg., Aggregates data mining)
Cleansing
count...)
Stable Teradata/
Teradata Teradata Teradata Teradata Teradata
Schema Hadoop
Aster
Evolving Hadoop
Aster / Aster /
Aster Aster (SQL +
Aster
Schema Hadoop Hadoop MapReduce
Analytics)
Aster
Format,
Hadoop
Hadoop Hadoop
Hadoop Hadoop
Hadoop Aster
Aster Aster
Aster (MapReduce
Aster
No Schema Analytics)
12 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
15. Churners – and data quality
15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
16. What events lead up to a reboot?
Note number of paths
with a reboot,
following another
reboot!
CREATE dimension table wrk.npath_reboot_5events
AS SELECT path, COUNT(*) AS path_count
FROM nPath
(ON wrk.w_event_f
PARTITION BY srv_id SELECT *
ORDER BY evt_ts desc FROM GraphGen (ON
MODE (NONOVERLAPPING ) (SELECT * from wrk.npath_reboot_5events
PATTERN ('X{0,5}.reboot') ORDER BY path_count
SYMBOLS LIMIT 30 )
(true as X, PARTITION BY 1
evt_name = 'REBOOT' AS reboot) ORDER BY path_count desc
RESULT item_format('npath')
(FIRST( srv_id OF X) AS srv_id, item1_col('path')
ACCUMULATE (evt_name OF ANY (X,reboot)) score_col('path_count')
AS path) output_format('sankey')
) GROUP BY 1 ; justify('right'));
16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
17. View events data in Tableau
Looks like an issue with the data
on the 30th September and
beyond, the Reboot data for
October seems to have been
aggregated and added to
September the 30th
17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
18. Address data quality
• Remove paths will all reboots and exclude data from 30th
September
Would appear
that events with
suffix 1 and 2
can be added
together
18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
19. Visualise as a Graph using Aster GraphGen
Size of Node = number
of customers
Width of Edge = number
of errors
SELECT *
FROM graphgen
(ON
(SELECT DISTINCT dmt_act_dslam,
nra_id,
nbr_of_srvid,
errorspersrv,
nbr_of_dslam
FROM wrk.srvid_dslam_err)
PARTITION BY 1
ORDER BY errorspersrv
item_format('cfilter')
item1_col('dmt_act_dslam')
item2_col('nra_id')
score_col('errorspersrv')
cnt1_col('nbr_of_srvid')
cnt2_col('nbr_of_dslam')
output_format('sigma')
directed('false')
width_max(10)
width_min(1)
nodesize_max (3)
nodesize_min (1));
19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
20. Synch Issues by Hub Type
20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
21. Error and Complaint rates by equipment
type
21 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
24. Input Data
create table wrk.cih_dshb_ads as
SELECT srv_id, sav_flag, offer, inseecode, code_postal, libelle, nom_dep, nom_region, longitude, latitude,
coalesce(topo_nra, 'Unknown') as topo_nra, topo_dslam, coalesce(iad_hardwareversion, 'Unknown') as iad_hardwareversion,
coalesce(iad_manufacturer, 'Unknown') as iad_manufacturer,
coalesce(iad_modelname , 'Unknown') as iad_modelname,
coalesce(iad_modemfirmwareversion , 'Unknown') as iad_modemfirmwareversion,
coalesce(iad_productclass , 'Unknown') as iad_productclass,
coalesce(iad_provisioningcode , 'Unknown') as iad_provisioningcode,
coalesce(iad_softwareversion , 'Unknown') as iad_softwareversion,
coalesce(iad_vendorconfigfiledescription_1 , 'Unknown') as iad_vendorconfigfiledescription_1,
coalesce(iad_vendorconfigfilename_1 , 'Unknown') as iad_vendorconfigfilename_1,
coalesce(iad_vendorconfigfilenumbofentries , 0) as iad_vendorconfigfilenumbofentries,
coalesce(iad_vendorconfigfileversion_1 , 'Unknown') as iad_vendorconfigfileversion_1,
coalesce(iad_x_000e50_boardversion , 'Unknown') as iad_x_000e50_boardversion,
coalesce(stb_description , 'Unknown') as stb_description,
coalesce(stb_devicestatus , 'Unknown') as stb_devicestatus,
coalesce(stb_gwinfoproductclass , 'Unknown') as stb_gwinfoproductclass,
coalesce(stb_hardwareversion , 'Unknown') as stb_hardwareversion,
coalesce(stb_manufacturer , 'Unknown') as stb_manufacturer,
coalesce(stb_productclass , 'Unknown') as stb_productclass,
coalesce( stb_softwareversion, 'Unknown') as stb_softwareversion,
dev_iad_uptime_diff,dsl_showtime_diff,dev_stb_uptime_diff,
kpi_iad_uptime,kpi_iad_synctime,kpi_stb_uptime,
dev_iad_uptime,dsl_showtime,dev_stb_uptime,
dsl_downstr_att,dsl_downstr_cur,dsl_downstr_max,
kpi_voip_nb_dropped_calls_diff,kpi_voip_nb_dropped_calls,kpi_dsl_nb_crc,kpi_dsl_dscurrate_ratio_qualite,
kpi_voip_tx_appels_coupes,kpi_voip_qualite,kpi_voip_qualite_diff,kpi_iptv_plr_nb_bon,kpi_iptv_plr_nb_moyen,
,kpi_iptv_conso_heures,kpi_iptv_packetslosts,kpi_iptv_packetsreceived, kpi_dsl_dscurrate_before,kpi_dsl_dscurrate_after,
FROM wrk.cih_dshb_bis
where network = 'BYT'
and stb_manufacturer is not null
and topo_dslam is not null
24 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
25. Decision Trees
SELECT *
FROM forest_drive
(ON (SELECT 1)
PARTITION BY 1
DATABASE('beehive')
USERID('beehive')
PASSWORD('beehive')
INPUTTABLE('wrk.cih_dshb_tree_in')
OUTPUTTABLE('wrk.cih_dshb_tree_out')
RESPONSE('sav_flag')
NUMERICINPUTS(‘KPI_SIGNAL')
CATEGORICALINPUTS('offer', 'nom_dep', 'nom_region',
'topo_nra','topo_dslam' , 'iad_modemfirmwareversion',
'iad_vendorconfigfiledescription_1', 'iad_x_000e50_boardversion',
'stb_description', 'stb_productclass', 'stb_softwareversion',
'topo_dslam_brand')
NUMTREES(4)
)
25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
26. Naïve Bayes
CREATE TABLE wrk.cih_dshb_model (PARTITION KEY(class)) AS
SELECT * FROM naiveBayesReduce(
ON(SELECT * FROM naiveBayesMap(
ON (select * from wrk.cih_dshb_ads_in_11 where kpi_iad_uptime is not null)
RESPONSE('sav_flag')
NUMERICINPUTS('dev_iad_uptime','dsl_showtime','dev_stb_uptime',
'dsl_downstr_att','dsl_downstr_cur','dsl_downstr_max',
'kpi_voip_nb_dropped_calls_diff','kpi_voip_nb_dropped_calls','kpi_dsl_nb_crc','kpi_dsl_d
scurrate_ratio_qualite','kpi_voip_tx_appels_coupes','kpi_voip_qualite','kpi_voip_qualite_
diff','kpi_iptv_plr_nb_bon','kpi_iptv_plr_nb_moyen','kpi_iptv_plr_nb_mauvais',
'kpi_iptv_packetslosts','kpi_iptv_packetsreceived','kpi_stb_uptime','kpi_iad_synctime','kp
i_iad_uptime')
CATEGORICALINPUTS('offer', 'nom_dep', 'nom_region', 'topo_nra','topo_dslam' ,
'iad_modemfirmwareversion','iad_vendorconfigfiledescription_1','iad_x_000e50_boardve
rsion', 'stb_description', 'stb_productclass', 'stb_softwareversion', 'topo_dslam_brand')
)
)PARTITION BY class
);
26 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
27. Support Vector Machine
create table wrk.cih_svm_train2 distribute by hash(srv_id) as
select srv_id, 'topo_nra_insee' as attr, topo_nra_insee::varchar as attr_value, sav_all_tgt
FROM wrk.cih_sav_train union all
select srv_id, 'code_postal' as attr, code_postal::varchar as attr_value, sav_all_tgt
FROM wrk.cih_sav_train union all
select srv_id, 'kpi_iad_uptime_avg' as attr, kpi_iad_uptime_avg::varchar as attr_value, sav_all_tgt
FROM wrk.cih_sav_train union all
select srv_id, 'dev_iad_uptime_diff_avg' as attr, dev_iad_uptime_diff_avg::varchar as attr_value, sav_all_tgt
FROM wrk.cih_sav_train union all
select srv_id, 'kpi_voip_nb_dropped_calls_diff_avg' as attr, kpi_voip_nb_dropped_calls_diff_avg::varchar as
attr_value, sav_all_tgt
FROM wrk.cih_sav_train union all
select srv_id, 'sav_nb_contacts' as attr, sav_nb_contacts::varchar as attr_value, sav_all_tgt
FROM wrk.cih_sav_train union all
select srv_id, 'nb_tr' as attr, nb_tr::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all
select srv_id, 'kpi_dsl_nb_crc_avg' as attr, kpi_dsl_nb_crc_avg::varchar as attr_value, sav_all_tgt
FROM wrk.cih_sav_train;
/*Run SVM*/
CREATE TABLE wrk.cih_svm_model3 (PARTITION KEY(vec_index)) AS
SELECT vec_index, avg(vec_value) as vec_value FROM
svm( ON wrk.cih_svm_train2
PARTITION BY srv_id
OUTCOME( 'sav_flag' )
ATTRIBUTE_NAME( 'attr' )
ATTRIBUTE_VALUE( 'attr_value' )
)GROUP BY vec_index;
27 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
28. Lift Chart to View Predictive Model
Performance
28 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Editor's Notes We want to help companies manage all of their data and get the best analytics valuePeople define big data around 3 V’s (volume, velocity, variety)Teradata sees the most value in “Big A” – Analytics. New analytics is what solves business problems which couldn’t be addressed beforeTo leverage Big Data you must give all the business analysts in your organization the right analytical tool on all the existing and new data available Operationalizing these new insights drives competitive advantage To do this we’ve develop the Unified Data Architecture™, an architecture that leverages the right technology on the right analytical problems - leveraging best-of-breed technologies. Good slide. Important. Could be made prettier.