Teradata Big Data London Seminar
Upcoming SlideShare
Loading in...5
×
 

Teradata Big Data London Seminar

on

  • 1,908 views

Unified Data Architecture - Teradata presentation on the topic of Big Data and Apache Hadoop.

Unified Data Architecture - Teradata presentation on the topic of Big Data and Apache Hadoop.

Statistics

Views

Total Views
1,908
Slideshare-icon Views on SlideShare
1,893
Embed Views
15

Actions

Likes
3
Downloads
96
Comments
1

1 Embed 15

https://hwtest.uservoice.com 15

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I like your Big Data presentation.
    I would like to share with you document about application of Big Data and Data Science in retail banking. http://www.slideshare.net/LadislavUrban/syoncloud-big-data-for-retail-banking-syoncloud
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We want to help companies manage all of their data and get the best analytics valuePeople define big data around 3 V’s (volume, velocity, variety)Teradata sees the most value in “Big A” – Analytics. New analytics is what solves business problems which couldn’t be addressed beforeTo leverage Big Data you must give all the business analysts in your organization the right analytical tool on all the existing and new data available Operationalizing these new insights drives competitive advantage To do this we’ve develop the Unified Data Architecture™, an architecture that leverages the right technology on the right analytical problems - leveraging best-of-breed technologies.
  • Good slide. Important. Could be made prettier.

Teradata Big Data London Seminar Teradata Big Data London Seminar Presentation Transcript

  • UNIFIED DATA ARCHITECTUREChris HillmanTeradata Principal Data Scientist
  • Need for a Unified Data Architecture for New InsightsEnabling Any User for Any Data Type from Data Capture to Analysis Java, C/C++, Python, R, SAS, SQL, Excel, BI, Visualization Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs 2 4/23/12 Teradata Confidential
  • UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS DISCOVERY INTEGRATED PLATFORM DATA WAREHOUSE AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP3 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Requirements for an Integrated Data Warehouse Customers/Partners• Single View of Your Business Marketing Business Analysts Front-line Workers• Cross-Functional Analysis Executives Knowledge Workers Operational Systems• Shared Source for Analytics• Load Once, Use Many Times• Highest Business Value BUSINESS INTELLIGENCE DATA MINING APPLICATIONS• Lowest Total Cost of Ownership• Fastest Time-to-Market For New Apps INTEGRATED DATA WAREHOUSE4 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Requirements of a Discovery Platform DATA SOURCES DISCOVERY DISCOVERY TOOLS USERS Non- SQL Relational Data Discovery Platform Data MapReduce Scientist Multi- Structured • Structured and Statistical Functions Business Data Analyst multi-structured data • Fraud patterns • Doesn’t require Structured extensive data • Customer behavior Data modeling • Digital marketing • Doesn’t balance the optimization books • Supply chain and • Data completeness supply line sensors OLTP can be good enough DBMS’s • No stringent SLAs5 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Big Data Analytics DISCOVERY INTEGRATED PLATFORM DATA WAREHOUSE Big Data Management CAPTURE | STORE | REFINE AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP6 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • TERADATA UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Productionize Analytic Score with Path Variable Golden Path Application Submit Event Triggers Fraud Sentiment Analysis Marketing IntegrationMulti-Channel Customer Behavior Customer Behavior Analysis Channel Hoping MySpending Report Attrition Paths Customer Segmentation Fraudulent Paths Credit Risk Analysis Digital Marketing Attribution DISCOVERY INTEGRATED Customer profitability PLATFORM DATA WAREHOUSE Portfolio Analysis Consumerization Sessionization Cross Platform Aggregation CAPTURE | STORE | REFINE E-MAIL STORE SVP SURVEY ON-LINE BRANCH DATA CALL CENTER ATM PROFILE
  • TERADATA UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS DISCOVERY INTEGRATED PLATFORM DATA WAREHOUSE SQL-H CAPTURE | STORE | REFINE8 ConfidentialVIDEOproprietary. Copyright © 2012 Teradata Corporation. AUDIO & and IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
  • SQL-H In ActionJoin Teradata, Hadoop, Aster tables; feed into Map ReduceSELECT qrd_focus_area, count(*) SQL manipulation for calculationFROM nPath( ON ( SELECT * FROM ( SELECT * FROM load_from_teradata( TD Connector to ON mr_driver TDPID(‘dbc’) get OWNERSHIP USERNAME(‘name1’) PASSWORD(‘password1’) data QUERY(‘SELECT * FROM owner.prod_own_fact’) ) ) AS td Include local Aster JOIN owner.prod_dim proddim ON td.prod_id = proddim.product_id tables in JOIN JOIN ( SELECT * FROM load_from_hadoop( ON mr_driver SERVER (10.10.3.139) Hadoop Connector to get WARRANTY USERNAME (‘name2) DBNAME (‘repair) data TABLENAME (transaction) ) ) AS sqlh ON sqlh.prod_ident_nbr = proddim.id ) PARTITION BY party_id, prod_id ORDER BY repair_dt Any path you MODE (OVERLAPPING) want, specified PATTERN ( ‘REPAIR{3} ) with the power of regular SYMBOLS ( event = ‘REPAIR’ AS REPAIR ) expressions! RESULT (ACCUMULATE(qrd_focus_area OF ANY(REPAIR)) AS qrd_focus_area_path ))nGROUP BY 1 ORDER BY 2 desc ;9 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • TERADATA UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems VIEWPOINT LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS SUPPORT DISCOVERY Aster Teradata INTEGRATED PLATFORM Connector DATA WAREHOUSE Aster Connector for SQL-H Teradata Connector Hadoop for Hadoop Aster Loader Teradata Loader CAPTURE | STORE | REFINE10 ConfidentialVIDEOproprietary. Copyright © 2012 Teradata Corporation. AUDIO & and IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
  • When to Use Which? The best approach by workload and data typeProcessing as a Function of Schema Requirements and Stage of Data Pipeline “Simple math Data Pre- Low Cost at scale” Joins, Analytics Processing, Storage and (Score, filter, Unions, (Iterative and Reporting Refining, Fast Loading sort, avg., Aggregates data mining) Cleansing count...) Financial Analysis, Ad-Hoc/OLAP Stable Teradata/ Enterprise-Wide BI TeradataReporting Teradata Teradata and Teradata Teradata Schema Hadoop Spatial/Temporal Active Execution Interactive Data Discovery Aster Evolving (SQL + Web Clickstream, Set-Top Box Analysis Aster / Aster / Hadoop Aster Aster Aster Schema Hadoop Hadoop MapReduce CDRs, Sensor Logs, JSON Analytics) Social Feeds, Text, Image Processing Aster Format,No Schema Hadoop Hadoop Audio/Video Storage and Refining Hadoop Hadoop Hadoop Hadoop Aster Aster Aster Aster (MapReduce Aster Analytics) Storage and Batch Transformations 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • When to Use Which? The best approach by workload and data typeProcessing as a Function of Schema Requirements and Stage of Data Pipeline “Simple math Data Pre- Low Cost at scale” Joins, Analytics Processing, Storage and (Score, filter, Unions, (Iterative and Reporting Refining, Fast Loading sort, avg., Aggregates data mining) Cleansing count...) Stable Teradata/ Teradata Teradata Teradata Teradata Teradata Schema Hadoop Aster Evolving Hadoop Aster / Aster / Aster Aster (SQL + Aster Schema Hadoop Hadoop MapReduce Analytics) Aster Format, Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Aster Aster Aster Aster (MapReduce AsterNo Schema Analytics) 12 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • UDA IN PRACTICEIPTV QUALITY OF SERVICE
  • Starting point: Complaints Data14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Churners – and data quality15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • What events lead up to a reboot? Note number of paths with a reboot, following another reboot! CREATE dimension table wrk.npath_reboot_5events AS SELECT path, COUNT(*) AS path_count FROM nPath (ON wrk.w_event_f PARTITION BY srv_id SELECT * ORDER BY evt_ts desc FROM GraphGen (ON MODE (NONOVERLAPPING ) (SELECT * from wrk.npath_reboot_5events PATTERN (X{0,5}.reboot) ORDER BY path_count SYMBOLS LIMIT 30 ) (true as X, PARTITION BY 1 evt_name = REBOOT AS reboot) ORDER BY path_count desc RESULT item_format(npath) (FIRST( srv_id OF X) AS srv_id, item1_col(path) ACCUMULATE (evt_name OF ANY (X,reboot)) score_col(path_count) AS path) output_format(sankey) ) GROUP BY 1 ; justify(right));16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • View events data in Tableau Looks like an issue with the data on the 30th September and beyond, the Reboot data for October seems to have been aggregated and added to September the 30th17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Address data quality • Remove paths will all reboots and exclude data from 30th September Would appear that events with suffix 1 and 2 can be added together18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Visualise as a Graph using Aster GraphGen Size of Node = number of customers Width of Edge = number of errors SELECT * FROM graphgen (ON (SELECT DISTINCT dmt_act_dslam, nra_id, nbr_of_srvid, errorspersrv, nbr_of_dslam FROM wrk.srvid_dslam_err) PARTITION BY 1 ORDER BY errorspersrv item_format(cfilter) item1_col(dmt_act_dslam) item2_col(nra_id) score_col(errorspersrv) cnt1_col(nbr_of_srvid) cnt2_col(nbr_of_dslam) output_format(sigma) directed(false) width_max(10) width_min(1) nodesize_max (3) nodesize_min (1));19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Synch Issues by Hub Type20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Error and Complaint rates by equipment type21 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • UDA IN PRACTICE PREDICTIVEMODELS
  • Input Data create table wrk.cih_dshb_ads as SELECT srv_id, sav_flag, offer, inseecode, code_postal, libelle, nom_dep, nom_region, longitude, latitude, coalesce(topo_nra, Unknown) as topo_nra, topo_dslam, coalesce(iad_hardwareversion, Unknown) as iad_hardwareversion, coalesce(iad_manufacturer, Unknown) as iad_manufacturer, coalesce(iad_modelname , Unknown) as iad_modelname, coalesce(iad_modemfirmwareversion , Unknown) as iad_modemfirmwareversion, coalesce(iad_productclass , Unknown) as iad_productclass, coalesce(iad_provisioningcode , Unknown) as iad_provisioningcode, coalesce(iad_softwareversion , Unknown) as iad_softwareversion, coalesce(iad_vendorconfigfiledescription_1 , Unknown) as iad_vendorconfigfiledescription_1, coalesce(iad_vendorconfigfilename_1 , Unknown) as iad_vendorconfigfilename_1, coalesce(iad_vendorconfigfilenumbofentries , 0) as iad_vendorconfigfilenumbofentries, coalesce(iad_vendorconfigfileversion_1 , Unknown) as iad_vendorconfigfileversion_1, coalesce(iad_x_000e50_boardversion , Unknown) as iad_x_000e50_boardversion, coalesce(stb_description , Unknown) as stb_description, coalesce(stb_devicestatus , Unknown) as stb_devicestatus, coalesce(stb_gwinfoproductclass , Unknown) as stb_gwinfoproductclass, coalesce(stb_hardwareversion , Unknown) as stb_hardwareversion, coalesce(stb_manufacturer , Unknown) as stb_manufacturer, coalesce(stb_productclass , Unknown) as stb_productclass, coalesce( stb_softwareversion, Unknown) as stb_softwareversion, dev_iad_uptime_diff,dsl_showtime_diff,dev_stb_uptime_diff, kpi_iad_uptime,kpi_iad_synctime,kpi_stb_uptime, dev_iad_uptime,dsl_showtime,dev_stb_uptime, dsl_downstr_att,dsl_downstr_cur,dsl_downstr_max, kpi_voip_nb_dropped_calls_diff,kpi_voip_nb_dropped_calls,kpi_dsl_nb_crc,kpi_dsl_dscurrate_ratio_qualite, kpi_voip_tx_appels_coupes,kpi_voip_qualite,kpi_voip_qualite_diff,kpi_iptv_plr_nb_bon,kpi_iptv_plr_nb_moyen, ,kpi_iptv_conso_heures,kpi_iptv_packetslosts,kpi_iptv_packetsreceived, kpi_dsl_dscurrate_before,kpi_dsl_dscurrate_after, FROM wrk.cih_dshb_bis where network = BYT and stb_manufacturer is not null and topo_dslam is not null24 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Decision Trees SELECT * FROM forest_drive (ON (SELECT 1) PARTITION BY 1 DATABASE(beehive) USERID(beehive) PASSWORD(beehive) INPUTTABLE(wrk.cih_dshb_tree_in) OUTPUTTABLE(wrk.cih_dshb_tree_out) RESPONSE(sav_flag) NUMERICINPUTS(‘KPI_SIGNAL) CATEGORICALINPUTS(offer, nom_dep, nom_region, topo_nra,topo_dslam , iad_modemfirmwareversion, iad_vendorconfigfiledescription_1, iad_x_000e50_boardversion, stb_description, stb_productclass, stb_softwareversion, topo_dslam_brand) NUMTREES(4) )25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Naïve Bayes CREATE TABLE wrk.cih_dshb_model (PARTITION KEY(class)) AS SELECT * FROM naiveBayesReduce( ON(SELECT * FROM naiveBayesMap( ON (select * from wrk.cih_dshb_ads_in_11 where kpi_iad_uptime is not null) RESPONSE(sav_flag) NUMERICINPUTS(dev_iad_uptime,dsl_showtime,dev_stb_uptime, dsl_downstr_att,dsl_downstr_cur,dsl_downstr_max, kpi_voip_nb_dropped_calls_diff,kpi_voip_nb_dropped_calls,kpi_dsl_nb_crc,kpi_dsl_d scurrate_ratio_qualite,kpi_voip_tx_appels_coupes,kpi_voip_qualite,kpi_voip_qualite_ diff,kpi_iptv_plr_nb_bon,kpi_iptv_plr_nb_moyen,kpi_iptv_plr_nb_mauvais, kpi_iptv_packetslosts,kpi_iptv_packetsreceived,kpi_stb_uptime,kpi_iad_synctime,kp i_iad_uptime) CATEGORICALINPUTS(offer, nom_dep, nom_region, topo_nra,topo_dslam , iad_modemfirmwareversion,iad_vendorconfigfiledescription_1,iad_x_000e50_boardve rsion, stb_description, stb_productclass, stb_softwareversion, topo_dslam_brand) ) )PARTITION BY class );26 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Support Vector Machine create table wrk.cih_svm_train2 distribute by hash(srv_id) as select srv_id, topo_nra_insee as attr, topo_nra_insee::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, code_postal as attr, code_postal::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, kpi_iad_uptime_avg as attr, kpi_iad_uptime_avg::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, dev_iad_uptime_diff_avg as attr, dev_iad_uptime_diff_avg::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, kpi_voip_nb_dropped_calls_diff_avg as attr, kpi_voip_nb_dropped_calls_diff_avg::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, sav_nb_contacts as attr, sav_nb_contacts::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, nb_tr as attr, nb_tr::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train union all select srv_id, kpi_dsl_nb_crc_avg as attr, kpi_dsl_nb_crc_avg::varchar as attr_value, sav_all_tgt FROM wrk.cih_sav_train; /*Run SVM*/ CREATE TABLE wrk.cih_svm_model3 (PARTITION KEY(vec_index)) AS SELECT vec_index, avg(vec_value) as vec_value FROM svm( ON wrk.cih_svm_train2 PARTITION BY srv_id OUTCOME( sav_flag ) ATTRIBUTE_NAME( attr ) ATTRIBUTE_VALUE( attr_value ) )GROUP BY vec_index;27 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • Lift Chart to View Predictive Model Performance28 Confidential and proprietary. Copyright © 2012 Teradata Corporation.