SlideShare a Scribd company logo
1 of 33
Download to read offline
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT FOR
HIGH-PERFORMANCE ANALYTICS
DAN SOCEANU
SENIOR SOLUTIONS ARCHITECT
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BEFORE WE BEGIN SAS ACKNOWLEDGEMENTS
Ron Agresta, Product Director, Data Management
Lisa Dodson, Global Technology Practice Manager, Data Management
David Pope, Pre-Sales Manager, Energy & Manufacturing
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT WHY ARE WE HERE?
• Data is rarely fit for analytic
purposes
• End-users are overwhelmed
o What data do I use?
o How do I load data?
o How can I find only the data I
need?
• Real-time needs
• The rise of “self-service
analytics”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
CAN YOU LEVERAGE OPEN SOURCE
ANALYTICS?
CAN YOU
SCALE YOUR
DATA AND YOUR
ANALYTICS?
DO YOU GROW
A CULTURE OF
INNOVATION?
CAN YOU ANALYZE ALL
OF YOUR DATA?
CAN YOU MODERNIZE
YOUR LEGACY BI
STRATEGY?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Data Management for
High Performance
Analytics
0
IoT
Operational
Unstructured
Web
Text
Optimization
Forecasting
Mining
High Performance
Analytics
Data Sources
DATA MANAGEMENT BRIDGING THE GAP
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Data Access Tier
Analytical Tier
Visualization Tier
Data Preparation Tier
Visualization
Analytics
Preparation
Access
DATA MANAGEMENT
CONVERGENCE OF DATA PREP, ANALYTICAL
PROCESSING AND PROVISIONING
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT DATA FLOW FOR HIGH PERFORMANCE ANALYTICS
Data Management
Data
Warehouse
Dynamic
ReportingRead
ETL
Dynamic
Visualization
ACCESS
DataManagement
Analytical
Data
Warehouse
DataMonitoring
ExplorationQualityIntegration
MDM
Data
Marts
Model
Development
Operational
MQ
XML
Cloud
SOURCES
Repository
High
Performance
Analytics
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ANALYTICS HISTORICAL VS. ADVANCED
Descriptive
 What happened?
 When?
 Why?
• Frequency
Distributions
• Correlation Measures
• Event Study
• Association Rules
Predictive
 What will happen?
 When?
 Why?
 How does that effect us?
 What actions should I
take?
• Estimation & Forecasting
• Segmentation
• Optimization
ANALYTICS
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HIGH-PERFORMANCE
ANALYTICS
SAS SOLUTIONS
SAS High-Performance Data Mining
Predictive models using thousands of variables to produce more accurate and timely insights
SAS High-Performance Econometrics
Analytical models using complete data, not just a subset
SAS High-Performance Optimization
Model and solve optimization problems that are very large or cumbersome to solve
SAS High-Performance Statistics
Statistical models using big data to produce more accurate and timely insights
SAS High-Performance Text Mining
Better understand communications and create new value from big text data
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HIGH-PERFORMANCE
ANALYTICS
SAS ANALYTIC PROCESSING APPROACHES
Traditional
Move data from source to the SAS server, process it and write back results (single server or SAS
Grid Manager)
In-Database
Move SAS processing to the data source and allow SAS processing to occur under the control of
the source environment (e.g. relational database or Hadoop). The analytic code executes in the
database process.
In-memory “Alongside” the Database
Move SAS processing to the data source but allow a SAS process to run "along-side”. The analytic
processes and the database processes are co-located and share resources.
In-memory “Next to” the Database
Move data from source to a dedicated SAS environment for processing. Does not require making
a physical copy of the data before processing and, once the processing is complete, the data is
not required to be kept in the dedicated SAS environment. This separates the resources
associated with data storage & processing and the SAS advanced analytical processing.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT VS. DATA PREPARATION?
Business Need
• Support analytical methods for decision
making, use cases and required actions
Data Governance
• Gap assessment; people, process and
technology
• Auditability, traceability, automated rules,
monitoring, collaboration
Productivity
• Data preparation, provisioning, reporting
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT VS. DATA PREPARATION?
Business Need
• Support analytical methods for decision
making, use cases and required actions
Data Governance
• Gap assessment; people, process and
technology
• Auditability, traceability, automated rules,
monitoring, collaboration
Productivity
• Data preparation, provisioning, reporting
DATA MANAGEMENT DATA PREPARATION
Identify
• Profile
• Data types
• Numeric
• Character
• Contextual
• Cardinality
Access
• ETL
• Batch
• Real-time
• Latency
• Data Movement
• Connectivity
• Data Sources
Data Quality
• De-duplicate
• Standardize
• Missing values
• Imputation
• Enrich
• Binning
• Matching
• Identify
anomalies
Reshape
• Wide & flat
• Long & lean
• Transformation
logic
• Transpositions
• Frequency
analysis
• Appending data
• Partitioning
data
• Summarization
Metadata
• Lineage
• Semantic
glossary
• Data
relationships
• Impact analysis
• Hierarchy
management
• Collaboration
• Repeatability
• Entity
management
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT THE ROLE OF DATA GOVERNANCE
Data Lifecycle
Reference and
Master Data
Data Security
Data
Architecture
Metadata Data Quality
Data
Administration
Data Warehousing
& BI/Analytics
DATA MANAGEMENT
DataStewardship
Roles&Tasks
Decision-making Bodies
Guiding Principles
Program Objectives
Decision Rights
DATA GOVERNANCE
DG without DM = only an academic exercise
DM without DG = the continued culture of “I know a guy”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT THE IMPORTANCE OF DATA GOVERNANCE
POSITIONS ENTERPRISE DATA ISSUES AS CROSS-FUNCTIONAL
• Establishes guiding principles for data sharing
• Eliminates data ownership issues and “turf wars”
• Ensures appropriate stakeholders have a say in decision making
ESTABLISHES BUSINESS STAKEHOLDERS AS INFORMATION OWNERS
• Aligns data policy with business strategies and priorities
• Aligns data quality with business measures and acceptance
• Helps to Identify ROI for data related activity
FORMALIZES DATA STEWARDSHIP
• Clarifies accountability for data definitions, rules, and quality
• Ensures data is managed separately from applications
• Formalizes monitoring and measurement of critical data
FOSTERS IMPROVED ALIGNMENT BETWEEN BUSINESS AND IT
• Links IT-driven data management activities with business unit activity
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
PARADIGM SHIFT
DATA PREPARATION IS ABOUT THE
BUSINESS NEED & USE CASE
80% 20%
Identify Access Data Quality Reshape Metadata Business Use
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA PREPARATION FIVE KEY FOCUS AREAS
DATA PREPARATION
Identify
•Profile
•Data types
•Numeric
•Character
•Contextual
•Cardinality
Access
•ETL
•Batch
•Real-time
•Latency
•Data Movement
•Connectivity
•Data Sources
Data Quality
•De-duplicate
•Standardize
•Missing values
•Imputation
•Enrich
•Binning
•Matching
•Identify
anomalies
Reshape
•Wide & flat
•Long & lean
•Transformation
logic
•Transpositions
•Frequency
analysis
•Appending data
•Partitioning data
•Summarization
Metadata
•Lineage
•Semantic
glossary
•Data
relationships
•Impact analysis
•Hierarchy
management
•Collaboration
•Repeatability
•Entity
management
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT?
Is my data
consistent?
Is my data
complete?
Is my data
highly
unique?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT?
Is my data
normal?
Is my data
linear?
What are the
associations in
the data?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ACCESS SO MANY DATA TYPES AND SOURCES
Access Excel SQLServer Oracle MySQL
Boolean Yes/No Bit Byte N/A Boolean
integer Number Int Number Int Int
float Number
(single)
Float Number Float Numeric
currency Currency Money NA NA Money
string NA Char Char Char Char
string Text VarChar VarChar VarChar VarChar
binary OLE Obj
Memo
Binary
Varbinary
Image
Long
Raw
Blob
Text
Binary
Varbinary
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
DATA QUALITY THE FOUNDATION
• Standardization
• Parsing
• Casing
• Identification
• De-duplication
• “Fuzzy” matching
• Clustering
• Entity resolution
• Survivorship
• Gender Analysis
• Locale Guessing
• Address Verification
• Address Enrichment (geocoding)
Business
Logic &
Rules
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA QUALITY FILLING IN THE GAPS AND STANDARDIZING
Standardizing
Text
De-duplication
Standardizing Numeric
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
FILLING IN THE GAPS AND STANDARDIZING
Dropping outliers
Grouping or binning data
DATA QUALITY
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE FIT FOR PURPOSE?
Schema/view
Or
Flat Table?
Format of data
Data quality
dimensions?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE FLATTENING THE DATA
• Efficient storage
• Fast retrieval
• Defined
schema
• WIDE tables /Time series data
• Iteration (build, test, repeat)
• Schema-less
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE SUMMARIZATION
Each product category will become its own row, with each
product purchased its own distinct category column.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE TRANSPOSITION FOR DATA MINING
Add up the quantities for
each product purchased,
in each product category.
Copyright © 2013, SAS Institute Inc. All rights reserved.
METADATA MANAGE DATA HIERARCHIES AND RELATIONSHIPS
Customer
Types
Hierarchy
Coverage
Products
Financial
Accounts
Address
Inquiries
Product Party
Accounts
Transactions
Authorizations
Individual Organization
Inquiries
Loans
Terms
Collaterals
Ratings
External
Assets
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
METADATA ENTITY RESOLUTION
EMPLOYER_NA
ME_GRPID
EMPLOYER_NAME = Name of the client employer
(SOL0003n_Employer_Name)
cnt
28296ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S 6
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 182
ČSKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA. A.S. 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S. 78
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A. S. 9
ČESKOSLOVENSKÁ OBCHODNÍ BANKA ,A.S. 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S 6
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S . 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S. 717
ČESKOSLOVENSKUÁ OBCHODNÍ BANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, S.R.O. 3
ČESKOSLOVENSKÁ OBCHODNÍ BAŃKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S. 587
ČESKOSLOVENSKÁOBCHODNÍBANKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍBANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANLA 1
ČESKOSLOVENSKÁ OBCHODNÍ BÁNKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA 27
ČESKOSLOVENSKÁOBCHODNÍBANKA,A.S. 1
Example:
Entity Resolution
Employer Name
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA SEMANTIC RECONCILIATION AND BUSINESS GLOSSARY
Business Glossary and
Terms
Technical Architecture Diagram
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA LINEAGE & TRACEABILITY
A view into existing
data sources/targets,
jobs and the
associated ‘owners’
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA COLLABORATION AND REPEATABILITY
Collaboration
& Role-based
Dashboarding
Workflow & Data
Remediation
Process Orchestration
Unified Lineage
Job Monitoring
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
Decision MakingCustomer Focus
Compliance
Mandates
Mergers &
Acquisitions
At-Risk Projects
Operational
Efficiencies
CORPORATE DRIVERS
Data Quality
Data
Integration
Reference Data
Management
Master Data
Management
Data
Visualization
Data
Monitoring
Metadata
Management
Business
Glossary
SOLUTIONS
Data Lifecycle
Reference and
Master Data
Data Security
Data
Architecture
Metadata Data Quality
Data
Administration
Data Warehousing
& BI/Analytics
DATA MANAGEMENT
DataStewardship
Roles&Tasks
Decision-making Bodies
Guiding Principles
Program Objectives
Decision Rights
DATA GOVERNANCE
People
Process
Technology
METHODS
SAS DATA
MANAGEMENT
FRAMEWORK FOR SUCCESS
Data
Virtualization
Data Profiling
& Exploration
Copyright © 2013, SAS Institute Inc. All rights reserved.
QUESTIONS & ANSWERS THANK YOU!
DAN.SOCEANU@SAS.COM

More Related Content

What's hot

SAS Analytics In Action - The New BI
SAS Analytics In Action - The New BISAS Analytics In Action - The New BI
SAS Analytics In Action - The New BISAS Canada
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceEric Kavanagh
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
SAS Presentation
SAS PresentationSAS Presentation
SAS PresentationKali Howard
 
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.SAS Canada
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit
 
SAS - Visual Analytics a živá ukázka
SAS - Visual Analytics a živá ukázkaSAS - Visual Analytics a živá ukázka
SAS - Visual Analytics a živá ukázkaMarketingArrowECS_CZ
 
Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases AWS Germany
 
451 Research Report on Avalon Big Data Capabilities - 2017
451 Research Report on Avalon Big Data Capabilities - 2017451 Research Report on Avalon Big Data Capabilities - 2017
451 Research Report on Avalon Big Data Capabilities - 2017Tom Reidy
 
Paraccel/Database Architechs Press Release
Paraccel/Database Architechs Press ReleaseParaccel/Database Architechs Press Release
Paraccel/Database Architechs Press ReleaseDatabase Architechs
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationEric Kavanagh
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho
 
JSBI Presentation Big Data Hyperion OBIEE Integration16 2
JSBI Presentation Big Data Hyperion OBIEE Integration16 2JSBI Presentation Big Data Hyperion OBIEE Integration16 2
JSBI Presentation Big Data Hyperion OBIEE Integration16 2Jeff Shauer
 
Data donderdag data quality sas
Data donderdag data quality sasData donderdag data quality sas
Data donderdag data quality sasCre-Aid
 

What's hot (19)

SAS Visual Analytics Overview
SAS Visual Analytics OverviewSAS Visual Analytics Overview
SAS Visual Analytics Overview
 
SAS Analytics In Action - The New BI
SAS Analytics In Action - The New BISAS Analytics In Action - The New BI
SAS Analytics In Action - The New BI
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
SAS Presentation
SAS PresentationSAS Presentation
SAS Presentation
 
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Big Data Services at YASH
Big Data Services at YASHBig Data Services at YASH
Big Data Services at YASH
 
SAS Visual Analytics
SAS Visual AnalyticsSAS Visual Analytics
SAS Visual Analytics
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
SAS - Visual Analytics a živá ukázka
SAS - Visual Analytics a živá ukázkaSAS - Visual Analytics a živá ukázka
SAS - Visual Analytics a živá ukázka
 
Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases
 
451 Research Report on Avalon Big Data Capabilities - 2017
451 Research Report on Avalon Big Data Capabilities - 2017451 Research Report on Avalon Big Data Capabilities - 2017
451 Research Report on Avalon Big Data Capabilities - 2017
 
Paraccel/Database Architechs Press Release
Paraccel/Database Architechs Press ReleaseParaccel/Database Architechs Press Release
Paraccel/Database Architechs Press Release
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
JSBI Presentation Big Data Hyperion OBIEE Integration16 2
JSBI Presentation Big Data Hyperion OBIEE Integration16 2JSBI Presentation Big Data Hyperion OBIEE Integration16 2
JSBI Presentation Big Data Hyperion OBIEE Integration16 2
 
Oracle big data discovery 994294
Oracle big data discovery   994294Oracle big data discovery   994294
Oracle big data discovery 994294
 
Data donderdag data quality sas
Data donderdag data quality sasData donderdag data quality sas
Data donderdag data quality sas
 

Viewers also liked

Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...Principled Technologies
 
Install SAS 9.2 presentation
Install SAS 9.2 presentationInstall SAS 9.2 presentation
Install SAS 9.2 presentationShane Gibson
 
Introduction To Sas
Introduction To SasIntroduction To Sas
Introduction To Sashalasti
 
SAS and Netezza Enzee universe presentation_20_june2011
SAS and Netezza Enzee universe presentation_20_june2011SAS and Netezza Enzee universe presentation_20_june2011
SAS and Netezza Enzee universe presentation_20_june2011Pavel Zhivulin
 
Migrating To SAS 9.2 by Bill Gibson
Migrating To SAS 9.2 by Bill GibsonMigrating To SAS 9.2 by Bill Gibson
Migrating To SAS 9.2 by Bill Gibsonsimienc
 
Netezza integration with SAS software
Netezza integration with SAS softwareNetezza integration with SAS software
Netezza integration with SAS softwarePavel Zhivulin
 
Administrative Reporting of SAS Visual Analytics 7.1 and Integration with E...
Administrative Reporting of SAS Visual Analytics 7.1  and Integration with  E...Administrative Reporting of SAS Visual Analytics 7.1  and Integration with  E...
Administrative Reporting of SAS Visual Analytics 7.1 and Integration with E...Francesco Marelli
 
Sas visual-analytics-startup-guide
Sas visual-analytics-startup-guideSas visual-analytics-startup-guide
Sas visual-analytics-startup-guideCMR WORLD TECH
 
Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processingguest2160992
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Longhow Lam
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Languageguest2160992
 
SAS MDM TRAINING ,SAS MDM SYLLABUS
SAS MDM TRAINING ,SAS MDM SYLLABUSSAS MDM TRAINING ,SAS MDM SYLLABUS
SAS MDM TRAINING ,SAS MDM SYLLABUSbidwhm
 

Viewers also liked (20)

Partnership checklist
Partnership checklistPartnership checklist
Partnership checklist
 
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
 
Install SAS 9.2 presentation
Install SAS 9.2 presentationInstall SAS 9.2 presentation
Install SAS 9.2 presentation
 
SAS Modernization Webinar
SAS Modernization WebinarSAS Modernization Webinar
SAS Modernization Webinar
 
Introduction To Sas
Introduction To SasIntroduction To Sas
Introduction To Sas
 
SAS and Netezza Enzee universe presentation_20_june2011
SAS and Netezza Enzee universe presentation_20_june2011SAS and Netezza Enzee universe presentation_20_june2011
SAS and Netezza Enzee universe presentation_20_june2011
 
Migrating To SAS 9.2 by Bill Gibson
Migrating To SAS 9.2 by Bill GibsonMigrating To SAS 9.2 by Bill Gibson
Migrating To SAS 9.2 by Bill Gibson
 
Netezza integration with SAS software
Netezza integration with SAS softwareNetezza integration with SAS software
Netezza integration with SAS software
 
Administrative Reporting of SAS Visual Analytics 7.1 and Integration with E...
Administrative Reporting of SAS Visual Analytics 7.1  and Integration with  E...Administrative Reporting of SAS Visual Analytics 7.1  and Integration with  E...
Administrative Reporting of SAS Visual Analytics 7.1 and Integration with E...
 
Sas Grid Migration and Roadmap
Sas Grid Migration and RoadmapSas Grid Migration and Roadmap
Sas Grid Migration and Roadmap
 
Proc sql tips
Proc sql tipsProc sql tips
Proc sql tips
 
Sas Presentation
Sas PresentationSas Presentation
Sas Presentation
 
SAS/Tableau integration
SAS/Tableau integrationSAS/Tableau integration
SAS/Tableau integration
 
Sas visual-analytics-startup-guide
Sas visual-analytics-startup-guideSas visual-analytics-startup-guide
Sas visual-analytics-startup-guide
 
SAS Proc SQL
SAS Proc SQLSAS Proc SQL
SAS Proc SQL
 
Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processing
 
Sas demo
Sas demoSas demo
Sas demo
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
SAS MDM TRAINING ,SAS MDM SYLLABUS
SAS MDM TRAINING ,SAS MDM SYLLABUSSAS MDM TRAINING ,SAS MDM SYLLABUS
SAS MDM TRAINING ,SAS MDM SYLLABUS
 

Similar to Data Management for High Performance Analytics

What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DATAVERSITY
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemDatabricks
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitMing Yuan
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?Albert Hoitingh
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your DataItai Yaffe
 
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...SAS Italy
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Bigdata and Analytics Services - Clover Infotech
Bigdata and Analytics Services - Clover InfotechBigdata and Analytics Services - Clover Infotech
Bigdata and Analytics Services - Clover InfotechSwetha Elias
 

Similar to Data Management for High Performance Analytics (20)

What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your Data
 
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Business Visualization: Dashboard & Storyboarding
Business Visualization: Dashboard & StoryboardingBusiness Visualization: Dashboard & Storyboarding
Business Visualization: Dashboard & Storyboarding
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Bigdata and Analytics Services - Clover Infotech
Bigdata and Analytics Services - Clover InfotechBigdata and Analytics Services - Clover Infotech
Bigdata and Analytics Services - Clover Infotech
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Data Management for High Performance Analytics

  • 1. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT FOR HIGH-PERFORMANCE ANALYTICS DAN SOCEANU SENIOR SOLUTIONS ARCHITECT DATA MANAGEMENT
  • 2. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BEFORE WE BEGIN SAS ACKNOWLEDGEMENTS Ron Agresta, Product Director, Data Management Lisa Dodson, Global Technology Practice Manager, Data Management David Pope, Pre-Sales Manager, Energy & Manufacturing
  • 3. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT WHY ARE WE HERE? • Data is rarely fit for analytic purposes • End-users are overwhelmed o What data do I use? o How do I load data? o How can I find only the data I need? • Real-time needs • The rise of “self-service analytics”
  • 4. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. CAN YOU LEVERAGE OPEN SOURCE ANALYTICS? CAN YOU SCALE YOUR DATA AND YOUR ANALYTICS? DO YOU GROW A CULTURE OF INNOVATION? CAN YOU ANALYZE ALL OF YOUR DATA? CAN YOU MODERNIZE YOUR LEGACY BI STRATEGY?
  • 5. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Data Management for High Performance Analytics 0 IoT Operational Unstructured Web Text Optimization Forecasting Mining High Performance Analytics Data Sources DATA MANAGEMENT BRIDGING THE GAP
  • 6. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Data Access Tier Analytical Tier Visualization Tier Data Preparation Tier Visualization Analytics Preparation Access DATA MANAGEMENT CONVERGENCE OF DATA PREP, ANALYTICAL PROCESSING AND PROVISIONING
  • 7. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT DATA FLOW FOR HIGH PERFORMANCE ANALYTICS Data Management Data Warehouse Dynamic ReportingRead ETL Dynamic Visualization ACCESS DataManagement Analytical Data Warehouse DataMonitoring ExplorationQualityIntegration MDM Data Marts Model Development Operational MQ XML Cloud SOURCES Repository High Performance Analytics
  • 8. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ANALYTICS HISTORICAL VS. ADVANCED Descriptive  What happened?  When?  Why? • Frequency Distributions • Correlation Measures • Event Study • Association Rules Predictive  What will happen?  When?  Why?  How does that effect us?  What actions should I take? • Estimation & Forecasting • Segmentation • Optimization ANALYTICS
  • 9. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HIGH-PERFORMANCE ANALYTICS SAS SOLUTIONS SAS High-Performance Data Mining Predictive models using thousands of variables to produce more accurate and timely insights SAS High-Performance Econometrics Analytical models using complete data, not just a subset SAS High-Performance Optimization Model and solve optimization problems that are very large or cumbersome to solve SAS High-Performance Statistics Statistical models using big data to produce more accurate and timely insights SAS High-Performance Text Mining Better understand communications and create new value from big text data
  • 10. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HIGH-PERFORMANCE ANALYTICS SAS ANALYTIC PROCESSING APPROACHES Traditional Move data from source to the SAS server, process it and write back results (single server or SAS Grid Manager) In-Database Move SAS processing to the data source and allow SAS processing to occur under the control of the source environment (e.g. relational database or Hadoop). The analytic code executes in the database process. In-memory “Alongside” the Database Move SAS processing to the data source but allow a SAS process to run "along-side”. The analytic processes and the database processes are co-located and share resources. In-memory “Next to” the Database Move data from source to a dedicated SAS environment for processing. Does not require making a physical copy of the data before processing and, once the processing is complete, the data is not required to be kept in the dedicated SAS environment. This separates the resources associated with data storage & processing and the SAS advanced analytical processing.
  • 11. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT VS. DATA PREPARATION? Business Need • Support analytical methods for decision making, use cases and required actions Data Governance • Gap assessment; people, process and technology • Auditability, traceability, automated rules, monitoring, collaboration Productivity • Data preparation, provisioning, reporting DATA MANAGEMENT
  • 12. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT VS. DATA PREPARATION? Business Need • Support analytical methods for decision making, use cases and required actions Data Governance • Gap assessment; people, process and technology • Auditability, traceability, automated rules, monitoring, collaboration Productivity • Data preparation, provisioning, reporting DATA MANAGEMENT DATA PREPARATION Identify • Profile • Data types • Numeric • Character • Contextual • Cardinality Access • ETL • Batch • Real-time • Latency • Data Movement • Connectivity • Data Sources Data Quality • De-duplicate • Standardize • Missing values • Imputation • Enrich • Binning • Matching • Identify anomalies Reshape • Wide & flat • Long & lean • Transformation logic • Transpositions • Frequency analysis • Appending data • Partitioning data • Summarization Metadata • Lineage • Semantic glossary • Data relationships • Impact analysis • Hierarchy management • Collaboration • Repeatability • Entity management
  • 13. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT THE ROLE OF DATA GOVERNANCE Data Lifecycle Reference and Master Data Data Security Data Architecture Metadata Data Quality Data Administration Data Warehousing & BI/Analytics DATA MANAGEMENT DataStewardship Roles&Tasks Decision-making Bodies Guiding Principles Program Objectives Decision Rights DATA GOVERNANCE DG without DM = only an academic exercise DM without DG = the continued culture of “I know a guy”
  • 14. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT THE IMPORTANCE OF DATA GOVERNANCE POSITIONS ENTERPRISE DATA ISSUES AS CROSS-FUNCTIONAL • Establishes guiding principles for data sharing • Eliminates data ownership issues and “turf wars” • Ensures appropriate stakeholders have a say in decision making ESTABLISHES BUSINESS STAKEHOLDERS AS INFORMATION OWNERS • Aligns data policy with business strategies and priorities • Aligns data quality with business measures and acceptance • Helps to Identify ROI for data related activity FORMALIZES DATA STEWARDSHIP • Clarifies accountability for data definitions, rules, and quality • Ensures data is managed separately from applications • Formalizes monitoring and measurement of critical data FOSTERS IMPROVED ALIGNMENT BETWEEN BUSINESS AND IT • Links IT-driven data management activities with business unit activity
  • 15. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. PARADIGM SHIFT DATA PREPARATION IS ABOUT THE BUSINESS NEED & USE CASE 80% 20% Identify Access Data Quality Reshape Metadata Business Use
  • 16. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA PREPARATION FIVE KEY FOCUS AREAS DATA PREPARATION Identify •Profile •Data types •Numeric •Character •Contextual •Cardinality Access •ETL •Batch •Real-time •Latency •Data Movement •Connectivity •Data Sources Data Quality •De-duplicate •Standardize •Missing values •Imputation •Enrich •Binning •Matching •Identify anomalies Reshape •Wide & flat •Long & lean •Transformation logic •Transpositions •Frequency analysis •Appending data •Partitioning data •Summarization Metadata •Lineage •Semantic glossary •Data relationships •Impact analysis •Hierarchy management •Collaboration •Repeatability •Entity management
  • 17. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT? Is my data consistent? Is my data complete? Is my data highly unique?
  • 18. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT? Is my data normal? Is my data linear? What are the associations in the data?
  • 19. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ACCESS SO MANY DATA TYPES AND SOURCES Access Excel SQLServer Oracle MySQL Boolean Yes/No Bit Byte N/A Boolean integer Number Int Number Int Int float Number (single) Float Number Float Numeric currency Currency Money NA NA Money string NA Char Char Char Char string Text VarChar VarChar VarChar VarChar binary OLE Obj Memo Binary Varbinary Image Long Raw Blob Text Binary Varbinary
  • 20. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. DATA QUALITY THE FOUNDATION • Standardization • Parsing • Casing • Identification • De-duplication • “Fuzzy” matching • Clustering • Entity resolution • Survivorship • Gender Analysis • Locale Guessing • Address Verification • Address Enrichment (geocoding) Business Logic & Rules
  • 21. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA QUALITY FILLING IN THE GAPS AND STANDARDIZING Standardizing Text De-duplication Standardizing Numeric
  • 22. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. FILLING IN THE GAPS AND STANDARDIZING Dropping outliers Grouping or binning data DATA QUALITY
  • 23. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE FIT FOR PURPOSE? Schema/view Or Flat Table? Format of data Data quality dimensions?
  • 24. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE FLATTENING THE DATA • Efficient storage • Fast retrieval • Defined schema • WIDE tables /Time series data • Iteration (build, test, repeat) • Schema-less
  • 25. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE SUMMARIZATION Each product category will become its own row, with each product purchased its own distinct category column.
  • 26. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE TRANSPOSITION FOR DATA MINING Add up the quantities for each product purchased, in each product category.
  • 27. Copyright © 2013, SAS Institute Inc. All rights reserved. METADATA MANAGE DATA HIERARCHIES AND RELATIONSHIPS Customer Types Hierarchy Coverage Products Financial Accounts Address Inquiries Product Party Accounts Transactions Authorizations Individual Organization Inquiries Loans Terms Collaterals Ratings External Assets
  • 28. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. METADATA ENTITY RESOLUTION EMPLOYER_NA ME_GRPID EMPLOYER_NAME = Name of the client employer (SOL0003n_Employer_Name) cnt 28296ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S 6 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 182 ČSKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA. A.S. 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S. 78 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A. S. 9 ČESKOSLOVENSKÁ OBCHODNÍ BANKA ,A.S. 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S 6 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S . 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S. 717 ČESKOSLOVENSKUÁ OBCHODNÍ BANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, S.R.O. 3 ČESKOSLOVENSKÁ OBCHODNÍ BAŃKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S. 587 ČESKOSLOVENSKÁOBCHODNÍBANKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍBANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANLA 1 ČESKOSLOVENSKÁ OBCHODNÍ BÁNKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA 27 ČESKOSLOVENSKÁOBCHODNÍBANKA,A.S. 1 Example: Entity Resolution Employer Name
  • 29. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. METADATA SEMANTIC RECONCILIATION AND BUSINESS GLOSSARY Business Glossary and Terms Technical Architecture Diagram
  • 30. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. METADATA LINEAGE & TRACEABILITY A view into existing data sources/targets, jobs and the associated ‘owners’
  • 31. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. METADATA COLLABORATION AND REPEATABILITY Collaboration & Role-based Dashboarding Workflow & Data Remediation Process Orchestration Unified Lineage Job Monitoring
  • 32. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. Decision MakingCustomer Focus Compliance Mandates Mergers & Acquisitions At-Risk Projects Operational Efficiencies CORPORATE DRIVERS Data Quality Data Integration Reference Data Management Master Data Management Data Visualization Data Monitoring Metadata Management Business Glossary SOLUTIONS Data Lifecycle Reference and Master Data Data Security Data Architecture Metadata Data Quality Data Administration Data Warehousing & BI/Analytics DATA MANAGEMENT DataStewardship Roles&Tasks Decision-making Bodies Guiding Principles Program Objectives Decision Rights DATA GOVERNANCE People Process Technology METHODS SAS DATA MANAGEMENT FRAMEWORK FOR SUCCESS Data Virtualization Data Profiling & Exploration
  • 33. Copyright © 2013, SAS Institute Inc. All rights reserved. QUESTIONS & ANSWERS THANK YOU! DAN.SOCEANU@SAS.COM