SlideShare a Scribd company logo
5/26/2022
1
Enterprise-levelTransition from SAS
to Open-Source Programing for the
whole department
Kevin Lee
Disclaimer
The views and opinions presented here represent those of the
speaker and should not be considered to represent any
companies or organizations.
1
2
5/26/2022
2
• Scope of Open-Source Transition
Project
• Challenges of Open-Source
Transition Project
• How to overcome Challenges
• Difference between SAS and
Open-source Programming
• Change Management by
Leadership
• Benefit of Open-Source
Transition Project
• Lessons Learned
• Conclusion
Open-Source Programming ( R & Python) Adoption by Data
Scientist has increased from 35% in 2014 to 77% in 2020 based on
Burtch Works Survey.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
2013 2014 2015 2016 2017 2018 2019 2020 2021
SAS
Open Source
3
4
5/26/2022
3
How has we, as individuals, transitioned/expanded
from SAS to Open-Source Programmer?
• Learned New Programming Language – R or/and Python
• Learned New Analytical System – R Studio / Jupyter
• Converted previous SAS codes to new Open-source codes
• Adjusted to new culture in Open-source (no dedicated customer
service from SAS)
What about the entire department
(150 SAS programmers)?
5
6
5/26/2022
4
High Level Infrastructure Transition
from SAS Window to AWS Open-SourceAnalytical System
Window Server
SAS Studio
Oracle Database
AWS Cloud Computing
R Studio Pro Server
Redshift Data Warehouse
8
Scope of Open-Source Transition Project
SAS Programming
SAS Studio
Window Server
SAS programming
SAS Programmers(150)
SAS codes(230)
Oracle Relational Database
Open-Source Programming
R Pro Server ( R Studio, Jupyter)
AWS Cloud Computing(Linux)
Open-Source Programming ( R,
Python, SQL)
Data Scientists
R, Python & SQL codes
AWS Redshift Data Warehouse
7
8
5/26/2022
5
9
New AWS
Open-Source
Analytical
System
AWS Cloud
Redshift
Data
Wareho
use
S3
EC2/
Linux
R
Studio
Jupyter
SQL
Workbe
nch
Git
Tableau
Difference between SAS and Open-Source Programming
SAS Programming Open-Source Programming
Analytic Interface SAS Studio R Studio, Jupyter, SQL Workbench
Data Processing Row by Row Column by Column
comments * #
Processing Structure DATA / PROC steps Single statement
Function Macros function
Macro variables Yes No (at least, I have not found one yet)
Data Structure SAS datasets Data Frame
Object-oriented No Yes
Customer Support Dedicated Support Community Support
Feature Update Less Frequent Very frequent since new features could be
developed by anyone.
Speed High Level Language, so fast SQL & Python are very fast – more suitable
for Big Data
System Closed System, thoroughly validated Open System, transparent but validation
could be questioned
Machine Learning At early stage The most popular language
9
10
5/26/2022
6
11
Difference between SAS and Open Source
studyid subjid siteid age sexc
• SAS process the data a row by a row.
• R/Python/SQL process the data a column by a column.
studyid subjid siteid age sexc sex
studyid subjid siteid age sexc sexc sex • Less memory to use
• Faster to process
• Difficult to process/create
multiple variables
data DM;
set DEMO;
if SEXC = ‘Male’ then SEX = ‘M’
else if SEXC = ‘Female’ then SEX = ‘F’
else if SEXC = ‘Unknown’ then SEX = ‘U’
run;
DM[‘SEX’] = DM.SEXC.replace([‘Male’,’Female’,’Unkonwn’ ], [‘M’, ‘F’, ‘U’])
12
Challenges for Open-Source Transition project
• Inexperience in Open-Source Programming
(e.g., R, Python)
• Inexperience in new system / environment (
AWS Cloud Computing, R Studio, Jupyter,
Linux )
• Learning curve
• Daily works as well as existing SAS codes
conversion
• Lack of enterprise customer support on
Open-Source Programming
• Version control
• Uncertainty toward new culture created by
Open-Source programming
11
12
5/26/2022
7
13
How to Overcome the Challenges
• Training on Open-Source programming
(R & Python)
• Continuous Workshop on the new
analytical system
• R Pro Server ( R Studio, Jupyter )
• Redshift Data Warehouse
• Linux
• S3
• SAS codes conversion (+230) to
R/Python/SQL by Transition Support
Team
• Change Management by Leadership
14
Sprint Cycle of SAS Codes Conversion
• Conversion for
Team 1 (40 codes)
• UAT 1
• SOP 1
Sprint 1
• Conversion for
Team 2 (50 codes)
• UAT 2
• SOP 2
Sprint 2 • Conversion for
Team 3 (50 codes)
• UAT 3
• SOP 3
Sprint 3
• Conversion for
Team 4 ( 50
codes)
• UAT 4
• SOP 4
Sprint 4
13
14
5/26/2022
8
15
Change Management by Leadership
• Clear goals, plan and timeline of the
change (Open-source programming)
• Executive Support
• Resources and Budgets for a change
• Continuous Support (e.g., new
programming, system, infrastructure)
during the transition from SAS to Open-
source.
• Dedicated Supporting Team /SME
• Frequent, transparent communication
on the project progress/update
• Full Participation from the whole team
16
Benefits for Open-Source Programming Transition
• Cultural Changes
• Less dependent on tools (Enterprise
System)
• Proactive
• Open to new technologies
• Faster to adopt new changes
• Collaborative
• More Opportunities
• To learn new languages (R, Python)
and system (Cloud Computing)
• To keep top talents
• Cost effective
• Innovative, Advanced Analytics (Big
Data, Data Visualization, Machin Learning)
15
16
5/26/2022
9
17
Lessons Learned from Open-Source Transition Project
• Support for programmers during the
Transition is critical.
• Training
• Workshop
• Dedicated Supporting Team during the
Transition is critical.
• Know-how / Experience on Existing
System and Open-Source System is critical
• Change Management by Leadership is the
key to success.
18
Conclusion
• Successful Transition from SAS Programming to Open-Source Programming in
enterprise level (e.g., Biometric Department) is more than learning new
languages.
• Successful Transition from SAS Programming to Open-Source Programming in
enterprise level requires for the whole team, especially from the
top(leadership).
• Cultural Change toward more open, proactive, advanced analytic environment
(Data Visualization, Machine Learning, Big Data).
• Open-source programming Transition Project has provided More
opportunities for SAS programmer to learn the new technologies.
17
18
5/26/2022
10
Any Questions?
19

More Related Content

Similar to Enterprise-level Transition from SAS to Open-source Programming for the whole department

What is DevOps?
What is DevOps?What is DevOps?
What is DevOps?
Mesut Güneş
 
Subhoshree_ETLDeveloper
Subhoshree_ETLDeveloperSubhoshree_ETLDeveloper
Subhoshree_ETLDeveloperSubhoshree Deo
 
Sangram Nayak_22Jan15
Sangram Nayak_22Jan15Sangram Nayak_22Jan15
Sangram Nayak_22Jan15Sangram Nayak
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps
Delphix
 
Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...
Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...
Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...
VMware Tanzu
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Debraj GuhaThakurta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
Webinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterpriseWebinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterprise
DBmaestro - Database DevOps
 
sonam_new _resume - Copy
sonam_new _resume - Copysonam_new _resume - Copy
sonam_new _resume - CopySonam Dubey
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BI
Kent Brooks
 
Resume_SAP PI_Debasish Choudhury
Resume_SAP PI_Debasish ChoudhuryResume_SAP PI_Debasish Choudhury
Resume_SAP PI_Debasish Choudhurydebasish choudhury
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Andy Talbot
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018
David P. Moore
 
Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...
Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...
Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...
Sam Garforth
 

Similar to Enterprise-level Transition from SAS to Open-source Programming for the whole department (20)

What is DevOps?
What is DevOps?What is DevOps?
What is DevOps?
 
Subhoshree_ETLDeveloper
Subhoshree_ETLDeveloperSubhoshree_ETLDeveloper
Subhoshree_ETLDeveloper
 
Sangram Nayak_22Jan15
Sangram Nayak_22Jan15Sangram Nayak_22Jan15
Sangram Nayak_22Jan15
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Rahul_Vijay_Gosewade
Rahul_Vijay_GosewadeRahul_Vijay_Gosewade
Rahul_Vijay_Gosewade
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps
 
Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...
Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...
Large-Scale Enterprise Platform Transformation with Microservices, DevOps, an...
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
sonal
sonalsonal
sonal
 
Webinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterpriseWebinar - Devops platform for the evolving enterprise
Webinar - Devops platform for the evolving enterprise
 
sonam_new _resume - Copy
sonam_new _resume - Copysonam_new _resume - Copy
sonam_new _resume - Copy
 
Laxmikant_Resume
Laxmikant_ResumeLaxmikant_Resume
Laxmikant_Resume
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BI
 
Resume_SAP PI_Debasish Choudhury
Resume_SAP PI_Debasish ChoudhuryResume_SAP PI_Debasish Choudhury
Resume_SAP PI_Debasish Choudhury
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018
 
Easter150127
Easter150127Easter150127
Easter150127
 
Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...
Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...
Salesforce Application Lifecycle Management presented to EA Forum by Sam Garf...
 

More from Kevin Lee

Patient’s Journey using Real World Data and its Advanced Analytics
Patient’s Journey using Real World Data and its Advanced AnalyticsPatient’s Journey using Real World Data and its Advanced Analytics
Patient’s Journey using Real World Data and its Advanced Analytics
Kevin Lee
 
Introduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric DepartmentIntroduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric Department
Kevin Lee
 
A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...
A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...
A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...
Kevin Lee
 
Prompt it, not Google it - Prompt Engineering for Data Scientists
Prompt it, not Google it - Prompt Engineering for Data ScientistsPrompt it, not Google it - Prompt Engineering for Data Scientists
Prompt it, not Google it - Prompt Engineering for Data Scientists
Kevin Lee
 
Leading into the Unknown? Yes, we need Change Management Leadership
Leading into the Unknown? Yes, we need Change Management LeadershipLeading into the Unknown? Yes, we need Change Management Leadership
Leading into the Unknown? Yes, we need Change Management Leadership
Kevin Lee
 
How to create SDTM DM.xpt using Python v1.1
How to create SDTM DM.xpt using Python v1.1How to create SDTM DM.xpt using Python v1.1
How to create SDTM DM.xpt using Python v1.1
Kevin Lee
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
Kevin Lee
 
Artificial Intelligence in Pharmaceutical Industry
Artificial Intelligence in Pharmaceutical IndustryArtificial Intelligence in Pharmaceutical Industry
Artificial Intelligence in Pharmaceutical Industry
Kevin Lee
 
Tell stories with jupyter notebook
Tell stories with jupyter notebookTell stories with jupyter notebook
Tell stories with jupyter notebook
Kevin Lee
 
Perfect partnership - machine learning and CDISC standard data
Perfect partnership - machine learning and CDISC standard dataPerfect partnership - machine learning and CDISC standard data
Perfect partnership - machine learning and CDISC standard data
Kevin Lee
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmers
Kevin Lee
 
Big data in pharmaceutical industry
Big data in pharmaceutical industryBig data in pharmaceutical industry
Big data in pharmaceutical industry
Kevin Lee
 
How FDA will reject non compliant electronic submission
How FDA will reject non compliant electronic submissionHow FDA will reject non compliant electronic submission
How FDA will reject non compliant electronic submission
Kevin Lee
 
End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...
End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...
End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...
Kevin Lee
 
Are you ready for Dec 17, 2016 - CDISC compliant data?
Are you ready for Dec 17, 2016 - CDISC compliant data?Are you ready for Dec 17, 2016 - CDISC compliant data?
Are you ready for Dec 17, 2016 - CDISC compliant data?
Kevin Lee
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL data
Kevin Lee
 
Introduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmers
Kevin Lee
 
Standards Metadata Management (system)
Standards Metadata Management (system)Standards Metadata Management (system)
Standards Metadata Management (system)
Kevin Lee
 
Data centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentData centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data development
Kevin Lee
 

More from Kevin Lee (20)

Patient’s Journey using Real World Data and its Advanced Analytics
Patient’s Journey using Real World Data and its Advanced AnalyticsPatient’s Journey using Real World Data and its Advanced Analytics
Patient’s Journey using Real World Data and its Advanced Analytics
 
Introduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric DepartmentIntroduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric Department
 
A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...
A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...
A fear of missing out and a fear of messing up : A Strategic Roadmap for Chat...
 
Prompt it, not Google it - Prompt Engineering for Data Scientists
Prompt it, not Google it - Prompt Engineering for Data ScientistsPrompt it, not Google it - Prompt Engineering for Data Scientists
Prompt it, not Google it - Prompt Engineering for Data Scientists
 
Leading into the Unknown? Yes, we need Change Management Leadership
Leading into the Unknown? Yes, we need Change Management LeadershipLeading into the Unknown? Yes, we need Change Management Leadership
Leading into the Unknown? Yes, we need Change Management Leadership
 
How to create SDTM DM.xpt using Python v1.1
How to create SDTM DM.xpt using Python v1.1How to create SDTM DM.xpt using Python v1.1
How to create SDTM DM.xpt using Python v1.1
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
 
Artificial Intelligence in Pharmaceutical Industry
Artificial Intelligence in Pharmaceutical IndustryArtificial Intelligence in Pharmaceutical Industry
Artificial Intelligence in Pharmaceutical Industry
 
Tell stories with jupyter notebook
Tell stories with jupyter notebookTell stories with jupyter notebook
Tell stories with jupyter notebook
 
Perfect partnership - machine learning and CDISC standard data
Perfect partnership - machine learning and CDISC standard dataPerfect partnership - machine learning and CDISC standard data
Perfect partnership - machine learning and CDISC standard data
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmers
 
Big data in pharmaceutical industry
Big data in pharmaceutical industryBig data in pharmaceutical industry
Big data in pharmaceutical industry
 
How FDA will reject non compliant electronic submission
How FDA will reject non compliant electronic submissionHow FDA will reject non compliant electronic submission
How FDA will reject non compliant electronic submission
 
End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...
End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...
End to end standards driven oncology study (solid tumor, Immunotherapy, Leuke...
 
Are you ready for Dec 17, 2016 - CDISC compliant data?
Are you ready for Dec 17, 2016 - CDISC compliant data?Are you ready for Dec 17, 2016 - CDISC compliant data?
Are you ready for Dec 17, 2016 - CDISC compliant data?
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL data
 
Introduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmers
 
Standards Metadata Management (system)
Standards Metadata Management (system)Standards Metadata Management (system)
Standards Metadata Management (system)
 
Data centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentData centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data development
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 

Enterprise-level Transition from SAS to Open-source Programming for the whole department

  • 1. 5/26/2022 1 Enterprise-levelTransition from SAS to Open-Source Programing for the whole department Kevin Lee Disclaimer The views and opinions presented here represent those of the speaker and should not be considered to represent any companies or organizations. 1 2
  • 2. 5/26/2022 2 • Scope of Open-Source Transition Project • Challenges of Open-Source Transition Project • How to overcome Challenges • Difference between SAS and Open-source Programming • Change Management by Leadership • Benefit of Open-Source Transition Project • Lessons Learned • Conclusion Open-Source Programming ( R & Python) Adoption by Data Scientist has increased from 35% in 2014 to 77% in 2020 based on Burtch Works Survey. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 2013 2014 2015 2016 2017 2018 2019 2020 2021 SAS Open Source 3 4
  • 3. 5/26/2022 3 How has we, as individuals, transitioned/expanded from SAS to Open-Source Programmer? • Learned New Programming Language – R or/and Python • Learned New Analytical System – R Studio / Jupyter • Converted previous SAS codes to new Open-source codes • Adjusted to new culture in Open-source (no dedicated customer service from SAS) What about the entire department (150 SAS programmers)? 5 6
  • 4. 5/26/2022 4 High Level Infrastructure Transition from SAS Window to AWS Open-SourceAnalytical System Window Server SAS Studio Oracle Database AWS Cloud Computing R Studio Pro Server Redshift Data Warehouse 8 Scope of Open-Source Transition Project SAS Programming SAS Studio Window Server SAS programming SAS Programmers(150) SAS codes(230) Oracle Relational Database Open-Source Programming R Pro Server ( R Studio, Jupyter) AWS Cloud Computing(Linux) Open-Source Programming ( R, Python, SQL) Data Scientists R, Python & SQL codes AWS Redshift Data Warehouse 7 8
  • 5. 5/26/2022 5 9 New AWS Open-Source Analytical System AWS Cloud Redshift Data Wareho use S3 EC2/ Linux R Studio Jupyter SQL Workbe nch Git Tableau Difference between SAS and Open-Source Programming SAS Programming Open-Source Programming Analytic Interface SAS Studio R Studio, Jupyter, SQL Workbench Data Processing Row by Row Column by Column comments * # Processing Structure DATA / PROC steps Single statement Function Macros function Macro variables Yes No (at least, I have not found one yet) Data Structure SAS datasets Data Frame Object-oriented No Yes Customer Support Dedicated Support Community Support Feature Update Less Frequent Very frequent since new features could be developed by anyone. Speed High Level Language, so fast SQL & Python are very fast – more suitable for Big Data System Closed System, thoroughly validated Open System, transparent but validation could be questioned Machine Learning At early stage The most popular language 9 10
  • 6. 5/26/2022 6 11 Difference between SAS and Open Source studyid subjid siteid age sexc • SAS process the data a row by a row. • R/Python/SQL process the data a column by a column. studyid subjid siteid age sexc sex studyid subjid siteid age sexc sexc sex • Less memory to use • Faster to process • Difficult to process/create multiple variables data DM; set DEMO; if SEXC = ‘Male’ then SEX = ‘M’ else if SEXC = ‘Female’ then SEX = ‘F’ else if SEXC = ‘Unknown’ then SEX = ‘U’ run; DM[‘SEX’] = DM.SEXC.replace([‘Male’,’Female’,’Unkonwn’ ], [‘M’, ‘F’, ‘U’]) 12 Challenges for Open-Source Transition project • Inexperience in Open-Source Programming (e.g., R, Python) • Inexperience in new system / environment ( AWS Cloud Computing, R Studio, Jupyter, Linux ) • Learning curve • Daily works as well as existing SAS codes conversion • Lack of enterprise customer support on Open-Source Programming • Version control • Uncertainty toward new culture created by Open-Source programming 11 12
  • 7. 5/26/2022 7 13 How to Overcome the Challenges • Training on Open-Source programming (R & Python) • Continuous Workshop on the new analytical system • R Pro Server ( R Studio, Jupyter ) • Redshift Data Warehouse • Linux • S3 • SAS codes conversion (+230) to R/Python/SQL by Transition Support Team • Change Management by Leadership 14 Sprint Cycle of SAS Codes Conversion • Conversion for Team 1 (40 codes) • UAT 1 • SOP 1 Sprint 1 • Conversion for Team 2 (50 codes) • UAT 2 • SOP 2 Sprint 2 • Conversion for Team 3 (50 codes) • UAT 3 • SOP 3 Sprint 3 • Conversion for Team 4 ( 50 codes) • UAT 4 • SOP 4 Sprint 4 13 14
  • 8. 5/26/2022 8 15 Change Management by Leadership • Clear goals, plan and timeline of the change (Open-source programming) • Executive Support • Resources and Budgets for a change • Continuous Support (e.g., new programming, system, infrastructure) during the transition from SAS to Open- source. • Dedicated Supporting Team /SME • Frequent, transparent communication on the project progress/update • Full Participation from the whole team 16 Benefits for Open-Source Programming Transition • Cultural Changes • Less dependent on tools (Enterprise System) • Proactive • Open to new technologies • Faster to adopt new changes • Collaborative • More Opportunities • To learn new languages (R, Python) and system (Cloud Computing) • To keep top talents • Cost effective • Innovative, Advanced Analytics (Big Data, Data Visualization, Machin Learning) 15 16
  • 9. 5/26/2022 9 17 Lessons Learned from Open-Source Transition Project • Support for programmers during the Transition is critical. • Training • Workshop • Dedicated Supporting Team during the Transition is critical. • Know-how / Experience on Existing System and Open-Source System is critical • Change Management by Leadership is the key to success. 18 Conclusion • Successful Transition from SAS Programming to Open-Source Programming in enterprise level (e.g., Biometric Department) is more than learning new languages. • Successful Transition from SAS Programming to Open-Source Programming in enterprise level requires for the whole team, especially from the top(leadership). • Cultural Change toward more open, proactive, advanced analytic environment (Data Visualization, Machine Learning, Big Data). • Open-source programming Transition Project has provided More opportunities for SAS programmer to learn the new technologies. 17 18