This SAS Programming For Beginners tutorial from Edureka will take you through the programming concepts in SAS such as data and procedure steps, formats, informats, loops, dataset operations and important procedures like Proc Means, Frequency, Summary and many more. We have implemented a Randomness Testing demo which uses SAS Frequency procedure and Chi Square test to check the randomness of a given sample of data. Below are the topics covered in this tutorial:
1. Data Analytics Tools
2. Why SAS?
3. What is SAS?
4. SAS Features
5. Programming Concepts in SAS
6. Use Case – Testing Randomness
7. SAS Job Trends
2. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
What to expect?
Data Analytics Tools
Why SAS?
What is SAS?
SAS Features
Programming Concepts in SAS
Demo – Testing Randomness
SAS Job Trends
4. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why Data Analytics?
Cost
Reduction
Improved
Services or
Products
Faster and Better
Decision Making
Next Generation
Products
Data
Analytics
Data Analytics help
manage resources so
as to reduce costs
Analytics enables
better work related
decisions
Meeting customer
needs through better
services
Data Analytics paves the
way for the creation of
next gen products
6. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Data Analytics Tools
There are many tools to perform Data Analytics and the popular ones are:
Tableau
Excel
QlikView
Splunk
SAS
Python
Apache Spark
Apache Storm
Pig & Hive
R
Paid Tools Open Source Tools
8. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why SAS?
We will compare SAS with the popular alternatives in the market on the following aspects:
Ease of Learning
Data Handling Capabilities
Graphical Capabilities
Advancements in tool
Job Scenario
9. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why SAS?
We will compare SAS with the popular alternatives in the market on the following aspects:
Ease of Learning:
Data Handling Capabilities
Graphical Capabilities
Advancements in tool
Job Scenario
SAS is easy to learn and provides easy option (PROC SQL) for people
who already know SQL. R on the other hand has a very steep
learning curve as it is a low level programming language.
10. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why SAS?
We will compare SAS with the popular alternatives in the market on the following aspects:
Ease of Learning
Data Handling Capabilities:
Graphical Capabilities
Advancements in tool
Job Scenario
SAS is on par with all leading tools including R & Python when
it comes to handling huge amount of data and options for
parallel computations.
11. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why SAS?
We will compare SAS with the popular alternatives in the market on the following aspects:
Ease of Learning
Data Handling Capabilities
Graphical Capabilities:
Advancements in tool
Job Scenario
SAS provides functional graphical capabilities and with a little
bit of learning, it is possible to customize on these plots.
12. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why SAS?
We will compare SAS with the popular alternatives in the market on the following aspects:
Ease of Learning
Data Handling Capabilities
Graphical Capabilities:
Advancements in tool:
Job Scenario
SAS releases updates in controlled environment, hence they
are well tested. R & Python on the other hand, have open
contribution and there are chances of errors in latest
developments.
13. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Why SAS?
We will compare SAS with the popular alternatives in the market on the following aspects:
Ease of Learning
Data Handling Capabilities
Graphical Capabilities
Advancements in tool
Job Scenario: Globally, SAS is the market leader in available corporate jobs.
In India, SAS controls about 70% of the data analytics market
share compared to 15% for R.
15. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
What is SAS?
SAS (Statistical Analytics System) is a software suite for advanced analytics,
multivariate analyses, business intelligence, data management and predictive
analytics.
It is developed by SAS Institute.
SAS provides a graphical point-and-click user interface for non-technical users
and more advanced options through the SAS language.
17. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Features
Base SAS
Flexible Extensible Integrated Powerful
Business
Solutions
Analytics
Reporting
and Graphics
Data Access and
Management
Visualization
and Discovery
18. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Base SAS
Flexible Extensible Integrated Powerful
SAS Features
Data Access
Reporting
Transformation
Let us look at some of the features of SAS in detail
19. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Access
ManageAnalyze
Present
SAS Framework
Data
SAS Framework
List Reports
Summary
Reports
Graphic
Reports
Forecasting Regression AveragesFrequency
20. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS STAT lets you perform
Statistical analysis with the variance
analysis, regression, multivariate
analysis, survival analysis and
psychometric analysis
SAS Components
Some of the SAS components include:
Base SAS is the most widely used
component, it has data
management facility and also lets
you analyse data
Base SAS SAS Graph
SAS STATSAS ETS
SAS ETS is suited for Time Series
analysis
Graphs and presentations make
understanding easier. SAS Graph
does that for you
22. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Program
data fibonacci;
do i = 1 to 10;
fib = sum(fib, lag(fib));
if i eq 1 then fib = 1;
output;
end;
run;
proc print data=fibonacci;
run;
Let us write a SAS program to print the first ten Fibonacci numbers
Data Step
Proc Step
23. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Program
data fibonacci;
do i = 1 to 10;
fib = sum(fib, lag(fib));
if i eq 1 then fib = 1;
output;
end;
run;
proc print data=fibonacci;
run;
Now understanding the data step in our SAS program
Data Step
We define a variable fib to find the next Fibonacci
number.
Fib variable is equal to the sum of current fib number
and the previous Fibonacci number.
The lag function is used to retrieve the last function and
fetches the value of the previous fib number.
1
2
3
24. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Program
data fibonacci;
do i = 1 to 10;
fib = sum(fib, lag(fib));
if i eq 1 then fib = 1;
output;
end;
run;
proc print data=fibonacci;
run;
Moving on to the PROC step,
Proc steps prints the data set fibonacci.
We get three columns in our output – Obs, i & fib
Obs & i go from 1 to 10 where as fib column contains
the first 10 Fibonacci numbers
1
2
3 Proc Step
25. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Data
Data is central to every data set. In SAS, data is available in tabular form where
variables occupy the column space and observations occupy the row space.
Data types
SAS treats numbers as numeric data and everything else falls under character
data. Hence SAS has two data types numeric and character.
Variables (Columns)
Observations
(Rows)
26. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Data - Date
Apart from these, dates in SAS are represented in a special way
compared to other languages.
A SAS date is a numeric value equal to the number of days since
January 1, 1960.
Apart from Date Values, there are many tools to work on dates such
as informats for reading dates, functions for manipulating dates and
formats for printing dates.
Date SAS Date Value
January 1, 1959 -365
January 1, 1960 0
January 1, 1961 366
January 1, 2003 15706
Dates in SAS
27. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Data – Informat
Informat
Informats tell SAS how to read a variable when you read in data from an external file with
the INPUT statement in a DATA step and also when you create a new variable in a dataset.
Every variable in any SAS dataset will have an informat.
There are three main classes of informats: character, numeric and date.
Type Informat Name What it does
Character $w. Reads character data of length w
Numeric w.d
Reads numeric data of length w with d
decimal points
Date MMDDYYw. Reads date data in the form MM-DD-YY
28. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Data – Format
Format
Defining a format for a variable is how you tell SAS to display the values in the variable.
Formats can be grouped into the same three classes as informats (character, numeric, and date-
time) and also always contain a dot.
The format statement can be used in either a data step or a proc step.
The general form of a format statement is:
FORMAT variable-name FORMAT-NAME.;
30. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Data – Loops
Loops in SAS are of three types:
DO Statement
DO expression;
more SAS statements;
END;
DO-UNTIL Statement
DO UNTIL expression;
more SAS statements;
END;
DO-WHILE Statement
DO WHILE expression;
more SAS statements;
END;
31. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Data – Dataset Operations
Datasets in SAS can be worked upon in the following ways:
Appending Datasets
PROC SORT data= input1; by key;
PROC SORT data= input2; by key;
DATA out1;
merge input1 input2;
by key;
RUN;
Merging Datasets
DATA out;
set input1 input2 input3;
by key;
RUN;
32. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Procedures
Procedures in SAS are represented by PROC statements.
Each PROC is unique but there’re few similarities as well.
All procedures have required statement and most have optional
statements.
IMPORT DATASETS CONTENTS PRINT FREQ
SORT FORMAT SURVEYSELECT TRANSPOSE MEANS
SUMMARY RANK OPTIONS EXPORT
Figure: Important SAS Procedures
33. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
SAS Means Procedure
PROC MEANS is one of the most powerful and flexible Procedures in the SAS System.
We can use it to rapidly and efficiently analyze the values of numeric variables and place those
analyses either in the Output Window or in a SAS Data Set.
Figure: Example of Means Procedure
proc means data=sashelp.class n min max sum mean median stddev range;
var age height weight;
class sex;
run;
35. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – Testing Randomness
Introduction
The demo we are demonstrating using SAS will check the randomness of a particular
sequence of numbers.
Random number generation is a key requisite for many security systems to work
across the world.
10,000 Random numbers from Random.org 10 Million digits of Pi decimal expansion
36. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – Testing Randomness
We will check the randomness of the following two sets of numbers.
1. 10,000 Random numbers generated from Random.org
2. 10 Million digits in the decimal expansion of Pi
10,000 Random numbers from Random.org 10 Million digits of Pi decimal expansion
37. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – What is Randomness?
A numeric sequence is said to be statistically random when it
contains no recognizable patterns or regularities; sequences such as
the results of an ideal dice roll or the digits of π exhibit statistical
randomness.
Statistical randomness does not necessarily imply true randomness,
i.e., objective unpredictability.
Some of the popular algorithms to generate random numbers
include Blum Blum Shub, Blum-Micali, CBRNG, Mersenne Twister,
Rule 30 and Yarrow.
We will use Chi-Squared test to test the randomness of our given
datasets of numbers.
41. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – SAS Code
The following is the code that we just saw in the demo.
Reading a file with 10
million input size Setting width of
each number to
one digit
42. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – SAS Code
The following is the code that we just saw in the demo.
Reading a file with 10
million input size Setting width of
each number to
one digitSetting a new line to
each number
43. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – SAS Code
The following is the code that we just saw in the demo.
Reading a file with 10
million input size
Running Chi Square
test on the input
Setting width of
each number to
one digitSetting a new line to
each number
44. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – Results
Running our SAS program to test the randomness of the given numbers, let us look at the results.
Results from 10 million Pi digits Results from 10,000 digits from Random.org
45. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Demo – Results
We can thus conclude that the decimal digits of Pi are more random when compared to any other set of
random numbers from Random.org
In fact, the digits in the decimal expansion of Pi form the most random occurring sequence ever found.
47. www.edureka.co/sas-trainingEDUREKA SAS CERTIFICATION TRAINING
Job Trends in SAS
The following is the Job Trend of SAS
& SAS Modeling across the world
SAS has been a market leader when
it comes to Data Analytics Jobs
Source: www.indeed.com