Introducción al Software Analítico SAS

1,188 views

Published on

Un resumen de lo que ofrece la programación en SAS. Evento realizado por Axcess Financial.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,188
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Introducción al Software Analítico SAS

  1. 1. SAS 9.3Solve next-generationproblems with SAS®
  2. 2. Give a man a fish and you feed him for a day; teacha man to fish and you feed him for a lifetime.
  3. 3. Presentation Outline
  4. 4. Introduction to the SAS Environment1. SAS Introduction2. SAS Programs3. SAS Data Sets and Data Libraries4. Creating SAS Data Sets
  5. 5. What is SAS?• SAS is a comprehensive statistical software system whichintegrates utilities for storing, modifying, analyzing, andgraphing data.• SAS runs on both Windows and UNIX platforms• SAS is used in a wide range of industries such ashealthcare, education, financial services, life sciences,…• Check out the webpage to learn more• http://www.sas.com/
  6. 6. Who is SAS?
  7. 7. 2012 Worldwide ResultsBreakdown by Industry Sector2012 SAS Annual Report
  8. 8. More than 1,500 banks with SAS
  9. 9. Evolution of SAS
  10. 10. SAS Banking Analytics Architecture
  11. 11. SAS User InterfaceLog WindowExplorerWindowEditor WindowOutput Window (not shown)ResultsWindow(not shown)Run button – click on this button to runSAS codeClick here for SAS helpNew Window buttonSave buttonTool bar similarto Windows applications
  12. 12. Editor WindowThe Editor Window contains inputted datasets and SAS programs
  13. 13. Explorer WindowExplorerWindowLibraries Folder - Contains data sets created in SAS
  14. 14. Libraries FolderContents of the LibrariesFolderThe Work Folder containsdata sets created in SASContents of the Work FolderThese are the data sets thathave been created in SASthrough inputting data andby creating data sets in SASprograms
  15. 15. Log WindowThe Log Window contains a recordof all commands submitted toSAS and shows errors in thecommands.
  16. 16. Output WindowThe Output Window contains outputbased on SAS programs submitted in theEditor Window.
  17. 17. Results WindowThe Results Window shows alisting of SAS programsthat have been submittedin the order that they weresubmitted.Click on any procedure toview all output parts of theprocedure and click on anyindividual part to view theactual output.
  18. 18. SAS Help
  19. 19. SAS Programs• File extension - .sas• Editor window has four uses:– Access and edit existing SASprograms– Write new SAS programs– Submitting SAS programs forexecution– Saving SAS programs• SAS program– Sequence of steps that the usersubmits for execution• Submitting SAS programs– Entire program– Selection of the program• 2 Basic steps in SAS programs:– Data Steps• Typically used to create SASdatasets and manipulatedata,• Begins with DATA statement– Proc Steps• Typically used to processSAS data sets• Begins with PROC statement• The end of the data or proc stepsare indicated by:– RUN statement – most steps– QUIT statement – some steps– Beginning of another step (DATAor PROC statement)
  20. 20. • SAS Data Libraries– Contain SAS data sets– Identified by assigning a libraryreference name – libref– Temporary• Work library• SAS data files are deletedwhen session ends• Library reference name notnecessary– Permanent• SAS data sets are savedafter session ends• SASUSER library• You can create and accessyour own librariesSAS Data Sets and Data Libraries
  21. 21. Presentation Outline
  22. 22. 1. Data Set Information2. Data Set Manipulation3. Combining Data SetsA. Concatenating/AppendingB. MergingWorking With SAS Data Sets
  23. 23. • Proc Contents– Output contains a table of contents of the specified data set– Data Set Information• Data set name• Number of observations• Number of Variables– Variable Information• Type (numeric or character)• Length– Syntax:PROC CONTENTS DATA=input_data_set;RUN;Data Set Information
  24. 24. • Create a new SAS data set using an existing SAS data set as input– Specify name of the new SAS data set after the DATA statement– Use SET statement to identify SAS data set being read– Syntax:DATA output_data_set;SET input_data_set;<additional SAS statements>;RUN;– By default the SET statement reads all observations and variables from theinput data set into the output data set.Data Set Manipulation
  25. 25. • Assignment Statements– Evaluate an expression– Assign resulting value to a variable– General Form: variable = expression;– Example: miles_per_hour = distance/time;• SAS Functions– Perform arithmetic functions, compute simple statistics, manipulatedates, etc.– General Form: variable=function_name(argument1, argument2,…);– Example: Time_worked = sum(Day1,Day2, Day3, Day4, Day5);Data Set Manipulation
  26. 26. • Conditional Processing– Uses IF-THEN-ELSE logic– General Form: IF <expression1> THEN <statement>;ELSE IF <expression2> THEN <statement>;ELSE <statement>;– <expression> is a true/false statement, such as:• Day1=Day2, Day1 > Day2, Day1 < Day2• Day1+Day2=10• Sum(day1,day2)=10• Day1=5 and Day2=5Data Set Manipulation
  27. 27. • Conditional ProcessingSymbolic Mnemonic Example= EQ IF region=‘Spain’;~= or ^= NE IF region ne ‘Spain’;> GT IF rainfall > 20;< LT IF rainfall lt 20;>= GE IF rainfall ge 20;<= LE IF rainfall <= 20;& AND IF rainfall ge 20 & temp < 90;| or ! OR IF rainfall ge 20 OR temp < 90;IS NOTMISSINGIF region IS NOT MISSING;BETWEENANDIF region BETWEEN ‘Plain’ AND ‘Spain’;CONTAINS IF region CONTAINS ‘ain’;IN IF region IN (‘Rain’, ‘Spain’, ‘Plain’);Data Set Manipulation
  28. 28. • PROC SORT sorts data according to specified variables• General Form:PROC SORT DATA=input_data_set <options>;BY Variable1 Variable2;RUN;• Sorts data according to Variable1 and then Variable2;• By default, SAS sorts data in ascending order– Number low to high– A to Z• Use DESCENDING statement for numbers high to low and letters Z to A– BY City DESCENDING Population;– SAS sorts data first by city A to Z and then Population high to lowData Set Manipulation
  29. 29. • Merging Data Sets– One-to-One Match Merge• A single record in a data set corresponds to a single record in all otherdata sets• Example: Patient and Billing Information– One-to-Many Match Merge• Matching one observation from one data set to multiple observations inother data sets• Example: County and State Information– Note: Data must be sorted before merging can be done(PROC SORT)Combining Data Sets
  30. 30. • Concatenating (or Appending)• Stacks each data set upon the other• If one data set does not have a variable that the other datasets do, thevariable in the new data set is set to missing for the observations fromthat data set.• General Form:DATA output_data_set;SET data1 data2;run;• PROC APPEND may also be usedCombining Data Sets
  31. 31. Presentation Outline
  32. 32. 1. Print Procedure2. Plot Procedure3. Univariate Procedure4. Means Procedure5. Freq ProcedureSummary Procedures
  33. 33. • PROC PRINT is used to print data to the output window• By default, prints all observations and variables in the SAS data set• General Form: PROC PRINT DATA=input_data_set <options><optional SAS statements>;RUN;• Some Options– input_data_set (obs=n) - Specifies the number of observationsto be printed in the output– NOOBS - Suppresses printing observationnumber– LABEL - Prints the labels instead of variablenamesPrint Procedure
  34. 34. • Used to create basic scatter plots of the data• Use PROC GPLOT or PROC SGPLOT for more sophisticated plots• General Form:PROC PLOT DATA=input_data_set;PLOT vertical_variable * horizontal_variable/<options>;RUN;• By default, SAS uses letters to mark points on plots– A for a single observation, B for two observations at the same point, etc.• To specify a different character to represent a point– PLOT vertical_variable * horizontal variable = ‘*’;• To specify a third variable to use to mark points– PLOT vertical_variable * horizontal_variable = third_variable;• To plot more than one variable on the vertical axis– PLOT vertical_variable1 * horizontal_variable=‘2’vertical_variable2 * horizontal_variable=‘1’/OVERLAY;Plot Procedure
  35. 35. • PROC UNIVARIATE is used to examine the distribution of data• Produces summary statistics for a single variable– Includes mean, median, mode, standarddeviation, skewness, kurtosis, quantiles, etc.• General Form:PROC UNIVARIATE DATA=input_data_set<options>;VAR variable1 variable2 variable3;RUN ;• If the variable statement is not used, summary statistics will be produced for allnumeric variables in the input data set.• Options include:– PLOT – produces Stem-and-leaf plot, Box plot, and Normal probability plot;– NORMAL – produces tests of NormalityUnivariate Procedure
  36. 36. • Similar to the Univariate procedure• General Form:PROC MEANS DATA=input_data_set options;<Optional SAS statements>;RUN;• With no options or optional SAS statements, the Means procedure will print outthe number of non-missing values, mean, standard deviation, minimum, andmaximum for all numeric variables in the input data set• Optional SAS Statements– VAR Variable1 Variable2;• Specifies which numeric variables statistics will be produced for– BY Variable1 Variable2;• Calculates statistics for each combination of the BY variables– Output out=output_data_set;• Creates data set with the default statisticsMeans Procedure
  37. 37. • Options– Statistics Available– Note: The default alpha level for confidence limits is 95%. Use ALPHA=option to specify different alpha level.CLM Two-Sided Confidence Limits RANGE RangeCSS Corrected Sum of Squares SKEWNESS SkewnessCV Coefficient of Variation STDDEV Standard DeviationKURTOSIS Kurtosis STDERR Standard Error of MeanLCLM Lower Confidence Limit SUM SumMAX Maximum Value SUMWGT Sum of Weight VariablesMEAN Mean UCLM Upper Confidence LimitMIN Minimum Value USS Uncorrected Sum of SquaresN Number Non-missing Values VAR VarianceNMISS Number Missing Values PROBT Probability for Student’s tMEDIAN (or P50) Median T Student’s tQ1 (P25) 25% Quantile Q3 (P75) 75% QuantileP1 1% Quantile P5 5% QuantileP10 10% Quantile P90 90% QuantileP95 95% Quantile P99 99% QuantileMeans Procedure
  38. 38. • PROC FREQ is used to generate frequency tables• Most common usage is create table showing the distribution of categoricalvariables• General Form:PROC FREQ DATA=input_data_set;TABLE variable1*variable2*variable3/<options>;RUN;• Options– LIST – prints cross tabulations in list format rather than grid– MISSING – specifies that missing values should be included in thetabulations– OUT=output_data_set – creates a data set containing frequencies, listformat– NOPRINT – suppress printing in the output window• Use BY statement to get percentages within each category of a variableFreq Procedure
  39. 39. Presentation Outline
  40. 40. • Proc SQL is the SAS implementation of SQL• Proc SQL is a powerful SAS procedure that combines the functionalityof the SAS data step with the SQL language• Proc SQL can sort, subset, merge and summarize data – all at once• Proc SQL can combine standard SQL functions with virtually all SASfunctions• Proc SQL can work remotely with RDBMS such as OracleIntroduction - What is PROC SQL
  41. 41. PROC SQL – What can do?– To perform a query – Using SELECT statement.– To save queried result into SAS dataset – Using CREATE TABLEstatement– To save the query itself – Using CREATE VIEW statement– To sort dataset– To merge more than one datasets in a number of ways– To import dataset from Oracle Clinical to SAS– To enter new records into a SAS dataset– To modify/ edit the SAS dataset
  42. 42. PROC SQL - Why• The Advantage of using SQL– Combined functionality– Faster for smaller tables– SQL code is more portable for non-SAS applications– Not require presorting– Not require common variable names to join on. (need sametype , length)
  43. 43. • It is used to perform a query. It does not create any dataset.• The simplest SQL code, need 3 statements• By default, it will print the resultant query, use NOPRINT option tosuppress this feature• Begin with PROC SQL, end with QUIT; not RUN;• Need at least one SELECT… FROM statementPerforming Query – SELECTStatement
  44. 44. PROC SQL;SELECT *FROM VITALS;QUIT;Performing Query – SELECTStatementTo select all the variablesuse ‘*’ after SELECTstatement
  45. 45. PROC SQL;SELECT Patient, pulseFROM VITALS;QUIT;Performing Query – SELECTStatementTo select only particular variable(s) write down the variable names after SELECTstatement. Variable names should be separated by commas.
  46. 46. PROC SQL;SELECT DISTINCT PatientFROM VITALS;QUIT;Performing Query – SELECTStatementTo select only distinct observations and to delete duplicate observations.
  47. 47. PROC SQL ;SELECT *FROM VitalsORDER BY date;QUIT;Ordering/Sorting Query Results• SELECT * means we select all variables from dataset VITALS• Put ORDER BY after FROM.Sorting by Date
  48. 48. PROC SQL;SELECT *FROM vitalsWHERE Name CONTAINS J;QUIT;Subsetting:- Character searching in WHERE• Always put WHERE after FROM• CONTAINS in WHERE statement only for character variablesPrint observations with namecontaining ‘J’.
  49. 49. PROC SQL;SELECT *FROM vitalsWHERE Name LIKE ‘%o%;QUIT;Subsetting- Character searching in WHERE• LIKE in WHERE statement only for character variablesPrint observations with namecontaining ‘o’ in between.
  50. 50. • In SELECT, the results of a query are converted to an output object (printing).• Query results can also be stored as data.• The CREATE TABLE statement creates a table with the results of a query.• The CREATE VIEW statement stores the query itself as a view. Either way, thedata identified in the query can beused in later SQL statements or in other SASsteps.Creating New Data
  51. 51. PROC SQL;CREATE TABLE bpAS SELECTpatient, date, pulseFROM VitalsWHERE temp>98.5;QUIT;Creating New Data - Create TableCREATE TABLE … AS…Statement Creates a Newtable from an existing table.These statements willcopy all the variables tothe new datasetPROC SQL;CREATE TABLE bpAS SELECT *FROM VitalsWHERE temp>98.5;QUIT;
  52. 52. Creating New Data - Create TableWe can also assign different variable name, Label, Length, and format namePROC SQL;CREATE TABLE bpAS SELECTpatient AS Patient LABEL=Subject number LENGTH =5,date AS Date LABEL=Date of Expt FORMAT=WORDDATE8.,pulse,tempFROM VitalsWHERE temp>98.5;QUIT;
  53. 53. PROC SQL;CREATE VIEW bpAS SELECT patient, date, pulse, tempFROM Vitals;WHERE temp>98.5QUIT;Creating New Data - Create View• First step-creating a view,no output is produced.• When a table is created, the query is executed and the resulting data is storedin a file. When a view is created, the query itself is stored in the file. The data isnot accessed at all in the process of creating a view.
  54. 54. • The order of each statement is important• CASE …END AS should in between SELECT and FROM• Use WHEN … THEN ELSE… to redefine variables• New variable GENDER is created from PATIENT.Case Logic- reassigning/recategorizePROC SQL;CREATE TABLE BP ASSELECT Patient, Pulse,CASE PatientWHEN 101 THEN MaleWHEN 102 THEN FemaleWHEN 103 THEN FemaleELSE MaleEND AS GenderFROM Vitals;QUIT;New VariableSource variable
  55. 55. Combining Datasets: JoinsFull Join InnerJoinLeft Join Right JoinIf a or b; If a and b;If a; If b;
  56. 56. Dataset: DosingCombining Datasets: Joins
  57. 57. Dataset: VitalsCombining Datasets: Joins
  58. 58. • No prior sorting required – one advantage over DATA MERGE• Use comma (,) to separate two datasets in FROM• Without WHERE, all possible combinations of rows from each tables isproduced, all columns are includedJoin Tables (Merge datasets)- Inner Join: Using WHEREPROC SQL;CREATE TABLE new ASSELECT dosing.patient,dosing.date,dosing.med,vitals.pulse,vitals.tempFROM dosing, vitalsWHERE dosing.patient=vitals.patientAND dosing.date=vitals.date;QUIT;
  59. 59. Join Tables (Merge datasets)- Inner Join
  60. 60. Resultant dataset will contain all & only those observations which comes fromDOSING dataset.Join Tables (Merge datasets)- Left Joins using ONPROC SQL;CREATE TABLE new1 ASSELECT dosing.patient,dosing.date,dosing.med,vitals.pulse,vitals.tempFROM dosing LEFT JOIN vitalsON dosing.patient=vitals.patientAND dosing.date=vitals.date;QUIT;
  61. 61. Join Tables (Merge datasets)- Left Joins using ON
  62. 62. Resultant dataset will contain all & only those observations which comes fromVITALS dataset.Join Tables (Merge datasets)- Right Joins using ONPROC SQL;CREATE TABLE new1 ASSELECT dosing.patient,dosing.date,dosing.med,vitals.pulse,vitals.tempFROM dosing RIGHT JOIN vitalsON dosing.patient=vitals.patientAND dosing.date=vitals.date;QUIT;
  63. 63. Join Tables (Merge datasets)- Right Joins using ON
  64. 64. Resultant dataset will contain all observation if they come from at least one of thedatasets.Join Tables (Merge datasets)- Full Joins using ONPROC SQL;CREATE TABLE new1 ASSELECT dosing.patient,dosing.date,dosing.med,vitals.pulse,vitals.tempFROM dosing FULL JOIN vitalsON dosing.patient=vitals.patientAND dosing.date=vitals.date;QUIT;
  65. 65. Join Tables (Merge datasets)- Full Joins using ON
  66. 66. SQL Functions♦ PROC SQL supports almost all the functions available to the SAS DATAstep that can be used in a proc sql select statement♦ Common Functions:◘ COUNT◘ DISTINCT◘ MAX◘ MIN◘ SUM◘ AVG◘ VAR◘ STD◘ STDERR◘ NMISS◘ RANGE◘ SUBSTR◘ LENGTH◘ UPPER◘ LOWER◘ CONCAT◘ ROUND◘ MOD
  67. 67. PROC SQL functionsPROC SQL;SELECT avg(Age) AS mean,std(Age) AS sd,min(Age) AS min,max(Age) AS max,count(Age) AS count,N (Age) AS CountFROM sashelp.class;quit;
  68. 68. PROC SQL functionsPROC SQL;SELECT sex,avg(Age) AS mean,std(Age) AS sd,min(Age) AS min,max(Age) AS max,count(Age) AS count,N (Age) AS CountFROM sashelp.class;GROUP BY Sexquit;
  69. 69. /*Deleting rows*/PROC SQL;DELETEFROM classWHERE age le 13;QUIT;Editing Data – Deleting rows andDropping columns/*Droping variables*/PROC SQL;CREATE TABLE New (DROP=age) ASSELECT *FROM Class;QUIT;• Deleting columns can be done in SELECT or in DROP on created table
  70. 70. Importing data from OC to SAS
  71. 71. Importing data from OC to SAS
  72. 72. Presentation Outline
  73. 73. Learning SAS
  74. 74. Learning SAS
  75. 75. Learning SAS
  76. 76. Learning SAS
  77. 77. Learning SAS
  78. 78. Learning SAS
  79. 79. Learning SAS
  80. 80. Learning SAS
  81. 81. Learning SAS
  82. 82. Presentation Outline
  83. 83. SAS Global Certification Program
  84. 84. SAS Global Certification Program
  85. 85. Presentation Outline
  86. 86. Questions and comments
  87. 87. ¡MUCHAS GRACIAS!Luis Barragán ScavinoJorge Rodríguez MamaniCalle Alcanfores 1255Miraflores, Lima 18, Perú+51 99 417 6340luis.barragan@bigdata.pejorge.rodriguez@bigdata.pe

×