Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving Effeciency with Options in SAS


Published on

Base SAS,
Advanced SAS,
Proc SQl,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...


Published in: Technology
  • Hi All, We provide Salesforce training in a traditional Classroom setting at our training center where students gain hands-on training from our Salesforce Instructor as well as interact with other students. Classroom settings provide you a high level of retention and allow collaboration to encourage peer insight and team building. For more information feel free to contact us : For Course Content and Recorded Demo Click Here :
    Are you sure you want to  Yes  No
    Your message goes here

Improving Effeciency with Options in SAS

  1. 1. SAS Techies [email_address]
  2. 2. <ul><li>When you store your data in a SAS data file, you use the sum of the data storage space that is required for the following: </li></ul><ul><ul><li>the descriptor portion </li></ul></ul><ul><ul><li>the observations </li></ul></ul><ul><ul><li>any storage overhead </li></ul></ul><ul><ul><li>any associated indexes. </li></ul></ul><ul><ul><li>Techniques: </li></ul></ul><ul><ul><li>Get rid of waste space </li></ul></ul><ul><ul><li>Compress datasets </li></ul></ul><ul><ul><li>Use Views </li></ul></ul>11/13/09 SAS Techies 2009
  3. 3. <ul><li>LENGTH variable(s) $ length ; </li></ul><ul><ul><li>where </li></ul></ul><ul><ul><li>variable(s) specifies the name of one or more SAS variables, separated by spaces. </li></ul></ul><ul><ul><li>length is an integer from 1 to 32,767 that specifies the length of the variable(s). </li></ul></ul><ul><li>SAS assigns a default length of 8 bytes to the Character variable </li></ul><ul><li>SAS character variables store data as 1 character per byte. A SAS character variable can be from 1 to 32,767 bytes in length. </li></ul><ul><li>One way to reduce the amount of data storage space that you need is to reduce the length of character data, thereby eliminating wasted space. Instead of recording the complete name in the data set, you could assign a code/abbreviation. </li></ul>11/13/09 SAS Techies 2009
  4. 4. <ul><li>The default length for a numeric variable is 8 bytes. </li></ul><ul><li>SAS stores all numeric values using double-precision floating-point representation. SAS stores the value of a numeric variable as multiple digits per byte. A SAS numeric variable can be from 2 to 8 bytes or 3 to 8 bytes in length, depending on your operating environment. </li></ul><ul><li>LENGTH var length <DEFAULT= n >; </li></ul><ul><ul><li>where </li></ul></ul><ul><ul><li>the optional DEFAULT= n argument changes the default number of bytes that SAS uses to store the values of any newly created numeric variables. If you use the DEFAULT= argument, you do not need to list any variable(s) . </li></ul></ul>11/13/09 SAS Techies 2009
  5. 5. <ul><li>Compressing a data file is a process that reduces the number of bytes that are required in order to represent each observation in a data file. </li></ul><ul><li>Reading from or writing to a compressed file during data processing requires fewer I/O operations because there are fewer data set pages in a compressed data file. </li></ul><ul><li>However, in order to read a compressed file, each observation must be uncompressed. This requires more CPU resources than reading an uncompressed file. </li></ul><ul><li>Also, in some cases, compressing a file might actually increase its size rather than decreasing it. </li></ul>11/13/09 SAS Techies 2009
  6. 6. <ul><li>By default, a SAS data file is not compressed. In uncompressed data files, </li></ul><ul><ul><li>each data value and each observation occupies the same number of bytes as any other data value of that variable. </li></ul></ul><ul><ul><li>character values are padded with blanks. </li></ul></ul><ul><ul><li>numeric values are padded with binary zeros. </li></ul></ul><ul><ul><li>there is a 16-byte overhead at the beginning of each page. </li></ul></ul><ul><ul><li>there is a 1-bit per observation overhead (rounded up to the nearest byte) at the end of each page; this bit denotes an observation's status as deleted or not deleted. </li></ul></ul><ul><ul><li>new observations are added at the end of the file. If a new observation won't fit on the current last page of the file, a whole new data set page is added. </li></ul></ul><ul><ul><li>the descriptor portion of the data file is stored at the end of the first page of the file. </li></ul></ul><ul><li>Compressed data files </li></ul><ul><ul><li>treat an observation as a single string of bytes by ignoring variable types and boundaries. </li></ul></ul><ul><ul><li>collapse consecutive repeating characters and numbers into fewer bytes. </li></ul></ul><ul><ul><li>contain a 28-byte overhead at the beginning of each page. </li></ul></ul><ul><ul><li>contain a 12-byte- or 24-byte-per-observation overhead following the page overhead. This space is used for deletion status, compressed length, pointers, and flags. </li></ul></ul>11/13/09 SAS Techies 2009
  7. 7. <ul><li>A data file is not a good candidate for compression if it has </li></ul><ul><ul><li>few repeated characters </li></ul></ul><ul><ul><li>small physical size </li></ul></ul><ul><ul><li>few missing values </li></ul></ul><ul><ul><li>short text strings. </li></ul></ul><ul><li>compression can be beneficial when the data file has one or more of the following properties: </li></ul><ul><ul><li>It is large. </li></ul></ul><ul><ul><li>It contains many long character values. </li></ul></ul><ul><ul><li>It contains many values that have repeated characters or binary zeros. </li></ul></ul><ul><ul><li>It contains many missing values. </li></ul></ul><ul><ul><li>It contains repeated values in variables that are physically stored next to one another. </li></ul></ul>11/13/09 SAS Techies 2009
  8. 8. <ul><ul><li>To compress a data file, you use either the COMPRESS= data set option or the COMPRESS= system option. </li></ul></ul><ul><ul><li>You use the COMPRESS= system option to compress all data files that you create during a SAS session. </li></ul></ul><ul><ul><li>Similarly, you use the COMPRESS= data set option to compress an individual data file. </li></ul></ul><ul><ul><li>CHAR or YES uses the Run Length Encoding (RLE) compression algorithm, which compresses repeating consecutive bytes such as trailing blanks or repeated zeros. </li></ul></ul><ul><ul><li>BINARY uses Ross Data Compression (RDC), which combines run-length encoding and sliding-window compression.   </li></ul></ul>11/13/09 SAS Techies 2009
  9. 9. <ul><ul><li>Another way to save disk space is to leave your data in its original location and use a SAS data view to access it. </li></ul></ul><ul><ul><li>A SAS data file and a SAS data view are both types of SAS data sets. The first type, a SAS data file, contains both descriptor information about the data and the data values. The second type, a SAS data view, contains only descriptor information about the data and instructions on how to retrieve data values that are stored elsewhere. </li></ul></ul>11/13/09 SAS Techies 2009
  10. 10. 11/13/09 SAS Techies 2009
  11. 11. <ul><li>use options and a statement to control the size and number of data buffers, which in turn can affect your programs' execution times by reducing the number of I/O operations that SAS must perform. </li></ul><ul><li>When you create a SAS data set using a DATA step, </li></ul><ul><ul><li>SAS copies the data from the input data set to a buffer in memory </li></ul></ul><ul><ul><li>one observation at a time is loaded into the program data vector </li></ul></ul><ul><ul><li>each observation is written to an output buffer when processing is complete </li></ul></ul><ul><ul><li>the contents of the output buffer are written to the disk when the buffer is full </li></ul></ul>11/13/09 SAS Techies 2009
  12. 12. <ul><li>options bufsize=30720 bufno=10;    </li></ul><ul><li>filename orders 'c:orders.dat';     </li></ul><ul><li>data company.orders_fact; </li></ul><ul><li>infile orders; </li></ul><ul><li><more SAS code>      run; </li></ul><ul><ul><li>choosing a page/buffer size that is larger than the default can speed up execution time by reducing the number of times that SAS must read from or write to the storage medium. </li></ul></ul><ul><ul><li>You can use the BUFNO= system or data set option to control the number of buffers that are available for reading or writing a SAS data set. By increasing the number of buffers, you can control how many pages of data are loaded into memory with each I/O transfer. </li></ul></ul><ul><ul><li>The product of BUFNO= and BUFSIZE=, rather than the specific value of either option, determines how much data can be transferred in one I/O operation. Increasing the value of either option increases the amount of data that can be transferred in one I/O operation. </li></ul></ul>11/13/09 SAS Techies 2009
  13. 13. <ul><li>sasfile company.sales load;       </li></ul><ul><li>proc print data=company.sales; </li></ul><ul><li>var Customer_Age_Group;      run; </li></ul><ul><li>proc tabulate data=company.sales;         </li></ul><ul><li>class Customer_Age_Group;         </li></ul><ul><li>var Customer_BirthDate;         </li></ul><ul><li>Table Customer_Age_Group,Customer_BirthDate*(mean median);   </li></ul><ul><li>run;      </li></ul><ul><li>sasfile company.sales close; </li></ul><ul><li>Another way of improving performance is to use the SASFILE statement to hold a SAS data file in memory so that the data is available to multiple program steps. Keeping the data file open reduces open/close operations, including the allocation and freeing of memory for buffers. </li></ul><ul><li>It is important to note that I/O processing is reduced only if there is sufficient real memory. If there is not sufficient real memory, the operating environment might </li></ul><ul><ul><li>use virtual memory </li></ul></ul><ul><ul><li>use the default number of buffers. </li></ul></ul><ul><ul><li>If SAS uses virtual memory, there might be a degradation in performance. </li></ul></ul>11/13/09 SAS Techies 2009
  14. 14. 11/13/09 SAS Techies 2009
  15. 15. 11/13/09 SAS Techies 2009
  16. 16. 11/13/09 SAS Techies 2009
  17. 17. <ul><li>Before you test the programming techniques, turn on the SAS system options that report resource usage. </li></ul><ul><li>Execute the code for each programming technique in a separate SAS session. </li></ul><ul><li>In each programming technique that you are testing, include only the SAS code that is essential for performing the task. </li></ul><ul><li>Run your benchmarking tests under the conditions in which your final program will run. </li></ul><ul><li>After testing is finished, consider turning off the options that report resource usage. </li></ul>11/13/09 SAS Techies 2009