Submit Search
Upload
Compressing Output Files in a SAS® job, on a UNIX platform
•
1 like
•
864 views
L
LJSomos
Follow
Compressing Output Files in a SAS® job, on a UNIX platform (6 slides/page)
Read less
Read more
Report
Share
Report
Share
1 of 19
Download now
Download to read offline
Recommended
Compressing Output Files in a SAS® job, on a UNIX platform
Compressing Output Files in a SAS® job, on a UNIX platform
LJSomos
Dpm Disaster Recovery Sonvu
Dpm Disaster Recovery Sonvu
vncson
How to send DNS over anything encrypted
How to send DNS over anything encrypted
Men and Mice
Intro to the Hadoop Stack @ April 2011 JavaMUG
Intro to the Hadoop Stack @ April 2011 JavaMUG
David Engfer
Dns
Dns
Patruni Chidananda Sastry
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Men and Mice
Sequential file programming patterns and performance with .net
Sequential file programming patterns and performance with .net
Michael Pavlovsky
Dns introduction
Dns introduction
sunil kumar
Recommended
Compressing Output Files in a SAS® job, on a UNIX platform
Compressing Output Files in a SAS® job, on a UNIX platform
LJSomos
Dpm Disaster Recovery Sonvu
Dpm Disaster Recovery Sonvu
vncson
How to send DNS over anything encrypted
How to send DNS over anything encrypted
Men and Mice
Intro to the Hadoop Stack @ April 2011 JavaMUG
Intro to the Hadoop Stack @ April 2011 JavaMUG
David Engfer
Dns
Dns
Patruni Chidananda Sastry
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Men and Mice
Sequential file programming patterns and performance with .net
Sequential file programming patterns and performance with .net
Michael Pavlovsky
Dns introduction
Dns introduction
sunil kumar
Whatistnsnames
Whatistnsnames
oracle documents
Ahmad-debian
Ahmad-debian
syaif-sae
Dns
Dns
deshvikas
Dns
Dns
ARYA TM
Build Dynamic DNS server from scratch in C (Part1)
Build Dynamic DNS server from scratch in C (Part1)
Yen-Kuan Wu
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
Young Pyo
DNS Server Configuration
DNS Server Configuration
chacheng oo
DNS - Domain Name System
DNS - Domain Name System
Peter R. Egli
The DNSSEC KSK of the root rolls
The DNSSEC KSK of the root rolls
Men and Mice
The History of DNS
The History of DNS
Michael McLean
Linux or unix interview questions
Linux or unix interview questions
Teja Bheemanapally
Basic IT 2 (General IT Knowledge-2)
Basic IT 2 (General IT Knowledge-2)
kholis_mjd
DNSSEC Tutorial; USENIX LISA 2013
DNSSEC Tutorial; USENIX LISA 2013
Shumon Huque
gcis-zenworks7.2
gcis-zenworks7.2
KARLY21
Domain Name System (DNS) - Domain Registration and Website Hosting Basics
Domain Name System (DNS) - Domain Registration and Website Hosting Basics
Asif Shahzad
Namespaces for Local Networks
Namespaces for Local Networks
Men and Mice
Part 2 - Local Name Resolution in Windows Networks
Part 2 - Local Name Resolution in Windows Networks
Men and Mice
What is a domain name system(dns)?
What is a domain name system(dns)?
Abhishek Mitra
Ccd
Ccd
Ulrich Krause
Linux test paper2
Linux test paper2
Ganesh Bhosale
File in cpp 2016
File in cpp 2016
Dr .Ahmed Tawwab
The age of rename() is over
The age of rename() is over
Steve Loughran
More Related Content
What's hot
Whatistnsnames
Whatistnsnames
oracle documents
Ahmad-debian
Ahmad-debian
syaif-sae
Dns
Dns
deshvikas
Dns
Dns
ARYA TM
Build Dynamic DNS server from scratch in C (Part1)
Build Dynamic DNS server from scratch in C (Part1)
Yen-Kuan Wu
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
Young Pyo
DNS Server Configuration
DNS Server Configuration
chacheng oo
DNS - Domain Name System
DNS - Domain Name System
Peter R. Egli
The DNSSEC KSK of the root rolls
The DNSSEC KSK of the root rolls
Men and Mice
The History of DNS
The History of DNS
Michael McLean
Linux or unix interview questions
Linux or unix interview questions
Teja Bheemanapally
Basic IT 2 (General IT Knowledge-2)
Basic IT 2 (General IT Knowledge-2)
kholis_mjd
DNSSEC Tutorial; USENIX LISA 2013
DNSSEC Tutorial; USENIX LISA 2013
Shumon Huque
gcis-zenworks7.2
gcis-zenworks7.2
KARLY21
Domain Name System (DNS) - Domain Registration and Website Hosting Basics
Domain Name System (DNS) - Domain Registration and Website Hosting Basics
Asif Shahzad
Namespaces for Local Networks
Namespaces for Local Networks
Men and Mice
Part 2 - Local Name Resolution in Windows Networks
Part 2 - Local Name Resolution in Windows Networks
Men and Mice
What is a domain name system(dns)?
What is a domain name system(dns)?
Abhishek Mitra
Ccd
Ccd
Ulrich Krause
Linux test paper2
Linux test paper2
Ganesh Bhosale
What's hot
(20)
Whatistnsnames
Whatistnsnames
Ahmad-debian
Ahmad-debian
Dns
Dns
Dns
Dns
Build Dynamic DNS server from scratch in C (Part1)
Build Dynamic DNS server from scratch in C (Part1)
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
DNS Server Configuration
DNS Server Configuration
DNS - Domain Name System
DNS - Domain Name System
The DNSSEC KSK of the root rolls
The DNSSEC KSK of the root rolls
The History of DNS
The History of DNS
Linux or unix interview questions
Linux or unix interview questions
Basic IT 2 (General IT Knowledge-2)
Basic IT 2 (General IT Knowledge-2)
DNSSEC Tutorial; USENIX LISA 2013
DNSSEC Tutorial; USENIX LISA 2013
gcis-zenworks7.2
gcis-zenworks7.2
Domain Name System (DNS) - Domain Registration and Website Hosting Basics
Domain Name System (DNS) - Domain Registration and Website Hosting Basics
Namespaces for Local Networks
Namespaces for Local Networks
Part 2 - Local Name Resolution in Windows Networks
Part 2 - Local Name Resolution in Windows Networks
What is a domain name system(dns)?
What is a domain name system(dns)?
Ccd
Ccd
Linux test paper2
Linux test paper2
Similar to Compressing Output Files in a SAS® job, on a UNIX platform
File in cpp 2016
File in cpp 2016
Dr .Ahmed Tawwab
The age of rename() is over
The age of rename() is over
Steve Loughran
168054408 cc1
168054408 cc1
homeworkping8
Jordan Hubbard Talk @ LISA
Jordan Hubbard Talk @ LISA
guest4c923d
Getting started with SIP Express Media Server SIP app server and SBC - workshop
Getting started with SIP Express Media Server SIP app server and SBC - workshop
stefansayer
Linux
Linux
afzal pa
Linux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of Technology
Nugroho Gito
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
Steve Loughran
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Massimo Cenci
Compact, Compress, De-Duplicate (DAOS)
Compact, Compress, De-Duplicate (DAOS)
Ulrich Krause
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
What does rename() do?
What does rename() do?
Steve Loughran
Ffsj
Ffsj
guestcfc63f
Gur1009
Gur1009
Cdiscount
pointer, structure ,union and intro to file handling
pointer, structure ,union and intro to file handling
Rai University
Digital Forensics
Digital Forensics
Oldsun
Dynamic Hadoop Clusters
Dynamic Hadoop Clusters
Steve Loughran
Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)
Abdullah khawar
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
Danairat Thanabodithammachari
Using an FTP client - Client server computing
Using an FTP client - Client server computing
lordmwesh
Similar to Compressing Output Files in a SAS® job, on a UNIX platform
(20)
File in cpp 2016
File in cpp 2016
The age of rename() is over
The age of rename() is over
168054408 cc1
168054408 cc1
Jordan Hubbard Talk @ LISA
Jordan Hubbard Talk @ LISA
Getting started with SIP Express Media Server SIP app server and SBC - workshop
Getting started with SIP Express Media Server SIP app server and SBC - workshop
Linux
Linux
Linux Survival Kit for Proof of Concept & Proof of Technology
Linux Survival Kit for Proof of Concept & Proof of Technology
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Compact, Compress, De-Duplicate (DAOS)
Compact, Compress, De-Duplicate (DAOS)
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
What does rename() do?
What does rename() do?
Ffsj
Ffsj
Gur1009
Gur1009
pointer, structure ,union and intro to file handling
pointer, structure ,union and intro to file handling
Digital Forensics
Digital Forensics
Dynamic Hadoop Clusters
Dynamic Hadoop Clusters
Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
Using an FTP client - Client server computing
Using an FTP client - Client server computing
Compressing Output Files in a SAS® job, on a UNIX platform
1.
Compressing Output Files
in a SAS® job, on a UNIX platform Leslie J. Somos Great Works Informatics, LLC 1 © 2008 by Great Works Informatics, LLC
2.
Original situation
UNIX MS Windows FTP email SAS file file 2 © 2008 by Great Works Informatics, LLC Done for multiple clients. No problem, until file sometimes is bigger for clients with more data. As file gets bigger, FTP from UNIX to MS Windows takes longer, and email administrators get unhappy.
3.
Original situation
UNIX MS Windows SAS big FTP big file file zip email zipped file 3 © 2008 by Great Works Informatics, LLC We (manually) zip the file to not annoy the email administrators. Everything is fine until we hit a wall, with one particular request for a multi-year extract, and run out of UNIX space to create the file. (Two million records, fixed length, record length 2000 bytes, the resulting file would have been ~4GBytes. Would have been -- if there had been enough UNIX space.) -- In each case, after the file is created, we don’t do anything further with it on UNIX, so it would not be a problem if we could somehow get SAS to write out an already- zipped file.
4.
Goal
UNIX MS Windows SAS FTP email zipped file zipped file 4 © 2008 by Great Works Informatics, LLC So this picture shows what we want to happen -- have SAS write an already- compressed file, which takes up less disk space and also takes less time to FTP from UNIX over to MS Windows. (4GBytes produces a zipped file ~74Mbytes. FTP from UNIX to MS Windows ~22 minutes.)
5.
Items we will
touch on • 'pipe' access method of FILENAME statement • 'nobs=' and 'point=' options of SET statement • compilation phase v. execution phase of DATA Step 5 © 2008 by Great Works Informatics, LLC
6.
Coding Conventions
• SAS keywords in lowercase • User-chosen words in uppercase • Spaces supplied even where not required by the syntax • Optional period after a macro variable always supplied 6 © 2008 by Great Works Informatics, LLC
7.
Original code
filename OUT 'BIG.TXT' lrecl=32760 ; ordinary file data _null_ ; set OURDATA ; file OUT ; put <field>…<field> ; run ; 7 © 2008 by Great Works Informatics, LLC Modify the "filename" statement to use an 'unnamed pipe' to write the data to the "compress" program, which then writes a compressed version to disk. The full-size file only ever exists on-the-fly, as it flows through the OUT fileref. Only the compressed version of the file ever exists on disk, so peak disk space usage is less.
8.
Modified code
filename OUT pipe 'compress > NOTSOBIG.Z' lrecl=32760 ; "unnamed pipe" Program compress reads its STDIN, receives data written to filename OUT; and writes to its STDOUT, which is redirected to NOTSOBIG.Z. data _null_ ; set OURDATA ; file OUT ; put <field>…<field> ; run ; 8 © 2008 by Great Works Informatics, LLC Modify the "filename" statement to use an 'unnamed pipe' to write the data to the "compress" program, which then writes a compressed version to disk. The full-size file only ever exists on-the-fly, as it flows through the OUT fileref. Only the compressed version of the file ever exists on disk, so peak disk space usage is less.
9.
Original code ::
Modified code filename OUT filename OUT pipe 'BIG.TXT' 'compress > NOTSOBIG.Z' lrecl=32760 ; lrecl=32760 ; ordinary file "unnamed pipe“ data _null_ ; data _null_ ; set OURDATA ; set OURDATA ; file OUT ; file OUT ; put <field>…<field> ; put <field>…<field> ; run ; run ; No changes to data step, only to filename. When you uncompress, supply name BIG.TXT. 0 9 © 2008 by Great Works Informatics, LLC A simple modification: No changes were necessary to the data step, only to the filename statement. Since the "compress" command only received a bytestream, it couldn't and didn't record any file name within the compressed file. When the file is subsequently uncompressed, the program will prompt for a file name to be supplied.
10.
Original code
filename OUT 'BIG.TXT' lrecl=32760 ; data _null_ ; set ONEDATA ; file OUT ; put <field>…<field> ; run ; data _null_ ; set TWODATA ; file OUT mod ; put <field>…<field> ; run ; 10 © 2008 by Great Works Informatics, LLC Slightly more complex code, closer to the actual original code -- multiple data steps write to the same output file. (Original code had six data steps, two are sufficient to demonstrate the issues we will discuss.) Each data step after the first one appends to the output file, by using the "mod" option of the file statement.
11.
Original code ::
Modified code filename OUT filename OUT pipe 'BIG.TXT' 'compress > NOTSOBIG.Z' lrecl=32760 ; lrecl=32760 ; data _null_ ; data _null_ ; set ONEDATA ; set ONEDATA ; file OUT ; file OUT ; put <field>…<field> ; put <field>…<field> ; run ; run ; data _null_ ; data _null_ ; set TWODATA ; set TWODATA ; file OUT mod ; file OUT mod ; put <field>…<field> ; put <field>…<field> ; run ; run ; No changes to data steps – problem! Program compress gets called a 2nd time, and overwrites NOTSOBIG.Z. 11 © 2008 by Great Works Informatics, LLC Only TWODATA is in the resulting file. If we follow the pattern we know, we modify just the "filename" statement. Each data step starts the "compress" program anew, and overwrites the previous content of the output file NOTSOBIG.Z. The "mod" option of the "file" statement has no effect here, when it refers to an unnamed pipe.
12.
Original code ::
Modified code filename OUT filename OUT pipe 'BIG.TXT' 'compress > NOTSOBIG.Z' lrecl=32760 ; lrecl=32760 ; data _null_ ; file OUT ; data _null_ ; set ONEDATA nobs=CONSTNOBS1 ; set ONEDATA ; do POINTVAR = 1 to CONSTNOBS1 ; file OUT ; set ONEDATA point=POINTVAR ; put <field>…<field> ; put <field>…<field> ; run ; end ; data _null_ ; set TWODATA nobs=CONSTNOBS2 ; set TWODATA ; do POINTVAR = 1 to CONSTNOBS2 ; file OUT mod ; set TWODATA point=POINTVAR ; put <field>…<field> ; put <field>…<field> ; run ; end ; stop ; /*<=REMEMBER!*/ run ; Only one data step, one call to compress. 12 © 2008 by Great Works Informatics, LLC Problem if ONEDATA has zero observations. Solution attempt: Combine the multiple data steps into one single data step, so there is only one invocation of the "compress" program. [When you take over from the normal SAS data cycle, you also have to include logic to stop execution.] Note -- the single variable "POINTVAR" is used in multiple loops with no problem. The two different "nobs=" variables in the two separate "set" statements must be different from each other, else one interferes with the other at compile time. Works in most cases, but: If any data set before the last has zero observations, execution stops when its "set" statement is executed, and the output file is incomplete.
13.
Original code ::
Modified code filename OUT filename OUT pipe 'BIG.TXT' 'compress > NOTSOBIG.Z' lrecl=32760 ; lrecl=32760 ; data _null_ ; file OUT ; if 0 then data _null_ ; set ONEDATA nobs=CONSTNOBS1 ; set ONEDATA ; do POINTVAR = 1 to CONSTNOBS1 ; file OUT ; set ONEDATA point=POINTVAR ; put <field>…<field> ; put <field>…<field> ; run ; end ; if 0 then data _null_ ; set TWODATA nobs=CONSTNOBS2 ; set TWODATA ; do POINTVAR = 1 to CONSTNOBS2 ; file OUT mod ; set TWODATA point=POINTVAR ; put <field>…<field> ; put <field>…<field> ; run ; end ; stop ; /*<=REMEMBER!*/ run ; 13 © 2008 by Great Works Informatics, LLC Works fine, looks a little cluttered. The "set" statements where the values of CONSTNOBS1 and CONSTNOBS2 are set have their effect at compile time, not at execution time. So, they don't ever have to be executed, they simply have to be present at compile time. They could be anywhere within the data step, they simply have to be present at compile time. As an alternative to prefixing each "set ... nobs=" statement with "if 0" or "if 1 = 2" etc., we could move them to after the "stop" statement.
14.
Original code ::
Modified code filename OUT filename OUT pipe 'BIG.TXT' 'compress > NOTSOBIG.Z' lrecl=32760 ; lrecl=32760 ; data _null_ ; file OUT ; data _null_ ; do POINTVAR = 1 to CONSTNOBS1 ; set ONEDATA ; set ONEDATA point=POINTVAR file OUT ; nobs=CONSTNOBS1 ; put <field>…<field> ; put <field>…<field> ; run ; end ; data _null_ ; do POINTVAR = 1 to CONSTNOBS2 ; set TWODATA ; set TWODATA point=POINTVAR file OUT mod ; nobs=CONSTNOBS2 ; put <field>…<field> ; put <field>…<field> ; run ; end ; stop ; /*<=REMEMBER!*/ run ; 14 © 2008 by Great Works Informatics, LLC The compile-time "nobs=" and the execution-time "point=" options of the "set" statement can both be present on a single "set" statement. And, if any data set before the last has zero observations, its "set" statement is not actually executed at run time because its "nobs=" variable is set to zero at compile time and that "do" loop reduces to "do POINTVAR = 1 to 0 ;" and execution does not enter that "do" loop.
15.
DATA Step --
compilation phase v. execution phase http://v8doc.sas.com SAS OnlineDoc . Base SAS Software . . SAS Language Reference: Concepts . . . DATA Step Concepts . . . . DATA Step Processing . . . . . Overview of DATA Step Processing Flow of Action When you submit a DATA step for execution, it is first compiled and then executed. . 15 © 2008 by Great Works Informatics, LLC
16.
SET
http://v8doc.sas.com SAS OnlineDoc . Base SAS Software . . SAS Language Reference: Dictionary . . . Dictionary of Language Elements . . . . Statements . . . . . SET NOBS=variable At compilation time, SAS reads the descriptor portion of each data set and assigns the value of the NOBS= variable automatically. Thus, you can refer to the NOBS= variable before the SET statement. POINT=variable POINT= causes the SET statement to use random (direct) access to read a SAS data set. CAUTION: Continuous loops can occur when you use the POINT= option. When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or both. 16 © 2008 by Great Works Informatics, LLC
17.
Reading from and
Writing to UNIX Commands (PIPE) http://v8doc.sas.com SAS OnlineDoc . Base SAS Software . . Host Specific Information . . . UNIX Environments (Companion) . . . . Running the SAS System Under UNIX . . . . . Using External Files and Devices . . . . . . Reading from and Writing to UNIX Commands (PIPE) FILENAME fileref PIPE 'UNIX-command' <options>; Under UNIX, you can use the FILENAME statement to assign filerefs not only to external files and I/O devices, but also to a pipe. Pipes enable your SAS application to receive input from any UNIX command that writes to standard output and to route output to any UNIX command that reads from standard input. . 17 © 2008 by Great Works Informatics, LLC
18.
Using Unnamed Pipes
(MS Windows) http://v8doc.sas.com SAS OnlineDoc . Base SAS Software . . Host Specific Information . . . Microsoft Windows Environment (Companion) . . . . Using SAS with Other Windows Applications . . . . . Using Unnamed and Named Pipes . . . . . . Using Unnamed Pipes FILENAME fileref PIPE 'program-name' option-list NOTE: The infile DIR is: Example: Unnamed Pipe Access Device, filename DIR pipe 'dir /?' ; PROCESS=dir /?,RECFM=V,LRECL=256 data _null_ ; Displays a list of files and subdirectories in a directory. infile DIR ; ... input ; NOTE: 38 records were read from the infile DIR. . The minimum record length was 0. put _infile_ ; The maximum record length was 75. run ; NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.03 seconds 18 © 2008 by Great Works Informatics, LLC Unfortunately, I couldn't find any compression program under MS Windows which would read a file to be compressed from its standard input.
19.
Questions?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 19 © 2008 by Great Works Informatics, LLC
Download now