SlideShare a Scribd company logo
1 of 37
To COMPRESS or Not, to COMPRESS or ZIP
David B. Horvath, CCP, MS
PhilaSUG Spring 2017 Meeting
2
To COMPRESS or Not, to COMPRESS or ZIP
The Author can be contacted at:
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
LinkedIn: https://www.linkedin.com/in/dbhorvath/ (will post presentation)
All trademarks and servicemarks are the
property of their respective owners.
Copyright © 2017, David B. Horvath, CCP — All Rights Reserved
3
Introductions
• My Background
• SAS Compress Basics
• SAS Compress Examples
• Operating System/Tool Compression
• Compression Comparison
• Taking Advantage of Parallelism – Piping
Abstract
• SAS supports both basic (Character) and advanced (binary) compression
• Operating systems and tools support additional compression.
• This session reviews the processing tradeoffs between uncompressed and
SAS-compressed datasets as well as dealing with operating system
compressed files and datasets.
• Is it better to process an uncompressed dataset or use SAS compression?
What are the factors that influence the decision to compress (or not)? What
are the considerations around applying operating system based compression
(for example, Winzip or UNIX zip or GNU gzip) to regular files and SAS
datasets? What are the tradeoffs? How can files in those formats be best
processed in SAS?
4
5
My Background
• Base SAS on Mainframe, UNIX, and PC Platforms
• SAS is primarily an ETL tool or Programming Language for me
• My background is IT – I am not a modeler
• Far from my first User Group presentation – presented sessions and
seminars in Australia, France, the US, and Canada.
• Undergraduate: Computer and Information Sciences, Temple Univ.
• Graduate: Organizational Dynamics, University of Pennsylvania
• Most of my career was in consulting (in-house last 11 years)
• Have written several books (none SAS-related, yet)
• Online Instructor for University of Phoenix covering IT topics.
• Currently working in Data Analytics for a regional bank
6
SAS Compress Basics
• Initially added with Version 6
• Initially only removed extra spaces from strings
• Significant improvements with Version 8
• Char or Yes: remove repeating blanks, characters, or numbers
• Binary: Char plus Compress Numeric Variables
• Silent improvements with Version 9:
• Much faster (less I/O) now that compression takes place “on the fly”
• Version 8 would create the initial file and then run the compression
• Which required yet another pass through the data and additional disk I/O
7
SAS Compress Basics
• Even with Version 9, compression can make your process run slower
• You are trading reduced storage space for increased CPU
• With some forms of compression, you can reduce I/O time
• Less data is being read
• I have seen this demonstrated with other tools
• SAS Compression seems to single threaded
• Same CPU that is performing your process is performing the compression
• SAS Compression may not be the most space efficient
• UNIX/Linux and Windows compression tools may save more space
• There will be increased code complexity to used those tools
• You may save elapsed time since they can run in a separate thread
8
SAS Compress Basics
• Compress=Yes
• Same as Compress=Char
• Compress=No
• Disables Compression even if options are set
• Compress=Binary
• Heaviest Compression, Highest CPU usage, Highest space savings
• Can also set via Options at system level, command line, in program,
or, as will be shown, within the dataset.
• Proc Options result for system I ran these on:
• COMPRESS=BINARY Specifies the type of compression to use for
observations in output SAS data sets.
9
SAS Compress – Simple Write Example
• An example to compare results:
libname test “/just/some/directory";
%macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data test.test_no (compress=no drop=text1-text44) test.test_yes (compress=yes drop=text1-text44)
test.test_char (compress=char drop=text1-text44) test.test_bin (compress=binary drop=text1-
text44);
array text[44] $20 ( /* 44 different words and phrases */));
format longstring $200. ;
DO indexvariable=1 TO 20000000;
word1=text[%RandBetween(1,44)];
num1=%RandBetween(1,9999999999);
word2=text[%RandBetween(1,44)];
num2=rand("uniform");
word3=text[%RandBetween(1,44)];
word4=text[%RandBetween(1,44)];
num3=%RandBetween(1,9999999999);
word5=text[%RandBetween(1,44)];
num4=rand("uniform");
num5=%RandBetween(1,9999999999);
word6=text[%RandBetween(1,44)];
num6=rand("uniform");
stringlength=%RandBetween(1,179); /* build a random length string */
longstring=trim(text[%RandBetween(1,44)]);
do while (length(longstring) < stringlength);
longstring=trim(longstring)||" " || text[%RandBetween(1,44)];
end;
num7=%RandBetween(1,9999999999);
word7=text[%RandBetween(1,44)];
output test.test_no; output test.test_yes; output test.test_char; output test.test_bin;
END;
run;
10
SAS Compress – Simple Write Example
• Individual File Size Results:
11:38:58 test_bin.sas7bdat.lck 4907139072
11:38:58 test_char.sas7bdat.lck 5317066752
11:38:58 test_no.sas7bdat.lck 8326414336
11:38:58 test_yes.sas7bdat.lck 5317066752
11:38:59 test_bin.sas7bdat.lck 4914216960
11:38:59 test_char.sas7bdat.lck 5324668928
11:38:59 test_no.sas7bdat.lck 8338407424
11:38:59 test_yes.sas7bdat.lck 5324734464
11:39:00 test_bin.sas7bdat 4920377344
11:39:00 test_char.sas7bdat 5331353600
11:39:00 test_no.sas7bdat 8348631040
11:39:00 test_yes.sas7bdat 5331353600
11:39:01 test_bin.sas7bdat 4,920,377,344
11:39:01 test_char.sas7bdat 5,331,353,600
11:39:01 test_no.sas7bdat 8,348,631,040
11:39:01 test_yes.sas7bdat 5,331,353,600
• We can see that the files grow together – compression is no longer a
separate step
11
SAS Compress – Simple Write Example
• Individual File Results:
NOTE: The data set TEST.TEST_NO has 20000000 observations and 17 variables.
NOTE: The data set TEST.TEST_YES has 20000000 observations and 17 variables.
NOTE: Compressing data set TEST.TEST_YES decreased size by 36.14 percent.
Compressed is 81349 pages; un-compressed would require 127389 pages.
NOTE: The data set TEST.TEST_CHAR has 20000000 observations and 17 variables.
NOTE: Compressing data set TEST.TEST_CHAR decreased size by 36.14 percent.
Compressed is 81349 pages; un-compressed would require 127389 pages.
NOTE: The data set TEST.TEST_BIN has 20000000 observations and 17 variables.
NOTE: Compressing data set TEST.TEST_BIN decreased size by 41.06 percent.
Compressed is 75078 pages; un-compressed would require 127389 pages.
NOTE: DATA statement used (Total process time):
real time 8:22.39
user cpu time 2:52.89
system cpu time 26.94 seconds
memory 1516.40k
OS Memory 21152.00k
Timestamp 04/17/2017 12:05:00 PM
Step Count 265 Switch Count 222
Page Faults 0
Page Reclaims 426
Page Swaps 0
Voluntary Context Switches 623546
Involuntary Context Switches 128208
Block Input Operations 0
Block Output Operations 0
12
SAS Compress – A Warning
• With small files, compress can make the file larger
• In this case, running the example code for only 20 observations:
Size File
131,072 test_no.sas7bdat
196,608 test_yes.sas7bdat
196,608 test_char.sas7bdat
196,608 test_bin.sas7bdat
• Even without actual compression, the file size is larger
• SAS Warns you in the log with a NOTE:
NOTE: The data set TEST.TEST_NO has 20 observations and 17 variables.
NOTE: The data set TEST.TEST_YES has 20 observations and 17 variables.
NOTE: Compressing data set TEST.TEST_YES increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: The data set TEST.TEST_CHAR has 20 observations and 17 variables.
NOTE: Compressing data set TEST.TEST_CHAR increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: The data set TEST.TEST_BIN has 20 observations and 17 variables.
NOTE: Compressing data set TEST.TEST_BIN increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
•
13
SAS Compress – Read Example
• Read Times will Vary based on compression method
• In each case, the read code is the same except for the input table
• Uncompressed Read (baseline):
libname test “/just/some/directory";
data _null_;
set test.test_no; /* Different datasets for each test */
retain total 0;
total=total+num1;
run;
NOTE: There were 20000000 observations read from the data set TEST.TEST_NO.
NOTE: DATA statement used (Total process time):
real time 4.99 seconds
user cpu time 1.22 seconds
system cpu time 3.43 seconds
memory 920.25k
OS Memory 21152.00k
14
SAS Compress – Read Example
• Compress=Char and Compress=Yes produced similar results:
NOTE: There were 20000000 observations read from the data
set TEST.TEST_YES.
NOTE: DATA statement used (Total process time):
real time 12.56 seconds
user cpu time 9.93 seconds
system cpu time 2.52 seconds
memory 1137.56k
OS Memory 21152.00k
• Compress=Binary used more resources:
NOTE: There were 20000000 observations read from the data
set TEST.TEST_BIN.
NOTE: DATA statement used (Total process time):
real time 24.18 seconds
user cpu time 21.75 seconds
system cpu time 2.25 seconds
memory 1151.34k
OS Memory 21152.00k
15
SAS Compress – Read Example
• A quick comparison:
Example Elapsed System User Memory
None 4.99 sec 3.43 sec 1.22 sec 920.25k
Yes 12.56 sec 2.52 sec 9.93 sec 1137.56k
Char 12.68 sec 2.44 sec 10.11 sec 1137.25k
Binary 24.18 sec 2.25 sec 21.75 sec 1151.34k
16
gzip Compression
• GNU Zip (gzip and gunzip) commands
• Are available on most systems including UNIX, Windows, and Linux (by default).
• WinZip is available under Windows (and can be read by gzip)
• Some UNIX zip can read WinZip files
• Significant improvement in space usage:
• Strangely enough, you get less compression on files SAS has already
compressed
size before size after: gzip fastest size after: gzip default size after: gzip max
test_bin 4,920,377,344 3,053,723,102 2,794,141,358 2,780,018,371
test_char 5,331,353,600 2,036,590,374 1,814,911,246 1,796,243,109
test_no 8,348,631,040 2,120,174,601 1,758,621,239 1,737,218,569
test_yes 5,331,353,600 2,036,590,374 1,814,911,246 1,796,264,146
17
gzip Compression
• There Ain’t No Such Thing As A Free Lunch (TANSTAAFL: Robert A.
Heinlein)
• The space savings comes at a cost:
• And a significant cost in elapsed time:
• But there are ways to reduce these costs
Zip fastest ET Unzip fastest ET Zip default ET Unzip default ET Zip max ET Unzip Max ET
test_bin 04:04.2 02:48.0 07:56.2 02:11.5 15:26.2 02:14.2
test_char 02:41.8 02:44.7 06:03.5 02:04.8 12:00.2 02:09.1
test_no 03:04.3 03:44.2 06:13.3 02:56.9 14:06.9 03:07.9
test_yes 02:44.4 02:40.7 06:10.7 02:08.7 11:25.5 02:11.1
Zip fastest
CPU
Unzip fastest
CPU
Zip default
CPU
Unzip default
CPU
Zip max
CPU
Unzip Max
CPU Average
test_bin 143.7 59.2 358.7 52.1 803.7 51.8 244.8
test_char 92.0 46.6 281.2 43.0 627.7 43.0 188.9
test_no 108.3 63.5 293.2 55.0 755.4 54.2 221.6
test_yes 92.4 46.4 281.5 43.4 592.5 43.0 183.2
Average 109.1 53.9 303.6 48.4 694.8 48.0
18
Compression Comparison
• Compression in any form makes sense when:
• Space is at a premium (just about always)
• File sizes are large
• Processing cost is high (data isn't just being read and reported)
• SAS Compression makes more sense when:
• Processing time is important
• Want simplicity of code
• Want immediate access to data
• gzip makes sense when:
• File is infrequently used – especially when it is kept because you're afraid to get rid
of it (or regulatory requirements)
• Maximum space savings is important
• File sizes are really large
19
Taking Advantage of Parallelism – Piping
• You can take advantage of multiple CPU/cores to process
compressed data through the use of Pipes.
• SAS supports piping natively for flat files
• SAS requires operating system support for "named pipes"
• Makes use of the "Sequential Data Engine" – often referred to as the
"TAPE" engine.
• You can only write one dataset to it
• You can only read once
• proc contents information limited (no 'NOBS' for instance)
• You can't do both at the same time
20
Taking Advantage of Parallelism – Piping
• Let's start with an example – minor changes to the earlier
Compression Write:
libname test "/just/some/directory/base_no_fifo";
/* In UNIX Command Line, execute: mknod /just/some/directory/base_no_fifo p */
%macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform")))
%mend;
X "gzip < /just/some/directory/base_no_fifo > /just/some/directory/base6_no_via_fifo.sas7bdat.gz &";
data test.test_no (compress=no drop=text1-text44) ;
array text[44] $20 (/* list of 44 words or phrases */);
format longstring $200. ;
DO indexvariable=1 TO 20000000;
/* Nothing changed here */
output test.test_no; /* Only creating one this time */
END;
run;
/* These will not work; I'll explain why!
proc print data=test.test_no (obs=10);
run;
proc contents data=test.test_no; run;
*/
21
Taking Advantage of Parallelism – Piping
• Minor changes to the earlier Compression Read example:
libname test "/just/some/directory/base_no_fifo";
/* In UNIX Command Line, execute: mknod /just/some/directory/base_no_fifo p */
X "gunzip --stdout /just/some/directory/base6_no_via_fifo.sas7bdat.gz >
/just/some/directory/base_no_fifo &";
data _null_;
set test.test_no;
retain total 0;
total=total+num1;
run;
22
Taking Advantage of Parallelism – Piping
• Timing Results:
• I've included a Direct Read for comparison purposes
• Note that SAS does not report the gzip/gunzip CPU usage
• Separate Process
• Separate CPU/Core/Thread
• There are times you can get a "nearly free" lunch.
zip CPU
unzip
CPU Zip ET Unzip ET
pipe zip
CPU
pipe unzip
CPU
pipe zip
ET
pipe unzip
ET File Size
gzip Max 755.40 54.20 14:06.9 03:07.9 61.22 5.00 10:33.0 48.14 1,737,218,569
gzip Default 293.20 55.00 06:13.0 02:57.0 59.20 5.11 04:50.9 01:01.9 1,758,621,239
gzip Min 108.30 63.50 03:04.3 03:44.0 59.35 5.12 03:49.0 58.03 2,120,174,601
cat 64.23 5.09 01:34.0 11.03 8,348,631,040
Direct Read 61.08 4.65 02:44.5 4.99 8,348,631,040
23
Taking Advantage of Parallelism – Piping
• What are Pipes?
• Very similar to the water pipes in your home
• There is a pump and faucet
• You are able to pick the direction
• Data can only flow one way at a time
• Data can only flow when the pipe program is executing
• There is a creator and consumer
• In the Write Example, SAS is the pump, gzip is the faucet
• In the Read Example, gzip is the pump, SAS is the faucet
• Data is not stored in the pipe itself
• May be a bit buffered on disk or may entirely be in memory
• Won't typically cross networks
24
Taking Advantage of Parallelism – Piping
• What are Pipes?
• Requires an entry on disk
• Created via the mknod (make node) or mkfifo (make first-in first-out):
mknod /just/some/directory/base_no_fifo p
mkfifo /just/some/directory/base_no_fifo
• Pipes (the infrastructure) remain around unless removed
• Disk entry will look like (using ls -al command)
prw-rw-r-- 1 MYID my_group_name 0 Apr 02 09:48 base_no_fifo
• "p" tells you this is a Pipe
• "0" tells you it isn't holding any data
• You can also run the external command in a script or by hand
• Useful if X Command not allowed
• Will not work in Grid environment
25
Taking Advantage of Parallelism – Piping
• Why won't they work?
• In the Pipe Compression Write I included:
/* These will not work; I'll explain why!
proc print data=test.test_no (obs=10); run;
proc contents data=test.test_no; run;
*/
• In the program, Libname test is a pipe.
• Data flowed through that pipe, and having flowed, is no longer available.
• At least not in this context
• The data is still available on the disk (written out by gzip)
• But not to this program unless we reprime, and in this case, reverse the pump:
X "gunzip --stdout /just/some/directory/base6_no_via_fifo.sas7bdat.gz > /just/some/directory/base_no_fifo &";
proc print data=test.test_no (obs=10); run;
X "gunzip --stdout /just/some/directory/base6_no_via_fifo.sas7bdat.gz > /just/some/directory/base_no_fifo &";
proc contents data=test.test_no; run;
26
Taking Advantage of Parallelism – Piping
• Common Error:
• Attempting to write multiple datasets to (or read multiple from) a sequential
library
output test.test_no test.test_yes test.test_char test.test_bin;
• Will result in an error:
ERROR: Attempt to open two sequential members in the same sequential library. File TEST.TEST_YES.DATA cannot be opened.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set TEST.TEST_NO may be incomplete. When this step was stopped there were 0 observations and 17
variables.
27
Taking Advantage of Parallelism – Piping
• External Command Example – Write:
• UNIX/Linux commands:
mknod mypipe p
gzip mypipe > input.gz & /* runs in background/parallel */
sas writepipe.sas
• writepipe.sas Program:
libname test "mypipe";
%macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform")))
%mend;
/* X command removed */
data test.test_no (compress=no drop=text1-text44) ;
array text[44] $20 (/* list of 44 words or phrases */);
format longstring $200. ;
DO indexvariable=1 TO 20000000;
/* Nothing changed here */
output test.test_no;
END;
run;
28
Taking Advantage of Parallelism – Piping
• External Command Example – Read:
• UNIX/Linux commands:
mknod mypipe p /* not needed if created before)
gzip –-stdout input.gz > mypipe & /* runs in background/parallel */
sas readpipe.sas
• readpipe.sas Program:
libname test "mypipe";
/* X command removed */
data _null_;
set test.test_no;
retain total 0;
total=total+num1;
run;
29
Taking Advantage of Parallelism – Piping
• No real timing differences between external and internal (X) command
approaches
• Minor Advantages for External Commands:
• Can trap errors within the gzip command
• Missing file for instance
• Control at the shell level
• Same SAS program able to work for different files
• Minor Disadvantages for External Commands:
• Increased code complexity
• Both SAS and UNIX/Linux code required
• Major Disadvantage for External Commands:
• External command difficult to implement in Grid environment
30
Personal Note
• I seem to learn quite a lot when working on presentations, new
classes, and writings
• It wasn’t until I was gathering data for this presentation that:
• I realized that SAS Compression had gotten smarter (rather than
processing the file again).
• I found that separate (external) commands would not work with pipes on a
Grid. I should've realized that since that command is running on my local
(login) machine while the SAS code runs anywhere on the Grid. Although
the Pipe was on shared storage, the data movement was in memory only.
• In any commands in this presentation, the single and double quotation
marks should be simple, not the “smart quotes” forced my Microsoft.
The same applies to dashes or minus signs – they should not be “em
dashes” (- versus –)
31
Wrap Up
Questions
and
Answers
?! ?!
?! ?!
?
? ?
?
!
!
!
!
32
Filename Piping
• If we have some extra time...
• It is possible to process INFILE or FILE with pipes
• Much like process with set or data
• Can be used with Internal or External commands
• SAS also supports the PIPE keyword on the FILENAME statement to
allow piping in/out data:
• FILENAME fileref PIPE 'UNIX-command' <options>;
• Your INFILE or FILE command will include the fileref. Whatever you
INPUT or PUT in that data step will involve the specified UNIX
command.
33
Filename Piping
• A Writing Example (should look fairly familiar by now):
filename testref PIPE "cat > /just/some/directory/output.txt";
%macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data _null_;
file testref;
array text[44] $20 (/* 44 words and phrases */);
format longstring $200. ;
DO indexvariable=1 TO 200;
word1=text[%RandBetween(1,44)];
num1=%RandBetween(1,9999999999);
word2=text[%RandBetween(1,44)];
num2=rand("uniform");
word3=text[%RandBetween(1,44)];
word4=text[%RandBetween(1,44)];
num3=%RandBetween(1,9999999999);
word5=text[%RandBetween(1,44)];
num4=rand("uniform");
num5=%RandBetween(1,9999999999);
word6=text[%RandBetween(1,44)];
num6=rand("uniform");
stringlength=%RandBetween(1,179);
longstring=trim(text[%RandBetween(1,44)]);
do while (length(longstring) < stringlength);
longstring=trim(longstring)||" " || text[%RandBetween(1,44)];
end;
num7=%RandBetween(1,9999999999);
word7=text[%RandBetween(1,44)];
put word1 num1 word2 num2 longstring;
END;
run;
34
Filename Piping
• A Reading Example (should look fairly familiar by now):
filename testref PIPE "cat /just/some/directory/output.txt";
data out;
infile testref;
input name $;
run;
proc print data=work.out (obs=10); run;
• Produces the following
Obsname
1with commas 63344454
2and enclose 58066050
3or double 882972945
4of an array 97957098
5To do 368188872 init
6and enclose 19271463
7and enclose 90992099
8or spaces 8165156291
9with commas 42546153
10or spaces 96397033 i
35
Compression References
• NOTES
• Indexing and Compressing SAS® Data Sets:
http://www2.sas.com/proceedings/sugi28/003-28.pdf
• SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#
a001288760.htm
• Programming Tricks For Reducing Storage And Work Space:
http://www2.sas.com/proceedings/sugi27/p023-27.pdf
• How to Reduce the Disk Space Required by a SAS® Data Set:
http://www.lexjansen.com/nesug/nesug06/io/io18.pdf
• Accessing Sequential-Format Data Libraries (pipes):
http://technology.msb.edu/old/training/statistics/sas/books/unix/z0386494.htm
• Smokin’ With UNIX Pipes (FILENAME):
http://www2.sas.com/proceedings/sugi25/25/cc/25p103.pdf
• SAS® 9.4 Companion for UNIX Environments, Sixth Edition (X command):
http://support.sas.com/documentation/cdl/en/hostunx/69602/PDF/default/hostunx.pd
f
• Using SAS with Pipes or as a Filter under UNIX:
https://www.linkedin.com/pulse/using-sas-pipes-filter-under-unix-david-
horvath?published=t
36
To COMPRESS or Not, to COMPRESS or ZIP
The Author can be contacted at:
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
LinkedIn: https://www.linkedin.com/in/dbhorvath/ (will post presentation)
All trademarks and servicemarks are the
property of their respective owners.
Copyright © 2017, David B. Horvath, CCP — All Rights Reserved
37
Compression References
• My Word/Phrase array:
array text[44] $20 ('For some' 'applications' 'it can be'
'beneficial' 'to assign' 'initial' 'values to the' 'variables or'
'elements' 'of an array' 'at the' 'time that' 'the array' 'is defined'
'To do' 'this' 'enclose' 'the initial' 'values in' 'parentheses'
'at the end' 'of the' 'ARRAY' 'statement' 'Separate' 'the values'
'either' 'with commas' 'or spaces' 'and enclose' 'character'
'values in' 'either single' 'or double' 'quotation' 'marks'
'The following' 'statements' 'illustrate' 'the' 'initialization'
'of numeric' 'and' 'character values');

More Related Content

What's hot

Dpm.2007.For.Sql Sonvu
Dpm.2007.For.Sql SonvuDpm.2007.For.Sql Sonvu
Dpm.2007.For.Sql Sonvuvncson
 
Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational PostgresEDB
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyFranck Pachot
 
Introduction to PgBench
Introduction to PgBenchIntroduction to PgBench
Introduction to PgBenchJoshua Drake
 
Oracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BIOracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BIFranck Pachot
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...Franck Pachot
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum
 
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSPractical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSCuneyt Goksu
 
Design Tradeoffs for SSD Performance
Design Tradeoffs for SSD PerformanceDesign Tradeoffs for SSD Performance
Design Tradeoffs for SSD Performancejimmytruong
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradminScott Miao
 
SSAS Reference Architecture
SSAS Reference ArchitectureSSAS Reference Architecture
SSAS Reference ArchitectureMarcel Franke
 
DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4Pranav Prakash
 
Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008paulguerin
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Chris Adkin
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeChris Adkin
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』Insight Technology, Inc.
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeTim Callaghan
 
DB2UDB_the_Basics Day 6
DB2UDB_the_Basics Day 6DB2UDB_the_Basics Day 6
DB2UDB_the_Basics Day 6Pranav Prakash
 

What's hot (20)

Dpm.2007.For.Sql Sonvu
Dpm.2007.For.Sql SonvuDpm.2007.For.Sql Sonvu
Dpm.2007.For.Sql Sonvu
 
Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easy
 
Introduction to PgBench
Introduction to PgBenchIntroduction to PgBench
Introduction to PgBench
 
Oracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BIOracle in-Memory Column Store for BI
Oracle in-Memory Column Store for BI
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
 
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSPractical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
 
Design Tradeoffs for SSD Performance
Design Tradeoffs for SSD PerformanceDesign Tradeoffs for SSD Performance
Design Tradeoffs for SSD Performance
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
 
SSAS Reference Architecture
SSAS Reference ArchitectureSSAS Reference Architecture
SSAS Reference Architecture
 
DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4
 
Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
 
Measuring Firebird Disk I/O
Measuring Firebird Disk I/OMeasuring Firebird Disk I/O
Measuring Firebird Disk I/O
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
DB2UDB_the_Basics Day 6
DB2UDB_the_Basics Day 6DB2UDB_the_Basics Day 6
DB2UDB_the_Basics Day 6
 

Similar to 20170419 To COMPRESS or Not, to COMPRESS or ZIP

202110 SESUG 43 To Compress or not
202110 SESUG 43 To Compress or not202110 SESUG 43 To Compress or not
202110 SESUG 43 To Compress or notdhorvath
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
 
CollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes ClientCollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes ClientChristoph Adler
 
Sql server troubleshooting
Sql server troubleshootingSql server troubleshooting
Sql server troubleshootingNathan Winters
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 
Performance Tuning
Performance TuningPerformance Tuning
Performance TuningJannet Peetz
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Antonios Chatzipavlis
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01FaisalMashood
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdfhania80
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
 
Optimize SQL server performance for SharePoint
Optimize SQL server performance for SharePointOptimize SQL server performance for SharePoint
Optimize SQL server performance for SharePointserge luca
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsLetsConnect
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration StrategiesMurilo Miranda
 
WMS Performance Shootout 2010
WMS Performance Shootout 2010WMS Performance Shootout 2010
WMS Performance Shootout 2010Jeff McKenna
 
Espc17 make your share point fly by tuning and optimising sql server
Espc17 make your share point  fly by tuning and optimising sql serverEspc17 make your share point  fly by tuning and optimising sql server
Espc17 make your share point fly by tuning and optimising sql serverIsabelle Van Campenhoudt
 
Make your SharePoint fly by tuning and optimizing SQL Server
Make your SharePoint  fly by tuning and optimizing SQL ServerMake your SharePoint  fly by tuning and optimizing SQL Server
Make your SharePoint fly by tuning and optimizing SQL Serverserge luca
 

Similar to 20170419 To COMPRESS or Not, to COMPRESS or ZIP (20)

202110 SESUG 43 To Compress or not
202110 SESUG 43 To Compress or not202110 SESUG 43 To Compress or not
202110 SESUG 43 To Compress or not
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
CollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes ClientCollabSphere 2019 - Dirty Secrets of the Notes Client
CollabSphere 2019 - Dirty Secrets of the Notes Client
 
Dba tuning
Dba tuningDba tuning
Dba tuning
 
Sql server troubleshooting
Sql server troubleshootingSql server troubleshooting
Sql server troubleshooting
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
pm1
pm1pm1
pm1
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdf
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Optimize SQL server performance for SharePoint
Optimize SQL server performance for SharePointOptimize SQL server performance for SharePoint
Optimize SQL server performance for SharePoint
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration Strategies
 
WMS Performance Shootout 2010
WMS Performance Shootout 2010WMS Performance Shootout 2010
WMS Performance Shootout 2010
 
Espc17 make your share point fly by tuning and optimising sql server
Espc17 make your share point  fly by tuning and optimising sql serverEspc17 make your share point  fly by tuning and optimising sql server
Espc17 make your share point fly by tuning and optimising sql server
 
Make your SharePoint fly by tuning and optimizing SQL Server
Make your SharePoint  fly by tuning and optimizing SQL ServerMake your SharePoint  fly by tuning and optimizing SQL Server
Make your SharePoint fly by tuning and optimizing SQL Server
 

More from David Horvath

20190413 zen and the art of programming
20190413 zen and the art of programming20190413 zen and the art of programming
20190413 zen and the art of programmingDavid Horvath
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and TricksDavid Horvath
 
20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigueDavid Horvath
 
20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml pptDavid Horvath
 
20180324 leveraging unix tools
20180324 leveraging unix tools20180324 leveraging unix tools
20180324 leveraging unix toolsDavid Horvath
 
20180324 zen and the art of programming
20180324 zen and the art of programming20180324 zen and the art of programming
20180324 zen and the art of programmingDavid Horvath
 
20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solvingDavid Horvath
 
20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import pptDavid Horvath
 
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"David Horvath
 
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...David Horvath
 
20150312 NOBS for Noobs
20150312 NOBS for Noobs20150312 NOBS for Noobs
20150312 NOBS for NoobsDavid Horvath
 
20140612 phila sug proc import
20140612 phila sug proc import20140612 phila sug proc import
20140612 phila sug proc importDavid Horvath
 

More from David Horvath (12)

20190413 zen and the art of programming
20190413 zen and the art of programming20190413 zen and the art of programming
20190413 zen and the art of programming
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks
 
20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue
 
20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt
 
20180324 leveraging unix tools
20180324 leveraging unix tools20180324 leveraging unix tools
20180324 leveraging unix tools
 
20180324 zen and the art of programming
20180324 zen and the art of programming20180324 zen and the art of programming
20180324 zen and the art of programming
 
20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving
 
20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt
 
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
 
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
 
20150312 NOBS for Noobs
20150312 NOBS for Noobs20150312 NOBS for Noobs
20150312 NOBS for Noobs
 
20140612 phila sug proc import
20140612 phila sug proc import20140612 phila sug proc import
20140612 phila sug proc import
 

Recently uploaded

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Recently uploaded (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

20170419 To COMPRESS or Not, to COMPRESS or ZIP

  • 1. To COMPRESS or Not, to COMPRESS or ZIP David B. Horvath, CCP, MS PhilaSUG Spring 2017 Meeting
  • 2. 2 To COMPRESS or Not, to COMPRESS or ZIP The Author can be contacted at: 504 Longbotham Drive, Aston PA 19014-2502, USA Phone: 1-610-859-8826 Email: dhorvath@cobs.com Web: http://www.cobs.com/ LinkedIn: https://www.linkedin.com/in/dbhorvath/ (will post presentation) All trademarks and servicemarks are the property of their respective owners. Copyright © 2017, David B. Horvath, CCP — All Rights Reserved
  • 3. 3 Introductions • My Background • SAS Compress Basics • SAS Compress Examples • Operating System/Tool Compression • Compression Comparison • Taking Advantage of Parallelism – Piping
  • 4. Abstract • SAS supports both basic (Character) and advanced (binary) compression • Operating systems and tools support additional compression. • This session reviews the processing tradeoffs between uncompressed and SAS-compressed datasets as well as dealing with operating system compressed files and datasets. • Is it better to process an uncompressed dataset or use SAS compression? What are the factors that influence the decision to compress (or not)? What are the considerations around applying operating system based compression (for example, Winzip or UNIX zip or GNU gzip) to regular files and SAS datasets? What are the tradeoffs? How can files in those formats be best processed in SAS? 4
  • 5. 5 My Background • Base SAS on Mainframe, UNIX, and PC Platforms • SAS is primarily an ETL tool or Programming Language for me • My background is IT – I am not a modeler • Far from my first User Group presentation – presented sessions and seminars in Australia, France, the US, and Canada. • Undergraduate: Computer and Information Sciences, Temple Univ. • Graduate: Organizational Dynamics, University of Pennsylvania • Most of my career was in consulting (in-house last 11 years) • Have written several books (none SAS-related, yet) • Online Instructor for University of Phoenix covering IT topics. • Currently working in Data Analytics for a regional bank
  • 6. 6 SAS Compress Basics • Initially added with Version 6 • Initially only removed extra spaces from strings • Significant improvements with Version 8 • Char or Yes: remove repeating blanks, characters, or numbers • Binary: Char plus Compress Numeric Variables • Silent improvements with Version 9: • Much faster (less I/O) now that compression takes place “on the fly” • Version 8 would create the initial file and then run the compression • Which required yet another pass through the data and additional disk I/O
  • 7. 7 SAS Compress Basics • Even with Version 9, compression can make your process run slower • You are trading reduced storage space for increased CPU • With some forms of compression, you can reduce I/O time • Less data is being read • I have seen this demonstrated with other tools • SAS Compression seems to single threaded • Same CPU that is performing your process is performing the compression • SAS Compression may not be the most space efficient • UNIX/Linux and Windows compression tools may save more space • There will be increased code complexity to used those tools • You may save elapsed time since they can run in a separate thread
  • 8. 8 SAS Compress Basics • Compress=Yes • Same as Compress=Char • Compress=No • Disables Compression even if options are set • Compress=Binary • Heaviest Compression, Highest CPU usage, Highest space savings • Can also set via Options at system level, command line, in program, or, as will be shown, within the dataset. • Proc Options result for system I ran these on: • COMPRESS=BINARY Specifies the type of compression to use for observations in output SAS data sets.
  • 9. 9 SAS Compress – Simple Write Example • An example to compare results: libname test “/just/some/directory"; %macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform"))) %mend; data test.test_no (compress=no drop=text1-text44) test.test_yes (compress=yes drop=text1-text44) test.test_char (compress=char drop=text1-text44) test.test_bin (compress=binary drop=text1- text44); array text[44] $20 ( /* 44 different words and phrases */)); format longstring $200. ; DO indexvariable=1 TO 20000000; word1=text[%RandBetween(1,44)]; num1=%RandBetween(1,9999999999); word2=text[%RandBetween(1,44)]; num2=rand("uniform"); word3=text[%RandBetween(1,44)]; word4=text[%RandBetween(1,44)]; num3=%RandBetween(1,9999999999); word5=text[%RandBetween(1,44)]; num4=rand("uniform"); num5=%RandBetween(1,9999999999); word6=text[%RandBetween(1,44)]; num6=rand("uniform"); stringlength=%RandBetween(1,179); /* build a random length string */ longstring=trim(text[%RandBetween(1,44)]); do while (length(longstring) < stringlength); longstring=trim(longstring)||" " || text[%RandBetween(1,44)]; end; num7=%RandBetween(1,9999999999); word7=text[%RandBetween(1,44)]; output test.test_no; output test.test_yes; output test.test_char; output test.test_bin; END; run;
  • 10. 10 SAS Compress – Simple Write Example • Individual File Size Results: 11:38:58 test_bin.sas7bdat.lck 4907139072 11:38:58 test_char.sas7bdat.lck 5317066752 11:38:58 test_no.sas7bdat.lck 8326414336 11:38:58 test_yes.sas7bdat.lck 5317066752 11:38:59 test_bin.sas7bdat.lck 4914216960 11:38:59 test_char.sas7bdat.lck 5324668928 11:38:59 test_no.sas7bdat.lck 8338407424 11:38:59 test_yes.sas7bdat.lck 5324734464 11:39:00 test_bin.sas7bdat 4920377344 11:39:00 test_char.sas7bdat 5331353600 11:39:00 test_no.sas7bdat 8348631040 11:39:00 test_yes.sas7bdat 5331353600 11:39:01 test_bin.sas7bdat 4,920,377,344 11:39:01 test_char.sas7bdat 5,331,353,600 11:39:01 test_no.sas7bdat 8,348,631,040 11:39:01 test_yes.sas7bdat 5,331,353,600 • We can see that the files grow together – compression is no longer a separate step
  • 11. 11 SAS Compress – Simple Write Example • Individual File Results: NOTE: The data set TEST.TEST_NO has 20000000 observations and 17 variables. NOTE: The data set TEST.TEST_YES has 20000000 observations and 17 variables. NOTE: Compressing data set TEST.TEST_YES decreased size by 36.14 percent. Compressed is 81349 pages; un-compressed would require 127389 pages. NOTE: The data set TEST.TEST_CHAR has 20000000 observations and 17 variables. NOTE: Compressing data set TEST.TEST_CHAR decreased size by 36.14 percent. Compressed is 81349 pages; un-compressed would require 127389 pages. NOTE: The data set TEST.TEST_BIN has 20000000 observations and 17 variables. NOTE: Compressing data set TEST.TEST_BIN decreased size by 41.06 percent. Compressed is 75078 pages; un-compressed would require 127389 pages. NOTE: DATA statement used (Total process time): real time 8:22.39 user cpu time 2:52.89 system cpu time 26.94 seconds memory 1516.40k OS Memory 21152.00k Timestamp 04/17/2017 12:05:00 PM Step Count 265 Switch Count 222 Page Faults 0 Page Reclaims 426 Page Swaps 0 Voluntary Context Switches 623546 Involuntary Context Switches 128208 Block Input Operations 0 Block Output Operations 0
  • 12. 12 SAS Compress – A Warning • With small files, compress can make the file larger • In this case, running the example code for only 20 observations: Size File 131,072 test_no.sas7bdat 196,608 test_yes.sas7bdat 196,608 test_char.sas7bdat 196,608 test_bin.sas7bdat • Even without actual compression, the file size is larger • SAS Warns you in the log with a NOTE: NOTE: The data set TEST.TEST_NO has 20 observations and 17 variables. NOTE: The data set TEST.TEST_YES has 20 observations and 17 variables. NOTE: Compressing data set TEST.TEST_YES increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: The data set TEST.TEST_CHAR has 20 observations and 17 variables. NOTE: Compressing data set TEST.TEST_CHAR increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: The data set TEST.TEST_BIN has 20 observations and 17 variables. NOTE: Compressing data set TEST.TEST_BIN increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. •
  • 13. 13 SAS Compress – Read Example • Read Times will Vary based on compression method • In each case, the read code is the same except for the input table • Uncompressed Read (baseline): libname test “/just/some/directory"; data _null_; set test.test_no; /* Different datasets for each test */ retain total 0; total=total+num1; run; NOTE: There were 20000000 observations read from the data set TEST.TEST_NO. NOTE: DATA statement used (Total process time): real time 4.99 seconds user cpu time 1.22 seconds system cpu time 3.43 seconds memory 920.25k OS Memory 21152.00k
  • 14. 14 SAS Compress – Read Example • Compress=Char and Compress=Yes produced similar results: NOTE: There were 20000000 observations read from the data set TEST.TEST_YES. NOTE: DATA statement used (Total process time): real time 12.56 seconds user cpu time 9.93 seconds system cpu time 2.52 seconds memory 1137.56k OS Memory 21152.00k • Compress=Binary used more resources: NOTE: There were 20000000 observations read from the data set TEST.TEST_BIN. NOTE: DATA statement used (Total process time): real time 24.18 seconds user cpu time 21.75 seconds system cpu time 2.25 seconds memory 1151.34k OS Memory 21152.00k
  • 15. 15 SAS Compress – Read Example • A quick comparison: Example Elapsed System User Memory None 4.99 sec 3.43 sec 1.22 sec 920.25k Yes 12.56 sec 2.52 sec 9.93 sec 1137.56k Char 12.68 sec 2.44 sec 10.11 sec 1137.25k Binary 24.18 sec 2.25 sec 21.75 sec 1151.34k
  • 16. 16 gzip Compression • GNU Zip (gzip and gunzip) commands • Are available on most systems including UNIX, Windows, and Linux (by default). • WinZip is available under Windows (and can be read by gzip) • Some UNIX zip can read WinZip files • Significant improvement in space usage: • Strangely enough, you get less compression on files SAS has already compressed size before size after: gzip fastest size after: gzip default size after: gzip max test_bin 4,920,377,344 3,053,723,102 2,794,141,358 2,780,018,371 test_char 5,331,353,600 2,036,590,374 1,814,911,246 1,796,243,109 test_no 8,348,631,040 2,120,174,601 1,758,621,239 1,737,218,569 test_yes 5,331,353,600 2,036,590,374 1,814,911,246 1,796,264,146
  • 17. 17 gzip Compression • There Ain’t No Such Thing As A Free Lunch (TANSTAAFL: Robert A. Heinlein) • The space savings comes at a cost: • And a significant cost in elapsed time: • But there are ways to reduce these costs Zip fastest ET Unzip fastest ET Zip default ET Unzip default ET Zip max ET Unzip Max ET test_bin 04:04.2 02:48.0 07:56.2 02:11.5 15:26.2 02:14.2 test_char 02:41.8 02:44.7 06:03.5 02:04.8 12:00.2 02:09.1 test_no 03:04.3 03:44.2 06:13.3 02:56.9 14:06.9 03:07.9 test_yes 02:44.4 02:40.7 06:10.7 02:08.7 11:25.5 02:11.1 Zip fastest CPU Unzip fastest CPU Zip default CPU Unzip default CPU Zip max CPU Unzip Max CPU Average test_bin 143.7 59.2 358.7 52.1 803.7 51.8 244.8 test_char 92.0 46.6 281.2 43.0 627.7 43.0 188.9 test_no 108.3 63.5 293.2 55.0 755.4 54.2 221.6 test_yes 92.4 46.4 281.5 43.4 592.5 43.0 183.2 Average 109.1 53.9 303.6 48.4 694.8 48.0
  • 18. 18 Compression Comparison • Compression in any form makes sense when: • Space is at a premium (just about always) • File sizes are large • Processing cost is high (data isn't just being read and reported) • SAS Compression makes more sense when: • Processing time is important • Want simplicity of code • Want immediate access to data • gzip makes sense when: • File is infrequently used – especially when it is kept because you're afraid to get rid of it (or regulatory requirements) • Maximum space savings is important • File sizes are really large
  • 19. 19 Taking Advantage of Parallelism – Piping • You can take advantage of multiple CPU/cores to process compressed data through the use of Pipes. • SAS supports piping natively for flat files • SAS requires operating system support for "named pipes" • Makes use of the "Sequential Data Engine" – often referred to as the "TAPE" engine. • You can only write one dataset to it • You can only read once • proc contents information limited (no 'NOBS' for instance) • You can't do both at the same time
  • 20. 20 Taking Advantage of Parallelism – Piping • Let's start with an example – minor changes to the earlier Compression Write: libname test "/just/some/directory/base_no_fifo"; /* In UNIX Command Line, execute: mknod /just/some/directory/base_no_fifo p */ %macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform"))) %mend; X "gzip < /just/some/directory/base_no_fifo > /just/some/directory/base6_no_via_fifo.sas7bdat.gz &"; data test.test_no (compress=no drop=text1-text44) ; array text[44] $20 (/* list of 44 words or phrases */); format longstring $200. ; DO indexvariable=1 TO 20000000; /* Nothing changed here */ output test.test_no; /* Only creating one this time */ END; run; /* These will not work; I'll explain why! proc print data=test.test_no (obs=10); run; proc contents data=test.test_no; run; */
  • 21. 21 Taking Advantage of Parallelism – Piping • Minor changes to the earlier Compression Read example: libname test "/just/some/directory/base_no_fifo"; /* In UNIX Command Line, execute: mknod /just/some/directory/base_no_fifo p */ X "gunzip --stdout /just/some/directory/base6_no_via_fifo.sas7bdat.gz > /just/some/directory/base_no_fifo &"; data _null_; set test.test_no; retain total 0; total=total+num1; run;
  • 22. 22 Taking Advantage of Parallelism – Piping • Timing Results: • I've included a Direct Read for comparison purposes • Note that SAS does not report the gzip/gunzip CPU usage • Separate Process • Separate CPU/Core/Thread • There are times you can get a "nearly free" lunch. zip CPU unzip CPU Zip ET Unzip ET pipe zip CPU pipe unzip CPU pipe zip ET pipe unzip ET File Size gzip Max 755.40 54.20 14:06.9 03:07.9 61.22 5.00 10:33.0 48.14 1,737,218,569 gzip Default 293.20 55.00 06:13.0 02:57.0 59.20 5.11 04:50.9 01:01.9 1,758,621,239 gzip Min 108.30 63.50 03:04.3 03:44.0 59.35 5.12 03:49.0 58.03 2,120,174,601 cat 64.23 5.09 01:34.0 11.03 8,348,631,040 Direct Read 61.08 4.65 02:44.5 4.99 8,348,631,040
  • 23. 23 Taking Advantage of Parallelism – Piping • What are Pipes? • Very similar to the water pipes in your home • There is a pump and faucet • You are able to pick the direction • Data can only flow one way at a time • Data can only flow when the pipe program is executing • There is a creator and consumer • In the Write Example, SAS is the pump, gzip is the faucet • In the Read Example, gzip is the pump, SAS is the faucet • Data is not stored in the pipe itself • May be a bit buffered on disk or may entirely be in memory • Won't typically cross networks
  • 24. 24 Taking Advantage of Parallelism – Piping • What are Pipes? • Requires an entry on disk • Created via the mknod (make node) or mkfifo (make first-in first-out): mknod /just/some/directory/base_no_fifo p mkfifo /just/some/directory/base_no_fifo • Pipes (the infrastructure) remain around unless removed • Disk entry will look like (using ls -al command) prw-rw-r-- 1 MYID my_group_name 0 Apr 02 09:48 base_no_fifo • "p" tells you this is a Pipe • "0" tells you it isn't holding any data • You can also run the external command in a script or by hand • Useful if X Command not allowed • Will not work in Grid environment
  • 25. 25 Taking Advantage of Parallelism – Piping • Why won't they work? • In the Pipe Compression Write I included: /* These will not work; I'll explain why! proc print data=test.test_no (obs=10); run; proc contents data=test.test_no; run; */ • In the program, Libname test is a pipe. • Data flowed through that pipe, and having flowed, is no longer available. • At least not in this context • The data is still available on the disk (written out by gzip) • But not to this program unless we reprime, and in this case, reverse the pump: X "gunzip --stdout /just/some/directory/base6_no_via_fifo.sas7bdat.gz > /just/some/directory/base_no_fifo &"; proc print data=test.test_no (obs=10); run; X "gunzip --stdout /just/some/directory/base6_no_via_fifo.sas7bdat.gz > /just/some/directory/base_no_fifo &"; proc contents data=test.test_no; run;
  • 26. 26 Taking Advantage of Parallelism – Piping • Common Error: • Attempting to write multiple datasets to (or read multiple from) a sequential library output test.test_no test.test_yes test.test_char test.test_bin; • Will result in an error: ERROR: Attempt to open two sequential members in the same sequential library. File TEST.TEST_YES.DATA cannot be opened. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set TEST.TEST_NO may be incomplete. When this step was stopped there were 0 observations and 17 variables.
  • 27. 27 Taking Advantage of Parallelism – Piping • External Command Example – Write: • UNIX/Linux commands: mknod mypipe p gzip mypipe > input.gz & /* runs in background/parallel */ sas writepipe.sas • writepipe.sas Program: libname test "mypipe"; %macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform"))) %mend; /* X command removed */ data test.test_no (compress=no drop=text1-text44) ; array text[44] $20 (/* list of 44 words or phrases */); format longstring $200. ; DO indexvariable=1 TO 20000000; /* Nothing changed here */ output test.test_no; END; run;
  • 28. 28 Taking Advantage of Parallelism – Piping • External Command Example – Read: • UNIX/Linux commands: mknod mypipe p /* not needed if created before) gzip –-stdout input.gz > mypipe & /* runs in background/parallel */ sas readpipe.sas • readpipe.sas Program: libname test "mypipe"; /* X command removed */ data _null_; set test.test_no; retain total 0; total=total+num1; run;
  • 29. 29 Taking Advantage of Parallelism – Piping • No real timing differences between external and internal (X) command approaches • Minor Advantages for External Commands: • Can trap errors within the gzip command • Missing file for instance • Control at the shell level • Same SAS program able to work for different files • Minor Disadvantages for External Commands: • Increased code complexity • Both SAS and UNIX/Linux code required • Major Disadvantage for External Commands: • External command difficult to implement in Grid environment
  • 30. 30 Personal Note • I seem to learn quite a lot when working on presentations, new classes, and writings • It wasn’t until I was gathering data for this presentation that: • I realized that SAS Compression had gotten smarter (rather than processing the file again). • I found that separate (external) commands would not work with pipes on a Grid. I should've realized that since that command is running on my local (login) machine while the SAS code runs anywhere on the Grid. Although the Pipe was on shared storage, the data movement was in memory only. • In any commands in this presentation, the single and double quotation marks should be simple, not the “smart quotes” forced my Microsoft. The same applies to dashes or minus signs – they should not be “em dashes” (- versus –)
  • 32. 32 Filename Piping • If we have some extra time... • It is possible to process INFILE or FILE with pipes • Much like process with set or data • Can be used with Internal or External commands • SAS also supports the PIPE keyword on the FILENAME statement to allow piping in/out data: • FILENAME fileref PIPE 'UNIX-command' <options>; • Your INFILE or FILE command will include the fileref. Whatever you INPUT or PUT in that data step will involve the specified UNIX command.
  • 33. 33 Filename Piping • A Writing Example (should look fairly familiar by now): filename testref PIPE "cat > /just/some/directory/output.txt"; %macro RandBetween(min, max); (&min + floor((1+&max-&min)*rand("uniform"))) %mend; data _null_; file testref; array text[44] $20 (/* 44 words and phrases */); format longstring $200. ; DO indexvariable=1 TO 200; word1=text[%RandBetween(1,44)]; num1=%RandBetween(1,9999999999); word2=text[%RandBetween(1,44)]; num2=rand("uniform"); word3=text[%RandBetween(1,44)]; word4=text[%RandBetween(1,44)]; num3=%RandBetween(1,9999999999); word5=text[%RandBetween(1,44)]; num4=rand("uniform"); num5=%RandBetween(1,9999999999); word6=text[%RandBetween(1,44)]; num6=rand("uniform"); stringlength=%RandBetween(1,179); longstring=trim(text[%RandBetween(1,44)]); do while (length(longstring) < stringlength); longstring=trim(longstring)||" " || text[%RandBetween(1,44)]; end; num7=%RandBetween(1,9999999999); word7=text[%RandBetween(1,44)]; put word1 num1 word2 num2 longstring; END; run;
  • 34. 34 Filename Piping • A Reading Example (should look fairly familiar by now): filename testref PIPE "cat /just/some/directory/output.txt"; data out; infile testref; input name $; run; proc print data=work.out (obs=10); run; • Produces the following Obsname 1with commas 63344454 2and enclose 58066050 3or double 882972945 4of an array 97957098 5To do 368188872 init 6and enclose 19271463 7and enclose 90992099 8or spaces 8165156291 9with commas 42546153 10or spaces 96397033 i
  • 35. 35 Compression References • NOTES • Indexing and Compressing SAS® Data Sets: http://www2.sas.com/proceedings/sugi28/003-28.pdf • SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm# a001288760.htm • Programming Tricks For Reducing Storage And Work Space: http://www2.sas.com/proceedings/sugi27/p023-27.pdf • How to Reduce the Disk Space Required by a SAS® Data Set: http://www.lexjansen.com/nesug/nesug06/io/io18.pdf • Accessing Sequential-Format Data Libraries (pipes): http://technology.msb.edu/old/training/statistics/sas/books/unix/z0386494.htm • Smokin’ With UNIX Pipes (FILENAME): http://www2.sas.com/proceedings/sugi25/25/cc/25p103.pdf • SAS® 9.4 Companion for UNIX Environments, Sixth Edition (X command): http://support.sas.com/documentation/cdl/en/hostunx/69602/PDF/default/hostunx.pd f • Using SAS with Pipes or as a Filter under UNIX: https://www.linkedin.com/pulse/using-sas-pipes-filter-under-unix-david- horvath?published=t
  • 36. 36 To COMPRESS or Not, to COMPRESS or ZIP The Author can be contacted at: 504 Longbotham Drive, Aston PA 19014-2502, USA Phone: 1-610-859-8826 Email: dhorvath@cobs.com Web: http://www.cobs.com/ LinkedIn: https://www.linkedin.com/in/dbhorvath/ (will post presentation) All trademarks and servicemarks are the property of their respective owners. Copyright © 2017, David B. Horvath, CCP — All Rights Reserved
  • 37. 37 Compression References • My Word/Phrase array: array text[44] $20 ('For some' 'applications' 'it can be' 'beneficial' 'to assign' 'initial' 'values to the' 'variables or' 'elements' 'of an array' 'at the' 'time that' 'the array' 'is defined' 'To do' 'this' 'enclose' 'the initial' 'values in' 'parentheses' 'at the end' 'of the' 'ARRAY' 'statement' 'Separate' 'the values' 'either' 'with commas' 'or spaces' 'and enclose' 'character' 'values in' 'either single' 'or double' 'quotation' 'marks' 'The following' 'statements' 'illustrate' 'the' 'initialization' 'of numeric' 'and' 'character values');