SQL Database Design For Developers at php[tek] 2024
20180410 sasgf2018 2454 lazy programmers xml ppt
1. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
2. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
SGF2018 2454:
Lazy Programmers Write Self-Modifying Code
Or Dealing with XML Ordinals
4. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML - Background
• XML stands for eXtensible Markup Language
• Originally created in 1996
• Consists of Markup and Content
• Markup defines the items (fields, etc.) – represented with tags
• Content is the data
• Is transportable and human readable
• If well formed, you’ll have a definition (XSD – XML Schema Definition)
• If not well formed, you’ll only have the XML data file itself
• Easy for data provider to change layout: update XSD, add data to XML file
• An easy way to think of XML is “CSV on steroids”
• Very flexible: Advantage and Disadvantage
5. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XML File Sample
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx
xmlns="http://www.topografix.com/GPX/1/1"
xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v2"
creator="nüvi 2370" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1
http://www.topografix.com/GPX/1/1/gpx.xsd
http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><metadata><l
ink href="http://www.garmin.com"><text>Garmin
International</text></link><time>2012-04-
12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="-
75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></wpt><wpt
lat="39.764033" lon="-75.551346"><ele>61.17</ele>
• Not very helpful viewed that way
6. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XML File Sample
7. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XML File Sample
• Treating the file as text may be helpful
• Elements like <trk> can have repeating sub-elements like <trkseg>
• But we don’t know the “data model”
• I’m using examples from my Garmin GPS
• I don’t have to sanitize the data like I would with a file from work…
• I’m not going to teach you XML coding today
• http://en.wikipedia.org/wiki/Xml provides good background
gpx_small_xml.txt
8. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XSD File Sample
• Treating the file as text may be helpful
• Describes exactly what is expected in the XML data
file
Gpx_xsd.txt
9. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
SAS XML Mapping Tool
• Free download from http://support.sas.com/kb/33/584.html
• Creates a Map for SAS to read XML as a “SAS Dataset”
• Can process an XML data file to create Map
• Works with subset of large file
• Not all elements appear for any particular key
• Tool has to guess data type of elements (like proc import CSV)
• Better to process an XSD file to create the Map
• Full definition (no “guessing” required)
• Not always available
• In easiest usage, will create keys to connect elements (“Ordinals”)
• The map is in XML format Gpx_map.txt
10. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
SAS XML Engine
Code to Access Data in XML
• Really very simple to use once Map is built:
filename test "/export/home/myid/gpx.xml";
filename SXLEMAP "/export/home/myid/gpx.map";
libname test xml xmlmap=SXLEMAP access=READONLY;
• And then use it much like any other SAS Dataset:
proc contents data=test._all_ ;
run;
• Or
data tableonly;
set test.member END=EOF;
output;
run;
11. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
SAS XML Engine
Code to access data in XML
• Modifying the XML file is more difficult (and not part of
this presentation):
ERROR: XMLMap= has been specified on the XML Libname
assignment. The output produced via this option will
change in upcoming releases. Correct the XML
Libname(remove XMLMap= option) and resubmit. Output
generation aborted.
• Everything you ever wanted to know about the SAS XML
engine is available at
• http://support.sas.com/rnd/base/xmlengine/index.html
12. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Our Problem
• Vendor provided XML file:
• Limited documentation
• No internal experience
• Short timeline
• Hundreds of internal “objects” (mapped to hundreds of SAS
datasets)
• Needed to be able to “see” data to learn about it
• Once in production:
• Daily input file
• Concatenated output Datasets
13. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Every object in the SAS mapped XML file will have an
Ordinal to ensure uniqueness (gpx):
• Child objects contain their parent Ordinals (rte):
14. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Child objects contain their parent Ordinals (rte):
15. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Those Ordinals are only unique to a specific XML data file, not over
time.
• In order to append today’s data to yesterday’s the Ordinals need to
change.
• Simple solution:
• Find yesterday’s maximum Ordinal for each table
• Add it to today’s values
• Append to yesterday’s accumulated file
• A better solution would be to build records in your desired format
• But you have to understand the data in order to do that
• Reduces flexibility
16. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• GPX example converts to 19 SAS datasets
17. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Repeating the same code for 19 elements (XML pseudo SAS
Datasets) is a pain.
• Can you imagine doing it for hundreds?
• I’m lazy and I make mistakes.
• I’d really rather not copy & paste & edit the same code 19 (or 190)
times.
• I’d rather not have to repeat the process every time the file
changes (or a new element appears – no XSD)
• Since the same process applies for every one of the elements, why
not let code do the work for me?
18. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
Self-modifying Code
• The mechanism to create self-modifying code within SAS is rather
simple since it is an interpreted language. You use File, Put, and
%include:
filename sourcecd “gpxxml_generated&DATADATE..sas";
data _null_;
file sourcec2;
set temp.maxvals end=EOF;
/* use put statements */
run;
%include sourcec2;
19. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
Creating Code File
• The maxvals dataset contains the maximum ordinals from yesterday.
• I create a file that contains SAS code, and by including it, cause execution of that
new code. In detail:
filename sourcec2 "xml_generated&DATADATE._2.sas";
data _null_;
file sourcec2;
set temp.maxvals end=EOF;
if (_n_ = 1) then do; put "libname temp 'temp';"; end;
put "data temp." tableonly "; set " member "END=EOF;";
if prikey NOT = "" AND prival NOT = . then put prikey" = " prikey " + " prival";";
if parkey NOT = "" AND parval NOT = . then put parkey" = " parkey " + " parval";";
put "output;"; put "run;";
put "proc datasets; append base=output." tableonly " data=temp." tableonly "; run;";
if EOF then do;
put " run;";
end;
run;
20. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
The Generated Code
• The resulting code snippet defined in sourcec2:
/* Libname test and output defined in main program */
libname temp '/export/home/myid/temp';
data temp.AUTHOR ;
set test.AUTHOR END=EOF;
METADATA_ORDINAL = METADATA_ORDINAL + 257 ;
output;
run;
proc datasets;
append base=output.AUTHOR data=temp.AUTHOR ;
run;
data temp.BOUNDS ;
set test.BOUNDS END=EOF;
METADATA_ORDINAL = METADATA_ORDINAL + 257 ;
output;
run;
proc datasets;
append base=output.BOUNDS data=temp.BOUNDS ;
run;
21. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
• I also could have created a full program and
executed with a new sas command.
• This code is executed by including it back into the
main program.
• Each day, the generated code is a little different
because the maximum ordinals (in the example
above, the value 257) changes each day.
• That way, the history contains unique ordinals over
time.
22. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Afterthoughts
• Used dynamic code rather than merging the maximum ordinal into
each record or macro – fewer passes through the data
• Encountered issues with the raw data – had to deal with invalid
tags that broke the XML engine.
• One disadvantage of the engine is the need to parse the entire file
for each object (200 objects, 200 passes through the file)
• We resorted to rsubmit to process 10 sets in parallel
• Seriously impacted I/O capabilities of system
• It would have been better to save off the maximum ordinal at each
stage rather than getting it from history.
• Had concerns about recovery and building new objects
23. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Afterthoughts
• I used proc contents; now that I know more about them
I could have used the dictionary tables instead
• Although these examples were written with version 1 of
the XML engine, when we moved to version 2, the only
code change (besides the map) was the following line:
libname test xmlv2 xmlmap=SXLEMAP access=readonly;
• Set nobs= does not work with XML.
• It compiles (no warning/error) and executes but returns the value
zero.
• I learned this while researching a presentation for PhilaSUG.
24. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Questions
and
Answers
?! ?!
?!
?!
?
? ?
?
!
!
!
!
Wrap Up
25. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
Example Files
• Walking through the code:
• We can also look at the log:
• The generated SAS code (in case anyone cares):
xml_process_generator.txt
xml_process_generator.txt
gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt
26. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Contact Information
The Speaker can be contacted at:
David B. Horvath, CCP
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
27. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Presenter
David B. Horvath, CCP, MS
David is an IT Professional who has worked with various platforms since the
1980’s with a variety of development and analysis tools.
He has presented sessions at PhilaSUG and SESUG previously as well as
presenting workshops and seminars in Australia, France, the US, Canada,
and Oxford England (about the British Author Nevil Shute) for various
organizations.
His undergraduate degree is in Computer and Information Sciences from
Temple University and a Masters in Organizational Dynamics from UPENN.
He achieved the Certified Computing Professional designation with honors.
Most of his career has been in consulting (although recently he has been in-
house) in the Philadelphia PA area.
He has several books to his credit (none directly SAS related) and is an
Adjunct Instructor at the University of Phoenix covering IT topics. He is
currently working in Data Analytics Infrastructure for a regional bank.
28. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF#SASGF
Your feedback counts!
Don't forget to complete the session survey
in your conference mobile app.
1. Go to the Agenda icon in the conference app.
2. Find this session title and select it.
3. On the sessions page, scroll down to Surveys
and select the name of the survey.
4. Complete the survey and click Finish.
29. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.