20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals

Lazy Programmers Write Self-Modifying Code
or
Dealing with XML Ordinals
David B. Horvath, CCP, MS
PhilaSUG Spring 2012 Meeting

Contact Information
Copyright © 2012, David B. Horvath, CCP — All Rights Reserved
The Author can be contacted at:
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
All trademarks and servicemarks are the
property of their respective owners.

Abstract
• The XML engine within SAS is very powerful but it does convert every object
into a SAS dataset with generated keys to implement the parent/child
relationships between these objects. Those keys (Ordinals in SAS-speak) are
guaranteed to be unique within a specific XML file. However, they restart at 1
with each file. When concatenating the individual tables together, those keys
are no longer unique.
• We received an XML file with over 110 objects resulting in over 110 SAS
datasets our internal customer wanted concatenated for multiple days. Rather
than copying and pasting the code to handle this process 110+ times, and
knowing that I would make mistakes along the way – and knowing that the
objects would also change along the way, I created SAS code to create the
SAS code to handle the XML. I consider myself a Lazy Programmer.
• As the classic "Real Programmers…" sheet tells us, Real Programmers are
Lazy.
• This session reviews XML (briefly), SAS XML Mapper, SAS XML Engine,
techniques for handing the Ordinals over multiple days, and finally discusses a
technique for using SAS code to generate SAS code.
3

4
Introductions
• My Background
• XML
• SAS XML Mapping Tool
• SAS XML Engine (Code to access data in XML)
• Our Problem
• Dealing with Ordinals over Multiple Files
• Using SAS Code to Generate SAS Code

5
My Background
• Base SAS on Mainframe, UNIX, and PC Platforms
• SAS is primarily an ETL tool or Programming Language for me
• My background is IT – I am not a modeler
• My first SUG presentation
• Not my first User Group presentation – presented workshops and
seminars in Australia, France, the US, and Canada.
• Undergraduate: Computer and Information Sciences, Temple Univ.
• Graduate: Organizational Dynamics, Upenn
• Most of my career was in consulting
• Have written several books (none SAS-related)
• Adjunct Instructor covering IT topics.

6
XML - Background
• XML stands for eXtensible Markup Language
• Originally created in 1996
• Consists of Markup and Content
• Markup defines the items (fields, etc.) – represented with tags
• Content is the data
• Is transportable and human readable
• If well formed, you’ll have a definition (XSD – XML Schema Definition)
• If not well formed, you’ll only have the XML data file itself
• Easy for data provider to change layout: update XSD, add data to
XML file
• An easy way to think of XML is “CSV on steroids”
• Very flexible: Advantage and Disadvantage

7
XML – XML File Sample
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx
xmlns="http://www.topografix.com/GPX/1/1"
xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtensio
n/v2" creator="nÃ¼vi 2370" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1
http://www.topografix.com/GPX/1/1/gpx.xsd
http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><meta
data><link href="http://www.garmin.com"><text>Garmin
International</text></link><time>2012-04-
12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="-
75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></w
pt><wpt lat="39.764033" lon="-75.551346"><ele>61.17</ele>
• Not very helpful viewed that way

9
XML – XML File Sample
• Treating the file as text may be helpful
• Elements like <trk> can have repeating sub-elements like <trkseg>
• But we don’t know the “data model”
• I’m using examples from my Garmin GPS
• I don’t have to sanitize the data like I would with a file from work…
• I’m not going to teach you XML coding today
• http://en.wikipedia.org/wiki/Xml provides good background
gpx_small_xml.txt

10
XML – XSD File Sample
• Treating the file as text may be helpful
• Describes exactly what is expected in the XML data file
Gpx_xsd.txt

11
SAS XML Mapping Tool
• Free download from http://support.sas.com/kb/33/584.html
• Creates a Map for SAS to read XML as a “SAS Dataset”
• Can process an XML data file to create Map
• Works with subset of large file
• Not all elements appear for any particular key
• Tool has to guess data type of elements (like proc import CSV)
• Better to process an XSD file to create the Map
• Full definition (no “guessing” required)
• Not always available
• In easiest usage, will create keys to connect elements (“Ordinals”)
• The map is in XML format
Gpx_map.txt

12
SAS XML Engine (Code to access data in XML)
• Really very simple to use once Map is built:
filename CDAtest2 "/export/home/fw03606/gpx.xml";
filename SXLEMAP "/export/home/fw03606/gpx.map";
libname CDAtest2 xml xmlmap=SXLEMAP access=READONLY;
• And then use it much like any other SAS Dataset:
proc contents data=CDAtest2._all_ ;
run;
• Or
data tableonly;
set CDAtest2.member END=EOF;
output;
run;

13
SAS XML Engine (Code to access data in XML)
• Modifying the XML file is more difficult (and not part of this
presentation):
ERROR: XMLMap= has been specified on the XML Libname
assignment. The output produced via this option will
change in upcoming releases. Correct the XML
Libname(remove XMLMap= option) and resubmit. Output
generation aborted.
• Everything you ever wanted to know about the SAS XML engine is
available at
• http://support.sas.com/rnd/base/xmlengine/index.html

14
Our Problem
• Vendor provided XML file:
• Limited documentation
• No internal experience
• Short timeline
• Hundreds of internal “objects” (mapped to hundreds of SAS datasets)
• Needed to be able to “see” data to learn about it
• Once in production:
• Daily input file
• Concatenated output Datasets

15
Dealing with Ordinals over Multiple Files
• Every object in the SAS mapped XML file will have an Ordinal to
ensure uniqueness (gpx):
• Child objects contain their parent Ordinals (rte):
Variable Type LenFormat Informat
creator Char 32$32. $32.
extensions Char 32$32. $32.
gpx Char 32$32. $32.
gpx_ORDINAL Num 8F8. F8.
version Char 32$32. $32.
Variable Type LenFormat Informat
cmt Char 32$32. $32.
desc Char 32$32. $32.
extensions Char 32$32. $32.
gpx_ORDINAL Num 8F8. F8.
name Char 32$32. $32.
number Num 8F8. F8.
rte Char 32$32. $32.
rte_ORDINAL Num 8F8. F8.
src Char 32$32. $32.
type Char 32$32. $32.

16
• Those Ordinals are only unique to a specific XML data file, not over
time.
• In order to append today’s data to yesterday’s the Ordinals need to
change.
• Simple solution:
• Find yesterday’s maximum Ordinal for each table
• Add it to today’s values
• Append to yesterday’s accumulated file
• A better solution would be to build records in your desired format
• But you have to understand the data in order to do that
• Reduces flexibility

17
• GPX example converts to 19 SAS datasets
Name Member Type
AUTHOR DATA
BOUNDS DATA
COPYRIGHT DATA
EMAIL DATA
GPX DATA
LINK DATA
LINK1 DATA
LINK2 DATA
LINK3 DATA
LINK4 DATA
LINK5 DATA
LINK6 DATA
METADATA DATA
RTE DATA
RTEPT DATA
TRK DATA
TRKPT DATA
TRKSEG DATA
WPT DATA

18
• Repeating the same code for 19 elements (XML pseudo SAS
Datasets) is a pain.
• Can you imagine doing it for hundreds?
• I’m lazy and I make mistakes.
• I’d really rather not copy & paste & edit the same code 19 (or 190)
times.
• I’d rather not have to repeat the process every time the file changes
(or a new element appears – no XSD)
• Since the same process applies for every one of the elements, why
not let code do the work for me?

19
Using SAS Code to Generate SAS Code
• Mechanism is fairly simple: File, Put, and %include:
filename sourcecd “gpxxml_generated&DATADATE..sas";
data _null_;
file sourcecd;
set temp.prikeys end=EOF;
/* use put statements */
run;
%include sourcecd;

20
Using SAS Code to Generate SAS Code
• Walking through the code:
• We can also look at the log:
• The generated SAS code (in case anyone cares):
• x
xml_process_generator.txt
xml_p_g_ppt_log.txt
gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt

21
Wrap Up
Questions
and
Answers
?! ?!
?! ?!
?
? ?
?
!
!
!
!

20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to 20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals

Similar to 20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals (20)

More from David Horvath

More from David Horvath (11)

Recently uploaded

Recently uploaded (20)

20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals