SlideShare a Scribd company logo
Lazy Programmers Write Self-Modifying Code
or
Dealing with XML Ordinals
David B. Horvath, CCP, MS
PhilaSUG Spring 2012 Meeting
Contact Information
Copyright © 2012, David B. Horvath, CCP — All Rights Reserved
The Author can be contacted at:
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
All trademarks and servicemarks are the
property of their respective owners.
Abstract
• The XML engine within SAS is very powerful but it does convert every object
into a SAS dataset with generated keys to implement the parent/child
relationships between these objects. Those keys (Ordinals in SAS-speak) are
guaranteed to be unique within a specific XML file. However, they restart at 1
with each file. When concatenating the individual tables together, those keys
are no longer unique.
• We received an XML file with over 110 objects resulting in over 110 SAS
datasets our internal customer wanted concatenated for multiple days. Rather
than copying and pasting the code to handle this process 110+ times, and
knowing that I would make mistakes along the way – and knowing that the
objects would also change along the way, I created SAS code to create the
SAS code to handle the XML. I consider myself a Lazy Programmer.
• As the classic "Real Programmers…" sheet tells us, Real Programmers are
Lazy.
• This session reviews XML (briefly), SAS XML Mapper, SAS XML Engine,
techniques for handing the Ordinals over multiple days, and finally discusses a
technique for using SAS code to generate SAS code.
3
4
Introductions
• My Background
• XML
• SAS XML Mapping Tool
• SAS XML Engine (Code to access data in XML)
• Our Problem
• Dealing with Ordinals over Multiple Files
• Using SAS Code to Generate SAS Code
5
My Background
• Base SAS on Mainframe, UNIX, and PC Platforms
• SAS is primarily an ETL tool or Programming Language for me
• My background is IT – I am not a modeler
• My first SUG presentation
• Not my first User Group presentation – presented workshops and
seminars in Australia, France, the US, and Canada.
• Undergraduate: Computer and Information Sciences, Temple Univ.
• Graduate: Organizational Dynamics, Upenn
• Most of my career was in consulting
• Have written several books (none SAS-related)
• Adjunct Instructor covering IT topics.
6
XML - Background
• XML stands for eXtensible Markup Language
• Originally created in 1996
• Consists of Markup and Content
• Markup defines the items (fields, etc.) – represented with tags
• Content is the data
• Is transportable and human readable
• If well formed, you’ll have a definition (XSD – XML Schema Definition)
• If not well formed, you’ll only have the XML data file itself
• Easy for data provider to change layout: update XSD, add data to
XML file
• An easy way to think of XML is “CSV on steroids”
• Very flexible: Advantage and Disadvantage
7
XML – XML File Sample
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx
xmlns="http://www.topografix.com/GPX/1/1"
xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtensio
n/v2" creator="nüvi 2370" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1
http://www.topografix.com/GPX/1/1/gpx.xsd
http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><meta
data><link href="http://www.garmin.com"><text>Garmin
International</text></link><time>2012-04-
12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="-
75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></w
pt><wpt lat="39.764033" lon="-75.551346"><ele>61.17</ele>
• Not very helpful viewed that way
8
XML – XML File Sample
9
XML – XML File Sample
• Treating the file as text may be helpful
• Elements like <trk> can have repeating sub-elements like <trkseg>
• But we don’t know the “data model”
• I’m using examples from my Garmin GPS
• I don’t have to sanitize the data like I would with a file from work…
• I’m not going to teach you XML coding today
• http://en.wikipedia.org/wiki/Xml provides good background
gpx_small_xml.txt
10
XML – XSD File Sample
• Treating the file as text may be helpful
• Describes exactly what is expected in the XML data file
Gpx_xsd.txt
11
SAS XML Mapping Tool
• Free download from http://support.sas.com/kb/33/584.html
• Creates a Map for SAS to read XML as a “SAS Dataset”
• Can process an XML data file to create Map
• Works with subset of large file
• Not all elements appear for any particular key
• Tool has to guess data type of elements (like proc import CSV)
• Better to process an XSD file to create the Map
• Full definition (no “guessing” required)
• Not always available
• In easiest usage, will create keys to connect elements (“Ordinals”)
• The map is in XML format
Gpx_map.txt
12
SAS XML Engine (Code to access data in XML)
• Really very simple to use once Map is built:
filename CDAtest2 "/export/home/fw03606/gpx.xml";
filename SXLEMAP "/export/home/fw03606/gpx.map";
libname CDAtest2 xml xmlmap=SXLEMAP access=READONLY;
• And then use it much like any other SAS Dataset:
proc contents data=CDAtest2._all_ ;
run;
• Or
data tableonly;
set CDAtest2.member END=EOF;
output;
run;
13
SAS XML Engine (Code to access data in XML)
• Modifying the XML file is more difficult (and not part of this
presentation):
ERROR: XMLMap= has been specified on the XML Libname
assignment. The output produced via this option will
change in upcoming releases. Correct the XML
Libname(remove XMLMap= option) and resubmit. Output
generation aborted.
• Everything you ever wanted to know about the SAS XML engine is
available at
• http://support.sas.com/rnd/base/xmlengine/index.html
14
Our Problem
• Vendor provided XML file:
• Limited documentation
• No internal experience
• Short timeline
• Hundreds of internal “objects” (mapped to hundreds of SAS datasets)
• Needed to be able to “see” data to learn about it
• Once in production:
• Daily input file
• Concatenated output Datasets
15
Dealing with Ordinals over Multiple Files
• Every object in the SAS mapped XML file will have an Ordinal to
ensure uniqueness (gpx):
• Child objects contain their parent Ordinals (rte):
Variable Type LenFormat Informat
creator Char 32$32. $32.
extensions Char 32$32. $32.
gpx Char 32$32. $32.
gpx_ORDINAL Num 8F8. F8.
version Char 32$32. $32.
Variable Type LenFormat Informat
cmt Char 32$32. $32.
desc Char 32$32. $32.
extensions Char 32$32. $32.
gpx_ORDINAL Num 8F8. F8.
name Char 32$32. $32.
number Num 8F8. F8.
rte Char 32$32. $32.
rte_ORDINAL Num 8F8. F8.
src Char 32$32. $32.
type Char 32$32. $32.
16
Dealing with Ordinals over Multiple Files
• Those Ordinals are only unique to a specific XML data file, not over
time.
• In order to append today’s data to yesterday’s the Ordinals need to
change.
• Simple solution:
• Find yesterday’s maximum Ordinal for each table
• Add it to today’s values
• Append to yesterday’s accumulated file
• A better solution would be to build records in your desired format
• But you have to understand the data in order to do that
• Reduces flexibility
17
Dealing with Ordinals over Multiple Files
• GPX example converts to 19 SAS datasets
Name Member Type
AUTHOR DATA
BOUNDS DATA
COPYRIGHT DATA
EMAIL DATA
GPX DATA
LINK DATA
LINK1 DATA
LINK2 DATA
LINK3 DATA
LINK4 DATA
LINK5 DATA
LINK6 DATA
METADATA DATA
RTE DATA
RTEPT DATA
TRK DATA
TRKPT DATA
TRKSEG DATA
WPT DATA
18
Dealing with Ordinals over Multiple Files
• Repeating the same code for 19 elements (XML pseudo SAS
Datasets) is a pain.
• Can you imagine doing it for hundreds?
• I’m lazy and I make mistakes.
• I’d really rather not copy & paste & edit the same code 19 (or 190)
times.
• I’d rather not have to repeat the process every time the file changes
(or a new element appears – no XSD)
• Since the same process applies for every one of the elements, why
not let code do the work for me?
19
Using SAS Code to Generate SAS Code
• Mechanism is fairly simple: File, Put, and %include:
filename sourcecd “gpxxml_generated&DATADATE..sas";
data _null_;
file sourcecd;
set temp.prikeys end=EOF;
/* use put statements */
run;
%include sourcecd;
20
Using SAS Code to Generate SAS Code
• Walking through the code:
• We can also look at the log:
• The generated SAS code (in case anyone cares):
• x
xml_process_generator.txt
xml_p_g_ppt_log.txt
gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt
21
Wrap Up
Questions
and
Answers
?! ?!
?! ?!
?
? ?
?
!
!
!
!

More Related Content

What's hot

NoSQL and Architectures
NoSQL and ArchitecturesNoSQL and Architectures
NoSQL and Architectures
Eberhard Wolff
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
Solving real world data problems with Jerakia
Solving real world data problems with JerakiaSolving real world data problems with Jerakia
Solving real world data problems with Jerakia
Craig Dunn
 
Css
CssCss
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from Java
Neo4j
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
Carlos Alberto Benitez
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
Ram Kedem
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
gagravarr
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappe
Spyros Passas
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
Ian Varley
 

What's hot (18)

NoSQL and Architectures
NoSQL and ArchitecturesNoSQL and Architectures
NoSQL and Architectures
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Solving real world data problems with Jerakia
Solving real world data problems with JerakiaSolving real world data problems with Jerakia
Solving real world data problems with Jerakia
 
Css
CssCss
Css
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from Java
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
NoSQL
NoSQLNoSQL
NoSQL
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Mongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappeMongo db php_shaken_not_stirred_joomlafrappe
Mongo db php_shaken_not_stirred_joomlafrappe
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
 

Similar to 20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals

Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Don Drake
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
AnushaMahmood
 
Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
STC-Philadelphia Metro Chapter
 
20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt
David Horvath
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
Kyle Banerjee
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
Boni Bruno
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Getting to Real-Time in a Multi-Model Architecture
Getting to Real-Time in a Multi-Model ArchitectureGetting to Real-Time in a Multi-Model Architecture
Getting to Real-Time in a Multi-Model Architecture
Benjamin Nussbaum
 
Processing XML
Processing XMLProcessing XML
Processing XML
Ólafur Andri Ragnarsson
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
Minio
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
Guang Xu
 
Investigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock HolmesInvestigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock Holmes
Richard Douglas
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
Narendranath Reddy T
 

Similar to 20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals (20)

Xml
XmlXml
Xml
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
 
Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
 
20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt20180410 sasgf2018 2454 lazy programmers xml ppt
20180410 sasgf2018 2454 lazy programmers xml ppt
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Getting to Real-Time in a Multi-Model Architecture
Getting to Real-Time in a Multi-Model ArchitectureGetting to Real-Time in a Multi-Model Architecture
Getting to Real-Time in a Multi-Model Architecture
 
Processing XML
Processing XMLProcessing XML
Processing XML
 
XML
XMLXML
XML
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Investigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock HolmesInvestigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock Holmes
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 

More from David Horvath

20190413 zen and the art of programming
20190413 zen and the art of programming20190413 zen and the art of programming
20190413 zen and the art of programming
David Horvath
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks
David Horvath
 
20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue
David Horvath
 
20180324 leveraging unix tools
20180324 leveraging unix tools20180324 leveraging unix tools
20180324 leveraging unix tools
David Horvath
 
20180324 zen and the art of programming
20180324 zen and the art of programming20180324 zen and the art of programming
20180324 zen and the art of programming
David Horvath
 
20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving
David Horvath
 
20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt
David Horvath
 
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
David Horvath
 
20150312 NOBS for Noobs
20150312 NOBS for Noobs20150312 NOBS for Noobs
20150312 NOBS for Noobs
David Horvath
 
20140612 phila sug proc import
20140612 phila sug proc import20140612 phila sug proc import
20140612 phila sug proc import
David Horvath
 
20170419 To COMPRESS or Not, to COMPRESS or ZIP
20170419 To COMPRESS or Not, to COMPRESS or ZIP20170419 To COMPRESS or Not, to COMPRESS or ZIP
20170419 To COMPRESS or Not, to COMPRESS or ZIP
David Horvath
 

More from David Horvath (11)

20190413 zen and the art of programming
20190413 zen and the art of programming20190413 zen and the art of programming
20190413 zen and the art of programming
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks
 
20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue20180414 nevil shute no highway modern metal fatigue
20180414 nevil shute no highway modern metal fatigue
 
20180324 leveraging unix tools
20180324 leveraging unix tools20180324 leveraging unix tools
20180324 leveraging unix tools
 
20180324 zen and the art of programming
20180324 zen and the art of programming20180324 zen and the art of programming
20180324 zen and the art of programming
 
20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 184 zen and the art of problem solving
 
20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt20171106 sesug bb 180 proc import ppt
20171106 sesug bb 180 proc import ppt
 
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
 
20150312 NOBS for Noobs
20150312 NOBS for Noobs20150312 NOBS for Noobs
20150312 NOBS for Noobs
 
20140612 phila sug proc import
20140612 phila sug proc import20140612 phila sug proc import
20140612 phila sug proc import
 
20170419 To COMPRESS or Not, to COMPRESS or ZIP
20170419 To COMPRESS or Not, to COMPRESS or ZIP20170419 To COMPRESS or Not, to COMPRESS or ZIP
20170419 To COMPRESS or Not, to COMPRESS or ZIP
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ordinals

  • 1. Lazy Programmers Write Self-Modifying Code or Dealing with XML Ordinals David B. Horvath, CCP, MS PhilaSUG Spring 2012 Meeting
  • 2. Contact Information Copyright © 2012, David B. Horvath, CCP — All Rights Reserved The Author can be contacted at: 504 Longbotham Drive, Aston PA 19014-2502, USA Phone: 1-610-859-8826 Email: dhorvath@cobs.com Web: http://www.cobs.com/ All trademarks and servicemarks are the property of their respective owners.
  • 3. Abstract • The XML engine within SAS is very powerful but it does convert every object into a SAS dataset with generated keys to implement the parent/child relationships between these objects. Those keys (Ordinals in SAS-speak) are guaranteed to be unique within a specific XML file. However, they restart at 1 with each file. When concatenating the individual tables together, those keys are no longer unique. • We received an XML file with over 110 objects resulting in over 110 SAS datasets our internal customer wanted concatenated for multiple days. Rather than copying and pasting the code to handle this process 110+ times, and knowing that I would make mistakes along the way – and knowing that the objects would also change along the way, I created SAS code to create the SAS code to handle the XML. I consider myself a Lazy Programmer. • As the classic "Real Programmers…" sheet tells us, Real Programmers are Lazy. • This session reviews XML (briefly), SAS XML Mapper, SAS XML Engine, techniques for handing the Ordinals over multiple days, and finally discusses a technique for using SAS code to generate SAS code. 3
  • 4. 4 Introductions • My Background • XML • SAS XML Mapping Tool • SAS XML Engine (Code to access data in XML) • Our Problem • Dealing with Ordinals over Multiple Files • Using SAS Code to Generate SAS Code
  • 5. 5 My Background • Base SAS on Mainframe, UNIX, and PC Platforms • SAS is primarily an ETL tool or Programming Language for me • My background is IT – I am not a modeler • My first SUG presentation • Not my first User Group presentation – presented workshops and seminars in Australia, France, the US, and Canada. • Undergraduate: Computer and Information Sciences, Temple Univ. • Graduate: Organizational Dynamics, Upenn • Most of my career was in consulting • Have written several books (none SAS-related) • Adjunct Instructor covering IT topics.
  • 6. 6 XML - Background • XML stands for eXtensible Markup Language • Originally created in 1996 • Consists of Markup and Content • Markup defines the items (fields, etc.) – represented with tags • Content is the data • Is transportable and human readable • If well formed, you’ll have a definition (XSD – XML Schema Definition) • If not well formed, you’ll only have the XML data file itself • Easy for data provider to change layout: update XSD, add data to XML file • An easy way to think of XML is “CSV on steroids” • Very flexible: Advantage and Disadvantage
  • 7. 7 XML – XML File Sample <?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtensio n/v2" creator="nüvi 2370" version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><meta data><link href="http://www.garmin.com"><text>Garmin International</text></link><time>2012-04- 12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="- 75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></w pt><wpt lat="39.764033" lon="-75.551346"><ele>61.17</ele> • Not very helpful viewed that way
  • 8. 8 XML – XML File Sample
  • 9. 9 XML – XML File Sample • Treating the file as text may be helpful • Elements like <trk> can have repeating sub-elements like <trkseg> • But we don’t know the “data model” • I’m using examples from my Garmin GPS • I don’t have to sanitize the data like I would with a file from work… • I’m not going to teach you XML coding today • http://en.wikipedia.org/wiki/Xml provides good background gpx_small_xml.txt
  • 10. 10 XML – XSD File Sample • Treating the file as text may be helpful • Describes exactly what is expected in the XML data file Gpx_xsd.txt
  • 11. 11 SAS XML Mapping Tool • Free download from http://support.sas.com/kb/33/584.html • Creates a Map for SAS to read XML as a “SAS Dataset” • Can process an XML data file to create Map • Works with subset of large file • Not all elements appear for any particular key • Tool has to guess data type of elements (like proc import CSV) • Better to process an XSD file to create the Map • Full definition (no “guessing” required) • Not always available • In easiest usage, will create keys to connect elements (“Ordinals”) • The map is in XML format Gpx_map.txt
  • 12. 12 SAS XML Engine (Code to access data in XML) • Really very simple to use once Map is built: filename CDAtest2 "/export/home/fw03606/gpx.xml"; filename SXLEMAP "/export/home/fw03606/gpx.map"; libname CDAtest2 xml xmlmap=SXLEMAP access=READONLY; • And then use it much like any other SAS Dataset: proc contents data=CDAtest2._all_ ; run; • Or data tableonly; set CDAtest2.member END=EOF; output; run;
  • 13. 13 SAS XML Engine (Code to access data in XML) • Modifying the XML file is more difficult (and not part of this presentation): ERROR: XMLMap= has been specified on the XML Libname assignment. The output produced via this option will change in upcoming releases. Correct the XML Libname(remove XMLMap= option) and resubmit. Output generation aborted. • Everything you ever wanted to know about the SAS XML engine is available at • http://support.sas.com/rnd/base/xmlengine/index.html
  • 14. 14 Our Problem • Vendor provided XML file: • Limited documentation • No internal experience • Short timeline • Hundreds of internal “objects” (mapped to hundreds of SAS datasets) • Needed to be able to “see” data to learn about it • Once in production: • Daily input file • Concatenated output Datasets
  • 15. 15 Dealing with Ordinals over Multiple Files • Every object in the SAS mapped XML file will have an Ordinal to ensure uniqueness (gpx): • Child objects contain their parent Ordinals (rte): Variable Type LenFormat Informat creator Char 32$32. $32. extensions Char 32$32. $32. gpx Char 32$32. $32. gpx_ORDINAL Num 8F8. F8. version Char 32$32. $32. Variable Type LenFormat Informat cmt Char 32$32. $32. desc Char 32$32. $32. extensions Char 32$32. $32. gpx_ORDINAL Num 8F8. F8. name Char 32$32. $32. number Num 8F8. F8. rte Char 32$32. $32. rte_ORDINAL Num 8F8. F8. src Char 32$32. $32. type Char 32$32. $32.
  • 16. 16 Dealing with Ordinals over Multiple Files • Those Ordinals are only unique to a specific XML data file, not over time. • In order to append today’s data to yesterday’s the Ordinals need to change. • Simple solution: • Find yesterday’s maximum Ordinal for each table • Add it to today’s values • Append to yesterday’s accumulated file • A better solution would be to build records in your desired format • But you have to understand the data in order to do that • Reduces flexibility
  • 17. 17 Dealing with Ordinals over Multiple Files • GPX example converts to 19 SAS datasets Name Member Type AUTHOR DATA BOUNDS DATA COPYRIGHT DATA EMAIL DATA GPX DATA LINK DATA LINK1 DATA LINK2 DATA LINK3 DATA LINK4 DATA LINK5 DATA LINK6 DATA METADATA DATA RTE DATA RTEPT DATA TRK DATA TRKPT DATA TRKSEG DATA WPT DATA
  • 18. 18 Dealing with Ordinals over Multiple Files • Repeating the same code for 19 elements (XML pseudo SAS Datasets) is a pain. • Can you imagine doing it for hundreds? • I’m lazy and I make mistakes. • I’d really rather not copy & paste & edit the same code 19 (or 190) times. • I’d rather not have to repeat the process every time the file changes (or a new element appears – no XSD) • Since the same process applies for every one of the elements, why not let code do the work for me?
  • 19. 19 Using SAS Code to Generate SAS Code • Mechanism is fairly simple: File, Put, and %include: filename sourcecd “gpxxml_generated&DATADATE..sas"; data _null_; file sourcecd; set temp.prikeys end=EOF; /* use put statements */ run; %include sourcecd;
  • 20. 20 Using SAS Code to Generate SAS Code • Walking through the code: • We can also look at the log: • The generated SAS code (in case anyone cares): • x xml_process_generator.txt xml_p_g_ppt_log.txt gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt