This document summarizes David B. Horvath's presentation on dealing with XML ordinals over multiple files. It discusses how the SAS XML engine converts XML objects into SAS datasets with generated keys (ordinals) to represent parent-child relationships. However, these ordinals are not unique when concatenating datasets from multiple files. The presentation describes how to handle non-unique ordinals by finding the maximum ordinal from the previous file and adding it to the current file's values. It also discusses how the presenter addressed processing over 100 datasets by writing SAS code to generate the SAS code needed to handle the XML processing, rather than copying and pasting code manually.
PHPID - Code Factory Online Meetup, Supporting COVID-19 with Rumah Komunitas. Silakan kunjungi laman ini untuk rekaman meetup https://youtu.be/pACuwalpQpk
A powerful feature in Postgres called Foreign Data Wrappers lets end users integrate data from MongoDB, Hadoop and other solutions with their Postgres database and leverage it as single, seamless database using SQL.
Use of these features has skyrocketed since EDB released to the open source community new FDWs for MongoDB, Hadoop and MySQL that support both read and write capabilities. Now greatly enhanced, FDWs enable integrating data across disparate deployments to support new workloads, expanded development goals and harvesting greater value from data.
To view the recorded presentation please visit Enterprisedb.com > Resources > On Demand Webcasts
Contact EnterpriseDB with your questions - sales@enterprisedbc.om
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
ArangoDB is a universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
The video is also available online:
http://2012.nosql-matters.org/bcn/speakers/
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. ORC is a self-describing type-aware columnar file format designed for Hadoop workloads that provides optimized streaming reads, but with integrated support for finding required rows quickly. In this talk, we will outline the progress made in Apache community for adding fine-grained column level encryption natively into ORC format that will also provide capabilities to mask or redact data on write while protecting sensitive column metadata such as statistics to avoid information leakage. The column encryption capabilities will be fully compatible with Hadoop Key Management Server (KMS) and use the KMS to manage master keys providing the additional flexibility to use and manage keys per column centrally.
PHPID - Code Factory Online Meetup, Supporting COVID-19 with Rumah Komunitas. Silakan kunjungi laman ini untuk rekaman meetup https://youtu.be/pACuwalpQpk
A powerful feature in Postgres called Foreign Data Wrappers lets end users integrate data from MongoDB, Hadoop and other solutions with their Postgres database and leverage it as single, seamless database using SQL.
Use of these features has skyrocketed since EDB released to the open source community new FDWs for MongoDB, Hadoop and MySQL that support both read and write capabilities. Now greatly enhanced, FDWs enable integrating data across disparate deployments to support new workloads, expanded development goals and harvesting greater value from data.
To view the recorded presentation please visit Enterprisedb.com > Resources > On Demand Webcasts
Contact EnterpriseDB with your questions - sales@enterprisedbc.om
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
ArangoDB is a universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
The video is also available online:
http://2012.nosql-matters.org/bcn/speakers/
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. ORC is a self-describing type-aware columnar file format designed for Hadoop workloads that provides optimized streaming reads, but with integrated support for finding required rows quickly. In this talk, we will outline the progress made in Apache community for adding fine-grained column level encryption natively into ORC format that will also provide capabilities to mask or redact data on write while protecting sensitive column metadata such as statistics to avoid information leakage. The column encryption capabilities will be fully compatible with Hadoop Key Management Server (KMS) and use the KMS to manage master keys providing the additional flexibility to use and manage keys per column centrally.
This presentation shows the influence of NoSQL databases on software architectures. It discusses different NoSQL flavors and products and shows how software architects can get the maximum benefit from those databases.
Solving real world data problems with JerakiaCraig Dunn
This is the talk I gave at Config Management Camp 2016 in Ghent introducing Jerakia as a lookup tool that can be used in place of, or along side of hiera to solve some of the edge cases around data separation
There are many ways to use Neo4j from Java. In this talk I want to demonstrate different APIs and examples on how to build solutions on top of Neo4j using a Java based stack.
Learn what you need to consider when moving from the world of relational databases to a NoSQL document store.
Hear from Developer Advocate Glynn Bird as he explains the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
If You Have The Content, Then Apache Has The Technology!gagravarr
Within the ASF, there are a wide variety of projects with technologies to help you store, retrieve, host, transform and generate content. This talk will review the landscape of Apache content technologies, provide a quick introduction to the more common and more interesting projects, and flag up new and innovative features within them. It'll also highlight talks from the rest of the week on many of the projects covered, so that you'll know where and when to go to learn more about those projects and technologies which catch your eye!
This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
Data interchange integration, HTML XML Biological XML DTDAnushaMahmood
Data interchange integration. Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML
By now, you have heard how important structured content is. But, maybe you poked around with something like DITA and were baffled by the complexity. Or, maybe you still aren’t sure what XSLT stands for. This workshop will take participants back to the basics, to provide a foundation for higher-level concepts that have taken hold of our industry. Topics will include:
- What XML looks like, what it does, and how to create it.
- How to define a structure model, including whether to use a - DTD, Schema, etc.
- What XSLT looks like, what it does, and how to make it work.
- What DITA and DocBook really are and whether one is right for you.
Russell Ward is an experienced technical writer and structured technologies developer. He has spent many years working with structured content to maximize efficiency in the techcomm environment, both as an employee and as an independent consultant. He is also an experienced trainer and speaks periodically at conferences and other peer events.
This presentation shows the influence of NoSQL databases on software architectures. It discusses different NoSQL flavors and products and shows how software architects can get the maximum benefit from those databases.
Solving real world data problems with JerakiaCraig Dunn
This is the talk I gave at Config Management Camp 2016 in Ghent introducing Jerakia as a lookup tool that can be used in place of, or along side of hiera to solve some of the edge cases around data separation
There are many ways to use Neo4j from Java. In this talk I want to demonstrate different APIs and examples on how to build solutions on top of Neo4j using a Java based stack.
Learn what you need to consider when moving from the world of relational databases to a NoSQL document store.
Hear from Developer Advocate Glynn Bird as he explains the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
If You Have The Content, Then Apache Has The Technology!gagravarr
Within the ASF, there are a wide variety of projects with technologies to help you store, retrieve, host, transform and generate content. This talk will review the landscape of Apache content technologies, provide a quick introduction to the more common and more interesting projects, and flag up new and innovative features within them. It'll also highlight talks from the rest of the week on many of the projects covered, so that you'll know where and when to go to learn more about those projects and technologies which catch your eye!
This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
Data interchange integration, HTML XML Biological XML DTDAnushaMahmood
Data interchange integration. Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML
By now, you have heard how important structured content is. But, maybe you poked around with something like DITA and were baffled by the complexity. Or, maybe you still aren’t sure what XSLT stands for. This workshop will take participants back to the basics, to provide a foundation for higher-level concepts that have taken hold of our industry. Topics will include:
- What XML looks like, what it does, and how to create it.
- How to define a structure model, including whether to use a - DTD, Schema, etc.
- What XSLT looks like, what it does, and how to make it work.
- What DITA and DocBook really are and whether one is right for you.
Russell Ward is an experienced technical writer and structured technologies developer. He has spent many years working with structured content to maximize efficiency in the techcomm environment, both as an employee and as an independent consultant. He is also an experienced trainer and speaks periodically at conferences and other peer events.
Web scraping is mostly about parsing and normalization. This presentation introduces people to harvesting methods and tools as well as handy utilities for extracting and normalizing data
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
AOL experienced explosive growth and needed a new database that was both flexible and easy to deploy with little effort. They chose MongoDB. Due to the complexity of internal systems and the data, most of the migration process was spent building a new identity platform and adapters for legacy apps to talk to MongoDB. Systems were migrated in 4 phases to ensure that users were not impacted during the switch. Turning on dual reads/writes to both legacy databases and MongoDB also helped get production traffic into MongoDB during the process. Ultimately, the project was successful with the help of MongoDB support. Today, the team has 15 shards, with 60-70 GB per shard.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Recording available here: https://youtu.be/VNYB373by0s
In any enterprise you’ll find a myriad of data technologies and architectures leveraged by various departments and groups – segmented and disconnected – yet all unified in the objective of turning the vast amounts of data generated and gathered by the organization into real-time (at least as the goal) insights and quantified decisions. All the components of a multi-model architecture likely already exist but can lack the strategy, structure, and integration to become an effective real-time data architecture. We discuss these issues, and present solutions that pave the way.
This presentation covers
- Defining a Strategy for Real-Time
- Understanding Each Model’s Strengths and Purpose
- Integrating It All Together
Modern data lakes are now built on cloud storage, helping organizations leverage the scale and economics of object storage while simplifying overall data storage and analysis flow
This is an exam cheat sheet hopes to cover all keys points for GCP Data Engineer Certification Exam
Let me know if there is any mistake and I will try to update it
The SQL Server system database TempDB has often been called a dumping ground, even the public toilet of SQL Server. (There has to be a joke about spills in there somewhere). In this session you will learn to find those criminal activities that are going on deep in the depths of SQL Server that are causing performance issues, not just for one session, but that affect everybody on that instance.
SAS provides a number of tools allowing you to execute operating system/shell commands from within your SAS code. This presentation looks at those tools. Presented 20190319 Philadelphia SAS Users Group.
20180414 nevil shute no highway modern metal fatigueDavid Horvath
Presentation on Nevil Shut "No Highway", descriptions of metal fatigue in the book, what Shute knew about metal fatigue, and modern examples in aircraft
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
3. Abstract
• The XML engine within SAS is very powerful but it does convert every object
into a SAS dataset with generated keys to implement the parent/child
relationships between these objects. Those keys (Ordinals in SAS-speak) are
guaranteed to be unique within a specific XML file. However, they restart at 1
with each file. When concatenating the individual tables together, those keys
are no longer unique.
• We received an XML file with over 110 objects resulting in over 110 SAS
datasets our internal customer wanted concatenated for multiple days. Rather
than copying and pasting the code to handle this process 110+ times, and
knowing that I would make mistakes along the way – and knowing that the
objects would also change along the way, I created SAS code to create the
SAS code to handle the XML. I consider myself a Lazy Programmer.
• As the classic "Real Programmers…" sheet tells us, Real Programmers are
Lazy.
• This session reviews XML (briefly), SAS XML Mapper, SAS XML Engine,
techniques for handing the Ordinals over multiple days, and finally discusses a
technique for using SAS code to generate SAS code.
3
4. 4
Introductions
• My Background
• XML
• SAS XML Mapping Tool
• SAS XML Engine (Code to access data in XML)
• Our Problem
• Dealing with Ordinals over Multiple Files
• Using SAS Code to Generate SAS Code
5. 5
My Background
• Base SAS on Mainframe, UNIX, and PC Platforms
• SAS is primarily an ETL tool or Programming Language for me
• My background is IT – I am not a modeler
• My first SUG presentation
• Not my first User Group presentation – presented workshops and
seminars in Australia, France, the US, and Canada.
• Undergraduate: Computer and Information Sciences, Temple Univ.
• Graduate: Organizational Dynamics, Upenn
• Most of my career was in consulting
• Have written several books (none SAS-related)
• Adjunct Instructor covering IT topics.
6. 6
XML - Background
• XML stands for eXtensible Markup Language
• Originally created in 1996
• Consists of Markup and Content
• Markup defines the items (fields, etc.) – represented with tags
• Content is the data
• Is transportable and human readable
• If well formed, you’ll have a definition (XSD – XML Schema Definition)
• If not well formed, you’ll only have the XML data file itself
• Easy for data provider to change layout: update XSD, add data to
XML file
• An easy way to think of XML is “CSV on steroids”
• Very flexible: Advantage and Disadvantage
7. 7
XML – XML File Sample
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx
xmlns="http://www.topografix.com/GPX/1/1"
xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtensio
n/v2" creator="nüvi 2370" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1
http://www.topografix.com/GPX/1/1/gpx.xsd
http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><meta
data><link href="http://www.garmin.com"><text>Garmin
International</text></link><time>2012-04-
12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="-
75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></w
pt><wpt lat="39.764033" lon="-75.551346"><ele>61.17</ele>
• Not very helpful viewed that way
9. 9
XML – XML File Sample
• Treating the file as text may be helpful
• Elements like <trk> can have repeating sub-elements like <trkseg>
• But we don’t know the “data model”
• I’m using examples from my Garmin GPS
• I don’t have to sanitize the data like I would with a file from work…
• I’m not going to teach you XML coding today
• http://en.wikipedia.org/wiki/Xml provides good background
gpx_small_xml.txt
10. 10
XML – XSD File Sample
• Treating the file as text may be helpful
• Describes exactly what is expected in the XML data file
Gpx_xsd.txt
11. 11
SAS XML Mapping Tool
• Free download from http://support.sas.com/kb/33/584.html
• Creates a Map for SAS to read XML as a “SAS Dataset”
• Can process an XML data file to create Map
• Works with subset of large file
• Not all elements appear for any particular key
• Tool has to guess data type of elements (like proc import CSV)
• Better to process an XSD file to create the Map
• Full definition (no “guessing” required)
• Not always available
• In easiest usage, will create keys to connect elements (“Ordinals”)
• The map is in XML format
Gpx_map.txt
12. 12
SAS XML Engine (Code to access data in XML)
• Really very simple to use once Map is built:
filename CDAtest2 "/export/home/fw03606/gpx.xml";
filename SXLEMAP "/export/home/fw03606/gpx.map";
libname CDAtest2 xml xmlmap=SXLEMAP access=READONLY;
• And then use it much like any other SAS Dataset:
proc contents data=CDAtest2._all_ ;
run;
• Or
data tableonly;
set CDAtest2.member END=EOF;
output;
run;
13. 13
SAS XML Engine (Code to access data in XML)
• Modifying the XML file is more difficult (and not part of this
presentation):
ERROR: XMLMap= has been specified on the XML Libname
assignment. The output produced via this option will
change in upcoming releases. Correct the XML
Libname(remove XMLMap= option) and resubmit. Output
generation aborted.
• Everything you ever wanted to know about the SAS XML engine is
available at
• http://support.sas.com/rnd/base/xmlengine/index.html
14. 14
Our Problem
• Vendor provided XML file:
• Limited documentation
• No internal experience
• Short timeline
• Hundreds of internal “objects” (mapped to hundreds of SAS datasets)
• Needed to be able to “see” data to learn about it
• Once in production:
• Daily input file
• Concatenated output Datasets
15. 15
Dealing with Ordinals over Multiple Files
• Every object in the SAS mapped XML file will have an Ordinal to
ensure uniqueness (gpx):
• Child objects contain their parent Ordinals (rte):
Variable Type LenFormat Informat
creator Char 32$32. $32.
extensions Char 32$32. $32.
gpx Char 32$32. $32.
gpx_ORDINAL Num 8F8. F8.
version Char 32$32. $32.
Variable Type LenFormat Informat
cmt Char 32$32. $32.
desc Char 32$32. $32.
extensions Char 32$32. $32.
gpx_ORDINAL Num 8F8. F8.
name Char 32$32. $32.
number Num 8F8. F8.
rte Char 32$32. $32.
rte_ORDINAL Num 8F8. F8.
src Char 32$32. $32.
type Char 32$32. $32.
16. 16
Dealing with Ordinals over Multiple Files
• Those Ordinals are only unique to a specific XML data file, not over
time.
• In order to append today’s data to yesterday’s the Ordinals need to
change.
• Simple solution:
• Find yesterday’s maximum Ordinal for each table
• Add it to today’s values
• Append to yesterday’s accumulated file
• A better solution would be to build records in your desired format
• But you have to understand the data in order to do that
• Reduces flexibility
17. 17
Dealing with Ordinals over Multiple Files
• GPX example converts to 19 SAS datasets
Name Member Type
AUTHOR DATA
BOUNDS DATA
COPYRIGHT DATA
EMAIL DATA
GPX DATA
LINK DATA
LINK1 DATA
LINK2 DATA
LINK3 DATA
LINK4 DATA
LINK5 DATA
LINK6 DATA
METADATA DATA
RTE DATA
RTEPT DATA
TRK DATA
TRKPT DATA
TRKSEG DATA
WPT DATA
18. 18
Dealing with Ordinals over Multiple Files
• Repeating the same code for 19 elements (XML pseudo SAS
Datasets) is a pain.
• Can you imagine doing it for hundreds?
• I’m lazy and I make mistakes.
• I’d really rather not copy & paste & edit the same code 19 (or 190)
times.
• I’d rather not have to repeat the process every time the file changes
(or a new element appears – no XSD)
• Since the same process applies for every one of the elements, why
not let code do the work for me?
19. 19
Using SAS Code to Generate SAS Code
• Mechanism is fairly simple: File, Put, and %include:
filename sourcecd “gpxxml_generated&DATADATE..sas";
data _null_;
file sourcecd;
set temp.prikeys end=EOF;
/* use put statements */
run;
%include sourcecd;
20. 20
Using SAS Code to Generate SAS Code
• Walking through the code:
• We can also look at the log:
• The generated SAS code (in case anyone cares):
• x
xml_process_generator.txt
xml_p_g_ppt_log.txt
gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt