SlideShare a Scribd company logo
1 of 75
Download to read offline
Data Format Description Language (DFDL)
and the Open Source Apache DaffodilTM Project
2023-03-15
Agenda
▪ DFDL (aka "DaFfoDiL")
• Why is DFDL Needed?
• Quick overview of DFDL
• Use Cases for DFDL
▪ DFDL Tutorial
• Examples: CSV, PCAP, MIL-STD-2045
• What DFDL Doesn't do (today)
▪ DFDL Schemas
▪ Implementations
• Open Source: Apache Daffodil, Daffodil VSCode Extension
Copyright 2023 Owl Cyber Defense 2
Why is DFDL Needed?
Copyright 2023 Owl Cyber Defense 3
Every Enterprise Software Company
• IBM (10+)
• Oracle(10+)
• SAP(10+)
• Microsoft
• SAS
• Informatica
• SyncSort
• AbInitio
• Pervasive
• Qlik/Expressor
• Pentaho
• .... Dozens more
Every kind of software that takes in data:
• data directed routing (msg brokers)
• database
• data analysis and/or data mining
• data cleansing
• master data management
• application integration
• data visualization
• …
There are hundreds of ad-hoc data format description systems
All these data format descriptions are:
• proprietary
• ad-hoc
• incompatible
Even within products of the same company!
Why DFDL is Needed?
▪ Hundreds of data format description systems… means that:
▪ Vendor investment is spread too thin
• Tools for creating data formats are inadequate
• No product is comprehensive enough
• Some products aren’t fast enough
▪ Customers get locked in to proprietary solution
▪ High training cost, low career value
▪ Inflexible packaging
• not libraries - must embed some product in your application data
flow
Copyright 2023 Owl Cyber Defense 4
Why DFDL is Needed?
▪ Result… we are stuck with:
▪ Data parsers continue to be written as
programs
▪ High cost to write & maintain, high error rate.
▪ When needed, certification (C&A) costs =
$$$$$
Copyright 2023 Owl Cyber Defense 5
Solving the Data Format Problem
▪ An Open Standard DFDL
• Multiple implementations that interoperate
▪ Commercial & Open Source
• Long-term sponsors
▪ IBM – has their own DFDL implementations
▪ US DoD, Canada DND
• Cybersecurity
• Available DFDL schemas for important data formats
▪ A High-Quality Open Source Library Implementation
• With a supporting community of developers
• With available commercial support (Owl)
Copyright 2023 Owl Cyber Defense 6
▪ A standard from Open Grid Form (OGF)
• Being proposed for an ISO standard
▪ Started 2001, OGF Ratified 2021
▪ Big - 200+ pages
▪ DFDL is not a data format.
▪ Write a DFDL Schema to describe a format.
▪ DFDL is sometimes pronounced "DaFfoDiL"
DFDL = Data Format Description Language
7
Copyright 2023 Owl Cyber Defense
Data Format Description Language
▪ DFDL is a way of describing data formats
▪ It is NOT a data format itself!
▪ DFDL combines State-of-the-Art
▪ Union of capabilities across many marketplace data integration
products/tools/packages
▪ DFDL adds small number of real innovations
▪ Overcome limitations of prior-gen e.g.,
• Computed Elements Capability - Unparse as well as Parse!
• Expression language
• BitOrder
Copyright 2023 Owl Cyber Defense 8
Data Format Description Language
▪ Core Concepts
▪ Leverage XML Schema (XSDL)
• Grammar scaffolding
• Describes the logical data model
• DFDL uses only a subset of XML schema
• Provides standard ways to annotate
▪ Add annotations
• Describe the physical representation.
▪ Read and write from same DFDL Schema
▪ Because Developers [ Love | Hate ] XML
▪ The DFDL Schema is based on XSDL
▪ The Infoset created when parsing data does NOT have to be XML => Can be JSON, Can be
adapted to others
Copyright 2023 Owl Cyber Defense 9
Example – Delimited Text Data
Copyright 2023 Owl Cyber Defense 10
rlimit=5;rpngx=-7.1E8
Example – Delimited Text Data
Copyright 2023 Owl Cyber Defense 11
rlimit=5;rpngx=-7.1E8
ASCII text
integer
ASCII text
floating point
Separator
Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters
Initiator
Initiator
DFDL Schema
Copyright 2023 Owl Cyber Defense 12
<xs:complexType name=“rValues">
<xs:sequence>
<xs:element name=“rlimit" type="xs:int"/>
<xs:element name=“rpngx" type="xs:float"/>
</xs:sequence>
</xs:complexType>
Logical
Elements
DFDL schema
Copyright 2023 Owl Cyber Defense 13
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/v1.0">
<dfdl:format representation="text"
textNumberRep="standard" encoding="ascii"
lengthKind="delimited“ .../>
</xs:appinfo>
</xs:annotation>
<xs:complexType name=“rValues">
<xs:sequence dfdl:separator=";" ... >
<xs:element name="rLimit" type="xs:int"
dfdl:initiator=“rLimit=" ... />
<xs:element name=“rpngx" type="xs:float"
dfdl:initiator=“rpngx=" ... />
</xs:sequence>
</xs:complexType>
DFDL
properties
rLimit=5
rpngx=-7.1E8
;
DFDL Data and Infoset Lifecycle
Copyright 2023 Owl Cyber Defense 14
rLimit=5;rpngx=-7.1E8
DFDL
Implementation
DFDL
Schema
DFDL
Implementation
Element
Name: rValues
Element
Name: rpngx
Value: -7.1E8
Type: Double
Element
Name: rLimit
Value: 5
Type: Int
Parse Unparse
Infoset
Data
XML
or
JSON
or
other
More DFDL Properties
Copyright 2023 Owl Cyber Defense 15
initiator
terminator
documentFinalTerminatorCanBeMissing
outputNewLine
length
lengthPattern
textStringPadCharacter
textNumberPadCharacter
textCalendarPadCharacter
textBooleanPadCharacter
escapeCharacter
escapeBlockStart
escapeBlockEnd
escapeEscapeCharacter
extraEscapedCharacters
textNumberPattern
textStandardGroupingSeparator
textStandardDecimalSeparator
textStandardExponentRep
textStandardInfinityRep
textStandardNaNRep
textStandardZeroRep
textBooleanTrueRep
textBooleanFalseRep
calendarPattern
calendarLanguage
binaryCalendarEpoch
nilValue
separator
occursStopValue
inputValueCalc
outputValueCalc
byteOrder
bitOrder
encoding
encodingErrorPolicy
utf16Width
ignoreCase
alignment
alignmentUnits
fillByte
leadingSkip
trailingSkip
lengthKind
lengthUnits
prefixIncludesPrefixLength
prefixLengthType
representation
textPadKind
textTrimKind
textOutputMinLength
escapeKind
generateEscapeBlock
textStringJustification
textNumberRep
textNumberJustification
textNumberCheckPolicy
textStandardBase
textNumberRoundingMode
textNumberRounding
textNumberRoundingIncrement
textZonedSignStyle
binaryNumberRep
binaryDecimalVirtualPoint
binaryNumberCheckPolicy
binaryPackedSignCodes
binaryFloatRep
textBooleanJustification
binaryBooleanTrueRep
binaryBooleanFalseRep
textCalendarJustification
calendarPatternKind
calendarCheckPolicy
calendarTimeZone
calendarObserveDST
calendarFirstDayOfWeek
calendarDaysInFirstWeek
calendarCenturyStart
binaryCalendarRep
nilKind
nilValueDelimiterPolicy
useNilForDefault
emptyValueDelimiterPolicy
sequenceKind
hiddenGroupRef
initiatedContent
separatorPosition
separatorPolicy
separatorSuppressionPolicy
choiceLengthKind
choiceLength
choiceDispatchKey
choiceBranchKey
occursCountKind
occursCount
floating
truncateSpecifiedLengthString
decimalSigned
failureType
DFDL Use Cases
▪ Deep Content Inspection/Sanitization
• CDS, Network Guards
• Data Filtering
• Rip-and-Rebuild Firewalls
▪ Data-Directed Routing
▪ Data Intake
• Data Warehouse or Analytical Data Store
▪ Interoperability / Data Translation
▪ Conversion of Legacy Data to Modern Formats
▪ Open Data / Data Export
• Recasting data as XML for export/publishing
Copyright 2023 Owl Cyber Defense 16
Examples
CSV (ASCII text delimited)
PCAP (variable length binary)
MIL-STD-2045 (dense bit-packed binary, LSBF)
Copyright 2023 Owl Cyber Defense 17
Example: CSV
Comma-Separated Values
Copyright 2023 Owl Cyber Defense 18
Example - CSV
▪ Textual Format (ASCII)
▪ Comma-Separated Values
▪ Tabular data
▪ Rows separated by newlines
▪ Fields separated by commas
Copyright 2023 Owl Cyber Defense 19
Smith,Charles,36,6.2
Williams,Mary,51,5.5
Example - CSV
Copyright 2023 Owl Cyber Defense 20
<xs:schema>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="defaults"
representation="text"
encoding="ASCII"
lengthKind="delimited"
escapeSchemRef="quoted"
occursCountKind="parsed" />
<dfdl:defineEscapeScheme name="quoted">
<dfdl:escapeScheme escapeKind="escapeBlock"
escapeBlockStart="&quot;"
escapeBlockEnd="&quot;"
escapeEscapeCharacter="&quot;">
</dfdl:defineEscapeScheme>
</xs:appinfo>
<xs:annotation>
<xs:schema>
Example - CSV
Copyright 2023 Owl Cyber Defense 21
<xs:schema>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="defaults"
representation="text"
encoding="ASCII"
lengthKind="delimited"
escapeSchemRef="quoted"
occursCountKind="parsed" />
<dfdl:defineEscapeScheme name="quoted">
<dfdl:escapeScheme escapeKind="escapeBlock"
escapeBlockStart="&quot;"
escapeBlockEnd="&quot;"
escapeEscapeCharacter="&quot;">
</dfdl:defineEscapeScheme>
</xs:appinfo>
<xs:annotation>
<xs:schema>
Example - CSV
Copyright 2023 Owl Cyber Defense 22
<xs:element name="csv">
<xs:complexType>
<xs:sequence dfdl:separator="%NL;"
dfdl:separatorPosition="postfix">
<xs:element name="record" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator=","
dfdl:separatorPosition="infix">
<xs:element name="item" type="xs:string"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Example - CSV
Copyright 2023 Owl Cyber Defense 23
<xs:element name="csv">
<xs:complexType>
<xs:sequence dfdl:separator="%NL;"
dfdl:separatorPosition="postfix">
<xs:element name="record" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator=","
dfdl:separatorPosition="infix">
<xs:element name="item" type="xs:string"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Example - CSV
Copyright 2023 Owl Cyber Defense 24
CSV Input
csv
record
item
Smith
item
Charles
item
36
item
6.2
record
item
Williams
item
Mary
Item
51
item
5.5
DFDL
Implementation
DFDL
Infoset
Smith,Charles,36,6.2
Williams,Mary,51,5.5
CSV DFDL
Schema
Example: PCAP
Packet Capture File Format
Copyright 2023 Owl Cyber Defense 25
Example - PCAP
Copyright 2023 Owl Cyber Defense 26
pcap
global
header
field
field
field
packet
header
data
DFDL
Implementation
DFDL
Infoset
PCAP DFDL
Schema
PCAP Data
Example - PCAP
▪ Network Packet Capture
▪ Binary Format
▪ Global Header
▪ Packet Header/Data
▪ Variable Endianness and Data Length
Copyright 2023 Owl Cyber Defense 27
Global
Header
Packet
Header
Packet
Data
Packet
Header
Packet
Data
…
Example - PCAP
Copyright 2023 Owl Cyber Defense 28
<xs:schema>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="defaults"
representation="binary"
alignment="8"
alignmentUnits="bits"
lengthUnits="bits"
binaryNumberRep="binary"
lengthKind="implicit"
byteOrder="{ $byte_order }" />
<dfdl:defineVariable name="byte_order" type="xs:string"/>
</xs:appinfo>
</xs:annotation>
<xs:simpleType name="uint32" dfdl:lengthKind="explicit"
dfdl:length="32">
<xs:restriction base="xs:unsignedInt"/>
</xs:simpleType>
</xs:schema>
Example - PCAP
Copyright 2023 Owl Cyber Defense 29
<xs:schema>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="defaults"
representation="binary"
alignment="8"
alignmentUnits="bits"
lengthUnits="bits"
binaryNumberRep="binary"
lengthKind="implicit"
byteOrder="{ $byte_order }" />
<dfdl:defineVariable name="byte_order" type="xs:string"/>
</xs:appinfo>
</xs:annotation>
<xs:simpleType name="uint32" dfdl:lengthKind="explicit"
dfdl:length="32">
<xs:restriction base="xs:unsignedInt"/>
</xs:simpleType>
</xs:schema>
Example - PCAP
Copyright 2023 Owl Cyber Defense 30
<xs:schema>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="defaults"
representation="binary"
alignment="8"
alignmentUnits="bits"
lengthUnits="bits"
binaryNumberRep="binary"
lengthKind="implicit"
byteOrder="{ $byte_order }" />
<dfdl:defineVariable name="byte_order" type="xs:string"/>
</xs:appinfo>
</xs:annotation>
<xs:simpleType name="uint32" dfdl:lengthKind="explicit"
dfdl:length="32">
<xs:restriction base="xs:unsignedInt"/>
</xs:simpleType>
</xs:schema>
Example - PCAP
Copyright 2023 Owl Cyber Defense 31
<xs:element name="magic_number" type="ex:uint32"
dfdl:byteOrder="bigEndian">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:setVariable ref="byte_order"><![CDATA[{
if (. eq dfdl:unsignedInt('xA1B2C3D4')) then 'bigEndian'
else if (. eq dfdl:unsignedInt('xD4C3B2A1') then 'littleEndian'
else daf:error() }]]>
</dfdl:setVariable>
<dfdl:property name="outputValueCalc"><![CDATA[{
if ($dfdl:byteOrder eq 'bigEndian')
then dfdl:unsignedInt('xA1B2C3D4')
else dfdl:unsignedInt('xD4C3B2A1')
}]]></dfdl:property>
</xs:appinfo>
</xs:annotation>
</xs:element>
Example - PCAP
Copyright 2023 Owl Cyber Defense 32
<xs:element name="magic_number" type="ex:uint32"
dfdl:byteOrder="bigEndian">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:setVariable ref="byte_order"><![CDATA[{
if (. eq 2712847316) then 'bigEndian'
else if (. eq 3569595041) then 'littleEndian'
else daf:error() }]]>
</dfdl:setVariable>
<dfdl:property name="outputValueCalc"><![CDATA[{
if ($dfdl:byteOrder eq 'bigEndian')
then dfdl:unsignedInt('xA1B2C3D4')
else dfdl:unsignedInt('xD4C3B2A1')
}]]></dfdl:property>
</xs:appinfo>
</xs:annotation>
</xs:element>
Example - PCAP
Copyright 2023 Owl Cyber Defense 33
<xs:element name="PacketHeader">
<xs:complexType>
<xs:sequence>
<xs:element name="Seconds" type="pcap:uint32"/>
<xs:element name="USeconds" type="pcap:uint32"/>
<xs:element name="InclLen" type="pcap:uint32"
...
/>
<xs:element name="OrigLen" type="pcap:uint32"
...
/>
</xs:sequence>
</xs:complexType>
</xs:element>
...
<xs:element ref="pcap:LinkLayer"
dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit"
dfdl:length="{ ../PacketHeader/InclLen }"/>
Example - PCAP
Copyright 2023 Owl Cyber Defense 34
<xs:element name="PacketHeader">
<xs:complexType>
<xs:sequence>
<xs:element name="Seconds" type="pcap:uint32"/>
<xs:element name="USeconds" type="pcap:uint32"/>
<xs:element name="InclLen" type="pcap:uint32"
...
/>
<xs:element name="OrigLen" type="pcap:uint32"
...
/>
</xs:sequence>
</xs:complexType>
</xs:element>
...
<xs:element ref="pcap:LinkLayer"
dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit"
dfdl:length="{ ../PacketHeader/InclLen }"/>
Example - PCAP
Copyright 2023 Owl Cyber Defense 35
<xs:element name="PacketHeader">
<xs:complexType>
<xs:sequence>
<xs:element name="Seconds" type="pcap:uint32"/>
<xs:element name="USeconds" type="pcap:uint32"/>
<xs:element name="InclLen" type="pcap:uint32"
dfdl:outputValueCalc="{
if (dfdl:valueLength(
../../pcap:LinkLayer/pcap:Ethernet,
'bytes') le 60) then 60
else
dfdl:valueLength(
../../pcap:LinkLayer/pcap:Ethernet,
'bytes') }"
/>
<xs:element name="OrigLen" type="pcap:uint32"
dfdl:outputValueCalc="{ ../InclLen }"/>
</xs:sequence>
</xs:complexType>
</xs:element>
...
<xs:element ref="pcap:LinkLayer"
dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit"
dfdl:length="{ ../PacketHeader/InclLen }"/>
Example - PCAP
Copyright 2023 Owl Cyber Defense 36
pcap
global
header
field
field
field
packet
header
data
DFDL
Implementation
DFDL
Infoset
PCAP DFDL
Schema
PCAP Data
Example: MIL-STD-2045
A dense bit-packed binary data format similar to many MIL-STD formats, but publicly available.
Uses Least-Significant-Bit-First bit order – typical of MIL formats, different from most non-MIL
formats.
Copyright 2023 Owl Cyber Defense 37
Example - MIL-STD-2045
▪ Densely packed binary with unique bit order
Copyright 2023 Owl Cyber Defense 38
dfdl:representation="binary"
dfdl:alignment="1"
dfdl:alignmentUnits="bits"
dfdl:lengthUnits="bits"
dfdl:lengthKind="explicit"
dfdl:length
dfdl:byteOrder="littleEndian"
dfdl:bitOrder="leastSignificantBitFirst"
Example - MIL-STD-2045
▪ 7-bit strings embedded in binary
Copyright 2023 Owl Cyber Defense 39
dfdl:encoding=“X-DFDL-US-ASCII-7BIT-PACKED"
 0x7F (DEL) terminated, unless max-length met
<xs:element name="str50" type="xs:string"
dfdl:lengthPattern="[^x7F]{0,49}(?=x7F)|.{50}" />
<xs:sequence dfdl:terminator="{
if (fn:string-length(./str50) eq 50)
then '%ES;'
else '%DEL;'
}" />
Example - MIL-STD-2045
▪ Presence indicators
Copyright 2023 Owl Cyber Defense 40
<dfdl:discriminator test="{ . eq 1 }" />
Presence
Indicator
Optional
Field
Example - MIL-STD-2045
▪ Repeat indicators
Copyright 2023 Owl Cyber Defense 41
dfdl:occursCountKind="implicit"
minOccurs="1"
maxOccurs="unbounded"
<dfdl:discriminator test="{
if (dfdl:occursIndex() eq 1)
then fn:true()
else ../fields[dfdl:occursIndex() – 1]/repeat_indicator eq 1
}" />
<xs:element name="GRI" type="tns:repeatIndicator"
dfdl:outputValueCalc="{
if (dfdl:occursIndex() lt fn:count(..))
then 1 else 0
}" />
Repeat
Indicator = 1
Data
Repeat
Indicator = 1
Data
Repeat
Indicator = 0
Data
another one follows another one follows last one
More DFDL Properties
Copyright 2023 Owl Cyber Defense 42
dfdl:assert
dfdl:discriminator
dfdl:newVariableInstance
dfdl:setVariable
dfdl:defineEscapeSchema
dfdl:defineVariable
dfdl:ignoreCase
dfdl:encodingErrorPolicy
dfdl:alignment
dfdl:alignmentUnits
dfdl:fillByte
dfdl:leadingSkip
dfdl:trailingSkip
dfdl:emptyValueDelimiterPolicy
dfdl:outputNewLine
dfdl:prefixIncludesPrefixLength
dfdl:prefixLengthType
dfdl:lengthPattern
dfdl:textPadKind
dfdl:textTrimKind
dfdl:textOutputMinLength
dfdl:escapeSchemeRef
dfdl:escapeKind
dfdl:escapeCharacter
dfdl:escapeBlockStart
dfdl:escapeBlockEnd
dfdl:escapeEscapeCharacter
dfdl:extraEscapedCharacters
dfdl:generateEscapeBlock
dfdl:textBidi
dfdl:textStringJustification
dfdl:textStringPadCharacter
dfdl:textNumberJusitification
dfdl:textStandardNaNRep
dfdl:textStandardZeroRep
dfdl:textStandardBase
dfdl:binaryNumberRep
dfdl:binaryPackedSignCodes
dfdl:binaryFloatRep
dfdl:calendarPattern
dfdl:binaryCalendarRep
dfdl:nilValueDelimiterPolicy
dfdl:initiatedContent
dfdl:separatorPosition
dfdl:separatorSuppressionPolicy
dfdl:floating
dfdl:hiddenGroupRef
dfdl:initiatedContent
dfdl:occursCountKind
dfdl:occursStopValue
dfdl:inputValueCalc
dfdl:outputValueCalc
DFDL Doesn't Do...
Limitations, Intended or Otherwise
Copyright 2023 Owl Cyber Defense 43
Things DFDL (v1.0) Does
▪ DFDL is for Data Sets
▪ Things typically thought of as files/streams full of data
about XYZ.
• Rows, Tables
• Header-Body-Trailer
• Hierarchical / Nested record-oriented data
• Messages
▪ Industry & Military data interchange formats
• HL7, HIPAA-5010, SWIFT, NACHA, X25, Link16, etc.
Copyright 2023 Owl Cyber Defense 44
Current DFDL v1.0 Language Limitations
▪ Recursive types
• DFDL v1.0 not a Turing-Complete language
• On purpose - it's a feature, not a bug
▪ Position of elements "by offset"
• Random jumping around data
• Ex: TIFF file format
▪ TIFF cannot be described in DFDL v1.0
Copyright 2023 Owl Cyber Defense 45
Thanks to
http://langsec.org/occupy/
Things DFDL (v1.0) Doesn't Do
▪ DFDL v1.0 was not originally intended for:
• Document file formats
▪ e.g., MS Office documents (.doc), or RTF, PDF
• Images, Video, Audio
• Archives like zip files or tar files
• Core dump/memory image format
• Storage format of a RDBMS table
• Graphs of pointers - object graphs dumped from memory
▪ DFDL is being extended (v2.0) to cover more
• BLOBs - for images, video, audio
▪ Already available in Daffodil from v2.5.0
• Offsets/Pointers (prototype) - for TIFF, ZIP
Copyright 2023 Owl Cyber Defense 46
DFDL Schemas
Apache Daffodil™ Project
DFDL Standard
Status Report
47
Apache Daffodil is a Trademark of the Apache Software Foundation
DFDL Schemas
Status Report
Copyright 2023 Owl Cyber Defense 48
DFDL Schemas
Copyright 2023 Owl Cyber Defense 49
Public
(many are on
github)
MIL-STD-2045
PCAP
NITF
PNG
JPEG
NACHA
VCard
QuasiXML
Geonames
EDIFACT
IBM4690-TLOG
ISO8583
BMP
GIF
Praat TextGrid
ARINC429 (PoC)
IPFIX
Syslog
iCalendar
IMF
SHP (shape file)
KNXNet/IP(indust. control)
Siemens S7 (indust. control)
Asterix (Cat 034, 048)
MagVar
AFTN Flight Plan
FOUO / CUI VMF (MIL-STD-6017 subset)
USMTF ATO (MIL-STD-6040)
LINK16 (NATO STANAG 5516)
LINK16Subset (MIL-STD-6016F subset)
A-GNOSC REMEDY
ARMY DRRS
USCG UCOP
CEF-R1965
GMTIF (STANAG 4607 subset)
SOTF
JICD
NACT
DISV6
Commercial
License $$$
SWIFT-MT (IBM)
HIPAA-5010 (IBM)
HL7-2.7 (IBM)
OTH-Gold (Owl)
USMTF (Owl)
LINK16 (MIL-STD-6016 E, F, G) (Owl)
VMF (MIL-STD-6017 A, B, C, D) (Owl)
JREAP-C (MIL-STD-3011) (Owl)
If you have DFDL schemas available (open, licensed, or just to show) I will add them to this list.
Apache
Daffodil™
Status Report
50
Avoid Version Confusion
51
DFDL Language Specification Apache Daffodil™ Software
• v1.0.0
• v1.1.0
• v2.0.0
• v2.1.0
• ...
• v2.4.0
• ...
• v3.0.0
• ....
• v3.4.0
• ....
v1.0
DFDL Schemas have their
own individual release version numbers.
E.g.., mil-std-2045 is release 0.0.3
Other DFDL Implementations
▪ IBM DFDL - The First DFDL Implementation - v1.0 Nov 2011
• Embeddable as Library, C and Java
• Includes Eclipse based tooling - graphical DFDL schema editor/test environment
• Found in IBM products:
▪ IBM App Connect Enterprise product family
▪ IBM Integration Bus
▪ IBM InfoSphere Master Data Management
▪ IBM z/TPF product family
▪ European Space Agency - DFDL4S = DFDL for Space
• Embeddable as Library, Java and C++ versions.
• Binary-data-only subset of DFDL
• Created by ESA for satellite data descriptions
• Evolving to be a fully compatible DFDL subset.
• More info at
▪ http://eop-cfi.esa.int/index.php/applications/dfdl4s
Copyright 2023 Owl Cyber Defense 52
Interoperability – Apache Daffodil & IBM
53
MIL-STD-2045
VMF
GMTIF
Link16
PCAP SHP
GIF BMP PNG
JPEG NITF
NACHA
ISO8583
CSV
QuasiXML
Geonames
IBM4690-TLOG
EDIFACT
vCard
HL7-v2.7
MagVar
Daffodil
IBM DFDL
A-GNOSC REMEDY
iCalendar
USMTF
ARMY DRRS
USCG UCOP
CEF-R1965
IPFIX
Interoperable
These use optional DFDL
features not supported (yet)
by IBM DFDL
COBOL-
defined
COBOL/FINSERV Data
requires optional DFDL
features not available
until Daffodil 3.5.0
release.
Apache Daffodil
▪ Open Source
• Apache Daffodil - Announced in March 2021
• Apache = A Permissive/Commercial Friendly License – embed how you wish, into anything.
▪ Runs on Java 8+ Virtual Machine
▪ Started as research project by UofI ~2009
▪ Originally developed by Owl (then Tresys) ~2012
• Funded by the USG
• Became Apache Incubator project in 2017
• Graduated from Apache Incubator in March 2021
▪ Highly interoperable with IBM DFDL (since v2.4.0)
▪ Active development
▪ https://daffodil.apache.org
54
Apache Daffodil - Recent Major Features
▪ EXI support
▪ Pluggable Character Sets
▪ Embedded XML Strings in data
▪ C-Code generator backend
▪ Pluggable 'layers' for Checksums, line-folding,
CRC, etc.
▪ Schematron Support - pluggable Validators
▪ SAX API
55
Apache Daffodil VSCode Extension
▪ Data Format Development and Debug IDE
• Set data breakpoints
• Step through parsing
• Save TDML test cases
• Edit large data files
▪ Extension to VSCode IDE
• Front-end - Typescript
• Back-end server - Scala
▪ Version 1.2.0 available
56
Apache Daffodil
▪ Jar libraries – runs on JVM
• Compiler, utilities, TDML runner
• Signed Jars available from Maven Central
• Runtime 1 - Scala - runs on JVM - supports nearly all of DFDL v1.0
• Runtime 2 - C-generator - creates fast C code - Small subset of the DFDL language
▪ Command Line Interface
• Interactive CLI debugger and trace
▪ Java & Scala API with documentation
▪ XML and JSON for parse-output, unparse-input
▪ You must get your DFDL schemas somewhere...
▪ github (DFDLSchemas, others)
▪ Write them! (and share them!)
57
If you download it,
what do you get?
Apache Daffodil Internal Components
Copyright 2023 Owl Cyber Defense 58
Compiler
•compiles DFDL schemas to
runtime data structures
•Issues diagnostics
Runtime 1
DPath - Xpath-like language
• compiler
• runtime
Infoset
• Convert to/from XML, JSON
• Fast, constant-time access
Parser primitives
Unparser primitives
TDML Runner
• Test (& Tutorial) Data
Markup Language
Utility Libraries
• for compiler
• for runtime
Command Line Interpreter
Java API
Scala API
Tests - Unit
(Scala)
Written in Scala
Tests - System
(TDML)
Breakpoint debugger
I/O library
• bits (not bytes), bitOrder
• streaming
• non-8-bit-characters (7, 6, 5)
• unbounded lookahead -
parsing/backtracking, and
unparsing
Runtime2
(C-code Gen)
Primitives
(C)
Code
Generator
Apache Daffodil Integrations
▪ EXI - binary form of XML - much faster/denser
▪ XProc - Calabash
▪ Apache Spark
▪ Apache NiFi
▪ SoftwareAGTM Integration Server (aka WebMethodsTM)
▪ Owl CDS products/services
▪ Other companies also!
▪ Potential
• Apache Tika
• Other data-handling frameworks - Flink, Hadoop,....
59
EXI (binary XML) - Data Size
▪ EXI is a w3c standard for a binary XML
• Same Infoset, same API (SAX)
▪ Requires massively less space than textual XML
▪ See examples on github.com OpenDFDL
60
0%
1000%
2000%
3000%
4000%
5000%
6000%
7000%
bmp cef iCal Jpeg Pcap shp vmf Link16
Size Increase (XML Text)
Source Size % XML Size %
0%
20%
40%
60%
80%
100%
120%
140%
bmp cef iCal Jpeg Pcap shp vmf Link16
Size Inrease EXI
Source Size % EXI Size % (SA)
Apache Daffodil Code Base
115602
62661
218998
Lines of Code = 397K*
source (scala + java)
unit tests(scala + java)
test suite (tdml, xsd, scala)
61
Data as of 2023-03-15 master branch
Main Daffodil library - does not include
VSCode Extension
Excludes DFDL Schemas
(Link16 DFDL Schema is 800K+ LoC just
by itself)
Apache Daffodil Roadmap
▪ Roadmap on https://s.apache.org/daffodil-wiki
▪ Depends on community needs, available
developers
62
Apache Daffodil Community
▪ Scala programmers wanted !
▪ Java programmers who want to learn Scala
wanted!
▪ Typescript/Javascript programmers who want
to work on the Daffodil VSCode Extension -
wanted!
63
Owl DFDL Services & Products
We would like to know how to make DFDL and Daffodil solve
your data challenges!
▪ Example data challenges
• Convert Legacy Data formats to Modern
• Data Intake to analysis systems
• Research Systems - increase credibility by using real MIL data
formats
• Data Publishing – e.g., USG Open Data requirements
• Data Security – CDS/ rip, inspect/transform, rebuild to
specification
▪ Examples of services Owl provides:
• Create (or help you create) & Test DFDL Schemas for
▪ Your specific in-house data formats
▪ Industry/Military standard data formats
• Commercial support for Apache Daffodil software
▪ Fix, Enhance Daffodil to meet your needs
▪ Customize, add features, APIs, ….
Copyright 2023 Owl Cyber Defense 64
Owl Cross-Domain Solutions using DFDL
Provide the highest level of network security.
1U CDS Appliances
▪ XD-Bridge
▪ XD-Guardian
▪ www.owlcyberdefense.com
Questions?
DFDL Specification:
http://ogf.org/dfdl
DFDL Schemas:
https://github.com/DFDLSchemas
Daffodil Open Source:
https://daffodil.apache.org
Copyright 2023 Owl Cyber Defense 65
END
Example: Java API
Yes, it is 'Hello world!'
Copyright 2023 Owl Cyber Defense 67
Hello world!
<helloWorld>
<word>Hello</word>
<word>world!</word>
</helloWorld>
Daffodil Java API – Hello World
Copyright 2023 Owl Cyber Defense 68
<!– helloWorld.dfdl.xsd -->
…
<dfdl:format ref="daffodilTest1
representation="text" encoding="utf-8"
lengthKind="delimited"/>
…
<xs:element name="helloWorld">
<xs:complexType><xs:sequence>
<xs:element name="word" type="xs:string"
maxOccurs="unbounded"
dfdl:lengthKind="delimited"
dfdl:terminator="%WSP+;"
dfdl:occursCountKind="parsed"/>
</xs:sequence></xs:complexType>
</xs:element>
public class HelloWorld {
public static void main(String[] args) throws IOException {
... // let's fill this in
}
}
Daffodil Java API – Hello World
Copyright 2023 Owl Cyber Defense 69
//
// First compile the DFDL Schema
// Creates a Daffodil ProcessorFactory object
//
Compiler c = Daffodil.compiler();
c.setValidateDFDLSchemas(true);
File schemaFile = new File("helloWorld.dfdl.xsd");
ProcessorFactory pf = c.compileFile(schemaFile);
// list any compile-time diagnostics - warnings and errors
List<Diagnostic> diags = pf.getDiagnostics();
for (Diagnostic d : diags) {
System.err.println(d.getSomeMessage());
}
if (pf.isError()) { System.exit(1); }
Daffodil Java API – Hello World
Copyright 2023 Owl Cyber Defense 70
//
// Use ProcessorFactory to create data processors
// As many as you want. They can be run (to parse/unparse) on
// different threads.
//
DataProcessor dp = pf.onPath("/");
ReadableByteChannel rbc =
Channels.newChannel(new FileInputStream("helloWorld.dat"));
//
// Setup JDOM outputter
//
JDOMInfosetOutputter outputter = new JDOMInfosetOutputter();
ParseResult res = dp.parse(rbc, outputter);
boolean err = res.isError();
if (err) { … list diagnostics and exit(2) … }
org.jdom2.Document doc =
outputter.getResult(); // JDOM2 XML Infoset
Daffodil Java API – Hello World
Copyright 2023 Owl Cyber Defense 71
//
// Pretty print the XML infoset.
//
XMLOutputter xo = new XMLOutputter(Format.getPrettyFormat());
xo.output(doc, System.out);
// or transform/inspect/etc.
Daffodil Java API – Hello World
Copyright 2023 Owl Cyber Defense 72
//
// unparse
//
WritableByteChannel wbc =
Channels.newChannel(new FileOutputStream(…));
JDOMInfosetInputter inputter = new JDOMInfosetInputter(doc);
UnparseResult res2 = dp.unparse(inputter, wbc);
if (res2.isError()) { … same as before … }
// File contains unparsed data
Question
▪ Q: Instead of DFDL why not use ...
• Apache Avro?
• Apache Thrift?
• Google GPBs?
• ASN.1 BER (or PER/DER/XER) ?
▪ A: Those are great, but are prescriptive.
• They don't describe formats, they are data formats
• We need a descriptive language.
Copyright 2023 Owl Cyber Defense 73
Question: Why is DFDL Needed? - ASN.1 ECN
▪ What about ASN.1 Encoding Control Notation? Doesn't it do what DFDL Does?
▪ Yes, ASN.1 ECN is very interesting. It is...
• Already an ISO Standard (since 2008)
• Conceptually similar
▪ Logical schema language + notations for physical representation
• Very different in the details.
▪ Developers [ Love | Hate ] [ ASN.1 | XML ]
▪ Differences that matter:
• ASN.1 ECN
▪ No current open source implementation of ECN (as of 2021-03-08)
• Sourceforge lvmaiAsn last update was 2013.
▪ Extension of a binary data standard ASN.1 BER/PER/DER
▪ Goal to describe legacy protocol messages
• DFDL
▪ Open source Daffodil implementation
▪ Extension of a textual data standard XML
▪ Goal to be union of data integration tool capabilities for format description
▪ No thorough side-by-side comparison has been done
Copyright 2023 Owl Cyber Defense 74
Daffodil Performance
▪ Uses Daffodil Message Streaming API
▪ The K03.2 message is 49 bytes. Expands to 4555 bytes of XML
• Note: EXI eliminates this text expansion - see earlier slide on EXI.
• This test was done using textual XML
▪ The file was 131072 repeats of that 49 byte message
▪ Intel Core i7-6820HQ CPU @ 2.70 Ghz (a laptop)
▪ Running on 2 processes (parse, unparse), 1 thread each.
▪ 81.218 seconds => ~ 0.6 milliseconds per message
▪ Conclusion:
• DFDL/Daffodil is not going to kill your 10msec latency budget.
• Cost will be from additional XML filtering/validation operations
Copyright 2023 Owl Cyber Defense 75
Daffodil
Parse &
Validate
XML
text pipe
Daffodil
Unparse
K03.2
messages
K03.2
messages

More Related Content

What's hot

Introduction to Swagger
Introduction to SwaggerIntroduction to Swagger
Introduction to SwaggerKnoldus Inc.
 
[Curso Java Basico] Exercicios Aula 19
[Curso Java Basico] Exercicios Aula 19[Curso Java Basico] Exercicios Aula 19
[Curso Java Basico] Exercicios Aula 19Loiane Groner
 
Lecture 2_ Intro to laravel.pptx
Lecture 2_ Intro to laravel.pptxLecture 2_ Intro to laravel.pptx
Lecture 2_ Intro to laravel.pptxSaziaRahman
 
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slidesSpring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slidesHitesh-Java
 
Introduction to Progressive web app (PWA)
Introduction to Progressive web app (PWA)Introduction to Progressive web app (PWA)
Introduction to Progressive web app (PWA)Zhentian Wan
 
React js - The Core Concepts
React js - The Core ConceptsReact js - The Core Concepts
React js - The Core ConceptsDivyang Bhambhani
 
Foster - Getting started with Angular
Foster - Getting started with AngularFoster - Getting started with Angular
Foster - Getting started with AngularMukundSonaiya1
 
Build RESTful API Using Express JS
Build RESTful API Using Express JSBuild RESTful API Using Express JS
Build RESTful API Using Express JSCakra Danu Sedayu
 
REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkMarcel Chastain
 
Why Progressive Web App is what you need for your Business
Why Progressive Web App is what you need for your BusinessWhy Progressive Web App is what you need for your Business
Why Progressive Web App is what you need for your BusinessLets Grow Business
 
ES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern JavascriptES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern JavascriptWojciech Dzikowski
 
Spring core module
Spring core moduleSpring core module
Spring core moduleRaj Tomar
 
Intro to Flexbox - A Magical CSS Property
Intro to Flexbox - A Magical CSS PropertyIntro to Flexbox - A Magical CSS Property
Intro to Flexbox - A Magical CSS PropertyAdam Soucie
 
The New JavaScript: ES6
The New JavaScript: ES6The New JavaScript: ES6
The New JavaScript: ES6Rob Eisenberg
 

What's hot (20)

Introduction to Swagger
Introduction to SwaggerIntroduction to Swagger
Introduction to Swagger
 
[Curso Java Basico] Exercicios Aula 19
[Curso Java Basico] Exercicios Aula 19[Curso Java Basico] Exercicios Aula 19
[Curso Java Basico] Exercicios Aula 19
 
Angular
AngularAngular
Angular
 
Spring Web MVC
Spring Web MVCSpring Web MVC
Spring Web MVC
 
Lecture 2_ Intro to laravel.pptx
Lecture 2_ Intro to laravel.pptxLecture 2_ Intro to laravel.pptx
Lecture 2_ Intro to laravel.pptx
 
THREADS EM JAVA: INTRODUÇÃO
THREADS EM JAVA: INTRODUÇÃOTHREADS EM JAVA: INTRODUÇÃO
THREADS EM JAVA: INTRODUÇÃO
 
Spring AOP in Nutshell
Spring AOP in Nutshell Spring AOP in Nutshell
Spring AOP in Nutshell
 
Basic Python Django
Basic Python DjangoBasic Python Django
Basic Python Django
 
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slidesSpring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
Spring - Part 2 - Autowiring, Annotations, Java based Configuration - slides
 
Introduction to Progressive web app (PWA)
Introduction to Progressive web app (PWA)Introduction to Progressive web app (PWA)
Introduction to Progressive web app (PWA)
 
React js - The Core Concepts
React js - The Core ConceptsReact js - The Core Concepts
React js - The Core Concepts
 
Foster - Getting started with Angular
Foster - Getting started with AngularFoster - Getting started with Angular
Foster - Getting started with Angular
 
Build RESTful API Using Express JS
Build RESTful API Using Express JSBuild RESTful API Using Express JS
Build RESTful API Using Express JS
 
REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-Framework
 
Why Progressive Web App is what you need for your Business
Why Progressive Web App is what you need for your BusinessWhy Progressive Web App is what you need for your Business
Why Progressive Web App is what you need for your Business
 
ES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern JavascriptES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern Javascript
 
Spring core module
Spring core moduleSpring core module
Spring core module
 
Asp.net caching
Asp.net cachingAsp.net caching
Asp.net caching
 
Intro to Flexbox - A Magical CSS Property
Intro to Flexbox - A Magical CSS PropertyIntro to Flexbox - A Magical CSS Property
Intro to Flexbox - A Magical CSS Property
 
The New JavaScript: ES6
The New JavaScript: ES6The New JavaScript: ES6
The New JavaScript: ES6
 

Similar to DFDL and Apache Daffodil(tm) Overview from Owl Cyber Defense

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo
 
RELAX NG to DTD and XSD Using the Open Toolkit
RELAX NG to DTD and XSD Using the Open ToolkitRELAX NG to DTD and XSD Using the Open Toolkit
RELAX NG to DTD and XSD Using the Open ToolkitContrext Solutions
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiA Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiData Con LA
 
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)lennartkats
 
PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019
PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019
PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019Dave Stokes
 
PHP, The X DevAPI, and the MySQL Document Store Presented January 23rd, 20...
PHP,  The X DevAPI,  and the  MySQL Document Store Presented January 23rd, 20...PHP,  The X DevAPI,  and the  MySQL Document Store Presented January 23rd, 20...
PHP, The X DevAPI, and the MySQL Document Store Presented January 23rd, 20...Dave Stokes
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sourcesrumito
 
Rule Engine & Drools
Rule Engine & DroolsRule Engine & Drools
Rule Engine & DroolsSandip Jadhav
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Dave Stokes
 
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDSOverloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDSSumant Tambe
 

Similar to DFDL and Apache Daffodil(tm) Overview from Owl Cyber Defense (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me Anything
 
RELAX NG to DTD and XSD Using the Open Toolkit
RELAX NG to DTD and XSD Using the Open ToolkitRELAX NG to DTD and XSD Using the Open Toolkit
RELAX NG to DTD and XSD Using the Open Toolkit
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiA Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
 
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
 
PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019
PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019
PHP, The X DevAPI, and the MySQL Document Store -- Benelux PHP Confernece 2019
 
PHP, The X DevAPI, and the MySQL Document Store Presented January 23rd, 20...
PHP,  The X DevAPI,  and the  MySQL Document Store Presented January 23rd, 20...PHP,  The X DevAPI,  and the  MySQL Document Store Presented January 23rd, 20...
PHP, The X DevAPI, and the MySQL Document Store Presented January 23rd, 20...
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 
Rule Engine & Drools
Rule Engine & DroolsRule Engine & Drools
Rule Engine & Drools
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019
 
KIRANKUMAR_MV
KIRANKUMAR_MVKIRANKUMAR_MV
KIRANKUMAR_MV
 
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDSOverloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 

DFDL and Apache Daffodil(tm) Overview from Owl Cyber Defense

  • 1. Data Format Description Language (DFDL) and the Open Source Apache DaffodilTM Project 2023-03-15
  • 2. Agenda ▪ DFDL (aka "DaFfoDiL") • Why is DFDL Needed? • Quick overview of DFDL • Use Cases for DFDL ▪ DFDL Tutorial • Examples: CSV, PCAP, MIL-STD-2045 • What DFDL Doesn't do (today) ▪ DFDL Schemas ▪ Implementations • Open Source: Apache Daffodil, Daffodil VSCode Extension Copyright 2023 Owl Cyber Defense 2
  • 3. Why is DFDL Needed? Copyright 2023 Owl Cyber Defense 3 Every Enterprise Software Company • IBM (10+) • Oracle(10+) • SAP(10+) • Microsoft • SAS • Informatica • SyncSort • AbInitio • Pervasive • Qlik/Expressor • Pentaho • .... Dozens more Every kind of software that takes in data: • data directed routing (msg brokers) • database • data analysis and/or data mining • data cleansing • master data management • application integration • data visualization • … There are hundreds of ad-hoc data format description systems All these data format descriptions are: • proprietary • ad-hoc • incompatible Even within products of the same company!
  • 4. Why DFDL is Needed? ▪ Hundreds of data format description systems… means that: ▪ Vendor investment is spread too thin • Tools for creating data formats are inadequate • No product is comprehensive enough • Some products aren’t fast enough ▪ Customers get locked in to proprietary solution ▪ High training cost, low career value ▪ Inflexible packaging • not libraries - must embed some product in your application data flow Copyright 2023 Owl Cyber Defense 4
  • 5. Why DFDL is Needed? ▪ Result… we are stuck with: ▪ Data parsers continue to be written as programs ▪ High cost to write & maintain, high error rate. ▪ When needed, certification (C&A) costs = $$$$$ Copyright 2023 Owl Cyber Defense 5
  • 6. Solving the Data Format Problem ▪ An Open Standard DFDL • Multiple implementations that interoperate ▪ Commercial & Open Source • Long-term sponsors ▪ IBM – has their own DFDL implementations ▪ US DoD, Canada DND • Cybersecurity • Available DFDL schemas for important data formats ▪ A High-Quality Open Source Library Implementation • With a supporting community of developers • With available commercial support (Owl) Copyright 2023 Owl Cyber Defense 6
  • 7. ▪ A standard from Open Grid Form (OGF) • Being proposed for an ISO standard ▪ Started 2001, OGF Ratified 2021 ▪ Big - 200+ pages ▪ DFDL is not a data format. ▪ Write a DFDL Schema to describe a format. ▪ DFDL is sometimes pronounced "DaFfoDiL" DFDL = Data Format Description Language 7 Copyright 2023 Owl Cyber Defense
  • 8. Data Format Description Language ▪ DFDL is a way of describing data formats ▪ It is NOT a data format itself! ▪ DFDL combines State-of-the-Art ▪ Union of capabilities across many marketplace data integration products/tools/packages ▪ DFDL adds small number of real innovations ▪ Overcome limitations of prior-gen e.g., • Computed Elements Capability - Unparse as well as Parse! • Expression language • BitOrder Copyright 2023 Owl Cyber Defense 8
  • 9. Data Format Description Language ▪ Core Concepts ▪ Leverage XML Schema (XSDL) • Grammar scaffolding • Describes the logical data model • DFDL uses only a subset of XML schema • Provides standard ways to annotate ▪ Add annotations • Describe the physical representation. ▪ Read and write from same DFDL Schema ▪ Because Developers [ Love | Hate ] XML ▪ The DFDL Schema is based on XSDL ▪ The Infoset created when parsing data does NOT have to be XML => Can be JSON, Can be adapted to others Copyright 2023 Owl Cyber Defense 9
  • 10. Example – Delimited Text Data Copyright 2023 Owl Cyber Defense 10 rlimit=5;rpngx=-7.1E8
  • 11. Example – Delimited Text Data Copyright 2023 Owl Cyber Defense 11 rlimit=5;rpngx=-7.1E8 ASCII text integer ASCII text floating point Separator Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters Initiator Initiator
  • 12. DFDL Schema Copyright 2023 Owl Cyber Defense 12 <xs:complexType name=“rValues"> <xs:sequence> <xs:element name=“rlimit" type="xs:int"/> <xs:element name=“rpngx" type="xs:float"/> </xs:sequence> </xs:complexType> Logical Elements
  • 13. DFDL schema Copyright 2023 Owl Cyber Defense 13 <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format representation="text" textNumberRep="standard" encoding="ascii" lengthKind="delimited“ .../> </xs:appinfo> </xs:annotation> <xs:complexType name=“rValues"> <xs:sequence dfdl:separator=";" ... > <xs:element name="rLimit" type="xs:int" dfdl:initiator=“rLimit=" ... /> <xs:element name=“rpngx" type="xs:float" dfdl:initiator=“rpngx=" ... /> </xs:sequence> </xs:complexType> DFDL properties rLimit=5 rpngx=-7.1E8 ;
  • 14. DFDL Data and Infoset Lifecycle Copyright 2023 Owl Cyber Defense 14 rLimit=5;rpngx=-7.1E8 DFDL Implementation DFDL Schema DFDL Implementation Element Name: rValues Element Name: rpngx Value: -7.1E8 Type: Double Element Name: rLimit Value: 5 Type: Int Parse Unparse Infoset Data XML or JSON or other
  • 15. More DFDL Properties Copyright 2023 Owl Cyber Defense 15 initiator terminator documentFinalTerminatorCanBeMissing outputNewLine length lengthPattern textStringPadCharacter textNumberPadCharacter textCalendarPadCharacter textBooleanPadCharacter escapeCharacter escapeBlockStart escapeBlockEnd escapeEscapeCharacter extraEscapedCharacters textNumberPattern textStandardGroupingSeparator textStandardDecimalSeparator textStandardExponentRep textStandardInfinityRep textStandardNaNRep textStandardZeroRep textBooleanTrueRep textBooleanFalseRep calendarPattern calendarLanguage binaryCalendarEpoch nilValue separator occursStopValue inputValueCalc outputValueCalc byteOrder bitOrder encoding encodingErrorPolicy utf16Width ignoreCase alignment alignmentUnits fillByte leadingSkip trailingSkip lengthKind lengthUnits prefixIncludesPrefixLength prefixLengthType representation textPadKind textTrimKind textOutputMinLength escapeKind generateEscapeBlock textStringJustification textNumberRep textNumberJustification textNumberCheckPolicy textStandardBase textNumberRoundingMode textNumberRounding textNumberRoundingIncrement textZonedSignStyle binaryNumberRep binaryDecimalVirtualPoint binaryNumberCheckPolicy binaryPackedSignCodes binaryFloatRep textBooleanJustification binaryBooleanTrueRep binaryBooleanFalseRep textCalendarJustification calendarPatternKind calendarCheckPolicy calendarTimeZone calendarObserveDST calendarFirstDayOfWeek calendarDaysInFirstWeek calendarCenturyStart binaryCalendarRep nilKind nilValueDelimiterPolicy useNilForDefault emptyValueDelimiterPolicy sequenceKind hiddenGroupRef initiatedContent separatorPosition separatorPolicy separatorSuppressionPolicy choiceLengthKind choiceLength choiceDispatchKey choiceBranchKey occursCountKind occursCount floating truncateSpecifiedLengthString decimalSigned failureType
  • 16. DFDL Use Cases ▪ Deep Content Inspection/Sanitization • CDS, Network Guards • Data Filtering • Rip-and-Rebuild Firewalls ▪ Data-Directed Routing ▪ Data Intake • Data Warehouse or Analytical Data Store ▪ Interoperability / Data Translation ▪ Conversion of Legacy Data to Modern Formats ▪ Open Data / Data Export • Recasting data as XML for export/publishing Copyright 2023 Owl Cyber Defense 16
  • 17. Examples CSV (ASCII text delimited) PCAP (variable length binary) MIL-STD-2045 (dense bit-packed binary, LSBF) Copyright 2023 Owl Cyber Defense 17
  • 19. Example - CSV ▪ Textual Format (ASCII) ▪ Comma-Separated Values ▪ Tabular data ▪ Rows separated by newlines ▪ Fields separated by commas Copyright 2023 Owl Cyber Defense 19 Smith,Charles,36,6.2 Williams,Mary,51,5.5
  • 20. Example - CSV Copyright 2023 Owl Cyber Defense 20 <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="text" encoding="ASCII" lengthKind="delimited" escapeSchemRef="quoted" occursCountKind="parsed" /> <dfdl:defineEscapeScheme name="quoted"> <dfdl:escapeScheme escapeKind="escapeBlock" escapeBlockStart="&quot;" escapeBlockEnd="&quot;" escapeEscapeCharacter="&quot;"> </dfdl:defineEscapeScheme> </xs:appinfo> <xs:annotation> <xs:schema>
  • 21. Example - CSV Copyright 2023 Owl Cyber Defense 21 <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="text" encoding="ASCII" lengthKind="delimited" escapeSchemRef="quoted" occursCountKind="parsed" /> <dfdl:defineEscapeScheme name="quoted"> <dfdl:escapeScheme escapeKind="escapeBlock" escapeBlockStart="&quot;" escapeBlockEnd="&quot;" escapeEscapeCharacter="&quot;"> </dfdl:defineEscapeScheme> </xs:appinfo> <xs:annotation> <xs:schema>
  • 22. Example - CSV Copyright 2023 Owl Cyber Defense 22 <xs:element name="csv"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix"> <xs:element name="record" maxOccurs="unbounded"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="item" type="xs:string" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
  • 23. Example - CSV Copyright 2023 Owl Cyber Defense 23 <xs:element name="csv"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix"> <xs:element name="record" maxOccurs="unbounded"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="item" type="xs:string" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
  • 24. Example - CSV Copyright 2023 Owl Cyber Defense 24 CSV Input csv record item Smith item Charles item 36 item 6.2 record item Williams item Mary Item 51 item 5.5 DFDL Implementation DFDL Infoset Smith,Charles,36,6.2 Williams,Mary,51,5.5 CSV DFDL Schema
  • 25. Example: PCAP Packet Capture File Format Copyright 2023 Owl Cyber Defense 25
  • 26. Example - PCAP Copyright 2023 Owl Cyber Defense 26 pcap global header field field field packet header data DFDL Implementation DFDL Infoset PCAP DFDL Schema PCAP Data
  • 27. Example - PCAP ▪ Network Packet Capture ▪ Binary Format ▪ Global Header ▪ Packet Header/Data ▪ Variable Endianness and Data Length Copyright 2023 Owl Cyber Defense 27 Global Header Packet Header Packet Data Packet Header Packet Data …
  • 28. Example - PCAP Copyright 2023 Owl Cyber Defense 28 <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="binary" alignment="8" alignmentUnits="bits" lengthUnits="bits" binaryNumberRep="binary" lengthKind="implicit" byteOrder="{ $byte_order }" /> <dfdl:defineVariable name="byte_order" type="xs:string"/> </xs:appinfo> </xs:annotation> <xs:simpleType name="uint32" dfdl:lengthKind="explicit" dfdl:length="32"> <xs:restriction base="xs:unsignedInt"/> </xs:simpleType> </xs:schema>
  • 29. Example - PCAP Copyright 2023 Owl Cyber Defense 29 <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="binary" alignment="8" alignmentUnits="bits" lengthUnits="bits" binaryNumberRep="binary" lengthKind="implicit" byteOrder="{ $byte_order }" /> <dfdl:defineVariable name="byte_order" type="xs:string"/> </xs:appinfo> </xs:annotation> <xs:simpleType name="uint32" dfdl:lengthKind="explicit" dfdl:length="32"> <xs:restriction base="xs:unsignedInt"/> </xs:simpleType> </xs:schema>
  • 30. Example - PCAP Copyright 2023 Owl Cyber Defense 30 <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="binary" alignment="8" alignmentUnits="bits" lengthUnits="bits" binaryNumberRep="binary" lengthKind="implicit" byteOrder="{ $byte_order }" /> <dfdl:defineVariable name="byte_order" type="xs:string"/> </xs:appinfo> </xs:annotation> <xs:simpleType name="uint32" dfdl:lengthKind="explicit" dfdl:length="32"> <xs:restriction base="xs:unsignedInt"/> </xs:simpleType> </xs:schema>
  • 31. Example - PCAP Copyright 2023 Owl Cyber Defense 31 <xs:element name="magic_number" type="ex:uint32" dfdl:byteOrder="bigEndian"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:setVariable ref="byte_order"><![CDATA[{ if (. eq dfdl:unsignedInt('xA1B2C3D4')) then 'bigEndian' else if (. eq dfdl:unsignedInt('xD4C3B2A1') then 'littleEndian' else daf:error() }]]> </dfdl:setVariable> <dfdl:property name="outputValueCalc"><![CDATA[{ if ($dfdl:byteOrder eq 'bigEndian') then dfdl:unsignedInt('xA1B2C3D4') else dfdl:unsignedInt('xD4C3B2A1') }]]></dfdl:property> </xs:appinfo> </xs:annotation> </xs:element>
  • 32. Example - PCAP Copyright 2023 Owl Cyber Defense 32 <xs:element name="magic_number" type="ex:uint32" dfdl:byteOrder="bigEndian"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:setVariable ref="byte_order"><![CDATA[{ if (. eq 2712847316) then 'bigEndian' else if (. eq 3569595041) then 'littleEndian' else daf:error() }]]> </dfdl:setVariable> <dfdl:property name="outputValueCalc"><![CDATA[{ if ($dfdl:byteOrder eq 'bigEndian') then dfdl:unsignedInt('xA1B2C3D4') else dfdl:unsignedInt('xD4C3B2A1') }]]></dfdl:property> </xs:appinfo> </xs:annotation> </xs:element>
  • 33. Example - PCAP Copyright 2023 Owl Cyber Defense 33 <xs:element name="PacketHeader"> <xs:complexType> <xs:sequence> <xs:element name="Seconds" type="pcap:uint32"/> <xs:element name="USeconds" type="pcap:uint32"/> <xs:element name="InclLen" type="pcap:uint32" ... /> <xs:element name="OrigLen" type="pcap:uint32" ... /> </xs:sequence> </xs:complexType> </xs:element> ... <xs:element ref="pcap:LinkLayer" dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" dfdl:length="{ ../PacketHeader/InclLen }"/>
  • 34. Example - PCAP Copyright 2023 Owl Cyber Defense 34 <xs:element name="PacketHeader"> <xs:complexType> <xs:sequence> <xs:element name="Seconds" type="pcap:uint32"/> <xs:element name="USeconds" type="pcap:uint32"/> <xs:element name="InclLen" type="pcap:uint32" ... /> <xs:element name="OrigLen" type="pcap:uint32" ... /> </xs:sequence> </xs:complexType> </xs:element> ... <xs:element ref="pcap:LinkLayer" dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" dfdl:length="{ ../PacketHeader/InclLen }"/>
  • 35. Example - PCAP Copyright 2023 Owl Cyber Defense 35 <xs:element name="PacketHeader"> <xs:complexType> <xs:sequence> <xs:element name="Seconds" type="pcap:uint32"/> <xs:element name="USeconds" type="pcap:uint32"/> <xs:element name="InclLen" type="pcap:uint32" dfdl:outputValueCalc="{ if (dfdl:valueLength( ../../pcap:LinkLayer/pcap:Ethernet, 'bytes') le 60) then 60 else dfdl:valueLength( ../../pcap:LinkLayer/pcap:Ethernet, 'bytes') }" /> <xs:element name="OrigLen" type="pcap:uint32" dfdl:outputValueCalc="{ ../InclLen }"/> </xs:sequence> </xs:complexType> </xs:element> ... <xs:element ref="pcap:LinkLayer" dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" dfdl:length="{ ../PacketHeader/InclLen }"/>
  • 36. Example - PCAP Copyright 2023 Owl Cyber Defense 36 pcap global header field field field packet header data DFDL Implementation DFDL Infoset PCAP DFDL Schema PCAP Data
  • 37. Example: MIL-STD-2045 A dense bit-packed binary data format similar to many MIL-STD formats, but publicly available. Uses Least-Significant-Bit-First bit order – typical of MIL formats, different from most non-MIL formats. Copyright 2023 Owl Cyber Defense 37
  • 38. Example - MIL-STD-2045 ▪ Densely packed binary with unique bit order Copyright 2023 Owl Cyber Defense 38 dfdl:representation="binary" dfdl:alignment="1" dfdl:alignmentUnits="bits" dfdl:lengthUnits="bits" dfdl:lengthKind="explicit" dfdl:length dfdl:byteOrder="littleEndian" dfdl:bitOrder="leastSignificantBitFirst"
  • 39. Example - MIL-STD-2045 ▪ 7-bit strings embedded in binary Copyright 2023 Owl Cyber Defense 39 dfdl:encoding=“X-DFDL-US-ASCII-7BIT-PACKED"  0x7F (DEL) terminated, unless max-length met <xs:element name="str50" type="xs:string" dfdl:lengthPattern="[^x7F]{0,49}(?=x7F)|.{50}" /> <xs:sequence dfdl:terminator="{ if (fn:string-length(./str50) eq 50) then '%ES;' else '%DEL;' }" />
  • 40. Example - MIL-STD-2045 ▪ Presence indicators Copyright 2023 Owl Cyber Defense 40 <dfdl:discriminator test="{ . eq 1 }" /> Presence Indicator Optional Field
  • 41. Example - MIL-STD-2045 ▪ Repeat indicators Copyright 2023 Owl Cyber Defense 41 dfdl:occursCountKind="implicit" minOccurs="1" maxOccurs="unbounded" <dfdl:discriminator test="{ if (dfdl:occursIndex() eq 1) then fn:true() else ../fields[dfdl:occursIndex() – 1]/repeat_indicator eq 1 }" /> <xs:element name="GRI" type="tns:repeatIndicator" dfdl:outputValueCalc="{ if (dfdl:occursIndex() lt fn:count(..)) then 1 else 0 }" /> Repeat Indicator = 1 Data Repeat Indicator = 1 Data Repeat Indicator = 0 Data another one follows another one follows last one
  • 42. More DFDL Properties Copyright 2023 Owl Cyber Defense 42 dfdl:assert dfdl:discriminator dfdl:newVariableInstance dfdl:setVariable dfdl:defineEscapeSchema dfdl:defineVariable dfdl:ignoreCase dfdl:encodingErrorPolicy dfdl:alignment dfdl:alignmentUnits dfdl:fillByte dfdl:leadingSkip dfdl:trailingSkip dfdl:emptyValueDelimiterPolicy dfdl:outputNewLine dfdl:prefixIncludesPrefixLength dfdl:prefixLengthType dfdl:lengthPattern dfdl:textPadKind dfdl:textTrimKind dfdl:textOutputMinLength dfdl:escapeSchemeRef dfdl:escapeKind dfdl:escapeCharacter dfdl:escapeBlockStart dfdl:escapeBlockEnd dfdl:escapeEscapeCharacter dfdl:extraEscapedCharacters dfdl:generateEscapeBlock dfdl:textBidi dfdl:textStringJustification dfdl:textStringPadCharacter dfdl:textNumberJusitification dfdl:textStandardNaNRep dfdl:textStandardZeroRep dfdl:textStandardBase dfdl:binaryNumberRep dfdl:binaryPackedSignCodes dfdl:binaryFloatRep dfdl:calendarPattern dfdl:binaryCalendarRep dfdl:nilValueDelimiterPolicy dfdl:initiatedContent dfdl:separatorPosition dfdl:separatorSuppressionPolicy dfdl:floating dfdl:hiddenGroupRef dfdl:initiatedContent dfdl:occursCountKind dfdl:occursStopValue dfdl:inputValueCalc dfdl:outputValueCalc
  • 43. DFDL Doesn't Do... Limitations, Intended or Otherwise Copyright 2023 Owl Cyber Defense 43
  • 44. Things DFDL (v1.0) Does ▪ DFDL is for Data Sets ▪ Things typically thought of as files/streams full of data about XYZ. • Rows, Tables • Header-Body-Trailer • Hierarchical / Nested record-oriented data • Messages ▪ Industry & Military data interchange formats • HL7, HIPAA-5010, SWIFT, NACHA, X25, Link16, etc. Copyright 2023 Owl Cyber Defense 44
  • 45. Current DFDL v1.0 Language Limitations ▪ Recursive types • DFDL v1.0 not a Turing-Complete language • On purpose - it's a feature, not a bug ▪ Position of elements "by offset" • Random jumping around data • Ex: TIFF file format ▪ TIFF cannot be described in DFDL v1.0 Copyright 2023 Owl Cyber Defense 45 Thanks to http://langsec.org/occupy/
  • 46. Things DFDL (v1.0) Doesn't Do ▪ DFDL v1.0 was not originally intended for: • Document file formats ▪ e.g., MS Office documents (.doc), or RTF, PDF • Images, Video, Audio • Archives like zip files or tar files • Core dump/memory image format • Storage format of a RDBMS table • Graphs of pointers - object graphs dumped from memory ▪ DFDL is being extended (v2.0) to cover more • BLOBs - for images, video, audio ▪ Already available in Daffodil from v2.5.0 • Offsets/Pointers (prototype) - for TIFF, ZIP Copyright 2023 Owl Cyber Defense 46
  • 47. DFDL Schemas Apache Daffodil™ Project DFDL Standard Status Report 47 Apache Daffodil is a Trademark of the Apache Software Foundation
  • 48. DFDL Schemas Status Report Copyright 2023 Owl Cyber Defense 48
  • 49. DFDL Schemas Copyright 2023 Owl Cyber Defense 49 Public (many are on github) MIL-STD-2045 PCAP NITF PNG JPEG NACHA VCard QuasiXML Geonames EDIFACT IBM4690-TLOG ISO8583 BMP GIF Praat TextGrid ARINC429 (PoC) IPFIX Syslog iCalendar IMF SHP (shape file) KNXNet/IP(indust. control) Siemens S7 (indust. control) Asterix (Cat 034, 048) MagVar AFTN Flight Plan FOUO / CUI VMF (MIL-STD-6017 subset) USMTF ATO (MIL-STD-6040) LINK16 (NATO STANAG 5516) LINK16Subset (MIL-STD-6016F subset) A-GNOSC REMEDY ARMY DRRS USCG UCOP CEF-R1965 GMTIF (STANAG 4607 subset) SOTF JICD NACT DISV6 Commercial License $$$ SWIFT-MT (IBM) HIPAA-5010 (IBM) HL7-2.7 (IBM) OTH-Gold (Owl) USMTF (Owl) LINK16 (MIL-STD-6016 E, F, G) (Owl) VMF (MIL-STD-6017 A, B, C, D) (Owl) JREAP-C (MIL-STD-3011) (Owl) If you have DFDL schemas available (open, licensed, or just to show) I will add them to this list.
  • 51. Avoid Version Confusion 51 DFDL Language Specification Apache Daffodil™ Software • v1.0.0 • v1.1.0 • v2.0.0 • v2.1.0 • ... • v2.4.0 • ... • v3.0.0 • .... • v3.4.0 • .... v1.0 DFDL Schemas have their own individual release version numbers. E.g.., mil-std-2045 is release 0.0.3
  • 52. Other DFDL Implementations ▪ IBM DFDL - The First DFDL Implementation - v1.0 Nov 2011 • Embeddable as Library, C and Java • Includes Eclipse based tooling - graphical DFDL schema editor/test environment • Found in IBM products: ▪ IBM App Connect Enterprise product family ▪ IBM Integration Bus ▪ IBM InfoSphere Master Data Management ▪ IBM z/TPF product family ▪ European Space Agency - DFDL4S = DFDL for Space • Embeddable as Library, Java and C++ versions. • Binary-data-only subset of DFDL • Created by ESA for satellite data descriptions • Evolving to be a fully compatible DFDL subset. • More info at ▪ http://eop-cfi.esa.int/index.php/applications/dfdl4s Copyright 2023 Owl Cyber Defense 52
  • 53. Interoperability – Apache Daffodil & IBM 53 MIL-STD-2045 VMF GMTIF Link16 PCAP SHP GIF BMP PNG JPEG NITF NACHA ISO8583 CSV QuasiXML Geonames IBM4690-TLOG EDIFACT vCard HL7-v2.7 MagVar Daffodil IBM DFDL A-GNOSC REMEDY iCalendar USMTF ARMY DRRS USCG UCOP CEF-R1965 IPFIX Interoperable These use optional DFDL features not supported (yet) by IBM DFDL COBOL- defined COBOL/FINSERV Data requires optional DFDL features not available until Daffodil 3.5.0 release.
  • 54. Apache Daffodil ▪ Open Source • Apache Daffodil - Announced in March 2021 • Apache = A Permissive/Commercial Friendly License – embed how you wish, into anything. ▪ Runs on Java 8+ Virtual Machine ▪ Started as research project by UofI ~2009 ▪ Originally developed by Owl (then Tresys) ~2012 • Funded by the USG • Became Apache Incubator project in 2017 • Graduated from Apache Incubator in March 2021 ▪ Highly interoperable with IBM DFDL (since v2.4.0) ▪ Active development ▪ https://daffodil.apache.org 54
  • 55. Apache Daffodil - Recent Major Features ▪ EXI support ▪ Pluggable Character Sets ▪ Embedded XML Strings in data ▪ C-Code generator backend ▪ Pluggable 'layers' for Checksums, line-folding, CRC, etc. ▪ Schematron Support - pluggable Validators ▪ SAX API 55
  • 56. Apache Daffodil VSCode Extension ▪ Data Format Development and Debug IDE • Set data breakpoints • Step through parsing • Save TDML test cases • Edit large data files ▪ Extension to VSCode IDE • Front-end - Typescript • Back-end server - Scala ▪ Version 1.2.0 available 56
  • 57. Apache Daffodil ▪ Jar libraries – runs on JVM • Compiler, utilities, TDML runner • Signed Jars available from Maven Central • Runtime 1 - Scala - runs on JVM - supports nearly all of DFDL v1.0 • Runtime 2 - C-generator - creates fast C code - Small subset of the DFDL language ▪ Command Line Interface • Interactive CLI debugger and trace ▪ Java & Scala API with documentation ▪ XML and JSON for parse-output, unparse-input ▪ You must get your DFDL schemas somewhere... ▪ github (DFDLSchemas, others) ▪ Write them! (and share them!) 57 If you download it, what do you get?
  • 58. Apache Daffodil Internal Components Copyright 2023 Owl Cyber Defense 58 Compiler •compiles DFDL schemas to runtime data structures •Issues diagnostics Runtime 1 DPath - Xpath-like language • compiler • runtime Infoset • Convert to/from XML, JSON • Fast, constant-time access Parser primitives Unparser primitives TDML Runner • Test (& Tutorial) Data Markup Language Utility Libraries • for compiler • for runtime Command Line Interpreter Java API Scala API Tests - Unit (Scala) Written in Scala Tests - System (TDML) Breakpoint debugger I/O library • bits (not bytes), bitOrder • streaming • non-8-bit-characters (7, 6, 5) • unbounded lookahead - parsing/backtracking, and unparsing Runtime2 (C-code Gen) Primitives (C) Code Generator
  • 59. Apache Daffodil Integrations ▪ EXI - binary form of XML - much faster/denser ▪ XProc - Calabash ▪ Apache Spark ▪ Apache NiFi ▪ SoftwareAGTM Integration Server (aka WebMethodsTM) ▪ Owl CDS products/services ▪ Other companies also! ▪ Potential • Apache Tika • Other data-handling frameworks - Flink, Hadoop,.... 59
  • 60. EXI (binary XML) - Data Size ▪ EXI is a w3c standard for a binary XML • Same Infoset, same API (SAX) ▪ Requires massively less space than textual XML ▪ See examples on github.com OpenDFDL 60 0% 1000% 2000% 3000% 4000% 5000% 6000% 7000% bmp cef iCal Jpeg Pcap shp vmf Link16 Size Increase (XML Text) Source Size % XML Size % 0% 20% 40% 60% 80% 100% 120% 140% bmp cef iCal Jpeg Pcap shp vmf Link16 Size Inrease EXI Source Size % EXI Size % (SA)
  • 61. Apache Daffodil Code Base 115602 62661 218998 Lines of Code = 397K* source (scala + java) unit tests(scala + java) test suite (tdml, xsd, scala) 61 Data as of 2023-03-15 master branch Main Daffodil library - does not include VSCode Extension Excludes DFDL Schemas (Link16 DFDL Schema is 800K+ LoC just by itself)
  • 62. Apache Daffodil Roadmap ▪ Roadmap on https://s.apache.org/daffodil-wiki ▪ Depends on community needs, available developers 62
  • 63. Apache Daffodil Community ▪ Scala programmers wanted ! ▪ Java programmers who want to learn Scala wanted! ▪ Typescript/Javascript programmers who want to work on the Daffodil VSCode Extension - wanted! 63
  • 64. Owl DFDL Services & Products We would like to know how to make DFDL and Daffodil solve your data challenges! ▪ Example data challenges • Convert Legacy Data formats to Modern • Data Intake to analysis systems • Research Systems - increase credibility by using real MIL data formats • Data Publishing – e.g., USG Open Data requirements • Data Security – CDS/ rip, inspect/transform, rebuild to specification ▪ Examples of services Owl provides: • Create (or help you create) & Test DFDL Schemas for ▪ Your specific in-house data formats ▪ Industry/Military standard data formats • Commercial support for Apache Daffodil software ▪ Fix, Enhance Daffodil to meet your needs ▪ Customize, add features, APIs, …. Copyright 2023 Owl Cyber Defense 64 Owl Cross-Domain Solutions using DFDL Provide the highest level of network security. 1U CDS Appliances ▪ XD-Bridge ▪ XD-Guardian ▪ www.owlcyberdefense.com
  • 65. Questions? DFDL Specification: http://ogf.org/dfdl DFDL Schemas: https://github.com/DFDLSchemas Daffodil Open Source: https://daffodil.apache.org Copyright 2023 Owl Cyber Defense 65
  • 66. END
  • 67. Example: Java API Yes, it is 'Hello world!' Copyright 2023 Owl Cyber Defense 67 Hello world! <helloWorld> <word>Hello</word> <word>world!</word> </helloWorld>
  • 68. Daffodil Java API – Hello World Copyright 2023 Owl Cyber Defense 68 <!– helloWorld.dfdl.xsd --> … <dfdl:format ref="daffodilTest1 representation="text" encoding="utf-8" lengthKind="delimited"/> … <xs:element name="helloWorld"> <xs:complexType><xs:sequence> <xs:element name="word" type="xs:string" maxOccurs="unbounded" dfdl:lengthKind="delimited" dfdl:terminator="%WSP+;" dfdl:occursCountKind="parsed"/> </xs:sequence></xs:complexType> </xs:element> public class HelloWorld { public static void main(String[] args) throws IOException { ... // let's fill this in } }
  • 69. Daffodil Java API – Hello World Copyright 2023 Owl Cyber Defense 69 // // First compile the DFDL Schema // Creates a Daffodil ProcessorFactory object // Compiler c = Daffodil.compiler(); c.setValidateDFDLSchemas(true); File schemaFile = new File("helloWorld.dfdl.xsd"); ProcessorFactory pf = c.compileFile(schemaFile); // list any compile-time diagnostics - warnings and errors List<Diagnostic> diags = pf.getDiagnostics(); for (Diagnostic d : diags) { System.err.println(d.getSomeMessage()); } if (pf.isError()) { System.exit(1); }
  • 70. Daffodil Java API – Hello World Copyright 2023 Owl Cyber Defense 70 // // Use ProcessorFactory to create data processors // As many as you want. They can be run (to parse/unparse) on // different threads. // DataProcessor dp = pf.onPath("/"); ReadableByteChannel rbc = Channels.newChannel(new FileInputStream("helloWorld.dat")); // // Setup JDOM outputter // JDOMInfosetOutputter outputter = new JDOMInfosetOutputter(); ParseResult res = dp.parse(rbc, outputter); boolean err = res.isError(); if (err) { … list diagnostics and exit(2) … } org.jdom2.Document doc = outputter.getResult(); // JDOM2 XML Infoset
  • 71. Daffodil Java API – Hello World Copyright 2023 Owl Cyber Defense 71 // // Pretty print the XML infoset. // XMLOutputter xo = new XMLOutputter(Format.getPrettyFormat()); xo.output(doc, System.out); // or transform/inspect/etc.
  • 72. Daffodil Java API – Hello World Copyright 2023 Owl Cyber Defense 72 // // unparse // WritableByteChannel wbc = Channels.newChannel(new FileOutputStream(…)); JDOMInfosetInputter inputter = new JDOMInfosetInputter(doc); UnparseResult res2 = dp.unparse(inputter, wbc); if (res2.isError()) { … same as before … } // File contains unparsed data
  • 73. Question ▪ Q: Instead of DFDL why not use ... • Apache Avro? • Apache Thrift? • Google GPBs? • ASN.1 BER (or PER/DER/XER) ? ▪ A: Those are great, but are prescriptive. • They don't describe formats, they are data formats • We need a descriptive language. Copyright 2023 Owl Cyber Defense 73
  • 74. Question: Why is DFDL Needed? - ASN.1 ECN ▪ What about ASN.1 Encoding Control Notation? Doesn't it do what DFDL Does? ▪ Yes, ASN.1 ECN is very interesting. It is... • Already an ISO Standard (since 2008) • Conceptually similar ▪ Logical schema language + notations for physical representation • Very different in the details. ▪ Developers [ Love | Hate ] [ ASN.1 | XML ] ▪ Differences that matter: • ASN.1 ECN ▪ No current open source implementation of ECN (as of 2021-03-08) • Sourceforge lvmaiAsn last update was 2013. ▪ Extension of a binary data standard ASN.1 BER/PER/DER ▪ Goal to describe legacy protocol messages • DFDL ▪ Open source Daffodil implementation ▪ Extension of a textual data standard XML ▪ Goal to be union of data integration tool capabilities for format description ▪ No thorough side-by-side comparison has been done Copyright 2023 Owl Cyber Defense 74
  • 75. Daffodil Performance ▪ Uses Daffodil Message Streaming API ▪ The K03.2 message is 49 bytes. Expands to 4555 bytes of XML • Note: EXI eliminates this text expansion - see earlier slide on EXI. • This test was done using textual XML ▪ The file was 131072 repeats of that 49 byte message ▪ Intel Core i7-6820HQ CPU @ 2.70 Ghz (a laptop) ▪ Running on 2 processes (parse, unparse), 1 thread each. ▪ 81.218 seconds => ~ 0.6 milliseconds per message ▪ Conclusion: • DFDL/Daffodil is not going to kill your 10msec latency budget. • Cost will be from additional XML filtering/validation operations Copyright 2023 Owl Cyber Defense 75 Daffodil Parse & Validate XML text pipe Daffodil Unparse K03.2 messages K03.2 messages