Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tresys DFDL Data Format Description Language & Daffodil Open Source - Public Overview

265 views

Published on

A public review of the status and capabilities of the Daffodil open-open source software system for describing data formats.

Published in: Software
  • Be the first to comment

Tresys DFDL Data Format Description Language & Daffodil Open Source - Public Overview

  1. 1. DFDL and the Open Source Daffodil Project Copyright 2018 Tresys Technology
  2. 2. Agenda ▪ DFDL • Why is DFDL Needed? • Quick overview of DFDL • Use Cases for DFDL ▪ DFDL Tutorial • Examples: CSV, PCAP, MIL-STD-2045, USMTF*, VMF* • Java API ▪ Demonstrations ▪ DFDL Schemas ▪ Implementations • Open Source: Apache Daffodil (Incubating) ▪ Tresys DFDL Products and Services Copyright 2018 Tresys Technology2
  3. 3. Why is DFDL Needed? Every Enterprise Software Company ▪ IBM (10+) ▪ Oracle(10+) ▪ SAP(10+) ▪ Microsoft ▪ SAS ▪ Informatica ▪ SyncSort ▪ AbInitio ▪ Pervasive ▪ Qlik/Expressor ▪ Pentaho ▪ .... Dozens more Every kind of software that takes in data: ▪ data directed routing (msg brokers) ▪ database ▪ data analysis and/or data mining ▪ data cleansing ▪ master data management ▪ application integration ▪ data visualization ▪ … There are hundreds of ad-hoc data format description systems All these data format descriptions are: • proprietary • ad-hoc • incompatible Even within products of the same company! Copyright 2018 Tresys Technology3
  4. 4. Why DFDL is Needed? Hundreds of data format description systems… means that: ▪ Vendor investment is spread too thin • Tools for creating data formats are inadequate • No product is comprehensive enough • Some products aren’t fast enough ▪ Customers get locked in to proprietary solution ▪ High training cost, low career value ▪ Inflexible packaging • not libraries - must embed some product in your application data flow Copyright 2018 Tresys Technology4
  5. 5. Why DFDL is Needed? Result… we are stuck with: ▪ Data parsers continue to be written as programs ▪ High cost to write, high error rate. ▪ Certification (C&A) costs = $$$$$ A standard DFDL fixes all these problems Copyright 2018 Tresys Technology5
  6. 6. Data Format Description Language ▪ DFDL: A new open standard • From the Open Grid Forum (OGF) • Version 1.0 – Sept. 2014 • Proposed Recommendation Status (as of Oct 2016) ▪ DFDL is a way of describing data… • It is NOT a data format itself! ▪ DFDL can describe … • Textual and binary • Commercial record-oriented • Scientific and numeric • Modern and legacy • Industry/Military standards ▪ While allowing high performance • You can choose the right data format for the job ▪ Leverage XML technology and concepts • Use W3C XML Schema subset & type system to describe the logical format of the data • Use annotations within the XSD to describe the physical representation of the data ▪ Read and write from same DFDL Schema ▪ Keep simple cases simple • Simple DFDL Schemas are human readable Copyright 2018 Tresys Technology6
  7. 7. Example – Delimited Text Data rlimit=5;rpngx=-7.1E8 Copyright 2018 Tresys Technology7
  8. 8. Example – Delimited Text Data rlimit=5;rpngx=-7.1E8 ASCII text integer ASCII text floating point Separator Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters InitiatorInitiator Copyright 2018 Tresys Technology8
  9. 9. DFDL Schema <xs:complexType name=“rValues"> <xs:sequence> <xs:element name=“rlimit" type="xs:int"/> <xs:element name=“rpngx" type="xs:float"/> </xs:sequence> </xs:complexType> Logical Elements Copyright 2018 Tresys Technology9
  10. 10. DFDL schema <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format representation="text" textNumberRep="standard" encoding="ascii" lengthKind="delimited“ .../> </xs:appinfo> </xs:annotation> <xs:complexType name=“rValues"> <xs:sequence dfdl:separator=";" ... > <xs:element name="rLimit" type="xs:int" dfdl:initiator=“rLimit=" ... /> <xs:element name=“rpngx" type="xs:float" dfdl:initiator=“rpngx=" ... /> </xs:sequence> </xs:complexType> DFDL properties rLimit=5 rpngx=-7.1E8 ; Copyright 2018 Tresys Technology10
  11. 11. DFDL Lifecycle rLimit=5;rpngx=-7.1E8 DFDL Implementation DFDL Schema DFDL Implementation Element Name: rValues Element Name: rpngx Value: -7.1E8 Type: Double Element Name: rLimit Value: 5 Type: Int Parse Unparse Infoset Data Copyright 2018 Tresys Technology11
  12. 12. DFDL Use Cases ▪ Deep Content Inspection/Sanitization • CDS, Network Guards • Data Filtering • Rip-and-Rebuild Firewalls ▪ Data-Directed Routing ▪ Data Intake • Data Warehouse or Analytical Data Store ▪ Interoperability / Data Translation ▪ Conversion of Legacy Data to Modern Formats ▪ Open Data / Data Export • Recasting data as XML for export/publishing Copyright 2018 Tresys Technology12
  13. 13. CSV (ASCII text delimited) PCAP (variable length binary) MIL-STD-2045 (dense bit-packed binary, LSBF) EXAMPLES Copyright 2018 Tresys Technology13
  14. 14. Comma-Separated Values EXAMPLE: CSV Copyright 2018 Tresys Technology14
  15. 15. Example - CSV ▪ Textual Format (ASCII) ▪ Comma-Separated Values ▪ Tabular data ▪ Rows separated by newlines ▪ Fields separated by commas Smith,Charles,36,6.2 Williams,Mary,51,5.5 Copyright 2018 Tresys Technology15
  16. 16. Example - CSV <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="text" encoding="ASCII" lengthKind="delimited" escapeSchemRef="quoted" occursCountKind="parsed" /> <dfdl:defineEscapeScheme name="quoted"> <dfdl:escapeScheme escapeKind="escapeBlock" escapeBlockStart="&quot;" escapeBlockEnd="&quot;" escapeEscapeCharacter="&quot;"> </dfdl:defineEscapeScheme> </xs:appinfo> <xs:annotation> <xs:schema> Copyright 2018 Tresys Technology16
  17. 17. Example - CSV <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="text" encoding="ASCII" lengthKind="delimited" escapeSchemRef="quoted" occursCountKind="parsed" /> <dfdl:defineEscapeScheme name="quoted"> <dfdl:escapeScheme escapeKind="escapeBlock" escapeBlockStart="&quot;" escapeBlockEnd="&quot;" escapeEscapeCharacter="&quot;"> </dfdl:defineEscapeScheme> </xs:appinfo> <xs:annotation> <xs:schema> Copyright 2018 Tresys Technology17
  18. 18. Example - CSV <xs:element name="csv"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix"> <xs:element name="record" maxOccurs="unbounded"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="item" type="xs:string" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> Copyright 2018 Tresys Technology18
  19. 19. Example - CSV <xs:element name="csv"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix"> <xs:element name="record" maxOccurs="unbounded"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="item" type="xs:string" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> Copyright 2018 Tresys Technology19
  20. 20. Example - CSV CSV Input csv record item Smith item Charles item 36 item 6.2 record item Williams item Mary Item 51 item 5.5 DFDL Implementation DFDL Infoset Smith,Charles,36,6.2 Williams,Mary,51,5.5 CSV DFDL Schema Copyright 2018 Tresys Technology20
  21. 21. Packet Capture File Format EXAMPLE: PCAP Copyright 2018 Tresys Technology21
  22. 22. Example - PCAP pcap global header field field field packet header data DFDL Implementation DFDL Infoset PCAP DFDL Schema PCAP Data Copyright 2018 Tresys Technology22
  23. 23. Example - PCAP ▪ Network Packet Capture ▪ Binary Format ▪ Global Header ▪ Packet Header/Data ▪ Variable Endianness and Data Length Global Header Packet Header Packet Data Packet Header Packet Data … Copyright 2018 Tresys Technology23
  24. 24. Example - PCAP <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="binary" alignment="8" alignmentUnits="bits" lengthUnits="bits" binaryNumberRep="binary" lengthKind="implicit" byteOrder="{ $byte_order }" /> <dfdl:defineVariable name="byte_order" type="xs:string"/> </xs:appinfo> </xs:annotation> <xs:simpleType name="uint32" dfdl:lengthKind="explicit" dfdl:length="32"> <xs:restriction base="xs:unsignedInt"/> </xs:simpleType> </xs:schema> Copyright 2018 Tresys Technology24
  25. 25. Example - PCAP <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="binary" alignment="8" alignmentUnits="bits" lengthUnits="bits" binaryNumberRep="binary" lengthKind="implicit" byteOrder="{ $byte_order }" /> <dfdl:defineVariable name="byte_order" type="xs:string"/> </xs:appinfo> </xs:annotation> <xs:simpleType name="uint32" dfdl:lengthKind="explicit" dfdl:length="32"> <xs:restriction base="xs:unsignedInt"/> </xs:simpleType> </xs:schema> Copyright 2018 Tresys Technology25
  26. 26. Example - PCAP <xs:schema> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="defaults" representation="binary" alignment="8" alignmentUnits="bits" lengthUnits="bits" binaryNumberRep="binary" lengthKind="implicit" byteOrder="{ $byte_order }" /> <dfdl:defineVariable name="byte_order" type="xs:string"/> </xs:appinfo> </xs:annotation> <xs:simpleType name="uint32" dfdl:lengthKind="explicit" dfdl:length="32"> <xs:restriction base="xs:unsignedInt"/> </xs:simpleType> </xs:schema> Copyright 2018 Tresys Technology26
  27. 27. Example - PCAP <xs:element name="magic_number" type="ex:uint32" dfdl:byteOrder="bigEndian"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:setVariable ref="byte_order"><![CDATA[{ if (. eq 2712847316) then 'bigEndian' else if (. eq 3569595041) then 'littleEndian' else daf:error() }]]> </dfdl:setVariable> <dfdl:property name="outputValueCalc"><![CDATA[{ if ($dfdl:byteOrder eq 'bigEndian') then xs:hexBinary('A1B2C3D4') else xs:hexBinary('D4C3B2A1') }]]></dfdl:property> </xs:appinfo> </xs:annotation> </xs:element> Copyright 2018 Tresys Technology27
  28. 28. Example - PCAP <xs:element name="magic_number" type="ex:uint32" dfdl:byteOrder="bigEndian"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:setVariable ref="byte_order"><![CDATA[{ if (. eq 2712847316) then 'bigEndian' else if (. eq 3569595041) then 'littleEndian' else daf:error() }]]> </dfdl:setVariable> <dfdl:property name="outputValueCalc"><![CDATA[{ if ($dfdl:byteOrder eq 'bigEndian') then xs:hexBinary('A1B2C3D4') else xs:hexBinary('D4C3B2A1') }]]></dfdl:property> </xs:appinfo> </xs:annotation> </xs:element> Copyright 2018 Tresys Technology28
  29. 29. Example - PCAP <xs:element name="PacketHeader"> <xs:complexType> <xs:sequence> <xs:element name="Seconds" type="pcap:uint32"/> <xs:element name="USeconds" type="pcap:uint32"/> <xs:element name="InclLen" type="pcap:uint32" ... /> <xs:element name="OrigLen" type="pcap:uint32" ... /> </xs:sequence> </xs:complexType> </xs:element> ... <xs:element ref="pcap:LinkLayer" dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" dfdl:length="{ ../PacketHeader/InclLen }"/> Copyright 2018 Tresys Technology29
  30. 30. Example - PCAP <xs:element name="PacketHeader"> <xs:complexType> <xs:sequence> <xs:element name="Seconds" type="pcap:uint32"/> <xs:element name="USeconds" type="pcap:uint32"/> <xs:element name="InclLen" type="pcap:uint32" ... /> <xs:element name="OrigLen" type="pcap:uint32" ... /> </xs:sequence> </xs:complexType> </xs:element> ... <xs:element ref="pcap:LinkLayer" dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" dfdl:length="{ ../PacketHeader/InclLen }"/> Copyright 2018 Tresys Technology30
  31. 31. Example - PCAP <xs:element name="PacketHeader"> <xs:complexType> <xs:sequence> <xs:element name="Seconds" type="pcap:uint32"/> <xs:element name="USeconds" type="pcap:uint32"/> <xs:element name="InclLen" type="pcap:uint32" dfdl:outputValueCalc="{ if (dfdl:valueLength( ../../pcap:LinkLayer/pcap:Ethernet, 'bytes') le 60) then 60 else dfdl:valueLength( ../../pcap:LinkLayer/pcap:Ethernet, 'bytes') }" /> <xs:element name="OrigLen" type="pcap:uint32" dfdl:outputValueCalc="{ ../InclLen }"/> </xs:sequence> </xs:complexType> </xs:element> ... <xs:element ref="pcap:LinkLayer" dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" dfdl:length="{ ../PacketHeader/InclLen }"/> Copyright 2018 Tresys Technology31
  32. 32. Example - PCAP pcap global header field field field packet header data DFDL Implementation DFDL Infoset PCAP DFDL Schema PCAP Data Copyright 2018 Tresys Technology32
  33. 33. EXAMPLE: MIL-STD-2045 A dense bit-packed binary data format similar to many MIL-STD formats, but publicly available. Uses Least-Significant-Bit-First bit order – typical of MIL formats, different from most non-MIL formats. Copyright 2018 Tresys Technology33
  34. 34. Example - MIL-STD-2045 dfdl:representation="binary" dfdl:alignment="1" dfdl:alignmentUnits="bits" dfdl:lengthUnits="bits" dfdl:lengthKind="explicit" dfdl:length dfdl:byteOrder="littleEndian" dfdl:bitOrder="leastSignificantBitFirst" ▪ Densely packed binary with unique bit order Copyright 2018 Tresys Technology34
  35. 35. Example - MIL-STD-2045 dfdl:encoding=“X-DFDL-US-ASCII-7BIT-PACKED" ▪ 7-bit strings embedded in binary  0x7F (DEL) terminated, unless max-length met <xs:element name="str50" type="xs:string" dfdl:lengthPattern="[^x7F]{0,49}(?=x7F)|.{50}" /> <xs:sequence dfdl:terminator="{ if (fn:string-length(./str50) eq 50) then '%ES;' else '%DEL;' }" /> Copyright 2018 Tresys Technology35
  36. 36. Example - MIL-STD-2045 <dfdl:discriminator test="{ . eq 1 }" /> ▪ Presence indicators Presence Indicator Optional Field Copyright 2018 Tresys Technology36
  37. 37. Example - MIL-STD-2045 dfdl:occursCountKind="implicit" minOccurs="1" maxOccurs="unbounded" <dfdl:discriminator test="{ if (dfdl:occursIndex() eq 1) then fn:true() else ../fields[dfdl:occursIndex() – 1]/repeat_indicator eq 1 }" /> <xs:element name="GRI" type="tns:repeatIndicator" dfdl:outputValueCalc="{ if (dfdl:occursIndex() lt fn:count(..)) then 1 else 0 }" /> ▪ Repeat indicators Repeat Indicator = 1 Data Repeat Indicator = 1 Data Repeat Indicator = 0 Data another one follows another one follows last one Copyright 2018 Tresys Technology37
  38. 38. More DFDL Properties dfdl:assert dfdl:discriminator dfdl:newVariableInstance dfdl:setVariable dfdl:defineEscapeSchema dfdl:defineVariable dfdl:ignoreCase dfdl:encodingErrorPolicy dfdl:alignment dfdl:alignmentUnits dfdl:fillByte dfdl:leadingSkip dfdl:trailingSkip dfdl:emptyValueDelimiterPolicy dfdl:outputNewLine dfdl:prefixIncludesPrefixLength dfdl:prefixLengthType dfdl:lengthPattern dfdl:textPadKind dfdl:textTrimKind dfdl:textOutputMinLength dfdl:escapeSchemeRef dfdl:escapeKind dfdl:escapeCharacter dfdl:escapeBlockStart dfdl:escapeBlockEnd dfdl:escapeEscapeCharacter dfdl:extraEscapedCharacters dfdl:generateEscapeBlock dfdl:textBidi dfdl:textStringJustification dfdl:textStringPadCharacter dfdl:textNumberJusitification dfdl:textStandardNaNRep dfdl:textStandardZeroRep dfdl:textStandardBase dfdl:binaryNumberRep dfdl:binaryPackedSignCodes dfdl:binaryFloatRep dfdl:calendarPattern dfdl:binaryCalendarRep dfdl:nilValueDelimiterPolicy dfdl:initiatedContent dfdl:separatorPosition dfdl:separatorSuppressionPolicy dfdl:floating dfdl:hiddenGroupRef dfdl:initiatedContent dfdl:occursCountKind dfdl:occursStopValue dfdl:inputValueCalc dfdl:outputValueCalc Copyright 2018 Tresys Technology38
  39. 39. EXAMPLE: JAVA API Yes, it is 'Hello world!' Hello world! <helloWorld> <word>Hello</word> <word>world!</word> </helloWorld> Copyright 2018 Tresys Technology39
  40. 40. Daffodil Java API – Hello World public class HelloWorld { public static void main(String[] args) throws IOException { ... // let's fill this in } } <!– helloWorld.dfdl.xsd --> … <dfdl:format ref="daffodilTest1 representation="text" encoding="utf-8" lengthKind="delimited"/> … <xs:element name="helloWorld"> <xs:complexType><xs:sequence> <xs:element name="word" type="xs:string" maxOccurs="unbounded" dfdl:lengthKind="delimited" dfdl:terminator="%WSP+;" dfdl:occursCountKind="parsed"/> </xs:sequence></xs:complexType> </xs:element> Copyright 2018 Tresys Technology40
  41. 41. Daffodil Java API – Hello World // // First compile the DFDL Schema // Creates a Daffodil ProcessorFactory object // Compiler c = Daffodil.compiler(); c.setValidateDFDLSchemas(true); File schemaFile = new File("helloWorld.dfdl.xsd"); ProcessorFactory pf = c.compileFile(schemaFile); // list any compile-time diagnostics - warnings and errors List<Diagnostic> diags = pf.getDiagnostics(); for (Diagnostic d : diags) { System.err.println(d.getSomeMessage()); } if (pf.isError()) { System.exit(1); } Copyright 2018 Tresys Technology41
  42. 42. Daffodil Java API – Hello World // // Use ProcessorFactory to create data processors // As many as you want. They can be run (to parse/unparse) on // different threads. // DataProcessor dp = pf.onPath("/"); ReadableByteChannel rbc = Channels.newChannel(new FileInputStream("helloWorld.dat")); // // Setup JDOM outputter // JDOMInfosetOutputter outputter = new JDOMInfosetOutputter(); ParseResult res = dp.parse(rbc, outputter); boolean err = res.isError(); if (err) { … list diagnostics and exit(2) … } org.jdom2.Document doc = outputter.getResult(); // JDOM2 XML Infoset Copyright 2018 Tresys Technology42
  43. 43. Daffodil Java API – Hello World // // Pretty print the XML infoset. // XMLOutputter xo = new XMLOutputter(Format.getPrettyFormat()); xo.output(doc, System.out); // or transform/inspect/etc. Copyright 2018 Tresys Technology43
  44. 44. Daffodil Java API – Hello World // // unparse // WritableByteChannel wbc = Channels.newChannel(new FileOutputStream(…)); JDOMInfosetInputter inputter = new JDOMInfosetInputter(doc); UnparseResult res2 = dp.unparse(inputter, wbc); if (res2.isError()) { … same as before … } // File contains unparsed data Copyright 2018 Tresys Technology44
  45. 45. DFDL Schemas Copyright 2018 Tresys Technology Public (github) MIL-STD-2045 PCAP NITF PNG JPEG NACHA VCard QuasiXML EDIFACT iCalendar** IBM4690-TLOG IMF** ISO8583 BMP GIF Praat TextGrid ARINC429* JPEG2000** planned: Exif, EP, DNG, WMF, EMF, ... planned: Asterisk, IPFIX FOUO (DI2E.net) VMF (MIL-STD-6017) USMTF ATO (MIL-STD-6040) LINK16 (NATO STANAG 5516/MIL-STD-6016) A-GNOSC REMEDY ARMY DRRS USCG UCOP CEF-R1965 GMTIF (STANAG 4607) Commercial License $$$ SWIFT-MT (IBM) HIPAA-5010 (IBM) HL7-2.7 (IBM) * = in development ** = not yet published 45
  46. 46. Link16 - DFDL Schema ▪ Not public standard - FOUO/NATO UNCLASSIFIED • MIL-STD-6016 • NATO STANAG 5516 ▪ Created in collaboration with NATO NCI Agency IOS Team • Link16 center of excellence ▪ Tested .... • over 100,000 tracks of data • all choices/branches of all messages exercised ▪ DFDL Schema serves for... • parse/unparse of data to/from XML/JSON • a machine-readable version of the Link16 specification ▪ generate doc ▪ generate various implementations ▪ generate transforms/XSLT or other ▪ Available on DI2E.net DFDL/Daffodil project Copyright 2018 Tresys Technology46
  47. 47. • Daffodil – Open Source • IBM DFDL • DFDL4S – from ESA DFDL IMPLEMENTATIONS Copyright 2018 Tresys Technology47
  48. 48. Daffodil ▪ Open Source • Apache Daffodil (incubating) ▪ A Permissive/Commercial Friendly License – embed how you wish, into anything. ▪ Runs on Java 8 Virtual Machine ▪ Started as research project by UofI ~2009 ▪ Primarily developed by Tresys Technology ~2012 • Funded by the USG ▪ Subset of full DFDL features – oriented toward our customers’ needs • Packed decimal – no. Bit order – yes. Computed elements = yes. ▪ Active development ▪ https://daffodil.apache.org Copyright 2018 Tresys Technology48
  49. 49. Daffodil ▪ Command Line Interface ▪ Interactive CLI Debugger and Trace ▪ Java & Scala API with documentation ▪ Calabash Extension for XProc pipeline ▪ Apache NiFi Integration Copyright 2018 Tresys Technology49
  50. 50. Daffodil Code Base 79015 30237 115888 42784 33388 2541 978 Lines of Code = 304K source (scala + java) unit tests(scala) test suite (tdml, xsd, scala) DFDLSchemas fouo-schemas Integrations sbt Copyright 2018 Tresys Technology Data as of 2017-08-07 50
  51. 51. Daffodil Performance ▪ Varies with complexity of data format ▪ 1-thread timings, as of 2017-09-01, Daffodil 2.0.0 release format parse rate (MB/sec) unparse rate (MB/sec) CSV - simple delimited text 1.2 2.8 Link16 J3.5 - dense binary 0.8 * 0.2 PCAP - Binary IP Packets 20 2.2 ATO - complex delimited text 0.3 0.9 * = Approximately 20K Link16 NACT messages/sec with 3 link16 words each. (~60 bytes/msg) Copyright 2018 Tresys Technology51
  52. 52. Daffodil Integrations ✓ XProc - Calabash ✓ Apache Spark ✓ Apache NiFi ✓ Apache Storm • demonstrated by Quark Security. Published ??? ✓ Tresys CDS products/services • details are FOUO ▪ Potential • Apache Tika • Other data-handling frameworks - Flink, Hadoop,.... Copyright 2018 Tresys Technology52
  53. 53. Daffodil with Apache NiFi Provides graphical user-interface similar to commercial data integration products. Copyright 2018 Tresys Technology53
  54. 54. Daffodil Roadmap ✓ 2.1.0 - Current version as of May 2018 Roadmap is maintained on project wiki https://cwiki.apache.org/confluence/display/DAFFODIL/Apache+Daffodil • 2.2.0 - Three new important features • BLOB support (DAFFODIL-1735) - enables large image file formats ✓ pre/post processor support (DAFFODIL-1734) - enables IETF format needs like line folding, and base64 encoding • message streaming API (DAFFODIL-1065, DAFFODIL-1526, DAFFODIL-1799) enables unbounded stream of input messages for parsing, output for unparsing • 2.3.0 - Interoperability • Data types and features to enable Daffodil to run IBM-published DFDL schemas • Cross-validation test system • 2.4.0 - Usability, Community Growth • Debug/Trace improvements • Schema compilation speed/space improvements • Improved developer documentation of Daffodil's internals. • Improved streaming behavior for runtime (parsing and unparsing) • Tutorials on authoring DFDL schemas • ... and your needed features here! Copyright 2018 Tresys Technology54
  55. 55. Daffodil Future Development ▪ DFDL Schemas for important formats • Link16 • AIS • Asterisk ▪ Cross-validation/test with IBM DFDL • Packed/zoned decimal and other IBM-legacy formats ▪ Tutorials on writing/debugging DFDL schemas ▪ Improved Trace and Debug ▪ Streaming behavior for parsing, unparsing ▪ Faster parsing & unparsing ▪ Faster schema compilation for large schemas ▪ Integrate into more frameworks (Storm, Hadoop, Flink, etc.) ▪ Extensions: recursion, layering, extensibility, document formats, BLOB/CLOB Copyright 2018 Tresys Technology55
  56. 56. IBM© DFDL ▪ Designed to be embeddable or standalone ▪ Includes parser and unparser ▪ Available in C and Java ▪ Emits SAX-style events/Accepts SAX-style events ▪ Includes Eclipse based tooling • Graphical DFDL schema editor/test environment ▪ Subset of full DFDL features – oriented toward IBM’s customers’ needs • Packed decimal – yes. Bit order – no. Computed elements – no. ▪ IBM WebSphere Product Family • www.ibm.com/developerworks/library/se-dfdl/index.html ▪ IBM z/TPF Product Family • https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.12/gtps6/ztpfdfdlsap iupport.html Copyright 2018 Tresys Technology56
  57. 57. DFDL4S from ESA ▪ DFDL4S = DFDL for Space ▪ Binary-data-only subset of DFDL ▪ Created by European Space Agency (ESA) for satellite data descriptions ▪ In use now. Java and C++ versions. ▪ Evolving to be a fully compatible DFDL subset. • DFDL standard specifically allows for specialized subsets like this to be accepted as conforming. ▪ More info at • http://eop-cfi.esa.int/index.php/applications/dfdl4s Copyright 2018 Tresys Technology57
  58. 58. Tresys DFDL Products and Services We would like to know how to make DFDL and Daffodil solve your data challenges! Example data challenges ▪ Data Intake to analysis systems ▪ Data Publishing – e.g., USG Open Data requirements ▪ Data Security – CDS/ rip, inspect/transform, rebuild to specification Examples of services Tresys provides: ▪ Create (or help you create) & Test DFDL Schemas for • Your specific in-house data formats • Industry/Military standard data formats ▪ Commercial support for Apache Daffodil (incubating) software • Fix, Enhance Daffodil to meet your needs • Customize, add features, APIs, …. ▪ Support your team embedding Daffodil into products Copyright 2018 Tresys Technology58
  59. 59. QUESTIONS? DFDL Specification: http://ogf.org/dfdl DFDL Schemas: https://github.com/DFDLSchemas Daffodil Open Source: https://daffodil.apache.org Copyright 2018 Tresys Technology59
  60. 60. END Copyright 2018 Tresys Technology60

×