AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With MuleSoft
1. MuleSoft Surat Meetup Group
Flat File Schemas Transformation
With MuleSoft
Date – 21ST Dec 2021
Time – 21:00 IST (GMT+05:30)
2. Safe Harbour Statement
● Both the speaker and the host are organizing this meet-up in individual capacity only. We are
not representing our companies here.
● This presentation is strictly for learning purposes only. Organizer/Presenter do not hold any
responsibility that same solution will work for your business requirements.
● This presentation is not meant for any promotional activities.
2
3. A recording of this meetup will be uploaded to events page within 24 hours.
Questions can be submitted/asked at any time in the Chat/Questions & Answers Tab.
Make it more Interactive!!!
Give us feedback! Rate this meetup session by filling feedback form at the end of the day.
We Love Feedbacks!!! Its Bread & Butter for Meetup.
Housekeeping
3
5. Speakers
5
Jitendra Bafna
Senior Solution Architect III
EPAM Systems
➢ Overall 14 years of experience in API and Integration Technologies.
➢ TOGAF 9.2 Certified.
➢ MuleSoft Ambassador and Surat/Nashik MuleSoft Meetup Leader
➢ Published overall 300+ YouTube Videos and 150+ Articles on MuleSoft
and Anypoint Platform.
➢ Expertise in setting up MuleSoft platform includes Hybrid
Implementation, CloudHub (Anypoint VPC, VPN and DLB), Customer
Hosted Mule Runtime (Clustering and Server Group).
➢ Expertise in Application Integration using API Led Connectivity and
Event Driven Architecture.
➢ Expertise in Integration with various systems like Salesforce, NetSuite
ERP, Snowflake, Databases, SAP,
➢ Define Integration and migration strategy and roadmap includes
migrating from on premise to CloudHub, migrating to higher version of
Mule Runtime etc.
➢ Expertise in AWS, OCI.
7. CSV Format
CSV stands for Comma-Separated values.
MIME type: application/csv
The DataWeave reader for CSV input supports the following parsing strategies:
• Indexed
• In-Memory
• Streaming
By Default, CSV reader stores input data from an entire file In-Memory if file is 1.5
MB or less. If the file is larger than 1.5 MB, the process writes the data to disk.
For very large files, you can improve the performance of the reader by setting a
streaming property to true.
8. CSV Reader
Properties
Properties Description
Data
Type
Default
Value
bodyStartLineNumber Line Number on which body starts. Number 0
escape
Character to use for escaping special characters, such as
separators or quotes. String
header Indicates whether a CSV header is present. BooleanTRUE
headerLineNumber Line Number on which CSV header located. Number 0
ignoreEmptyLine Indicates whether to ignore empty line. BooleanTRUE
quote Character to used for quotes String "
separator Character for separating one field from another filed String ,
9. CSV Writer
Properties
Properties Description Data Type
Default
Value
bodyStartLineNumber Line number on which the body starts. Number 0
bufferSize Size of the buffer writer. Number 8192
deferred
Generates the output as a data stream when set to
true, and defers the script’s execution until
consumed. Boolean FALSE
encoding
The character set to use for the output, such as
UTF-8. String null
escape
Character to use for escaping special characters,
such as separators or quotes. String
header Indicates whether a CSV header is present. Boolean TRUE
headerLineNumber Line number on which the CSV header is located. Number 0
ignoreEmptyLine Indicates whether to ignore an empty line. Boolean TRUE
lineSeparator
Line separator to use when writing CSV, for
example, "rn". By default, DataWeave uses the
system line separator. String New Line
quote Character to use for quotes. String "
quoteHeader Quotes header values when set to true. String FALSE
quoteValues
Quotes every value when set to true, including
values that contain special characters. String TRUE
separator
Character that separates one field from another
field. String ,
10. DataWeave
Readers
Read Strategy Description Supported Formats
In-Memory This strategy parses the entire document
and loads it into memory, enabling random
access to data.
DataWeave can read all
supported formats using
this strategy.
Indexed This strategy parses the entire document
and uses disk space to avoid out-of-memory
issues on large files, enabling random
access to data. When using this strategy, a
DataWeave script can access any part of the
resulting value at any time.
When processing a String with a size larger
than 1.5 MB, DataWeave automatically splits
the value in chunks to avoid out-of-memory
issues. This feature works only with JSON
and XML input data.
CSV
JSON
XML
Streaming This strategy partitions the input document
into smaller items and accesses the data
sequentially, storing the current item in
memory. A DataWeave selector can access
only the portion of the file that is getting read.
CSV
JSON
Excel (XLSX)
XML
11. Streaming in
DataWeave
DataWeave supports end-to-end streaming through a flow in a Mule application.
Streaming speeds the processing of large documents without overloading
memory.
DataWeave processes streamed data as its bytes arrive instead of scanning the
entire document to index it. When in deferred mode, DataWeave can also pass
streamed output data directly to a message processor without saving it to the
disk. This behavior enables DataWeave and Mule to process data faster and
consume fewer resources than the default processes for reading and writing data.
Streaming is not enabled by default. You can use two configuration properties to
stream data in a supported data format:
• streaming property, for reading source data as a stream
• deferred writer property, for passing an output stream directly to the next
message processor in a flow
12. Demo 1: Represent CSV Data
Demo 2: Stream CSV Data
Demo 3: Steaming with Deferred
Mode
13. Flat File Definition
DataWeave uses a YAML format called FFD (for Flat File Definition) to represent
flat file schemas. The FFD format is flexible enough to support a range of use
cases, but is based around the concepts of elements, composites, segments,
groups, and structures.
Schemas must be written in Flat File Schema Language, and by convention use a
.ffd extension.
14. Components in
Schema
• Element - An element is a basic data item, which has an associated type and
fixed width, along with formatting options for how the data value is read and
written.
• Composite - (Optional) A group of elements. It can also include other child
composites.
• Segment - A line of data, or record, made up of any number of elements
and/or composites that might be repeated.
• Group - (Optional) Several segments grouped together. It can also include
other child groups.
• Structure - A hierarchical organization of segments, which requires that the
segments have unique identifier codes as part of their data.
15. Single Segment
If you are only working with one type of record, you only need to have a segment
definition for that record type in your FFD.
17. Multiple Segment
If you are working with multiple types of records in the same transformation, you
need to use a structure definition that controls how these different records are
combined.