BizTalk Server: Teach me something new about Flat Files (or not) – Delimited Flat File Schema Optimizations

Posted: October 23, 2015 in BizTalk
Tags: , , , , , , , , ,

We ended last post asking: Do you think the schema is entirely correct? Does it handles with all files accordingly to the expected results? Or do we need to tune some of the properties of your schema…

When we start working with Flat File schemas for BizTalk, we may think that just using the Flat File wizard is going to be enough. Once the schema is created, we just need to create a custom pipeline and add the schema to the Document Schemas of a Flat File disassembler and it will be good… well let me tell you that most probably is not enough and the reality may be very different.

A common mistake that we sometimes do is to rely on a single sample message to generate/validate our Flat File Schemas. And we ended up getting possibly several different errors in production due to unforeseen cases that end up not being supported in our current schema version.

So it is real important to get all the information, as possible, from the different teams, for example:

  • Are there any optional elements?
  • If so, will they be delete from the message? Or will they be present without any content (blank/null)
  • Does the message will always end with a blank line? i.e., does the last line contain a Carriage Return/Line Feed (CR+LF)
  • What are the Data type of the elements?

And many, many more….

The following walkthrough shows you some of the common scenarios that you can expect and how you can tune some of the properties of our Flat File Schemas to deal with them.

Cosmetic operations

While generating the Flat File Schema with the Flat File Schema Wizard you should always provide/set:

  • The correct or proper namespace for the schema
  • Rename the parent node, each record, element or attribute
  • And set the correct data type of each element or attribute

However if you forgot to do that or if you make a mistake defining one of them, you can always easily rectify it in the Schema Editor.

I call it cosmetic operations because they don’t infer “any changes” in the translation rules generated by the wizard, they only changed the way the elements are presented in the message, if you change the name of the element or attributes. Or they will have effect in the validation of the message, if we change the data types.

You can accomplished that by:

  • Open the schema in BizTalk Editor.
  • Change the Schema Target Namespace
    • In the BizTalk Editor dialog, right-click the <Schema> node in the tree view and click Properties. The Properties dialog appears.
    • In the Properties dialog, the Target Namespace displays under the General category for the <Schema> node.
    • In the Target Namespace field, type the name of your new target namespace and click OK.
  • Change the Field Attribute or Field Element Name
    • Select the Field Attribute or Field Element node and then right-click and select the “Rename” option
    • Type the desired Field Attribute or Field Element name and press “Enter”
  • Change the Field Attribute or Field Element Data Type
    • Select the Field Attribute or Field Element node and then right-click and select the “Properties” option
    • In Properties window, set the “Data Type” property to the desired type

01-BizTalk-Flat-File-Schema-Cosmetic-Operations

Validating Flat Files Schemas through the Visual Studio with different scenarios

Let us forget the runtime for now, and focus firstly on the development phase where we can test and validate our schemas directly from Visual Studio as explained, once again, in the previous post.

Dealing with Files ending with a newline or that doesn’t ends with a newline

Initially we had the following example:

Sandro;Pereira;1978;Crestuma;4415{CR}{LF}
Vânia;Braziela;1981;Vila Nova de Gaia;4400{CR}{LF}
José;Silva;1972;Crestuma;4415{CR}{LF}
Rui;Barbosa;1975;Lever;4415{CR}{LF}

Which was the base to generate our Flat File schema: “FF_DSYMBOL_PEOPLE.xsd”.

If you paid attention, while we were generating our Flat File Schema, that in the “Select Document Data” window of the BizTalk Flat File Schema Wizard, you can notice that the last line contains a Carriage Return/Line Feed (CR+LF), which means that if you open the file for example in Notepad, you will notice that the file ends with a blank line.

01-BizTalk-Flat-File-Schema-Wizard-Select-Document-Data-Page-CR-LF

Note: You can check the file “FF_DSYMBOL_PEOPLE.txt” that is available with the source code.

So let us try with a simple change by removing the Carriage Return/Line Feed so that the files does not end with a newline:

Sandro;Pereira;1978;Crestuma;4415{CR}{LF}
Vânia;Braziela;1981;Vila Nova de Gaia;4400{CR}{LF}
José;Silva;1972;Crestuma;4415{CR}{LF}
Rui;Barbosa;1975;Lever;4415

Check the file: “FF_DSYMBOL_PEOPLE WithoutEnterLastLine.txt

If we now try this file instance against our Flat File Schema by:

  • In Solution Explorer, right-click the schema that you want to validate, and then click Properties.
  • In Properties window for the schema, for Input Instance Filename enter the name and path for the “FF_DSYMBOL_PEOPLE
    • Check more about Schema File Properties here.
  • WithoutEnterLastLine.txt” message instance, or browse to the file, select it, and then click OK.
  • Right-click the schema that you want to validate, and then click Validate Schema

You will notice that we will get the following error:

Invoking component…
<path>\DemoFile\FF_DSYMBOL_PEOPLE WithoutEnterLastLine.txt: error BEC2004: Unexpected data found while looking for:
‘\r\n’
The current definition being parsed is Person. The stream offset where the error occured is 133. The line number where the error occured is 4. The column where the error occured is 23.
<path>\FF_DSYMBOL_PEOPLE.xsd: error BEC2004: Validate Instance failed for schema FF_DSYMBOL_PEOPLE.xsd, file: <file:///path\FF_DSYMBOL_PEOPLE WithoutEnterLastLine.txt>.
Component invocation succeeded.

02-Visual-Studio-Flat-File-Instance-Validation-Failed

If, on the other hand, we had originally generated the Flat File Schema using the message instance without the Carriage Return/Line Feed at the end.

03-BizTalk-Flat-File-Schema-Wizard-Select-Document-Data-Page-option-3

And if we know try it with an instance containing a blank line at the end of the message we would receive a similar but different error:

Invoking component…
<path>\FF_DSYMBOL_PEOPLE.txt: error BEC2004: Unexpected end of stream while looking for:
‘;’
The current definition being parsed is People. The stream offset where the error occured is 139. The line number where the error occured is 5. The column where the error occured is 0.
<path>\FF_DSYMBOL_PEOPLE_Advance.xsd: error BEC2004: Validate Instance failed for schema FF_DSYMBOL_PEOPLE_Advance.xsd, file: <file:///path\FF_DSYMBOL_PEOPLE.txt>.
Component invocation succeeded.

04-Visual-Studio-Flat-File-Instance-Validation-Failed

I’ve seen situations/solutions where there were created pipeline components to address and solve these type of problems, which isn’t the best and the proper solution!

In both cases the solution for solving these problems is the same: ensure that the “Allow Message Breakup At Infix Root” property on the Schemas node is set to “true” by:

  • Double-click the schema to open the item in the Schema Editor;
  • In BizTalk Editor, right-click in the Schema node and select the Properties option
  • In Properties window for the schema, set the “Allow Message Breakup At Infix Root” property to “true”

05-Visual-Studio-Flat-File-Instance-Validation-Failed-solved

After making this change, if we try to validate the two previous messages instances, we will notice that now everything will work perfectly.

So if you notice in the properties of our Schema:

  • The “Child Order” property of the People Root Node is set as: “Postfix”
  • And the “Child Order” property of the “Person” record is set as: “Infix”

What does it means? Child Order must always have a value. The possible values are:

  • Prefix: Specifies that the delimiter appears before the data that it is delimiting.
  • Postfix: Specifies that the delimiter appears after the data that it is delimiting.
  • Infix: Specifies that the delimiter appears between different items of data, resulting in one less delimiter than when Prefix or Postfix is chosen.
  • Default Child Order: Sets the child_order attribute to “default”, specifying that the child order specified using the Default Child Order property should be used.
  • And Conditional Default: Removes the corresponding annotation in the XSD representation of the schema, if any, resulting in the following behavior:
    • If the Record node has a tag identifier, then the child order will be prefix.
    • Otherwise, the child order will be infix.

There are two scenarios related to delimited flat files for which special considerations apply when setting the Child Order property.

  • The first such scenario concerns situations in which the flat file document has a header, a body, and optionally, a trailer.
  • The second scenario is that this property must be set according to what the runtime components expect for the nodes

In our instance message, the data/delimiter sequence in the conceptual loop would merely be one or more person records. Of course, only in the case where there are more than one person records would there be a delimiter to separate them. In that case, the number of delimiters is one less than the sets of things being delimited, and the delimiters are located between the delimited items in a structure known as Infix.

  • Number delimiter = Number Records – 1

Which is not the case in all our messages instances. Our first instance have the same number of “valid records” and delimiters.

By setting “Allow Message Breakup At Infix Root” property to “true”, this tells the parser whether to allow message breakup, and will ensure that the Infix stream is processed properly by the FF Disassembler.

Dealing with Optional Elements

Using Suppress Empty Nodes property

So let’s imagine a different scenario, this time we will deal with optional elements. In our first scenario while discussing with our time we realize that the element “Birthyear” was appearing empty in several messages, and it should be considered as an optional element:

Sandro;Pereira;1978;Crestuma;4415
Vânia;Braziela;1981;Vila Nova de Gaia;4300
José;Silva;;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

By default, we need to have a complete instance of the message, for us developers be able to create all the transformation rules, which means that by default, when we finish creating the Flat File Schema through Flat File Schema Wizard tool, none of the records, elements or attributes are optional.

All of these changes/optimizations/configurations, they have to be made after the Flat File Schema been created.

You can always modify an existing Flat File schema that was created by the wizard (name of the elements, data type and so on). So to accomplish that we need to:

  • Double-click the schema to open the item in the Schema Editor;
  • In BizTalk Editor, expand People root node, expand Person record and right-click in the Birthyear element and select the Properties option
  • In Properties window, set the “Min Occurs” property to “0”

If we now try to validate the previous message instance (“FF_DSYMBOL_PEOPLE WithOptional.txt”), we ended up receiving another error:

Invoking component…
Validation generated XML output <file:///<path>\FF_DSYMBOL_PEOPLE_Advance_output.xml>.
<path>\FF_DSYMBOL_PEOPLE_Advance_output.xml: error BEC2004: The ‘Birthyear’ element is invalid – The value ” is invalid according to its datatype ‘http://www.w3.org/2001/XMLSchema:int&#8217; – The string ” is not a valid Int32 value.
<path>\FF_DSYMBOL_PEOPLE_Advance.xsd: error BEC2004: Validate Instance failed for schema FF_DSYMBOL_PEOPLE_Advance.xsd, file: file:///path\FF_DSYMBOL_PEOPLE WithOptional.txt.
Component invocation succeeded.

06-Visual-Studio-Flat-File-Instance-Validation-Failed-Element-String-Not-valid

And of course the reason is obvious, an empty string is not a valid integer.

Making this element nillable doesn’t make any difference, because in both situations the parser will continue to add an empty element:

<Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear></Birthyear>
    <Address>Crestuma</Address>
    <ZipCode>4415</ZipCode>
  </Person>

To solve that situation there are a few additional flat file node properties that we can use. You can see the full list here:

And in our case we are looking for the “Suppress Empty Nodes” property (or suppress_empty_nodes). By setting this property to “True” it will remove all the empty nodes from the XML stream.

  • This can be used to eliminate fields that are empty after being parsed, but the XSD type doesn’t allow empty values.

To accomplish that we need to:

  • In BizTalk Editor, right-click in the Schema node and select the Properties option
  • In Properties window for the schema, set the Suppress Empty Nodes property to “true”

07-Visual-Studio-Flat-File-Schema-Suppress-Empty-Nodes

After making this change, if we try to validate the previous message instance, we will notice that now everything will work perfectly.

Note: When using Delimited files, the optional element in the beginning or in the middle of the each line (record node) need always to be present in the text with empty values (;;) otherwise the parsing rules will fail or will perform an incorrect translation of the message.

If the “Birthyear” is set as string and we receive this type of messages

Sandro;Pereira;1978;Crestuma;4415
Vânia;Braziela;1981;Vila Nova de Gaia;4300
José;Silva;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

The message will be translated incorrectly and there is nothing you can do about it, the end result would look like this:

  <Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear>Crestuma</Birthyear>
    <Address>
      4415
      Rui
    </Address>
    <ZipCode>Barbosa;1975;Lever;4415</ZipCode>
  </Person>
Dealing with Optional Elements (last element with the data type string)

Continue in the same scenario/message type, after a further series of tests, we noticed another unforeseen situation. Once again several messages were being executed incorrected because some of the element were not being parsed the element “ZipCode” – note that this element is a string

Sandro;Pereira;1978;Crestuma;4415
Vânia;Braziela;1981;Vila Nova de Gaia;4300
José;Silva;1972;Crestuma
Rui;Barbosa;1975;Lever;4415

Note: Contrary to what happens with elements at the beginning and end of the message, if the last element is an optional element, then it is not required to appear in the text (see line 3 – it has one element less)

We don’t receive any error message in BizTalk Server runtime but we notice that we messages are being integrated with incorrect data. After further analysis, let’s take the sample above, we notice that the message are being translated incorrectly and the last node is being added to the “ZipCode” element and not in a different Person node as supposed

  <Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear>1972</Birthyear>
    <Address>
      Crestuma
      Rui
    </Address>
    <ZipCode>Barbosa;1975;Lever;4415</ZipCode>
  </Person>

To solve this problem we once again need set the element, in this case “ZipCode” as optional:

  • Double-click the schema to open the item in the Schema Editor;
  • In BizTalk Editor, expand People root node, expand Person record and right-click in the ZipCode element and select the Properties option
  • In Properties window, set the “Min Occurs” property to “0”
<People xmlns="http://DelimitedBySymbols.FF_DSYMBOL_PEOPLE_Advance">
  <Person xmlns="">
    <Name>Sandro</Name>
    <Surname>Pereira</Surname>
    <Birthyear>1978</Birthyear>
    <Address>Crestuma</Address>
    <ZipCode>4415</ZipCode>
  </Person>
  <Person xmlns="">
    <Name>Vânia</Name>
    <Surname>Braziela</Surname>
    <Birthyear>1981</Birthyear>
    <Address>Vila Nova de Gaia</Address>
    <ZipCode>4300</ZipCode>
  </Person>
  <Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear>1972</Birthyear>
    <Address>Crestuma</Address>
  </Person>
  <Person xmlns="">
    <Name>Rui</Name>
    <Surname>Barbosa</Surname>
    <Birthyear>1975</Birthyear>
    <Address>Lever</Address>
    <ZipCode>4415</ZipCode>
  </Person>
</People>

There is a Flat File Schema property call “allow_early_termination” (Early Terminate Optional Fields) that is a Boolean property. The use of this property is this situation is incorrect and does produce any changes in outcome. We will address and explain this property at another time.

Some references:

Related posts:

You can download all the code used here from:

BizTalk Server: Dealing with Files Delimited by Symbols (96 KB)
Microsoft | Code Gallery

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s