Friday, April 5, 2013

XML Parser Transformations


XML Parser is an Active and Connected transformation.

XML Parser transformation is use to extract XML inside a pipeline. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline.

For example, you might want to extract XML data from a Flat File (data will be in XML Format) and pass the data to relational targets.
The XML Parser transformation reads XML data from a single input port and writes data to one or more output ports.

When the Integration Service processes an XML Parser transformation, it reads a row of XML data, parses the XML, and returns data through output groups. The XML Parser transformation returns non-XML data in pass-through ports. You can parse XML messages from sources such as JMS or IBM WebSphere MQ.

The XML Parser transformation has one input group and one or more output groups. The input group has one input port, DataInput, which accepts an XML document in a string.

When we create an XML Parser transformation, use the XML Wizard to import an XML, DTD, or XML schema file. For example, you can import the following Employee DTD file:

<!ELEMENT EMPLOYEES (EMPLOYEE+)>
<!ELEMENT EMPID CDATA #REQUIRED>
<!ELEMENT ENAME (#PCDATA)>
<!ELEMENT JOB (#PCDATA)>
<!ELEMENT MGR (#PCDATA)>
<!ELEMENT HIREDATE (#PCDATA)>
<!ELEMENT SAL #CDATA #REQUIRED>
<!ELEMENT COMM (#PCDATA)>
<!ELEMENT DEPTNO CDATA #REQUIRED>

The following figure shows the XML Parser transformation that the Designer creates if you choose to create entity relationships:

The Designer creates a root view, X_1. X_1 is the parent of X_0.

Each view in the XML Parser transformation has at least one key to establish its relationship with another view. If we do not designate the keys in the XML Editor, the Designer creates the primary and foreign keys for each view. The keys are of datatype bigint. The keys are called generated keys because the Integration Service creates the key values each time it returns a row from the XML Parser transformation.

When the Designer creates a primary or foreign key column, it assigns a column name with a prefix. In an XML definition, the prefix is XPK_ for a generated primary key column and XFK_ for a generated foreign key column. A foreign key always refers to a primary key in another group. A generated foreign key column always refers to a generated primary key column.

For example, the group X_0 has the XPK_COLUMN primary key. Each group has the foreign key column XFK_0.

The repository stores the key values. You cannot change the values in the repository, but you can choose to reset or restart the sequence numbers after a session.

XML Parser Input Validation
We can configure the XML Parser transformation to validate XML before parsing it. The XML Parser transformation validates the XML against a schema. If the XML is not valid for the schema, a row error occurs. The XML Parser transformation returns the XML and associated error messages to a separate output group. You can pass the invalid XML and error message to a target.

The session runs with a source-based commit. A message in the commit transaction has an invalid XML payload. To prevent the commit from failing, you can configure the XML Parser transformation to return the invalid XML to a separate output group from the valid data. The XML Parser transformation processes the valid XML messages and completes the transaction.

The session log contains a message that indicates when Route Invalid Payload Through Data Flow is enabled. When you set the session tracing level to Normal, the Integration Service writes a message to the session log that indicates whether the validation is successful. The log message contains the location of the schema the XML Parser accessed to validate the XML. When XML streaming is enabled and the XML is invalid, the Integration Service truncates the XML and passes it to the Invalid_Payload port. The Integration Service logs the invalid XML in the session log.

To configure the XML Parser transformation to validate the XML, enable the Route Invalid Payload Through Data Flow option on the Midstream XML Parser tab. The Designer adds the following ports to the XML Parser transformation: 
  •  Invalid_Payload. Returns invalid XML messages to the pipeline. If the XML payload is valid, the Invalid_Payload port contains a null value. This port has the same precision as the DataInput port.
  • Error_Status. Contains the error string or status returned from the XML validation. If the XML is valid for the current row, Error_Status contains a null value. This port has the same precision as the DataInput port.
The following mapping shows an XML Parser transformation that routes invalid XML messages to an Errors target table:
a) XML Parser transformation. Receives the XML message data in the DataInput port. When the XML is valid, the XML Parser transformation returns the employee (X_0) data and passes it to a target. When the XML is not valid, the XML Parser transformation returns the XML in the Invalid_Payload port. It returns an error message in the Error_Status port.
b) XML_Errors target definition. Receives invalid XML and error messages.

Sample Mapping:

1. Connect and Open the folder if not already opened.
2. Select Tools --> Mapping Designer
3. Select Mappings --> Create

It will pop-up "Mapping Name". Enter the mapping name of your choice" and Click on 'OK'. Example: m_employee_parsar
4. Drag the Source and Target definitions into workspace if they are already exist. If not click here to know how to create or import Table definitions.
Note:
a) In General Source definition will have only one port/column, datatype string with length max 64000.
b) Here we have 2 target file, EMPLOYEE target for output records and XML_Parser_Error for Parser Error records.

5. Select 'Transformation' from Menu --> Create
a) That will appear you 'Select the transformation type to create:'
 
b) Select 'XML Parser' from drop down and 'Enter a new name for this transformation:' as "EMP_PARSER"
c) Click 'Create' then it will pop-up 'Import XML Definition'
 
d) Give any rootname in Optional XML Rootname: 'EMPLOYEE' as highlighted in red above
e) Select 'Non-XML Source' as highlighted in red above and Select Scott.EMP and click on >> as highlighted in Green
 
f) Select Scott.EMP from right side window 'Selected Source' and click on 'Open'
 
g) Click on No
h) XML Wizard (step 1) will pop-up then click 'Next'
i) Select 'Entity Relationships', Click 'Finish' and Click 'Done'


or

a) Click on XML Generator Transformation icon marked below in below snapshot.
b) Click in the workspace in Mapping Designer then it will pop-up 'Import XML Definition'
c) Give any rootname in Optional XML Rootname: 'EMPLOYEE' as highlighted in red above
d) Select 'Non-XML Source' as highlighted in red above and Select Scott.EMP and click on >> as highlighted in Green
e) Select Scott.EMP from right side window 'Selected Source' and click on 'Open'
 
f) Click on No
g) XML Wizard (step 1) will pop-up then click 'Next'
h) Select 'Entity Relationships' and Click 'Finish'
i) Select ' EMP' XML Parser transformation in workspace and Right Click --> Edit.
j) In Transformation tab --> Click on 'Rename' highlighted above which will pop-up 'Rename Transformation'. Enter the Transformation Name: "EMP_PARSER"
k) Click on 'OK'
l) Click on 'Apply' and 'OK'.
6) Select 'EMP_PARSER' xml parser transformation and click on edit --> Midstream XML Parser tab. From Properties option select 'Restart and Route invalid payload through data flow' options as below.
 Click on 'Apply' and 'OK'
7) Drag 'Field1' from ' SQ_x_employee' Source Qualifier to ' EMP_PARSER' XML Parser Transformation.
8) Drag required port from 1st group of from XML Parser transformation to Target Employee (file) and 'ErrorOutputGroup' to XML_Parser_Error target.
 
9) Select 'Mapping' from Menu --> Validate.
10) Select 'Repository' from Menu --> Save.

11 comments:

  1. very good thank you.

    ReplyDelete
  2. Hi,
    May i knw how to add new field(input and output fields apart from default) to xml parser trans.?
    my requirment is xml data is coming from Oracle table. here there is one field xml_text having xml content and app_id having application id related to xml content. now i want to validate the xml content trhu parsert trans then if it is invalid i want to separate those xml content along with app_id value.
    in the normal procedure i can track only xml content but how could i also link app_id field from xml parser to next trans/target.

    Much appriciate on your quick help.

    ReplyDelete
    Replies
    1. Did you get ur answer already?

      Delete
    2. Hi, I also have the same issue Like Abhi. Can someone suggest the soluntion.??

      Delete
  3. Can some one help me in get the solution for abhi's Query

    ReplyDelete
  4. Hey Swaroop,

    Can you explain you requirement clearly with some example, I can help you.

    Thanks,
    Gowtham

    ReplyDelete
  5. Very very clear explanation Gowtham

    ReplyDelete
  6. Thank you for a clear explanation

    ReplyDelete
  7. I have the .xml file as source then sam ecan be done through XML Source qaualifier right?

    ReplyDelete
  8. Thank you.Well it was nice post and very helpful information on Informatica Online Training Hyderbad

    ReplyDelete