Friday, April 5, 2013

Rank Transformation

Rank transformation is an Active and Connected transformation.

We can select only the top or bottom rank of data with Rank transformation. Use a Rank transformation to return the largest or smallest numeric value in a port or group. We can also use a Rank transformation to return the strings at the top or the bottom of a session sort order. During the session, the Integration Service caches input data until it can perform the rank calculations.

Ranking String Values
When the Integration Service runs in the ASCII data movement mode, it sorts session data using a binary sort order.

When the Integration Service runs in Unicode data movement mode, the Integration Service uses the sort order configured for the session. We select the session sort order in the session properties. The session properties lists all available sort orders based on the code page used by the Integration Service.

Rank Caches
During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row. If you configure the Rank transformation to rank across multiple groups, the Integration Service ranks incrementally for each group it finds.

The Integration Service stores group information in an index cache and row data in a data cache. If you create multiple partitions in a pipeline, the Integration Service creates separate caches for each partition.

Rank Transformation Properties
When you create a Rank transformation, you can configure the following properties:
  • Enter a cache directory.
  • Select the top or bottom rank.
  • Select the input/output port that contains values used to determine the rank. We can select only one port to define a rank.
  • Select the number of rows falling within a rank.
  • Define groups for ranks, such as the 10 least expensive products for each manufacturer.
Ports in a Rank Transformation
The Rank transformation includes input or input/output ports connected to another transformation in the mapping. It also includes variable ports and a rank port. Use the rank port to specify the column you want to rank.

The following table describes the ports in a Rank transformation:
Ports    
Number Required
Description
I
Minimum of one
Input port. Create an input port to receive data from another transformation.
O
Minimum of one
Output port. Create an output port for each port you want to link to another transformation. You can designate input ports as output ports.
V
Not Required
Variable port. Can use to store values or calculations to use in an expression. Variable ports cannot be input or output ports. They pass data within the transformation only.
R
One only
Rank port. Use to designate the column for which you want to rank values. You can designate only one Rank port in a Rank transformation. The Rank port is an input/output port. You must link the Rank port to another transformation.

Rank Index
The Designer creates a RANKINDEX port for each Rank transformation. The Integration Service uses the Rank Index port to store the ranking position for each row in a group.

For example, if you create a Rank transformation that ranks the top five salaries, the rank index numbers the salespeople from 1 to 3:
  • The RANKINDEX is an output port only.
  • We can pass the rank index to another transformation in the mapping or directly to a target.
  • We cannot delete or edit it. 
The RANKINDEX is an output port only. You can pass the rank index to another transformation in the mapping or directly to a target.

Defining Groups
The Rank transformation lets you group information.

For example, if you want to select the 3 salaried employees of each Department, you would first define a group for each Department. When you configure the Rank transformation, you can set one of its input/output ports as a group by port. For each unique value in the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank).

Note: If two rank values match, they receive the same value in the rank index and the transformation skips the next value.

Sample Mapping:

Get the top 3 salaries employee’s details from EMP table.

1. Connect and Open the folder if not already opened.

2. Select Tools --> Mapping Designer
3. Select Mappings --> Create

It will pop-up "Mapping Name". Enter the mapping name of your choice" and Click on 'OK'. Example: m_employee_top_3_salaries
4. Drag the Source and Target definitions into workspace if they are already exist. If not click hereto know how to create or import Table definitions.
5. Select 'Transformation' from Menu --> Create
a) That will appear you 'Select the transformation type to create:'
b) Select 'Rank' from drop down and 'Enter a new name for this transformation:' as "rnk_top_3_salaries"
c) Click 'Create' and 'Done'

or

a) Click on Expression Transformation icon marked below in below snapshot.
b) Click in the workspace in Mapping Designer.
c) Select RNKTRANS in workspace and Right Click --> Edit.
d) In Transformation tab --> Click on 'Rename' highlighted above which will pop-up 'Rename Transformation'. Enter the Transformation Name: "rnk_top_3_salaries"
e) Click on 'OK'
f) Click on 'Apply' and 'OK'.
6). Drag required ports from 'SQ_EMP' Source Qualifier to 'rnk_top_3_salaries' Rank Transformation.
 
7) Select 'rnk_top_3_salaries' rank transformation and Right click -->Edit.

a) Go to Port tab:
b) Select port (SAL) to which you want to define the Ranks.
 
Note: If you want you can defined 'Group by' also based on Department to get top 3 salaries from each department wise.

c) Go to 'Properties tab' to defined the ranks: 3.
Top/Bottom: Select top or bottom ranks
Number of Ranks: Number of records in each rank
Case Sensitive String Comparison: Case sensitive comparison

d) Click on 'Apply' and Click on 'OK'.
 
8). Drag required port from 'rnk_top_3_salaries' ranker to 'EMP1' Target.
 
9) Select 'Mapping' from Menu --> Validate.
10) Select 'Repository' from Menu --> Save.
 



5 comments:

  1. You forgot to include the RANK port in the Target table.

    ReplyDelete
    Replies
    1. What you include the rank port in target table. If you want to displace the rank the it required. Here I am explaining about the transformation not a scenario

      Delete
  2. Excellent reference!

    I have a naive question, though: How does the rank transformation behaves with respect to empty strings (or more generally, with respect to NULLs)?
    When sorting with Oracle, there is a NULLS LAST/NULLS FIRST option which can be used in the query.

    Does the Rank transformation offers something similar? It seems like, by default, it systematically ignores the records having NULL in the RANK port ...

    ReplyDelete
  3. awesome post presented by you..your writing style is fabulous and keep update with your blogs Informatica Online Course Bangalore

    ReplyDelete