Monday, March 25, 2013

Mapping Optimization Techniques

1. Reduce the number of transformations. There is always overhead involved in moving data between transformations.

2. Consider more shared memory for large number of transformations. Session shared memory between 12MB and 40MB should suffice.

3. Calculate once, use many times.
  • Avoid calculating or testing the same value over and over.
  • Calculate it once in an expression, and set a True/False flag.
  • Within an expression, use variable ports to calculate a value than can be used multiple times within that transformation.
  • Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier.
  • This is also helpful for maintenance. If a transformation needs to be reconnected, it is best to only have necessary ports set as input and output to reconnect.
  • In lookup transformations, change unused ports to be neither input nor output. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, which cuts down on the amount of cache necessary and thereby improves performance.
  • The engine automatically converts compatible types.
  • Sometimes data conversion is excessive. Data types are automatically converted when types are different between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping.
  • Plan for reusable transformations upfront.
  • Use variables. Use both mapping variables as well as ports that are variables. Variable ports are especially beneficial when they can be used to calculate a complex expression or perform a disconnected lookup call only once instead of multiple times
  • Use mapplets to encapsulate multiple reusable transformations.
  • Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions.
  • Reduce the number of non-essential records that are passed through the entire mapping.
  • Use active transformations that reduce the number of records as early in the mapping as possible (i.e., placing filters, aggregators as close to source      as possible).
  • Select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table for a faster join.
  • Redesign mappings to utilize one Source Qualifier to populate multiple targets. This way the server reads this source only once. If you have different Source Qualifiers for the same source (e.g., one for delete and one for update/insert), the server reads the source for each Source Qualifier.
  • Remove or reduce field-level stored procedures.
  • If you use field-level stored procedures, the PowerCenter server has to make a call to that stored procedure for every row, slowing performance.

4. Only connect what is used.

5. Watch the data types.

6. Facilitate reuse.

7. Only manipulate data that needs to be moved and transformed.

8. Utilize single-pass reads.

9. Sort the input data before passing to Joiner and Aggregate transformation.

10. In Lookup using customize query instead of default query. (Use '--' to overwrite lookup default order by clause).

11. Avoid using un-neccessary columns/port in sql query.

12. Filter un-neccessary data as closer to the source qualifier. (In case of Relational database include filter condition to the sql query).

13. In Joiner consider lesser value of data as Master Table.

14. In-case of mapping partition place aggregate transformation before the partition point.

15. Use Router instead of having multiple Filter transformations.


1 comment:

  1. good stuff technically.
    and please help us with allocation of memory taking sample data(10k/1M) records. Respective allocations of index cache, data cache in joiner/rank, aggregator, sorter transformation, or DTM/buffer block size recommendations

    ReplyDelete