I recently discovered a great and powerful feature inside Pentaho Kettle. It’s the Mapping step. It allows you to call a transformation inside another transformation.
It is not intended to call a transformation the same way you do with a Job. The way I see it is like a way to create a new Step. If you have a developer background, I could refer to a function. If you have a sequence of steps you repeat often, create a mapping with that sequence! It’s a really nice shortcut.
Let’s take a closer look at how it works.
Suppose you have a transformation to retrieve the total and free ram of the server running the transformation, calculate the quantity of used ram (total – free = used) and write it to log. The transformation should look like this:
To create the mapping, you have to create a new transformation with 2 specific steps: the Mapping Input Specification and the Mapping Output Specification. These steps allow the parent transformation to pass values to the sub-transformation (the mapping) and get the results as output fields. In the example, I will create a mapping that gets the total and free ram as input, calculate the used memory and convert it to MB and send it back to the parent transformation. The transformation should look like this:
As you can see, the exact same 2 steps from the first transformation have been duplicated in the sub-transformation. The Mapping Input specifies the 2 fields that will receive the values from the parent transformation. The name of the fields don’t have to be the same from parent transformation.
In the first transformation, the 2 steps can be replaced by the mapping step.
In the Input Tab of the step, the fields from parent transformation have to be mapped to the fields declared in the sub-transformation.
There we go ! It’s that easy !
You can download the transformation used in the screenshot
Don’t try to fool the transformation execution.
All the rules regarding the execution sequence still apply!
- It is not possible to set a variable in the sub-transformation and get it in the parent transformation.
- All the input steps start at the same time, regardless if the input is in the parent or sub transformation
Since I discovered the mapping, i’m using it a lot to make my etl processes way more modular.