Tuning tips every PDI user using MySQL should know: increase table output speed by 7000%

I found a thread on Pentaho Forum about Table Output Performance.  It gives a really easy way to speed up your table output.. A LOT! Some people have been able to reach 84k rows/sec using the method. It is not a joke. It works.  It really does.

All you need to do is to give the MySQL JDBC driver 3 parameters:

It allows to insert many rows with a single statement instead of one row per statement. Also, useCompression reduce the traffic between the server and pentaho.

I tested it myself.

I used a 869mb CSV file (5248714 rows) from my Laptop and transferred it to an out-of-the-box MySQL installation on a cheap VPS server (1 CPU, 1gb ram) with a poor network in Europe (yep.. across the ocean). The expected performance is a very slow transfer.

I set the number of copies to start of my Table Output to 4 and commit size to 1000.

Here is the result without the options: I reached 52 rows/seconds.

NoOptimisation4TableOutput

The same test, but with the 3 connection parameters: 3674 rows/second ! Wow ! 7000% faster !

Optimisation4TableOutput

Impressive isn’t it !?

 

References: