I found a thread on Pentaho Forum about Table Output Performance. It gives a really easy way to speed up your table output.. A LOT! Some people have been able to reach 84k rows/sec using the method. It is not a joke. It works. It really does.
All you need to do is to give the MySQL JDBC driver 3 parameters:
1 2 3 |
useServerPrepStmts=false rewriteBatchedStatements=true useCompression=true |
It allows to insert many rows with a single statement instead of one row per statement. Also, useCompression reduce the traffic between the server and pentaho.
I tested it myself.
I used a 869mb CSV file (5248714 rows) from my Laptop and transferred it to an out-of-the-box MySQL installation on a cheap VPS server (1 CPU, 1gb ram) with a poor network in Europe (yep.. across the ocean). The expected performance is a very slow transfer.
I set the number of copies to start of my Table Output to 4 and commit size to 1000.
Here is the result without the options: I reached 52 rows/seconds.
The same test, but with the 3 connection parameters: 3674 rows/second ! Wow ! 7000% faster !
Impressive isn’t it !?
References:
Nice tip! – Tuning tips every PDI user using MySQL should know: increase table output speed by 7000% http://t.co/n4Gcnq8Lig