Pentaho Kettle 5 comes with improved database repository performance

Two weeks ago I wrote about Kettle 4.4 and its database repository and how working with it is truly no fun due to excessive latency connected with loading and saving of jobs and transformations. Logging the queries sent to a MySQL database made it obvious that the reason is a tornado of thousands of database commands and replies sent back and forth.

Did Kettle 5 learn something new?

Now since last week Kettle 5.0.1 (the community edition of PDI 5) is finally released. So, I gave it the benefit of a doubt and lo and behold the number of database commands has been reduced by give or take 30%. Way less transactions and more simple statements instead. So I guess it is save to say that the speed improvement in a network- / database-setup comparable to which I was suffering from can be expected to be at least 50% (which might still be too slow on the other hand).

File based repo for easier deployment and versioning

Note the intentional past tense in last sentence – to avoid the brain-melting slowness of Kettle 4.4.’s database repo and ease versioning and deployment we (my boss and me) decided to go with a file based repository instead. Given the two additional reasons I would even now recommend to choose file based repo. Git does a great job.


The logs you can download here.


Honestly, I wouldn’t bother using the database repo anyway but it is great to see that the team behind Kettle and other Pentaho products is taking an effort to continuously improve the software also where it is not immediately obvious.