Support for ONNX models

After a long period of development and testing, the QsarDB repository has added support for the Open Neural Network Exchange (ONNX) format. Please note that the preferred and core model representation format for QsarDB is still Predictive Model Markup Language (PMML). PMML works well for representing QSAR/QSPR models based on classical machine learning. Also, it is highly transparent because anyone can open PMML files in a text editor and inspect the equations, involved descriptors, parameters, and data transformation steps. However, for more complex machine learning and artificial intelligence architectures, one format cannot represent every use case. Therefore, we added ONNX support for cases where PMML reaches its structural limits.

onnx

So, when should you use ONNX? We recommend using it for deep learning neural network architectures trained with PyTorch, TensorFlow, or Keras. Another case is complex ensemble models, such as random forests or gradient-boosted trees. While PMML often works well for such models, very complex models may produce PMML files that require hundreds of megabytes or even gigabytes of storage. The same applies to PMML files for k-nearest neighbour models based on large data sets. For such edge cases, ONNX is the format of choice.

Both the PMML and ONNX formats provide a solid foundation for long-term model storage. Over time, software libraries evolve and change their implementations, making it increasingly difficult to reuse older models when they are saved in "raw" formats tied to specific software versions. For example, when a model is saved as a Python pickle file, it may be challenging, after many years, to recreate the proper environment to run it. PMML and ONNX are better alternatives because they abstract the model away from the library code. The prediction will only need its runtime to execute, and there is a much higher chance that the model uploaded today will execute identically years from now.

Examples of archives containing ONNX representations of models: https://doi.org/10.15152/QDB.264

Previous Post Next Post