How to Save a Machine Learning Model
While using the scikit learn library for machine learning, it is necessary to save and restore the models to use them again to compare with other models or test the model against new data. The process of saving data is referred to as serialization, while the process of restoring data is referred to as Deserialization. We also handle different types and sizes of data. While some datasets can be trained quickly (e.g. they take less time), but the large datasets (more than 1GB) may take a lot of time to train, even on a local computer with GPU. To avoid losing time and avoid wastage, save the trained model from being used in future projects.
Two Ways to Save a Model from scikit-learn:
1. Pickle string: The pickle module implements an efficient yet fundamental algorithm for serializing or deserializing Python object structures.
The pickle model offers the following functions:
- dump: For serializing an object hierarchy, we can use dump() function.
- load: For deserializing a data stream, we can use the loads() function.
Example: Let’s use K Nearest Neighbor to the iris dataset, then save the model.
Code:
Output:
Now, we will save the above model to string using pickle –
Code:
Output:
2. Pickled Model as File using joblib: Joblib replaces pickle because it is faster on objects with large numpy arrays. These functions only accepts file-like object instead of filename.
The pickled model as file using joblib offers the following functions:
- dump: This is used for serializing object hierarchy.
- load: This is used for deserializing a data stream.
Use joblib to save to pickled file
Example:
Output: