finalize mlflow tutorial

This commit is contained in:
Amjad Saadeh 2023-03-25 13:18:56 +01:00
parent 5e2eb5194d
commit 3625337a28

View file

@ -250,8 +250,10 @@ $ source ai_playground/bin/activate
```
We are going to utilize `scikit-learn` to train a [random forest classifiers](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) of different sizes (i.e. nunbers of trees) on the [iris dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-dataset) (a toy example for classification on flowers) to compare their accuracy to each other.
We create an experiment called `RF Tree Nunbers`, iterate over sizes from 1 to 20 trees and log the numbers of trees and the accuracy of the random forest in a so called run (a substructure of an MLFlow experiment).
Finally, the trained random forest will be also saved along with the parameter and accuracy, so you can download the model afterwards.
We are going to use the following script to do our first experiement (it is well commented):
We are going to use the following script to do our first experiment (it is well commented):
```python
from sklearn.datasets import load_iris
@ -289,6 +291,10 @@ for n_est in range(1, 21): # Check forests from 1 to 20 trees
mlflow.end_run()
```
A detailed explanation on how random forests work is out of scope of this tutorial. In short words: a random forest is a bunch of decision trees.
Each tree decides on its own, which class a sample belongs to.
In the end there is a vote and the class voted the most will be chosen as result. More details can be read [here](https://en.m.wikipedia.org/wiki/Random_forest).
I saved it under `example.py` and executed it by calling `python example.py` within my local virtual environment.
After the script has finish its work, the webui looks a bit different:
@ -299,9 +305,10 @@ Each run of an experiment is listed there und can be explored by clicking on the
Additionally, you can also download the trained models and do comparisons.
# Conclusion
SSH proxy can now be used. It is recommended to perform tests before productive use.
You set up a central MLFlow instance to track your AI experiments for sutstainable and reproducable data science.
Additionaly, you tracked your first machine learning experiments.
![images/The SOCKS proxy can handle multiple ports simultaneously](community-tutorials/setup-and-use-sshproxy/images/socks.png)
Further steps would be to secure the access via TLS and improve scalability in the long run.
# Licence