finalize mlflow tutorial

2024-09-16 11:28:12 +00:00 · 2023-03-25 13:18:56 +01:00 · 2023-03-25 13:18:56 +01:00 · 3625337a28
commit 3625337a28
parent 5e2eb5194d
1 changed files with 10 additions and 3 deletions
--- a/community-tutorials/install-mlflow/01-en.md
+++ b/community-tutorials/install-mlflow/01-en.md
@ -250,8 +250,10 @@ $ source ai_playground/bin/activate
 ```

 We are going to utilize `scikit-learn` to train a [random forest classifiers](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) of different sizes (i.e. nunbers of trees) on the [iris dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-dataset) (a toy example for classification on flowers) to compare their accuracy to each other.
+We create an experiment called `RF Tree Nunbers`, iterate over sizes from 1 to 20 trees and log the numbers of trees and the accuracy of the random forest in a so called run (a substructure of an MLFlow experiment).
+Finally, the trained random forest will be also saved along with the parameter and accuracy, so you can download the model afterwards.

-We are going to use the following script to do our first experiement (it is well commented):
+We are going to use the following script to do our first experiment (it is well commented):

 ```python
 from sklearn.datasets import load_iris
@ -289,6 +291,10 @@ for n_est in range(1, 21): # Check forests from 1 to 20 trees
    mlflow.end_run()
 ```

+A detailed explanation on how random forests work is out of scope of this tutorial. In short words: a random forest is a bunch of decision trees.
+Each tree decides on its own, which class a sample belongs to.
+In the end there is a vote and the class voted the most will be chosen as result. More details can be read [here](https://en.m.wikipedia.org/wiki/Random_forest).
+
 I saved it under `example.py` and executed it by calling `python example.py` within my local virtual environment.

 After the script has finish its work, the webui looks a bit different:
@ -299,9 +305,10 @@ Each run of an experiment is listed there und can be explored by clicking on the
 Additionally, you can also download the trained models and do comparisons.

 # Conclusion
-SSH proxy can now be used. It is recommended to perform tests before productive use.
+You set up a central MLFlow instance to track your AI experiments for sutstainable and reproducable data science.
+Additionaly, you tracked your first machine learning experiments.

-![images/The SOCKS proxy can handle multiple ports simultaneously](community-tutorials/setup-and-use-sshproxy/images/socks.png)
+Further steps would be to secure the access via TLS and improve scalability in the long run.

 # Licence