Skip to content

Guides

This page provides in depths tutorials for AME. If you are looking to quickly try out AME, see getting started.

From zero to live model

This guide is focused on using AME if you are looking for a deployment guide go here.

This guide will walk through going from zero to having a model served through an the V2 inference protocol. it will be split into multiple sub steps which can be consumed in isolation if you are just looking for a smaller guide on that specific step.

Almost any python project should be usable but if you want to follow along with the exact same project as the guide clone this repo.

Setup the CLI

Before we can initialise an AME project we need to install the ame CLI and connect with your AME instance.

TODO describe installation

Initialising AME in your project

The first step will be creating an ame.yaml file in the project directory.

This is easiet to do with the ame CLI by running ame project init. The CLI will ask for a project and then produce a file that looks like this:

name: sklearn_logistic_regression

The first training

Next we want to set up our model to be run by AME. The most important thing here is the Task that will train the model so lets start with that.

Here we need to consider a few things, what command is used to train a model, how are dependencies managed in our project, what python version do we need and how many resources does our model training require.

If you are using the repo for this guide, you will want a task configured as below.

name: sklearn_logistic_regression
tasks:
  - name: training
    !poetry
    executor:
      pythonVersion: 3.11
      command: python train.py
    resources:
      memory: 10G 
      cpu: 4 
      storage: 30G 
      nvidia.com/gpu: 1 

To try out our task we can run ame task run training --logs and see the task get deployed and executed.

For model specific features such validation and deployment we want to declare a model in our AME file.

We add a simple model called logreg and specify the training task. This will allow AME to automatically train new model versions when appropriate.

name: sklearn_logistic_regression
models:
  - name: logreg
    training:
      task:
        taskRef: training
tasks:
  - name: training
    !poetry
    executor:
      pythonVersion: 3.11
      command: python train.py
    resources:
      memory: 10G 
      cpu: 4 
      storage: 30G 
      nvidia.com/gpu: 1 

training.task can contain a complete task, we simply use the taskRef field to keep our file readable. We could have placed an entire task inside the model configuration directly.

Now we can run ame model train logreg --logs and perform a training. Under the hood this does essentially the same thing as just running the task directly.

Now lets look at deploying our model.

name: sklearn_logistic_regression
models:
  - name: logreg
    training:
      task:
        taskRef: training
    deployment:
      autoDeploy: true

tasks:
  - name: training
    !poetry
    executor:
      pythonVersion: 3.11
      command: python train.py
    resources:
      memory: 10G 
      cpu: 4 
      storage: 30G 
      nvidia.com/gpu: 1 

Setting auto deployment to true will tell AME to always deploy latest version of a model. Deployment in this context means spinning up an inference server. As AME needs some persitent context when managing a model deployment we have to synchronize our project to the AME instance. This could be done via a Git repo and for production use cases that is highly recommended. For experimention and educational purposes we can manually place our local version of the project in the cluster. Run ame project sync and AME will automatically train a version of the model, if needed and deploy an inference server for the model.

Validating models before deployment

To ensure that a new model versions perform well before exposing them AME supports model validation. This is done by providing AME with a Task which will succeed if the model passes validation and fail if not.

Example from ame-demo:

projectid: sklearn_logistic_regression
models:
  - name: logreg
    type: mlflow
    validationTask: # the validation task is set here.
      taskRef: mlflow_validation 
    training: 
      task:
        taskRef: training
    deployment:
      auto_train: true
      deploy: true
      enable_tls: false
tasks:
  - name: training
    projectid: sklearn_logistic_regression
    templateRef: shared-templates.logistic_reg_template
    taskType: Mlflow
  - name: mlflow_validation
    projectid: sklearn_logistic_regression
    runcommand: python validate.py

This approach allows for a lot of flexibility of how models are validated, at the cost of writing the validation your self. In the future AME will provide builtin options for common validation configurations as well, see the roadmap.

Using MLflow metrics

Here we will walk through how to validate a model based on recorded metrics in MLflow, using the ame-demo repository as an example. The model is a simple logistic regresser, the training code looks like this:

import numpy as np
from sklearn.linear_model import LogisticRegression
import mlflow
import mlflow.sklearn
import os

if __name__ == "__main__":
    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
    y = np.array([0, 0, 1, 1, 1, 0])
    lr = LogisticRegression()
    lr.fit(X, y)
    score = lr.score(X, y)
    print("Score: %s" % score)
    mlflow.log_metric("score", score)
    mlflow.sklearn.log_model(lr, "model", registered_model_name="logreg")
    print("Model saved in run %s" % mlflow.active_run().info.run_uuid)

Notice how the score is logged as a metric. We can use that in our validation.

AME exposes the necessary environment variables to running tasks so we can access the Mlflow instance during validation just by using the Mlflow library.

TODO