Introduction
This page will introduce you to the core concepts in AME.
Core concepts
AME provides a number of simple building blocks which can be used in isolation for simple workflows or combined for more complex requirements.
TODO: reformulate this
Tasks
Note: that yaml configuraion files are used here as a way to easily show different configurations. the CLI and dashboard will help du generate, edit and valdidate these files so you don't have write mountains of error prone YAML by hand :).
A 'Task' is the fundamental unit of work in AME. It can be as simple as running a single command python train.py
:
or more complex such as orchestrating a pipline or DAG with many sub tasks:
project: logreg
tasks:
- name: train
pipeline:
- name: prepare-data
executor:
!poetry
command: python prepare_data.py
- name: train
executor:
!poetry
command: python train.py
resources:
cpu: 4
nvidia.com/gpu: 2
memory: 20Gi
storage: 100Gi
- name: save
executor:
!poetry
command: python save_model.py
Task's also have a notion of dependencies where a Task
can depend on a Dataset
. Indicating dependencies
too AME allows for more efficient scheduling of Tasks and caching to avoid repeating the same work. Task's are designed to be able to execute most python projects out of the box.
If you have start building custom docker images in day to day usage, that is considered a failure on our part and please submit an issue.
We can't cover every possible case therefore if any of the defaults are unssuitable there are escape hatches to allow for things such as custom container images, custom setup commands and patching the underlying K8S resources. This should be a last resort, if you find yourself doing this feel free to submit an issue and we probably expand AME to cover your usecase properly :)
Projects
Projects a specific directory and often repostiory. It provides the context for which Tasks are executed within. The AME file ame.yaml
servers as a declaractive way of defining any configuration
related to a specific project. This includes Tasks, TaskTemplates, DataSets, Models and project wide defaults.
Project Source
A project source Tells AME where to look for projects. Currently only Git repositories are supported as projet sources.
A typical project source object looks like this:
AME will then watch every branch on this repo for project files and pull them into the AME cluster.
DataSets
A DataSet
is essentially a Task
with extra semantics. Where artifacts generated by the underlying Task
are treated as a dataset to be consumed by Tasks
. Currently this doesn't add muc. In the future this will allow AME to perform better scheduling. The main advantage right now is that a Task
can depend on a DataSet
and once a DataSet
is cached the work will not be
repeated. Therefore many Tasks
can depend on the same DataSet an the dataset will only be generated once.
Example dataset
# ame.yaml
...
dataSets:
- name: mnist
path: ./data # Specifies where the tasks stores data.
# task:
taskRef: fetch_mnist # References a task which produces data.
Models
A Model
defines how to train, validate and deploy a machine learning model. All you have to do is tell AME how to train and validate your model in the form of
Tasks
and then then lifecycle of a model can be automated. AME currently does not have it's own model registry but instead integrates with an mlflow.