Visual Classifier
Train an image classification model using a pipeline template
Input: Images and videos
Output: Concepts
A visual classifier is a deep fine-tuned model that categorizes images and video frames into a set of predefined concepts. It answers the question "What is in this image?" or "Who is in this image?"
For example, it can be used to categorize images into concepts such as "cat", "dog", or "vehicle".
A pipeline template is a pre-configured workflow that defines how a model is trained, evaluated, and deployed.
It is built on top of Clarifai Pipelines, which are the underlying system that orchestrates a sequence of steps (nodes) such as data processing, training, and evaluation. The template simply provides a ready-made, opinionated setup of these pipelines for a specific use case.
Instead of building everything from scratch, a pipeline template gives you a ready-made structure with:
- Predefined steps (e.g., data loading, preprocessing, training, evaluation)
- Default configurations (such as model architecture and training logic)
- Tunable parameters (hyperparameters you can adjust to fit your use case)
In practical terms, it acts as a blueprint for your training process. For example, when you select the classifier-pipeline-resnet template, you're choosing:
- A pipeline designed for image classification
- A ResNet-based model architecture
- A sequence of steps already wired together using Clarifai Pipelines to train on labeled image data
You may choose a visual classifier model type in cases where:
- Accuracy takes priority — you need a carefully targeted solution rather than a fast, general-purpose one.
- Your data is unique — existing Clarifai models don't recognize the features in your dataset, and you need to deep fine-tune a custom model integrated into your workflows.
- You have the right ingredients — a custom dataset, accurate labels, and the time and expertise to fine-tune.
Visual classifiers are optimized for classification tasks. If you need to locate where objects appear in an image, consider a Visual Detector instead.
Via the UI
Let’s walk through how to create and train a visual classifier model using the UI.
Step 1: Create an App
Create an application to store and manage your model and its associated resources (such as datasets, pipelines, and deployments). You can follow this guide to set one up.
Note: When creating the application, select the default Image/Video option as the primary input type.
Step 2: Prepare Training Data
Preparing your data is a critical step in training a model. High-quality, well-structured data helps your model learn effectively, generalize to new inputs, and produce reliable predictions.
Make sure your dataset is:
- Clean and accurate — free from labeling errors
- Diverse — covers different variations of your target classes
- Sufficient in size — enough examples for the model to learn meaningful patterns
Tip: You can organize your dataset using any spreadsheet tool. Download a CSV template to get started.
For this example, we’ll use the Beans Dataset from Hugging Face, which contains images of healthy and diseased bean leaves.
Based on the selected dataset, we will train a model to classify leaf images into three categories: Angular Leaf Spot, Bean Rust, or Healthy.
Step 3: Add and Annotate Inputs
To add inputs to your app, open the collapsible left sidebar and select the Inputs option.
Click the Upload Inputs button in the upper-right corner, then use the uploader pop-up to select and upload your data.
As you upload, assign each input category to a dataset and label them with their appropriate concepts. Ensure that all labeled inputs are added to the same dataset.

Once done, click the Upload Inputs button to add the annotated images to your app.
Note: For this tutorial, upload the three image categories to the same dataset, labeling each with its corresponding concept:
Angular Leaf SpotBean RustHealthy
After uploading all the inputs, refresh your dataset and create a new version to reflect the changes.
Step 4: Create a Cluster and Nodepool
To run and train your model, you’ll need to set up a cluster and nodepool with the appropriate compute resources.
Start by creating a cluster that supports GPU-enabled workloads, as GPUs are required for efficient training and inference of vision models.
Next, create a nodepool within the cluster and select a GPU-backed instance that matches your performance and budget needs.
Note: GPU support is essential for this tutorial. Ensure that the selected nodepool is configured with a compatible GPU instance to avoid performance issues or failed training runs.
Step 5: Choose a Training Template
Select the Models option in your app’s collapsible left sidebar. On the ensuing page for listing models, click the Add a Model button.

In the window that pops up, select the Train a Model option.

You’ll be redirected to a page listing available pipeline training templates. These templates provide pre-configured workflows to help you quickly get started with different types of models.

Select the classifier-pipeline-resnet template. This is a ResNet-based image classification pipeline designed for training models on labeled image datasets.
Step 6: Configure Training Settings
The ensuing page allows you to review the model training configuration and begin the training process.
Select Training Template
The training template you selected previously will be displayed for you. Otherwise, you can click the Change button to change to another training pipeline.

Select Nodepool Instance
Choose the nodepool that will be used to train your model.
Select the Choose an instance option to open a selection window, where you can pick from existing or recommended nodepools based on your training requirements.
Choose your preferred nodepool, then click Save Changes to apply your selection.

The selected nodepool will be displayed for you.

Learn more about selecting a nodepool instance here.
Set Training Settings
Configure the training settings for your model:

- Model ID — Set a unique ID for the model that will be created after it is trained.
- Dataset — Select the dataset from which inputs will be used for this pipeline. For this tutorial, let's select the dataset we previously created containing the bean leaf images.
- Dataset Version — Select which version of the dataset to use for training. You must select a dataset first before this option becomes available.
- Training Concepts — Select the list of concepts you want the model to predict from the existing concepts labeled with your inputs. For this tutorial, let's pick these concepts:
Angular Leaf Spot,Bean Rust, andHealthy. - Training Epochs — Set how many times the model will see the entire dataset. More epochs can lead to better accuracy but take longer. The default value is
25.
Configure Template
Each training template includes a set of configurable hyperparameters that control how the model is trained.
You can adjust these settings based on your dataset and performance goals. However, for this tutorial, we’ll use the default values provided by the classifier-pipeline-resnet template.

These are the settings you can configure:
- Batch Size — Number of samples processed per training step. Default:
64. - Image Size — Size (in pixels) to which input images are resized (square). Default:
224. - Per Item Lrate — Learning rate applied per training sample. Default:
0.00001953125. - Weight Decay — Regularization factor to prevent overfitting. Default:
0.01. - Per Item Min Lrate — Minimum learning rate per sample during training. Default:
1.5625e-8. - Warmup Iters — Number of initial iterations used to gradually increase the learning rate. Default:
5. - Warmup Ratio — Starting ratio of the learning rate during warmup. Default:
0.0001. - Flip Probability — Chance of randomly flipping images during training (data augmentation). Default:
0.5. - Flip Direction — Direction used when flipping images. Default:
horizontal. - Concepts Mutually Exclusive — Whether each input can belong to only one concept (mutually exclusive) or multiple concepts at the same time. Default:
disabled. - Pretrained Weights — Source of initial model weights for transfer learning. Default:
ImageNet-1k. - Seed — Random seed used to initialize training (set
-1for random behavior). Default:-1.
Step 7: Train the Model
After configuring the training settings, click the Train Model button to start training your model using the selected pipeline.
You’ll be redirected to the Pipeline Version Runs page, where you can monitor the training job in real time and track how the pipeline executes.

On this page, you can:
- Monitor run status — Track the current state of the pipeline:
RUNNING: The training job is in progress. While the job is running, you can pause or stop it.COMPLETED: The training finished successfullyFAILED: The training did not complete successfully (check logs for details)
- View run details — See key information such as the start time and total run duration.
- Inspect infrastructure — View where the job is running, including the cloud provider, region, compute instance type, and allocated resources.
- Follow pipeline execution — The training runs as an Argo Workflow, which breaks the process into steps. You can track the step-by-step execution of the pipeline in real time.
- Explore logs and nodes — The logs panel displays detailed, JSON-like output, including a list of nodes (pipeline steps such as data loading, training, and evaluation). Each node includes metadata like its ID, type (e.g.,
Steps,Pod), and current status. - Reload logs — Click the Reload button to refresh and view the latest logs.
- Run a new job — Click Run Pipeline Version to launch another training run. You’ll be prompted to select a cluster and nodepool before starting.
Step 8: Use the Model
Once your model has been trained successfully, you can start using it for predictions.
To access it, go to the Models section from the left sidebar and select your model from the list. This opens the models listing page.

Click the listed model to open its individual page.

Next, click the Deploy Model button to create a deployment. This sets up the compute resources needed to run inference.
After deployment, click the Try Model button in the upper-right corner to open the Playground, where you can submit inputs and get predictions.

For this tutorial, uploading an image of a bean leaf will return classifications such as Angular Leaf Spot, Bean Rust, or Healthy, along with their prediction probabilities.
That’s it!
Via the CLI
Quick Start
The classifier-pipeline-resnet-quick-start template lets you train a test visual classification model with minimal setup. It uses a ResNet-based image classifier pre-configured with a public dataset, so you can run an end-to-end training pipeline immediately — no data preparation required.
Step 1: Perform Prerequisites
Before getting started, make sure you’ve completed the following setup:
- Install the Clarifai package:
- CLI
pip install --upgrade clarifai
- Authenticate your connection by setting your Personal Access Token (PAT):
- CLI
clarifai login
- Select an instance type for running your pipeline — such as
g6e.xlarge.
Step 2: Initialize a Pipeline from a Template
Initialize a new pipeline using the quick-start template, then navigate into the generated directory:
- CLI
clarifai pipeline init --template=classifier-pipeline-resnet-quick-start
- CLI
cd classifier-pipeline-resnet-quick-start
Step 3: Upload and Run the Pipeline
Upload the pipeline configuration and execute the training job:
- CLI
clarifai pipeline upload
Note: This will automatically create an app called
pipeline-appand upload the pipeline to it.
- CLI
clarifai pipeline run --instance=g6e.xlarge
Step 4: Monitor Your Pipeline
Once the pipeline runs, it automatically loads the dataset, trains a ResNet-based image classifier, and produces a test model ready for use.
To access your pipeline, open your app's sidebar and select Pipelines; to view your trained model, select Models.
Let’s walk through how to use the Clarifai CLI to build and train a visual classification model using your own custom dataset
Step 1: Install Clarifai and Authenticate
Start by installing the latest version of the clarifai Python package. This also includes the Clarifai CLI, which we’ll use to run and manage the training pipeline.
- CLI
pip install --upgrade clarifai
Then, authenticate your connection to Clarifai:
- CLI
clarifai login
The CLI will prompt you for your Personal Access Token (PAT). It will auto-detect your user ID and save everything locally.
Note: You can obtain a PAT by opening Settings in the platform’s collapsible left sidebar, selecting Secrets, and then creating a new token or copying an existing one.
Step 2: Create an App
Create an app to store and manage your model and its associated resources (such as datasets, pipelines, and deployments).
- CLI
clarifai app create your-app-id
Example Output
[INFO] 14:40:29.196099 App with ID 'my-vis-classifier' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-12.4.0-1b1aa61784ee447e80653686a5da546f"
| thread=8485281984
App 'my-vis-classifier' created successfully.
Step 3: Prepare Training Data
As mentioned previously, high-quality, well-structured data is critical for training an accurate and reliable model.
For this example, we’ll use a public dataset of food images available here. This dataset contains labeled images across multiple food categories, making it ideal for a classification task.
Using this dataset, we’ll train a model to classify images into the following categories: beignets, hamburger, prime_rib, and ramen.
You can clone the repository containing the dataset, then use the Clarifai Python SDK to upload the dataset to your app.
- Python SDK
import os
from clarifai.datasets.upload.utils import load_module_dataloader
from clarifai.client.app import App # make sure app is imported if not already
# Construct the path to the dataset folder
module_path = os.path.join(os.getcwd().split('/models/model_train')[0], 'examples/datasets/upload/image_classification/food-101' )
# Load the dataloader module using the provided function from your module
food101_dataloader = load_module_dataloader(module_path)
# Initialize Clarifai App
app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE")
# Create a Clarifai dataset with the specified dataset_id
dataset = app.create_dataset(dataset_id="image_dataset")
# Upload the dataset using the provided dataloader and get the upload status
dataset.upload_dataset(dataloader=food101_dataloader, get_upload_status=True)
Example Output
[INFO] 15:47:57.287614
Dataset created
code: SUCCESS
description: "Ok"
req_id: "sdk-python-12.4.0-c8985fc62a4840399cbe3b5af908ed8a"
| thread=8485281984
[INFO] 15:47:57.294167 Getting dataset upload status... | thread=8485281984
[INFO] 15:48:01.211582
Dataset Version created
code: SUCCESS
description: "Ok"
req_id: "sdk-python-12.4.0-fd7e961f305e49b7939065578f3f127b"
| thread=8485281984
Uploading Dataset: 100%|██████████████████████████████████████████████████████| 1/1 [03:16<00:00, 196.83s/it]
[INFO] 15:51:18.373783 Getting dataset upload status... | thread=8485281984
[INFO] 15:51:18.633118
Dataset Version created
code: SUCCESS
description: "Ok"
req_id: "sdk-python-12.4.0-ff64897b59814a0aaf6e578fb06b7e8a"
| thread=8485281984
[INFO] 15:51:18.923250 Crunching the dataset metrics. Please wait... | thread=8485281984
╭─────────────────────────────────────────────────── Dataset Upload Summary ────────────────────────────────────────────────────╮
│ Inputs Progress: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% │
│ Annotations Progress: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% │
│ ┏━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ dataset_id ┃ user_id ┃ app_id ┃ │
│ ┡━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │
│ │ image_dataset │ alfrick │ my-vis-classifier │ │
│ └───────────────┴─────────┴───────────────────┘ │
╰────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────── Dataset Metrics Comparison ──────────────────────────────────────────────────╮
│ Local Dataset Uploaded Dataset │
│ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Inputs Count ┃ Annotations Count ┃┃ Inputs Count ┃ Annotations Count ┃ │
│ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━┩ │
│ │ 20 │ {'concepts': 20, 'bboxes': 0, 'polygons': 0} ││ 20 │ {'concepts': 20, 'bboxes': 0, 'polygons': 0} │ │
│ └──────────────┴──────────────────────────────────────────────┘└──────────────┴──────────────────────────────────────────────┘ │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Note: Once your dataset is successfully uploaded, navigate to the platform UI and record the
dataset_idanddataset_version_id. You’ll need these values when running the training pipeline.
Step 4: Set Up Compute
You can run your pipeline using either on-demand instance compute or a managed cluster and nodepool.
Option A: Select an Instance Type
You can run your pipeline directly on on-demand compute by specifying an instance with the --instance flag (see example below). This removes the need to create and manage a cluster and nodepool.
With this approach, compute is automatically provisioned—or reused if available — so you can focus on running your pipeline rather than managing infrastructure.
See the available instance types to choose one that best matches your workload and performance requirements.
Option B: Create a Cluster and Nodepool
To train your model via the CLI with managed infrastructure, you’ll need to provision compute resources by creating a cluster and a nodepool.
Start by defining a YAML configuration file for your compute cluster. Ensure the configuration supports GPU workloads, as GPUs are required for efficient training and inference of vision models.
Here is an example cluster config file:
- YAML
compute_cluster:
id: "visual-compute-cluster"
description: "My AWS compute cluster"
cloud_provider:
id: "aws"
region: "us-east-1"
managed_by: "clarifai"
cluster_type: "dedicated"
visibility:
gettable: 10
Then run the following command, pointing to your config file:
- CLI
clarifai computecluster create \
your_compute_cluster_id \
--config your_compute_cluster_config_filepath
Example Output
clarifai computecluster create \
visual-compute-cluster \
--config compute.yaml
[INFO] 16:04:56.120016 Compute Cluster with ID 'visual-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-12.4.0-8233ec6fcf154467857f61d19d175023"
| thread=8485281984
Next, define a nodepool within your cluster. This is where you specify the actual compute instances used for training. Be sure to choose a GPU-enabled instance that aligns with your performance and cost requirements.
Note: GPU support is essential for this tutorial. Without a compatible GPU instance, training may be significantly slower or fail altogether.
Here is an example nodepool config file:
- YAML
nodepool:
id: "visual-nodepool"
compute_cluster:
id: "visual-compute-cluster"
description: "GPU nodepool for training workloads"
instance_types:
- id: "g5.2xlarge"
compute_info:
cpu_limit: "8"
cpu_memory: "28Gi"
accelerator_type:
- "a10"
num_accelerators: 1
accelerator_memory: "40Gi"
node_capacity_type:
capacity_types:
- 1
min_instances: 0
max_instances: 1
Then run the following command, pointing to your config file:
- CLI
clarifai nodepool create \
your_compute_cluster_id \
your_nodepool_id \
--config your_nodepool_config_filepath
Example Output
clarifai nodepool create \
visual-compute-cluster \
visual-nodepool \
--config nodepool.yaml
[INFO] 16:12:23.712709 Nodepool with ID 'visual-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-12.4.0-db3e87045f87426395a777c157daf5a2"
| thread=8485281984
Step 5: Initialize a Pipeline from a Template
The classifier-pipeline-resnet template lets you quickly set up a visual classification pipeline using a preconfigured ResNet-based image classifier — so you can focus on training rather than setup.
To view all the available predefined templates, run:
- CLI
clarifai pipelinetemplate list
Example Output
clarifai pipelinetemplate list
NAME TYPE
classifier-pipeline-resnet classifier
classifier-pipeline-resnet-quick-start classifier
detector-pipeline-dfine detector
detector-pipeline-eval-yolof-quick-start detector
detector-pipeline-yolof detector
detector-pipeline-yolof-quick-start detector
lora-pipeline-unsloth-quick-start lora
Found 7 template(s) total
Available types: classifier, detector, lora
Run the following command to initialize a pipeline from the template:
- CLI
clarifai pipeline init \
--app_id your_app_id \
--user_id your_user_id \
--template classifier-pipeline-resnet \
--set dataset_id=image_dataset \
--set dataset_version_id=dataset_version_id \
--set concepts='["beignets","hamburger","prime_rib","ramen"]'
Where:
| Parameter | Description |
|---|---|
--app_id | The ID of the app where the pipeline will be created |
--user_id | Your Clarifai user ID |
--template | The pipeline template to use. Here, we use classifier-pipeline-resnet |
--set dataset_id | The ID of the dataset to use for training |
--set dataset_version_id | The specific dataset version to use for training |
--set concepts | A JSON array of the concept labels the model will be trained to classify |
Example Output
clarifai pipeline init \
--app_id my-vis-classifier \
--user_id alfrick \
--template classifier-pipeline-resnet \
--set dataset_id=image_dataset \
--set dataset_version_id=a3003c4c946b482fb7a24c4d536367f1 \
--set concepts='["beignets","hamburger","prime_rib","ramen"]'
Using template: classifier-pipeline-resnet
Template Type: classifier
Steps:
Parameters: 19 required
Creating pipeline 'classifier-pipeline-resnet' from template 'classifier-pipeline-resnet'...
Template Parameters (default values):
user_id : <YOUR_USER_ID>
app_id : <YOUR_APP_ID>
model_id : test_model
dataset_id : <YOUR_DATASET_ID>
dataset_version_id :
concepts : ["beignets","hamburger","prime_rib","ramen"]
num_epochs : 200
batch_size : 64
image_size : 224
per_item_lrate : 1.953125e-05
weight_decay : 0.01
per_item_min_lrate : 1.5625e-08
warmup_iters : 5
warmup_ratio : 0.0001
flip_probability : 0.5
flip_direction : horizontal
concepts_mutually_exclusive : False
pretrained_weights : ImageNet-1k
seed : -1
Pipeline initialization complete in /Users/macbookpro/Desktop/test2/trainvis/classifier-pipeline-resnet
Next steps:
1. Review and customize the generated pipeline steps
2. Add any additional dependencies to requirements.txt files
3. Run 'clarifai pipeline upload /Users/macbookpro/Desktop/test2/trainvis/classifier-pipeline-resnet/config.yaml' to upload your pipeline
4. Use 'clarifai pipeline run --config /Users/macbookpro/Desktop/test2/trainvis/classifier-pipeline-resnet/config.yaml [--set key=value]' to execute your pipeline
Once executed, the command creates a new project directory named after the template, preloaded with all necessary configuration files.
Before running any subsequent clarifai pipeline ... commands, navigate into the generated directory — these commands rely on the local config.yaml and config-lock.yaml files:
- CLI
cd classifier-pipeline-resnet
Note: You can optionally review the generated pipeline steps and tailor them to your use case. If needed, you can also adjust the default parameters and add any additional dependencies to the
requirements.txtfiles to support your pipeline.
You can optionally customize the pipeline during setup — for example, by specifying a different user/app, assigning a custom pipeline ID, or adjusting model parameters:
- CLI
clarifai pipeline init --template=classifier-pipeline-resnet \
--user_id your_custom_user_id \
--app_id your_custom_app_id \
--set id=your_custom_pipeline_id \
--set num_epochs=20
Step 6: Upload Your Pipeline
Once your pipeline is initialized and configured, the next step is to upload it and trigger the training job.
Make sure you’re inside the generated pipeline directory, then run:
- CLI
clarifai pipeline upload
Example Output
cd classifier-pipeline-resnet
((venv) ) (base) macbookpro@macs-MacBook-Pro classifier-pipeline-resnet % clarifai pipeline upload
[INFO] 16:32:35.454197 Starting pipeline upload from config: ./config.yaml | thread=8485281984
[INFO] 16:32:37.865166 Using existing app 'my-vis-classifier' | thread=8485281984
[INFO] 16:32:37.873317 No step_directories specified, but all templateRefs have versions. Skipping pipeline step upload (reusing existing step versions). | thread=8485281984
[INFO] 16:32:37.873399 No pipeline step versions for lockfile | thread=8485281984
[WARNING] 16:32:37.877711 Could not find version for step: classifier-pipeline-resnet-ps | thread=8485281984
[INFO] 16:32:37.880190 Creating pipeline classifier-pipeline-resnet... | thread=8485281984
[WARNING] 16:32:37.884322 Could not find version for step: classifier-pipeline-resnet-ps | thread=8485281984
[INFO] 16:32:41.966172 Successfully created pipeline classifier-pipeline-resnet | thread=8485281984
[INFO] 16:32:41.966564 Pipeline ID: classifier-pipeline-resnet | thread=8485281984
[INFO] 16:32:41.966635 Pipeline version ID: c382c6f2f1a14167971a14ad6cd06aaf | thread=8485281984
[INFO] 16:32:41.987219 Generated lockfile: /Users/macbookpro/Desktop/test2/trainvis/classifier-pipeline-resnet/config-lock.yaml | thread=8485281984
[INFO] 16:32:41.987555 Pipeline upload completed successfully with lockfile! | thread=8485281984
The above command will register the pipeline in your app, upload all associated configuration files, and prepare the pipeline for execution.
Step 7: Run the Pipeline
You can run your pipeline using either on-demand instance compute or a preconfigured cluster and nodepool.
Option A: Run on On-Demand Instance Compute
Instead of relying on an existing nodepool and compute cluster, you can automatically provision or reuse compute at runtime by specifying an instance type:
- CLI
clarifai pipeline run --instance=g6e.xlarge
This approach removes the need to manage infrastructure, making it ideal for quick experiments or simplified workflows.
To modify pipeline parameters at run time, pass one or more --set key=value flags:
- CLI
clarifai pipeline run \
--instance=g6e.xlarge \
--set num_epochs=20 \
--set batch_size=32
Option B: Run on Cluster and Nodepool
If you’ve already set up a compute cluster and nodepool, you can run the pipeline by explicitly targeting those resources:
- CLI
clarifai pipeline run \
--nodepool_id=your_nodepool_id \
--compute_cluster_id=your_compute_cluster_id
Example Output
clarifai pipeline run \
--nodepool_id=visual-nodepool \
--compute_cluster_id=visual-compute-cluster
Found config-lock.yaml, using it as default config source
[INFO] 16:38:26.555009 Starting pipeline run for pipeline classifier-pipeline-resnet | thread=8485281984
[INFO] 16:38:29.834776 Pipeline version run created with ID: 2c8ec3b1fa7d438bbba96ca3b46c47ba | thread=8485281984
[INFO] 16:42:49.830451 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:42:49.830541 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:43:00.628789 Pipeline run monitoring... (elapsed 270.8s) | thread=8485281984
[INFO] 16:43:00.629355 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:43:00.629498 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:43:11.350856 [LOG] time="2026-04-23T13:40:25.268Z" level=info msg="Starting Workflow Executor" version=v3.7.9
time="2026-04-23T13:40:25.272Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2026-04-23T13:40:25.272Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=prod-alfrick podName=cl-****baf1fecd63b7-cl-****7cc9dd354d7d-913573978 templateName=cl-****7cc9dd354d7d version="&Version{Version:v3.7.9,BuildDate:2026-01-28T14:52:11Z,GitCommit:********,GitTag:v3.7.9,GitTreeState:clean,GoVersion:go1.25.5,Compiler:gc,Platform:linux/amd64,}"
time="2026-04-23T13:40:25.285Z" level=info msg="Starting deadline monitor"
/opt/conda/lib/python3.11/site-packages/torch/cuda/__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import] | thread=8485281984
[INFO] 16:43:11.351441 Pipeline run monitoring... (elapsed 281.5s) | thread=8485281984
[INFO] 16:43:11.351634 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:43:11.351907 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:44:05.154017 [LOG] /opt/conda/lib/python3.11/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead.
from torch.distributed.optim import \
INFO:root:Starting MMClassification ResNet-50 training pipeline
{"msg": "Downloading to /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth", "@timestamp": "2026-04-23T13:43:02.414665Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 503, "level": "info"}
INFO:clarifai:Downloading to /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth
Downloading: 100%|██████████| 103M/103M [00:02<00:00, 42.1MB/s]
{"msg": "Download completed: /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth", "@timestamp": "2026-04-23T13:43:05.182058Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 559, "level": "info"}
INFO:clarifai:Download completed: /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth
INFO:root:Downloaded checkpoint to /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth
INFO:root:
INFO:root:================================================================================
INFO:root:STEP 1: Downloading Dataset from Clarifai API
INFO:root:================================================================================
INFO:model.1.dataset_helpers:================================================================================
INFO:model.1.dataset_helpers:STEP 1: Downloading Dataset from Clarifai API (Export Method)
INFO:model.1.dataset_helpers:Downloaded 20/20 images
INFO:model.1.dataset_helpers:Successfully downloaded 20 images
INFO:model.1.dataset_helpers:Cached dataset to: /tmp/mmpretrain_work_dir/dataset_image_dataset/downloaded_data.pkl
INFO:root:Dataset name: dataset_image_dataset
INFO:root:
INFO:root:================================================================================
INFO:root:STEP 2: Converting Dataset to ImageNet Format
INFO:root:================================================================================
INFO:model.1.dataset_helpers:================================================================================
INFO:model.1.dataset_helpers:STEP 2: Converting Dataset to ImageNet Format
INFO:model.1.dataset_helpers:================================================================================
INFO:model.1.dataset_helpers:Loaded 20 cached images
INFO:model.1.dataset_helpers:Concept mapping: {'beignets': 0, 'hamburger': 1, 'prime_rib': 2, 'ramen': 3}
INFO:model.1.dataset_helpers:Conversion complete! Images saved: 20
INFO:root:Images directory: /tmp/mmpretrain_work_dir/dataset_image_dataset/train
INFO:root:Annotations file: /tmp/mmpretrain_work_dir/dataset_image_dataset/train/train_annotations.txt
INFO:model.1.dataset_helpers:Created classes.txt with 4 classes
INFO:root:Classes file: /tmp/mmpretrain_work_dir/dataset_image_dataset/train/classes.txt
INFO:root:Using 4 classes from dataset: ['beignets', 'hamburger', 'prime_rib', 'ramen']
INFO:root:
INFO:root:================================================================================
INFO:root:STEP 3-6: Training Model
INFO:root:================================================================================
INFO:root:STEP 3: Generating self-contained config...
INFO:root:Config generated at /tmp/mmpretrain_work_dir/config.py
INFO:root:STEP 4: Configuring config for dataset...
INFO:root:Loaded config from file: /tmp/mmpretrain_work_dir/config.py
INFO:root:Set ann_file to /tmp/mmpretrain_work_dir/dataset_image_dataset/train/train_annotations.txt
INFO:root:Set data_prefix to /tmp/mmpretrain_work_dir/dataset_image_dataset/train/train
INFO:root:Set classes to /tmp/mmpretrain_work_dir/dataset_image_dataset/train/classes.txt
INFO:root:Set dataset_type to CustomDataset
INFO:root:Dumped updated config to file: /tmp/mmpretrain_work_dir/configured_config.py
INFO:root:Config configured at /tmp/mmpretrain_work_dir/configured_config.py
INFO:root:STEP 5: Training model using mmpretrain/mmengine API...
INFO:root:Building runner and starting training...
INFO:root:✓ CUDA enabled: NVIDIA A10G
04/23 13:43:11 - mmengine - INFO -
------------------------------------------------------------
System environment:
sys.platform: linux
Python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:27:36) [GCC 13.3.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 780123474
GPU 0: NVIDIA A10G
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.5.1+cu124
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.5.3 (Git Hash ********)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.4
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.20.1+cu124
OpenCV: 4.13.0
MMEngine: 0.10.7
Runtime environment:
cudnn_benchmark: False
dist_cfg: {'backend': 'nccl'}
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
seed: 780123474
Distributed launcher: none
Distributed training: False
GPU number: 1
------------------------------------------------------------
04/23 13:43:12 - mmengine - INFO - Config:
data_preprocessor = dict(
mean=[
123.675,
116.28,
103.53,
],
num_classes=4,
std=[
58.395,
57.12,
57.375,
],
to_rgb=True)
dataset_type = 'CustomDataset'
default_hooks = dict(
checkpoint=dict(interval=1, max_keep_ckpts=1, type='CheckpointHook'),
logger=dict(interval=10, type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'))
default_scope = 'mmpretrain'
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
launcher = 'none'
load_from = '/tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth'
model = dict(
backbone=dict(
depth=50,
num_stages=4,
out_indices=(3, ),
style='pytorch',
type='ResNet'),
data_preprocessor=dict(
mean=[
123.675,
116.28,
103.53,
],
num_classes=4,
std=[
58.395,
57.12,
57.375,
],
to_rgb=True),
head=dict(
in_channels=2048,
loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
num_classes=4,
topk=(
1,
4,
),
type='LinearClsHead'),
neck=dict(type='****'),
type='ImageClassifier')
optim_wrapper = dict(
optimizer=dict(lr=0.00125, type='Lamb', weight_decay=0.01))
param_scheduler = [
dict(
begin=0,
by_epoch=True,
convert_to_iter_based=True,
end=5,
start_factor=0.0001,
type='LinearLR'),
dict(
T_max=195,
begin=5,
by_epoch=True,
end=200,
eta_min=1e-06,
type='CosineAnnealingLR'),
]
test_cfg = dict()
test_dataloader = dict(
batch_size=64,
dataset=dict(
ann_file='',
data_prefix='',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
backend='pillow',
edge='short',
interpolation='bicubic',
scale=256,
type='ResizeEdge'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
],
type='CustomDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(topk=(1, ), type='Accuracy')
train_cfg = dict(by_epoch=True, max_epochs=200, val_interval=1)
train_dataloader = dict(
batch_size=64,
dataset=dict(
ann_file=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train_annotations.txt',
classes=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/classes.txt',
data_prefix=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train',
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=224, type='RandomResizedCrop'),
dict(direction='horizontal', prob=0.5, type='RandomFlip'),
dict(
hparams=dict(
interpolation='bicubic', pad_val=[
104,
116,
124,
]),
magnitude_level=7,
magnitude_std=0.5,
num_policies=2,
policies=[
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(
magnitude_key='angle',
magnitude_range=(
0,
30,
),
type='Rotate'),
dict(
magnitude_key='bits',
magnitude_range=(
4,
0,
),
type='Posterize'),
dict(
magnitude_key='thr',
magnitude_range=(
256,
0,
),
type='Solarize'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
110,
),
type='SolarizeAdd'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='ColorTransform'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Contrast'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Brightness'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Sharpness'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
],
total_level=10,
type='RandAugment'),
dict(type='PackInputs'),
],
type='CustomDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(scale=224, type='RandomResizedCrop'),
dict(direction='horizontal', prob=0.5, type='RandomFlip'),
dict(
hparams=dict(interpolation='bicubic', pad_val=[
104,
116,
124,
]),
magnitude_level=7,
magnitude_std=0.5,
num_policies=2,
policies=[
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(
magnitude_key='angle',
magnitude_range=(
0,
30,
),
type='Rotate'),
dict(
magnitude_key='bits',
magnitude_range=(
4,
0,
),
type='Posterize'),
dict(
magnitude_key='thr',
magnitude_range=(
256,
0,
),
type='Solarize'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
110,
),
type='SolarizeAdd'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='ColorTransform'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Contrast'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Brightness'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Sharpness'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
],
total_level=10,
type='RandAugment'),
dict(type='PackInputs'),
]
val_cfg = dict()
val_dataloader = dict(
batch_size=64,
dataset=dict(
ann_file=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train_annotations.txt',
classes=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/classes.txt',
data_prefix=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
backend='pillow',
edge='short',
interpolation='bicubic',
scale=256,
type='ResizeEdge'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
],
type='CustomDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(topk=(1, ), type='Accuracy')
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(
backend='pillow',
edge='short',
interpolation='bicubic',
scale=256,
type='ResizeEdge'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
]
work_dir = '/tmp/mmpretrain_work_dir' | thread=8485281984
[INFO] 16:44:05.156567 Pipeline run monitoring... (elapsed 335.3s) | thread=8485281984
[INFO] 16:44:05.156705 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:44:05.156789 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:44:20.819888 [LOG] 04/23 13:43:12 - mmengine - INFO - Config:
data_preprocessor = dict(
mean=[
123.675,
116.28,
103.53,
],
num_classes=4,
std=[
58.395,
57.12,
57.375,
],
to_rgb=True)
dataset_type = 'CustomDataset'
default_hooks = dict(
checkpoint=dict(interval=1, max_keep_ckpts=1, type='CheckpointHook'),
logger=dict(interval=10, type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'))
default_scope = 'mmpretrain'
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
launcher = 'none'
load_from = '/tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth'
model = dict(
backbone=dict(
depth=50,
num_stages=4,
out_indices=(3, ),
style='pytorch',
type='ResNet'),
data_preprocessor=dict(
mean=[
123.675,
116.28,
103.53,
],
num_classes=4,
std=[
58.395,
57.12,
57.375,
],
to_rgb=True),
head=dict(
in_channels=2048,
loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
num_classes=4,
topk=(
1,
4,
),
type='LinearClsHead'),
neck=dict(type='****'),
type='ImageClassifier')
optim_wrapper = dict(
optimizer=dict(lr=0.00125, type='Lamb', weight_decay=0.01))
param_scheduler = [
dict(
begin=0,
by_epoch=True,
convert_to_iter_based=True,
end=5,
start_factor=0.0001,
type='LinearLR'),
dict(
T_max=195,
begin=5,
by_epoch=True,
end=200,
eta_min=1e-06,
type='CosineAnnealingLR'),
]
test_cfg = dict()
test_dataloader = dict(
batch_size=64,
dataset=dict(
ann_file='',
data_prefix='',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
backend='pillow',
edge='short',
interpolation='bicubic',
scale=256,
type='ResizeEdge'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
],
type='CustomDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(topk=(1, ), type='Accuracy')
train_cfg = dict(by_epoch=True, max_epochs=200, val_interval=1)
train_dataloader = dict(
batch_size=64,
dataset=dict(
ann_file=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train_annotations.txt',
classes=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/classes.txt',
data_prefix=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train',
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=224, type='RandomResizedCrop'),
dict(direction='horizontal', prob=0.5, type='RandomFlip'),
dict(
hparams=dict(
interpolation='bicubic', pad_val=[
104,
116,
124,
]),
magnitude_level=7,
magnitude_std=0.5,
num_policies=2,
policies=[
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(
magnitude_key='angle',
magnitude_range=(
0,
30,
),
type='Rotate'),
dict(
magnitude_key='bits',
magnitude_range=(
4,
0,
),
type='Posterize'),
dict(
magnitude_key='thr',
magnitude_range=(
256,
0,
),
type='Solarize'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
110,
),
type='SolarizeAdd'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='ColorTransform'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Contrast'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Brightness'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Sharpness'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
],
total_level=10,
type='RandAugment'),
dict(type='PackInputs'),
],
type='CustomDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(scale=224, type='RandomResizedCrop'),
dict(direction='horizontal', prob=0.5, type='RandomFlip'),
dict(
hparams=dict(interpolation='bicubic', pad_val=[
104,
116,
124,
]),
magnitude_level=7,
magnitude_std=0.5,
num_policies=2,
policies=[
dict(type='AutoContrast'),
dict(type='Equalize'),
dict(type='Invert'),
dict(
magnitude_key='angle',
magnitude_range=(
0,
30,
),
type='Rotate'),
dict(
magnitude_key='bits',
magnitude_range=(
4,
0,
),
type='Posterize'),
dict(
magnitude_key='thr',
magnitude_range=(
256,
0,
),
type='Solarize'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
110,
),
type='SolarizeAdd'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='ColorTransform'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Contrast'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Brightness'),
dict(
magnitude_key='magnitude',
magnitude_range=(
0,
0.9,
),
type='Sharpness'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.3,
),
type='Shear'),
dict(
direction='horizontal',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
dict(
direction='vertical',
magnitude_key='magnitude',
magnitude_range=(
0,
0.45,
),
type='Translate'),
],
total_level=10,
type='RandAugment'),
dict(type='PackInputs'),
]
val_cfg = dict()
val_dataloader = dict(
batch_size=64,
dataset=dict(
ann_file=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train_annotations.txt',
classes=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/classes.txt',
data_prefix=
'/tmp/mmpretrain_work_dir/dataset_image_dataset/train/train',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
backend='pillow',
edge='short',
interpolation='bicubic',
scale=256,
type='ResizeEdge'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
],
type='CustomDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(topk=(1, ), type='Accuracy')
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(
backend='pillow',
edge='short',
interpolation='bicubic',
scale=256,
type='ResizeEdge'),
dict(crop_size=224, type='CenterCrop'),
dict(type='PackInputs'),
]
work_dir = '/tmp/mmpretrain_work_dir'
INFO:matplotlib.font_manager:generated new fontManager
04/23 13:43:13 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
04/23 13:43:13 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
--------------------
before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook
--------------------
before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
--------------------
before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
--------------------
after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
before_val:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_val_epoch:
(NORMAL ) IterTimerHook
--------------------
before_val_iter:
(NORMAL ) IterTimerHook
--------------------
after_val_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_val:
(VERY_HIGH ) RuntimeInfoHook
--------------------
after_train:
(VERY_HIGH ) RuntimeInfoHook
(VERY_LOW ) CheckpointHook
--------------------
before_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_test_epoch:
(NORMAL ) IterTimerHook
--------------------
before_test_iter:
(NORMAL ) IterTimerHook
--------------------
after_test_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
after_run:
(BELOW_NORMAL) LoggerHook
--------------------
INFO:root:cuda:0
/opt/conda/lib/python3.11/site-packages/mmengine/runner/checkpoint.py:347: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(filename, map_location=map_location)
Loads checkpoint by local backend from path: /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth
The model and loaded state dict do not match exactly
size mismatch for head.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([4, 2048]).
size mismatch for head.fc.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).
04/23 13:43:15 - mmengine - INFO - Load checkpoint from /tmp/pretrain_checkpoints/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth
04/23 13:43:15 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
04/23 13:43:15 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
04/23 13:43:15 - mmengine - INFO - Checkpoints will be saved to /tmp/mmpretrain_work_dir.
04/23 13:43:16 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:16 - mmengine - INFO - Epoch(train) [1][1/1] lr: 1.2500e-07 eta: 0:03:30 time: 1.0578 data_time: 0.2672 memory: 1814 loss: 1.3941
04/23 13:43:16 - mmengine - INFO - Saving checkpoint at 1 epochs
04/23 13:43:17 - mmengine - INFO - Epoch(val) [1][1/1] accuracy/top1: 15.0000 data_time: 0.2537 time: 0.2902
04/23 13:43:17 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:17 - mmengine - INFO - Epoch(train) [2][1/1] lr: 3.1259e-04 eta: 0:02:03 time: 0.6234 data_time: 0.1699 memory: 1988 loss: 1.3959
04/23 13:43:17 - mmengine - INFO - Saving checkpoint at 2 epochs
04/23 13:43:19 - mmengine - INFO - Epoch(val) [2][1/1] accuracy/top1: 20.0000 data_time: 0.1775 time: 0.2075
04/23 13:43:19 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:19 - mmengine - INFO - Epoch(train) [3][1/1] lr: 6.2506e-04 eta: 0:01:34 time: 0.4782 data_time: 0.1393 memory: 1988 loss: 1.3905
04/23 13:43:19 - mmengine - INFO - Saving checkpoint at 3 epochs
04/23 13:43:20 - mmengine - INFO - Epoch(val) [3][1/1] accuracy/top1: 15.0000 data_time: 0.1021 time: 0.1256
04/23 13:43:21 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:21 - mmengine - INFO - Epoch(train) [4][1/1] lr: 9.3753e-04 eta: 0:01:18 time: 0.4024 data_time: 0.1203 memory: 1988 loss: 1.3899
04/23 13:43:21 - mmengine - INFO - Saving checkpoint at 4 epochs
04/23 13:43:22 - mmengine - INFO - Epoch(val) [4][1/1] accuracy/top1: 20.0000 data_time: 0.1013 time: 0.1247
04/23 13:43:22 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:22 - mmengine - INFO - Epoch(train) [5][1/1] lr: 1.2500e-03 eta: 0:01:09 time: 0.3577 data_time: 0.1101 memory: 1988 loss: 1.3852
04/23 13:43:22 - mmengine - INFO - Saving checkpoint at 5 epochs
04/23 13:43:23 - mmengine - INFO - Epoch(val) [5][1/1] accuracy/top1: 45.0000 data_time: 0.1013 time: 0.1248
04/23 13:43:23 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:23 - mmengine - INFO - Epoch(train) [6][1/1] lr: 1.2500e-03 eta: 0:01:03 time: 0.3278 data_time: 0.1027 memory: 1988 loss: 1.3782
04/23 13:43:23 - mmengine - INFO - Saving checkpoint at 6 epochs
04/23 13:43:24 - mmengine - INFO - Epoch(val) [6][1/1] accuracy/top1: 60.0000 data_time: 0.1016 time: 0.1252
04/23 13:43:24 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:24 - mmengine - INFO - Epoch(train) [7][1/1] lr: 1.2499e-03 eta: 0:00:59 time: 0.3074 data_time: 0.0976 memory: 1988 loss: 1.3742
04/23 13:43:24 - mmengine - INFO - Saving checkpoint at 7 epochs
04/23 13:43:25 - mmengine - INFO - Epoch(val) [7][1/1] accuracy/top1: 75.0000 data_time: 0.1019 time: 0.1258
04/23 13:43:25 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:25 - mmengine - INFO - Epoch(train) [8][1/1] lr: 1.2497e-03 eta: 0:00:55 time: 0.2903 data_time: 0.0929 memory: 1988 loss: 1.3694
04/23 13:43:25 - mmengine - INFO - Saving checkpoint at 8 epochs
04/23 13:43:26 - mmengine - INFO - Epoch(val) [8][1/1] accuracy/top1: 85.0000 data_time: 0.1013 time: 0.1252
04/23 13:43:27 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:27 - mmengine - INFO - Epoch(train) [9][1/1] lr: 1.2493e-03 eta: 0:00:53 time: 0.2782 data_time: 0.0905 memory: 1988 loss: 1.3624
04/23 13:43:27 - mmengine - INFO - Saving checkpoint at 9 epochs
04/23 13:43:28 - mmengine - INFO - Epoch(val) [9][1/1] accuracy/top1: 85.0000 data_time: 0.0996 time: 0.1232
04/23 13:43:28 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:28 - mmengine - INFO - Epoch(train) [10][1/1] lr: 1.2487e-03 eta: 0:00:50 time: 0.2683 data_time: 0.0883 memory: 1988 loss: 1.3537
04/23 13:43:28 - mmengine - INFO - Saving checkpoint at 10 epochs
04/23 13:43:29 - mmengine - INFO - Epoch(val) [10][1/1] accuracy/top1: 95.0000 data_time: 0.1002 time: 0.1237
04/23 13:43:29 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:29 - mmengine - INFO - Epoch(train) [11][1/1] lr: 1.2480e-03 eta: 0:00:49 time: 0.1805 data_time: 0.0684 memory: 1988 loss: 1.3416
04/23 13:43:29 - mmengine - INFO - Saving checkpoint at 11 epochs
04/23 13:43:30 - mmengine - INFO - Epoch(val) [11][1/1] accuracy/top1: 95.0000 data_time: 0.1004 time: 0.1242
04/23 13:43:30 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:30 - mmengine - INFO - Epoch(train) [12][1/1] lr: 1.2471e-03 eta: 0:00:47 time: 0.1800 data_time: 0.0683 memory: 1988 loss: 1.3262
04/23 13:43:30 - mmengine - INFO - Saving checkpoint at 12 epochs
04/23 13:43:31 - mmengine - INFO - Epoch(val) [12][1/1] accuracy/top1: 100.0000 data_time: 0.1000 time: 0.1238
04/23 13:43:31 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:31 - mmengine - INFO - Epoch(train) [13][1/1] lr: 1.2460e-03 eta: 0:00:46 time: 0.1789 data_time: 0.0670 memory: 1988 loss: 1.3102
04/23 13:43:31 - mmengine - INFO - Saving checkpoint at 13 epochs
04/23 13:43:32 - mmengine - INFO - Epoch(val) [13][1/1] accuracy/top1: 100.0000 data_time: 0.0997 time: 0.1233
04/23 13:43:32 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:43:32 - mmengine - INFO - Epoch(train) [14][1/1] lr: 1.2448e-03 eta: 0:00:45 time: 0.1788 data_time: 0.0672 memory: 1988 loss: 1.2922
04/23 13:43:32 - mmengine - INFO - Saving checkpoint at 14 epochs
04/23 13:45:11 - mmengine - INFO - Epoch(val) [97][1/1] accuracy/top1: 100.0000 data_time: 0.0981 time: 0.1215
04/23 13:45:11 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:11 - mmengine - INFO - Epoch(train) [98][1/1] lr: 6.8076e-04 eta: 0:00:19 time: 0.1783 data_time: 0.0678 memory: 1988 loss: 0.0436
04/23 13:45:11 - mmengine - INFO - Saving checkpoint at 98 epochs | thread=8485281984
[INFO] 16:45:27.893136 Pipeline run monitoring... (elapsed 418.1s) | thread=8485281984
[INFO] 16:45:27.893270 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:45:27.893357 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:45:39.613107 [LOG] 04/23 13:45:11 - mmengine - INFO - Epoch(val) [97][1/1] accuracy/top1: 100.0000 data_time: 0.0981 time: 0.1215
04/23 13:45:11 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:11 - mmengine - INFO - Epoch(train) [98][1/1] lr: 6.8076e-04 eta: 0:00:19 time: 0.1783 data_time: 0.0678 memory: 1988 loss: 0.0436
04/23 13:45:20 - mmengine - INFO - Epoch(train) [105][1/1] lr: 6.1041e-04 eta: 0:00:17 time: 0.1807 data_time: 0.0698 memory: 1988 loss: 0.0565
04/23 13:45:20 - mmengine - INFO - Saving checkpoint at 105 epochs
[INFO] 16:45:39.614099 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:45:39.614171 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:45:51.452550 [LOG] 04/23 13:45:26 - mmengine - INFO - Epoch(val) [110][1/1] accuracy/top1: 100.0000 data_time: 0.0985 time: 0.1219
04/23 13:45:26 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:26 - mmengine - INFO - Epoch(train) [111][1/1] lr: 5.5022e-04 eta: 0:00:16 time: 0.1745 data_time: 0.0643 memory: 1988 loss: 0.0692
04/23 13:45:26 - mmengine - INFO - Saving checkpoint at 111 epochs
04/23 13:45:27 - mmengine - INFO - Epoch(val) [111][1/1] accuracy/top1: 100.0000 data_time: 0.0981 time: 0.1217
04/23 13:45:28 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:28 - mmengine - INFO - Epoch(train) [112][1/1] lr: 5.4025e-04 eta: 0:00:16 time: 0.1747 data_time: 0.0645 memory: 1988 loss: 0.0923
04/23 13:45:28 - mmengine - INFO - Saving checkpoint at 112 epochs
04/23 13:45:37 - mmengine - INFO - Saving checkpoint at 120 epochs
04/23 13:45:38 - mmengine - INFO - Epoch(val) [120][1/1] accuracy/top1: 100.0000 data_time: 0.0973 time: 0.1209
04/23 13:45:38 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:38 - mmengine - INFO - Epoch(train) [121][1/1] lr: 4.5175e-04 eta: 0:00:14 time: 0.1753 data_time: 0.0643 memory: 1988 loss: 0.0491
04/23 13:45:38 - mmengine - INFO - Saving checkpoint at 121 epochs
04/23 13:45:39 - mmengine - INFO - Epoch(val) [121][1/1] accuracy/top1: 100.0000 data_time: 0.0983 time: 0.1218 | thread=8485281984
[INFO] 16:45:51.453265 Pipeline run monitoring... (elapsed 441.6s) | thread=8485281984
[INFO] 16:45:51.453378 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:45:51.453460 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:03.172225 [LOG] 04/23 13:45:39 - mmengine - INFO - Epoch(val) [121][1/1] accuracy/top1: 100.0000 data_time: 0.0983 time: 0.1218
04/23 13:45:49 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:49 - mmengine - INFO - Epoch(train) [130][1/1] lr: 3.6691e-04 eta: 0:00:12 time: 0.1765 data_time: 0.0654 memory: 1988 loss: 0.0350
04/23 13:45:49 - mmengine - INFO - Saving checkpoint at 130 epochs
04/23 13:45:50 - mmengine - INFO - Epoch(val) [130][1/1] accuracy/top1: 100.0000 data_time: 0.0978 time: 0.1211
04/23 13:45:50 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:50 - mmengine - INFO - Epoch(train) [131][1/1] lr: 3.5778e-04 eta: 0:00:12 time: 0.1768 data_time: 0.0656 memory: 1988 loss: 0.0352
04/23 13:45:50 - mmengine - INFO - Saving checkpoint at 131 epochs
04/23 13:45:51 - mmengine - INFO - Epoch(val) [131][1/1] accuracy/top1: 100.0000 data_time: 0.0976 time: 0.1211 | thread=8485281984
[INFO] 16:46:03.173041 Pipeline run monitoring... (elapsed 453.3s) | thread=8485281984
[INFO] 16:46:03.173673 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:03.173812 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:15.001420 [LOG] 04/23 13:45:51 - mmengine - INFO - Epoch(val) [131][1/1] accuracy/top1: 100.0000 data_time: 0.0976 time: 0.1211
04/23 13:45:51 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:45:51 - mmengine - INFO - Epoch(train) [132][1/1] lr: 3.4873e-04 eta: 0:00:12 time: 0.1771 data_time: 0.0658 memory: 1988 loss: 0.0267
04/23 13:46:03 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:03 - mmengine - INFO - Epoch(train) [142][1/1] lr: 2.6251e-04 eta: 0:00:10 time: 0.1771 data_time: 0.0656 memory: 1988 loss: 0.0305
04/23 13:46:03 - mmengine - INFO - Saving checkpoint at 142 epochs
04/23 13:46:04 - mmengine - INFO - Epoch(val) [142][1/1] accuracy/top1: 100.0000 data_time: 0.0985 time: 0.1219
04/23 13:46:11 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:11 - mmengine - INFO - Epoch(train) [149][1/1] lr: 2.0763e-04 eta: 0:00:09 time: 0.1813 data_time: 0.0694 memory: 1988 loss: 0.0328
04/23 13:46:11 - mmengine - INFO - Saving checkpoint at 149 epochs
04/23 13:46:12 - mmengine - INFO - Epoch(val) [149][1/1] accuracy/top1: 100.0000 data_time: 0.0990 time: 0.1226
04/23 13:46:12 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:12 - mmengine - INFO - Epoch(train) [150][1/1] lr: 2.0021e-04 eta: 0:00:09 time: 0.1797 data_time: 0.0687 memory: 1988 loss: 0.0323
04/23 13:46:12 - mmengine - INFO - Saving checkpoint at 150 epochs | thread=8485281984
[INFO] 16:46:15.002122 Pipeline run monitoring... (elapsed 465.2s) | thread=8485281984
[INFO] 16:46:15.002290 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:15.002423 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:28.489969 Pipeline run monitoring... (elapsed 478.7s) | thread=8485281984
[INFO] 16:46:28.490477 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:28.491283 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:41.210377 [LOG] 04/23 13:46:13 - mmengine - INFO - Epoch(val) [150][1/1] accuracy/top1: 100.0000 data_time: 0.0987 time: 0.1223
04/23 13:46:13 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:27 - mmengine - INFO - Epoch(val) [162][1/1] accuracy/top1: 100.0000 data_time: 0.0973 time: 0.1207
04/23 13:46:27 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:27 - mmengine - INFO - Epoch(train) [163][1/1] lr: 1.1442e-04 eta: 0:00:06 time: 0.1773 data_time: 0.0666 memory: 1988 loss: 0.0115
04/23 13:46:38 - mmengine - INFO - Epoch(train) [172][1/1] lr: 6.7929e-05 eta: 0:00:05 time: 0.1765 data_time: 0.0656 memory: 1988 loss: 0.0102
04/23 13:46:38 - mmengine - INFO - Saving checkpoint at 172 epochs
04/23 13:46:39 - mmengine - INFO - Epoch(val) [172][1/1] accuracy/top1: 100.0000 data_time: 0.0995 time: 0.1233
04/23 13:46:39 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:39 - mmengine - INFO - Epoch(train) [173][1/1] lr: 6.3470e-05 eta: 0:00:04 time: 0.1775 data_time: 0.0665 memory: 1988 loss: 0.0099
04/23 13:46:39 - mmengine - INFO - Saving checkpoint at 173 epochs
04/23 13:46:40 - mmengine - INFO - Epoch(val) [173][1/1] accuracy/top1: 100.0000 data_time: 0.0988 time: 0.1225
04/23 13:46:40 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:40 - mmengine - INFO - Epoch(train) [174][1/1] lr: 5.9157e-05 eta: 0:00:04 time: 0.1789 data_time: 0.0677 memory: 1988 loss: 0.0076
04/23 13:46:40 - mmengine - INFO - Saving checkpoint at 174 epochs | thread=8485281984
[INFO] 16:46:56.332102 Pipeline run monitoring... (elapsed 506.5s) | thread=8485281984
[INFO] 16:46:56.332230 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:46:56.332298 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:47:09.173639 [LOG] 04/23 13:46:41 - mmengine - INFO - Epoch(val) [174][1/1] accuracy/top1: 100.0000 data_time: 0.0978 time: 0.1213
04/23 13:46:52 - mmengine - INFO - Epoch(train) [184][1/1] lr: 2.4276e-05 eta: 0:00:02 time: 0.1779 data_time: 0.0668 memory: 1988 loss: 0.0152
04/23 13:46:52 - mmengine - INFO - Saving checkpoint at 184 epochs
04/23 13:46:53 - mmengine - INFO - Epoch(val) [184][1/1] accuracy/top1: 100.0000 data_time: 0.0985 time: 0.1220
04/23 13:46:53 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:53 - mmengine - INFO - Epoch(train) [185][1/1] lr: 2.1633e-05 eta: 0:00:02 time: 0.1775 data_time: 0.0666 memory: 1988 loss: 0.0152
04/23 13:46:53 - mmengine - INFO - Saving checkpoint at 185 epochs | thread=8485281984
[INFO] 16:47:09.176604 Pipeline run monitoring... (elapsed 519.3s) | thread=8485281984
[INFO] 16:47:09.176793 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:47:09.176855 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:47:26.625295 [LOG] 04/23 13:46:53 - mmengine - INFO - Epoch(val) [184][1/1] accuracy/top1: 100.0000 data_time: 0.0985 time: 0.1220
04/23 13:46:53 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:46:53 - mmengine - INFO - Epoch(train) [185][1/1] lr: 2.1633e-05 eta: 0:00:02 time: 0.1775 data_time: 0.0666 memory: 1988 loss: 0.0152
04/23 13:46:53 - mmengine - INFO - Saving checkpoint at 185 epochs
04/23 13:47:07 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:47:07 - mmengine - INFO - Epoch(train) [197][1/1] lr: 2.2963e-06 eta: 0:00:00 time: 0.1762 data_time: 0.0662 memory: 1988 loss: 0.0076
04/23 13:47:07 - mmengine - INFO - Saving checkpoint at 197 epochs | thread=8485281984
[INFO] 16:47:26.625844 Pipeline run monitoring... (elapsed 536.8s) | thread=8485281984
[INFO] 16:47:26.625993 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:47:26.626169 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:47:41.867744 [LOG] 04/23 13:47:08 - mmengine - INFO - Epoch(val) [197][1/1] accuracy/top1: 100.0000 data_time: 0.0991 time: 0.1226
04/23 13:47:08 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:47:08 - mmengine - INFO - Epoch(train) [198][1/1] lr: 1.7293e-06 eta: 0:00:00 time: 0.1761 data_time: 0.0661 memory: 1988 loss: 0.0073
04/23 13:47:08 - mmengine - INFO - Saving checkpoint at 198 epochs
04/23 13:47:09 - mmengine - INFO - Epoch(val) [198][1/1] accuracy/top1: 100.0000 data_time: 0.0988 time: 0.1223
04/23 13:47:09 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:47:09 - mmengine - INFO - Epoch(train) [199][1/1] lr: 1.3242e-06 eta: 0:00:00 time: 0.1756 data_time: 0.0654 memory: 1988 loss: 0.0077
04/23 13:47:09 - mmengine - INFO - Saving checkpoint at 199 epochs
04/23 13:47:10 - mmengine - INFO - Epoch(val) [199][1/1] accuracy/top1: 100.0000 data_time: 0.0992 time: 0.1227
04/23 13:47:11 - mmengine - INFO - Exp name: configured_config_20260423_134311
04/23 13:47:11 - mmengine - INFO - Epoch(train) [200][1/1] lr: 1.0810e-06 eta: 0:00:00 time: 0.1769 data_time: 0.0668 memory: 1988 loss: 0.0077
04/23 13:47:11 - mmengine - INFO - Saving checkpoint at 200 epochs
04/23 13:47:12 - mmengine - INFO - Epoch(val) [200][1/1] accuracy/top1: 100.0000 data_time: 0.0993 time: 0.1227
INFO:root:Training completed
INFO:root:GPU memory reserved before cleanup: 2130.00 MB (matches nvidia-smi)
INFO:root:GPU memory reserved after cleanup: 390.00 MB (matches nvidia-smi)
INFO:root:GPU memory freed: 1740.00 MB
INFO:root:Checkpoint found: /tmp/mmpretrain_work_dir/epoch_200.pth
INFO:root:Training completed. Checkpoint: /tmp/mmpretrain_work_dir/epoch_200.pth
INFO:root:Config: /tmp/mmpretrain_work_dir/configured_config.py
INFO:root:
INFO:root:================================================================================
INFO:root:STEP 6.5: Benchmarking Model for GPU Requirements
INFO:root:================================================================================
INFO:root:Benchmarking with input shape: (3, 224, 224)
{"msg": "\u2713 CUDA enabled for benchmark: NVIDIA A10G", "@timestamp": "2026-04-23T13:47:12.113225Z", "filename": "benchmark_model_helper.py", "stack_info": null, "lineno": 89, "level": "info"}
INFO:clarifai:✓ CUDA enabled for benchmark: NVIDIA A10G
INFO:clarifai:With 20% buffer: 835 MB → 1 GB
{"msg": "\u2705 Updated ****.yaml with GPU requirement: 1Gi", "@timestamp": "2026-04-23T13:47:14.356524Z", "filename": "benchmark_model_helper.py", "stack_info": null, "lineno": 202, "level": "info"}
INFO:clarifai:✅ Updated ****.yaml with GPU requirement: 1Gi
INFO:root:✅ Benchmark complete and config.yaml updated
INFO:root:
INFO:root:================================================================================
INFO:root:STEP 7: Exporting and Uploading Model to Clarifai
INFO:root:================================================================================
{"msg": "================================================================================", "@timestamp": "2026-04-23T13:47:14.356793Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 46, "level": "info"}
INFO:clarifai:================================================================================
{"msg": "MODEL EXPORT: Starting classifier model export and upload", "@timestamp": "2026-04-23T13:47:14.356859Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 47, "level": "info"}
INFO:clarifai:MODEL EXPORT: Starting classifier model export and upload
{"msg": "================================================================================", "@timestamp": "2026-04-23T13:47:14.356926Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 48, "level": "info"}
INFO:clarifai:================================================================================
{"msg": "\u2705 Found trained weights: /tmp/mmpretrain_work_dir/epoch_200.pth", "@timestamp": "2026-04-23T13:47:14.357045Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 55, "level": "info"}
INFO:clarifai:✅ Found trained weights: /tmp/mmpretrain_work_dir/epoch_200.pth
{"msg": "\ud83d\udccb Number of classes: 4", "@timestamp": "2026-04-23T13:47:14.357106Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 56, "level": "info"}
INFO:clarifai:📋 Number of classes: 4
{"msg": "\ud83d\udccb Classes: ['beignets', 'hamburger', 'prime_rib', 'ramen']", "@timestamp": "2026-04-23T13:47:14.357167Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 57, "level": "info"}
INFO:clarifai:📋 Classes: ['beignets', 'hamburger', 'prime_rib', 'ramen']
{"msg": "\ud83d\udce6 Preparing model package for upload...", "@timestamp": "2026-04-23T13:47:14.357237Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 60, "level": "info"}
INFO:clarifai:📦 Preparing model package for upload...
{"msg": "\ud83d\udcc1 Creating model package in: trained_model_temp/classifier_model", "@timestamp": "2026-04-23T13:47:14.357419Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 92, "level": "info"}
INFO:clarifai:📁 Creating model package in: trained_model_temp/classifier_model
{"msg": "\ud83d\udce6 Copying weights to: trained_model_temp/classifier_model/1/model_files/checkpoint.pth", "@timestamp": "2026-04-23T13:47:14.360269Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 110, "level": "info"}
INFO:clarifai:📦 Copying weights to: trained_model_temp/classifier_model/1/model_files/checkpoint.pth
{"msg": "\u2705 Weights copied successfully", "@timestamp": "2026-04-23T13:47:14.415568Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 112, "level": "info"}
INFO:clarifai:✅ Weights copied successfully
{"msg": "Uploading file: trained_model_temp/classifier_model/1/model_files/checkpoint.pth (269.6 MB)", "@timestamp": "2026-04-23T13:47:14.454563Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 125, "level": "info"}
INFO:clarifai:Uploading file: trained_model_temp/classifier_model/1/model_files/checkpoint.pth (269.6 MB)
{"msg": "Artifact test_model_checkpoint exists", "@timestamp": "2026-04-23T13:47:14.469356Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 142, "level": "info"}
INFO:clarifai:Artifact test_model_checkpoint exists
Loads checkpoint by local backend from path: /tmp/mmpretrain_work_dir/epoch_200.pth
Uploading: 0%| | 0.00/283M [00:00<?, ?B/s]{"msg": "Uploading artifact content...", "@timestamp": "2026-04-23T13:47:14.469975Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 320, "level": "info"}
INFO:clarifai:Uploading artifact content...
{"msg": "Upload complete!", "@timestamp": "2026-04-23T13:47:14.636405Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 347, "level": "info"}
INFO:clarifai:Upload complete!
{"msg": "Created artifact version: ****0fd8e2fbbd14", "@timestamp": "2026-04-23T13:47:14.699322Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 223, "level": "info"}
INFO:clarifai:Created artifact version: ****0fd8e2fbbd14 | thread=8485281984
[INFO] 16:47:41.868790 Pipeline run monitoring... (elapsed 552.0s) | thread=8485281984
[INFO] 16:47:41.868951 Pipeline run status: 64001 (JOB_RUNNING) | thread=8485281984
[INFO] 16:47:41.869080 Pipeline run in progress: 64001 (JOB_RUNNING) | thread=8485281984
Uploading: 100%|██████████| 283M/283M [00:13<00:00, 20.3MB/s] /283M [00:00<00:10, 24.5MB/s]
{"msg": "Upload completed successfully: ****0fd8e2fbbd14", "@timestamp": "2026-04-23T13:47:28.424392Z", "filename": "artifact_version.py", "stack_info": null, "lineno": 265, "level": "info"}
INFO:clarifai:Upload completed successfully: ****0fd8e2fbbd14
{"msg": "\ud83d\udce6 Artifact version: ****0fd8e2fbbd14", "@timestamp": "2026-04-23T13:47:28.431640Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 32, "level": "info"}
INFO:clarifai:📦 Artifact version: ****0fd8e2fbbd14
{"msg": "\ud83d\udce6 Copying config.py to: trained_model_temp/classifier_model/1/model_files/config.py", "@timestamp": "2026-04-23T13:47:28.431866Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 118, "level": "info"}
INFO:clarifai:📦 Copying config.py to: trained_model_temp/classifier_model/1/model_files/config.py
{"msg": "\u2705 Config.py copied successfully", "@timestamp": "2026-04-23T13:47:28.432292Z", "filename": "model_export_helper.py", "stack_info": null, "lineno": 120, "level": "info"}
INFO:clarifai: