Skip to main content

Datasets

Manage the datasets for training, testing, and evaluating your models


A dataset is a collection of data examples you can use to train, test, and evaluate your machine learning model. With Clarifai datasets, you can manage the datasets you want to use for visual search, training, and evaluation.

Datasets are stored as convenient snapshots in datasets tables, and they play a critical role in determining the performance of your models.

The quality and quantity of the data in the dataset can significantly impact the accuracy and robustness of the resulting machine learning model. Therefore, it is essential to select a relevant and sufficient dataset for the task you have at hand.

You can add different types of datasets on the Clarifai portal, including:

  • Training dataset—This is the data you can use to initially train a model. It comprises a set of annotated examples, where the annotations are the output values the model is expected to predict.
  • Validation dataset—This is the data you can use to fine-tune your model's hyperparameters and assess its performance during training. It comprises a set of annotated examples that are not used for training, but are used to gauge the model's performance.
  • Testing dataset—This is the data you can use to assess the final performance of your trained model. It comprises a set of annotated examples that are not used for training or validation.