Skip to main content

YAML-based Examples

Simple examples of workflows defined in YAML


Assorted Examples

Node NameInput & OutputDescriptionExample Usage
audio-to-textAudio -> TextClassify audio signal into string of text.
barcode-operatorImage -> TextOperator that detects and recognizes barcodes from the image. It assigns regions with barcode text for each detected barcode. Supports EAN/UPC, Code 128, Code 39, Interleaved 2 of 5 and QR Code.
Centroid TrackerFrames -> Track IDCentroid trackers rely on the Euclidean distance between centroids of regions in different video frames to assign the same track ID to detections of the same object.
ClustererEmbeddings -> ClustersCluster semantically similar images and video frames together in embedding space. This is the basis for good visual search within your app at scale or for grouping your data together without the need for annotated concepts
embeddings-classifierEmbeddings -> ConceptsClassify images or texts based on the embedding model that has indexed them in your app. Transfer learning leverages feature representations from a pre-trained model based on massive amounts of data, so you don't have to train a new model from scratch and can learn new things very quickly with minimal training data
image-color-recognizerImage -> ColorsRecognize standard color formats and the proportion each color that covers an image
image-to-textImage -> TextTakes in cropped regions with text in them and returns the text it sees.
kalman-filter-trackerFrames -> Track IDKalman Filter trackers rely on the Kalman Filter algorithm to estimate the next position of an object based on its position and velocity in previous frames. Then detections are matched to predictions by using the Hungarian algorithm
kalman-reid-trackerFrames -> Track IDKalman reid tracker is a kalman filter tracker that expects the Embedding proto field to be populated for detections, and reassigns track IDs based off of embedding distance
neural-lite-trackerFrames -> Track IDNeural Lite Tracker uses light-weight trainable graphical models to infer states of tracks and perform associations using hybrid similairty of lou and centroid distance
neural-trackerFrames -> Track IDNeural Tracker uses neural probabilistic models to perform filtering and association
optical-character-recognizerImage -> TextDetect bounding box regions in images or video frames where text is present and then output the text read with the score
tesseract-operatorImage -> TextOperator for Optical Character Recognition using the Tesseract libraries
text-classifierText -> ConceptsClassify text into a set of concepts
text-embedderText -> EmbeddingsEmbed text into a vector representing a high level understanding from our Al models. These embeddings enable similarity search and training on top of them
text-token-classifierText -> ConceptsClassify tokens from a set of entity classes
visual-classifierImage -> ConceptsClassify images and videos frames into set of concepts
visual-detectorImage -> Bounding BoxDetect bounding box regions in images or video frames where things and then classify objects, descriptive words or topics within the boxes
visual-embedderImage -> EmbeddingsEmbed images and videos frames into a vector representing a high level understanding from our Al models. These embeddings enable visual search and training on top of them
visual-segmenterImage -> ConceptsSegment a per-pixel mask in images where things are and then classify objects, descriptive words or topics within the masks
concept-thresholderConcepts -> ConcpetsThreshold input concepts according to both a threshold and an operator (>, >=, =, <=, or <). For example, assume the " > " threshold type is set for the model, then if the input concept value is greater than the threshold for that concept, the input concept will be output from this model, otherwise it will not be output by the model
random-sampleAny -> AnyRandomly sample allowing the input to pass to the output. This is done with the conditional keep_fraction > rand() where keep_fraction is the fraction to allow through on average
region-thresholderConcepts -> ConceptsThreshold regions based on the concepts that they contain using a threshold per concept and an overall operator (>, >=, =, <=, or <). For example, assume the " > " threshold type is set for the model, then if the input regions[...].data.concepts.value is greater than the threshold for that concept, the input concept will be output from this model, otherwise it will not be output by the model. If the entire list of concepts at regions[...].data.concepts is filtered out then the overall region will also be removed
byte-trackerFrame -> Track IDUses byte tracking algorithm for tracking objects
concept-synonym-mapperConcept -> ConceptMap the input concepts to output concepts by following synonym concept relations in the knowledge graph of your app
image-alignImage -> ImageAligns images using keypoints
image-cropImage -> ImageCrop the input image according to each input region that is present in the input. When used in a workflow this model can look back along the graph of the workflow to find the input image if the preceding model does not output an image itself so that you can do image -> detector -> cropper type of workflow easily
image-tiling-operatorImage -> ImageOperator for tiling images into a fixed number of equal sized images
image-to-imageImage -> ImageGiven an image, apply a transformation on the input and return the post-processed image as output
input-filterAny -> AnyIf the input going through this model does not match those we are filtering for, it will not be passed on in the workflow branch
input-searcherConcepts,Images,Text -> HitsTriggers a visual search in another app based on the model configs if concept(s) are found in images and returns the matched search hits as regions.
keyword-filter-operatorText -> ConceptsThis operator is initialized with a set of words, and then determines which are found in the input text.
language-id-operatorText -> ConceptsOperator for language identification using the langdetect library
multimodal-embedderAny -> EmbeddingsEmbed text or image into a vector representing a high level understanding from our Al models, e.g. CLIP. These embeddings enable similarity search and training on top of them.
multimodal-to-textAny -> TextGenerate text from either text or images or both as input, allowing it to understand and respond to questions about those images
prompterText -> TextPrompt template where inputted text will be inserted into placeholders marked with (data.text.raw).
rag-prompterText -> TextA prompt template where we will perform a semantic search in the app with the incoming text. The inputted text will be inserted into placeholders marked with '(data.text.raw)' and search results will be inserted into placeholders with '{data.hits}', which will be new line separated
regex-based-classifierText -> ConceptsClassifies text using regex. If the regex matches, the text is classified as the provided concepts.
text-to-audioText -> AudioGiven text input, this model produces an audio file containing the spoken version of the input
text-to-imageText -> ImageTakes in a prompt and generates an image
tiling-region-aggregator-operatorFrames -> Concepts,Bounding BoxOperator to be used as a follow up to the image-tiling-operator and visual detector. This operator will transform the detections on each of tiles back to the original image and perform non-maximum suppression. Only the top class prediction for each box is considered
visual-keypointerImage -> KeypointsThis model detects keypoints in images or video frames.
isolation-operatorConcepts,BoundingBox -> Concepts,BoundingBoxOperator that computes distance between detections and assigns isolation label
object-counterConcepts -> Metadatacount number of regions that match this model's active concepts frame by frame
text-aggregation-operatorText -> TextOperator that combines text detections into text body for the whole image. Detections are sorted from left to right first and then top to bottom, using the top-left corner of the bounding box as reference
tokens-to-entity-operatorText,Concepts -> Text,ConceptsOperator that combines text tokens into entities, e.g. New' + 'York' -> New York
annotation-writerAny -> AnyWrite the input data to the database in the form of an annotation with a specified status as if a specific user created the annotation
aws-lambdaAny -> AnyThis model sends data to an AWS lambda function so you can implement any arbitrary logic to be handled within a model predict or workflow. The request our API sends is a PostModelOutputsRequest in the 'request' field and the response we expect is a MultiOutputResponse response in the 'response' field
emailAny -> AnyEmail alert model will send an email if there are any data fields input to this model
results-pushAny -> AnyThis model pushes clarifai prediction results in an external format
smsAny -> AnySMS alert model will send a SMS if there are any data fields input to this model
status-pushAny -> AnyThis model pushes processing status of a batch of inputs ingested through vendor/inputs endpoint in one request

ASR Sentiment

Automatic Speech Recognition (ASR) sentiment analysis is the process of detecting the emotional tone or sentiment in spoken language by first transcribing speech using an ASR model and then analyzing the resulting text.

workflow:
id: asr-sentiment
nodes:
- id: audio-speech-recognition
model:
model_id: asr-wav2vec2-large-robust-ft-swbd-300h-english
user_id: facebook
app_id: asr

- id: text-sentiment-classification
model:
model_id: sentiment-analysis-twitter-roberta-base
user_id: erfan
app_id: text-classification

node_inputs:
- node_id: audio-speech-recognition

Demographics

This is a multi-model workflow designed to detect faces, crop them, and recognize key demographic characteristics. It visually classifies attributes such as age, gender, and cultural appearance.

workflow:
id: Demographics
nodes:
- id: detect-concept
model:
model_id: face-detection
model_version_id: 45fb9a671625463fa646c3523a3087d5

- id: image-crop
model:
model_id: margin-110-image-crop
model_version_id: b9987421b40a46649566826ef9325303
node_inputs:
- node_id: detect-concept

- id: demographics-race
model:
model_id: ethnicity-demographics-recognition
model_version_id: b2897edbda314615856039fb0c489796
node_inputs:
- node_id: image-crop

- id: demographics-gender
model:
model_id: gender-demographics-recognition
model_version_id: ff83d5baac004aafbe6b372ffa6f8227
node_inputs:
- node_id: image-crop

- id: demographics-age
model:
model_id: age-demographics-recognition
model_version_id: fb9f10339ac14e23b8e960e74984401b
node_inputs:
- node_id: image-crop

A workflow that combines face detection, recognition, and embedding to generate facial landmarks and enable visual search based on the embeddings of detected faces.

workflow:
id: Face-Search
nodes:
- id: face-detect
model:
model_id: face-detection
model_version_id: fe995da8cb73490f8556416ecf25cea3

- id: crop
model:
model_id: margin-100-image-crop
model_version_id: 0af5cd8ad40e43ef92154e4f4bc76bef
node_inputs:
- node_id: face-detect

- id: face-landmarks
model:
model_id: face-landmarks
model_version_id: 98ace9ca45e64339be94b06011557e2a
node_inputs:
- node_id: crop

- id: face-alignment
model:
model_id: landmarks-align
model_version_id: 4bc8b83a327247829ec638c78cde5f8b
node_inputs:
- node_id: face-landmarks

- id: face-embed
model:
model_id: face-identification-transfer-learn
model_version_id: fc3b8814fbe54533a3d80a1896dc9884
node_inputs:
- node_id: face-alignment

- id: face-cluster
model:
model_id: face-clustering
model_version_id: 621d74074a5443d7ad9dc1503fba9ff0
node_inputs:
- node_id: face-embed

Face Sentiment

A multi-model workflow that combines face detection with sentiment classification to recognize seven emotional expressions: anger, disgust, fear, neutral, happiness, sadness, and contempt.

workflow:
id: Face-Sentiment
nodes:
- id: face-det
model:
model_id: face-detection
model_version_id: 6dc7e46bc9124c5c8824be4822abe105

- id: margin-110
model:
model_id: margin-110-image-crop
model_version_id: b9987421b40a46649566826ef9325303
node_inputs:
- node_id: face-det

- id: face-sentiment
model:
model_id: face-sentiment-recognition
model_version_id: a5d7776f0c064a41b48c3ce039049f65
node_inputs:
- node_id: margin-110

General

A general-purpose image detection workflow that identifies a wide range of common objects and enables visual search using embeddings generated from the detected regions.

workflow:
id: General
nodes:
- id: general-v1.5-concept
model:
model_id: aaa03c23b3724a16a56b629203edc62c
model_version_id: aa7f35c01e0642fda5cf400f543e7c40

- id: general-v1.5-embed
model:
model_id: bbb5f41425b8468d9b7a554ff10f8581
model_version_id: bb186755eda04f9cbb6fe32e816be104

- id: general-v1.5-cluster
model:
model_id: cccbe437d6e54e2bb911c6aa292fb072
model_version_id: cc2074cff6dc4c02b6f4e1b8606dcb54
node_inputs:
- node_id: general-v1.5-embed

Language Aware OCR

A workflow that performs Optical Character Recognition (OCR) across multiple languages, automatically adapting to the language present in the input text.

workflow:
id: wf-ocr
nodes:
- id: ocr-workflow
model:
model_id: language-aware-multilingual-ocr-multiplex

- id: text-aggregator
model:
model_id: text-aggregation
model_type_id: text-aggregation-operator
output_info:
params:
avg_word_width_window_factor: 2.0
avg_word_height_window_factor: 1.0

node_inputs:
- node_id: ocr-workflow

- id: language-id-operator
model:
model_id: language-id
model_type_id: language-id-operator
output_info:
params:
library: "fasttext"
topk: 1
threshold: 0.1
lowercase: true

node_inputs:
- node_id: text-aggregator

Prompter LLM

A workflow that utilizes a prompt template to interact with a Large Language Model (LLM), enabling dynamic and context-aware text generation based on input data.

workflow:
id: wf-prompter-llm
nodes:
- id: prompter
model:
model_id: prompter
model_type_id: prompter
description: 'Prompter Model'
output_info:
params:
prompt_template: 'Classify sentiment between postive and negative for the text {data.text.raw}'

- id: llm
model:
user_id: mistralai
model_id: mistral-7B-Instruct
app_id: completion

node_inputs:
- node_id: prompter

RAG Prompter LLM

This workflow combines a Large Language Model (LLM) with a Retrieval-Augmented Generation (RAG) prompter template to generate responses informed by relevant external knowledge.

workflow:
id: wf-prompter-llm
nodes:
- id: rag-prompter
model:
model_id: rag-prompter
model_type_id: rag-prompter
description: 'RAG Prompter Model'

- id: llm
model:
user_id: mistralai
model_id: mistral-7B-Instruct
app_id: completion

node_inputs:
- node_id: rag-prompter
tip

Click here to view more YAML-based workflows examples.