YAML-based Examples

Simple examples of workflows defined in YAML

Assorted Examples

Node Name	Input & Output	Description
audio-to-text	Audio -> Text	Classify audio signal into string of text.
barcode-operator	Image -> Text	Operator that detects and recognizes barcodes from the image. It assigns regions with barcode text for each detected barcode. Supports EAN/UPC, Code 128, Code 39, Interleaved 2 of 5 and QR Code.
Centroid Tracker	Frames -> Track ID	Centroid trackers rely on the Euclidean distance between centroids of regions in different video frames to assign the same track ID to detections of the same object.
Clusterer	Embeddings -> Clusters	Cluster semantically similar images and video frames together in embedding space. This is the basis for good visual search within your app at scale or for grouping your data together without the need for annotated concepts
embeddings-classifier	Embeddings -> Concepts	Classify images or texts based on the embedding model that has indexed them in your app. Transfer learning leverages feature representations from a pre-trained model based on massive amounts of data, so you don't have to train a new model from scratch and can learn new things very quickly with minimal training data
image-color-recognizer	Image -> Colors	Recognize standard color formats and the proportion each color that covers an image
image-to-text	Image -> Text	Takes in cropped regions with text in them and returns the text it sees.
kalman-filter-tracker	Frames -> Track ID	Kalman Filter trackers rely on the Kalman Filter algorithm to estimate the next position of an object based on its position and velocity in previous frames. Then detections are matched to predictions by using the Hungarian algorithm
kalman-reid-tracker	Frames -> Track ID	Kalman reid tracker is a kalman filter tracker that expects the Embedding proto field to be populated for detections, and reassigns track IDs based off of embedding distance
neural-lite-tracker	Frames -> Track ID	Neural Lite Tracker uses light-weight trainable graphical models to infer states of tracks and perform associations using hybrid similairty of lou and centroid distance
neural-tracker	Frames -> Track ID	Neural Tracker uses neural probabilistic models to perform filtering and association
optical-character-recognizer	Image -> Text	Detect bounding box regions in images or video frames where text is present and then output the text read with the score
tesseract-operator	Image -> Text	Operator for Optical Character Recognition using the Tesseract libraries
text-classifier	Text -> Concepts	Classify text into a set of concepts
text-embedder	Text -> Embeddings	Embed text into a vector representing a high level understanding from our Al models. These embeddings enable similarity search and training on top of them
text-token-classifier	Text -> Concepts	Classify tokens from a set of entity classes
visual-classifier	Image -> Concepts	Classify images and videos frames into set of concepts
visual-detector	Image -> Bounding Box	Detect bounding box regions in images or video frames where things and then classify objects, descriptive words or topics within the boxes
visual-embedder	Image -> Embeddings	Embed images and videos frames into a vector representing a high level understanding from our Al models. These embeddings enable visual search and training on top of them
visual-segmenter	Image -> Concepts	Segment a per-pixel mask in images where things are and then classify objects, descriptive words or topics within the masks
concept-thresholder	Concepts -> Concpets	Threshold input concepts according to both a threshold and an operator (>, >=, =, <=, or <). For example, assume the " > " threshold type is set for the model, then if the input concept value is greater than the threshold for that concept, the input concept will be output from this model, otherwise it will not be output by the model
random-sample	Any -> Any	Randomly sample allowing the input to pass to the output. This is done with the conditional keep_fraction > rand() where keep_fraction is the fraction to allow through on average
region-thresholder	Concepts -> Concepts	Threshold regions based on the concepts that they contain using a threshold per concept and an overall operator (>, >=, =, <=, or <). For example, assume the " > " threshold type is set for the model, then if the input regions[...].data.concepts.value is greater than the threshold for that concept, the input concept will be output from this model, otherwise it will not be output by the model. If the entire list of concepts at regions[...].data.concepts is filtered out then the overall region will also be removed
byte-tracker	Frame -> Track ID	Uses byte tracking algorithm for tracking objects
concept-synonym-mapper	Concept -> Concept	Map the input concepts to output concepts by following synonym concept relations in the knowledge graph of your app
image-align	Image -> Image	Aligns images using keypoints
image-crop	Image -> Image	Crop the input image according to each input region that is present in the input. When used in a workflow this model can look back along the graph of the workflow to find the input image if the preceding model does not output an image itself so that you can do image -> detector -> cropper type of workflow easily
image-tiling-operator	Image -> Image	Operator for tiling images into a fixed number of equal sized images
image-to-image	Image -> Image	Given an image, apply a transformation on the input and return the post-processed image as output
input-filter	Any -> Any	If the input going through this model does not match those we are filtering for, it will not be passed on in the workflow branch
input-searcher	Concepts,Images,Text -> Hits	Triggers a visual search in another app based on the model configs if concept(s) are found in images and returns the matched search hits as regions.
keyword-filter-operator	Text -> Concepts	This operator is initialized with a set of words, and then determines which are found in the input text.
language-id-operator	Text -> Concepts	Operator for language identification using the langdetect library
multimodal-embedder	Any -> Embeddings	Embed text or image into a vector representing a high level understanding from our Al models, e.g. CLIP. These embeddings enable similarity search and training on top of them.
multimodal-to-text	Any -> Text	Generate text from either text or images or both as input, allowing it to understand and respond to questions about those images
prompter	Text -> Text	Prompt template where inputted text will be inserted into placeholders marked with (data.text.raw).
rag-prompter	Text -> Text	A prompt template where we will perform a semantic search in the app with the incoming text. The inputted text will be inserted into placeholders marked with '(data.text.raw)' and search results will be inserted into placeholders with '`{data.hits}`', which will be new line separated
regex-based-classifier	Text -> Concepts	Classifies text using regex. If the regex matches, the text is classified as the provided concepts.
text-to-audio	Text -> Audio	Given text input, this model produces an audio file containing the spoken version of the input
text-to-image	Text -> Image	Takes in a prompt and generates an image
tiling-region-aggregator-operator	Frames -> Concepts,Bounding Box	Operator to be used as a follow up to the image-tiling-operator and visual detector. This operator will transform the detections on each of tiles back to the original image and perform non-maximum suppression. Only the top class prediction for each box is considered
visual-keypointer	Image -> Keypoints	This model detects keypoints in images or video frames.
isolation-operator	Concepts,BoundingBox -> Concepts,BoundingBox	Operator that computes distance between detections and assigns isolation label
object-counter	Concepts -> Metadata	count number of regions that match this model's active concepts frame by frame
text-aggregation-operator	Text -> Text	Operator that combines text detections into text body for the whole image. Detections are sorted from left to right first and then top to bottom, using the top-left corner of the bounding box as reference
tokens-to-entity-operator	Text,Concepts -> Text,Concepts	Operator that combines text tokens into entities, e.g. New' + 'York' -> New York
annotation-writer	Any -> Any	Write the input data to the database in the form of an annotation with a specified status as if a specific user created the annotation
aws-lambda	Any -> Any	This model sends data to an AWS lambda function so you can implement any arbitrary logic to be handled within a model predict or workflow. The request our API sends is a PostModelOutputsRequest in the 'request' field and the response we expect is a MultiOutputResponse response in the 'response' field
email	Any -> Any	Email alert model will send an email if there are any data fields input to this model
results-push	Any -> Any	This model pushes clarifai prediction results in an external format
sms	Any -> Any	SMS alert model will send a SMS if there are any data fields input to this model
status-push	Any -> Any	This model pushes processing status of a batch of inputs ingested through vendor/inputs endpoint in one request

ASR Sentiment

Automatic Speech Recognition (ASR) sentiment analysis is the process of detecting the emotional tone or sentiment in spoken language by first transcribing speech using an ASR model and then analyzing the resulting text.

workflow:
  id: asr-sentiment
  nodes:
    - id: audio-speech-recognition
      model:
          model_id: asr-wav2vec2-large-robust-ft-swbd-300h-english
          user_id: facebook
          app_id: asr

    - id: text-sentiment-classification
      model:
          model_id: sentiment-analysis-twitter-roberta-base
          user_id: erfan
          app_id: text-classification

      node_inputs:
        - node_id: audio-speech-recognition

Demographics

This is a multi-model workflow designed to detect faces, crop them, and recognize key demographic characteristics. It visually classifies attributes such as age, gender, and cultural appearance.

workflow:
  id: Demographics
  nodes:
    - id: detect-concept
      model:
        model_id: face-detection
        model_version_id: 45fb9a671625463fa646c3523a3087d5

    - id: image-crop
      model:
        model_id: margin-110-image-crop
        model_version_id: b9987421b40a46649566826ef9325303
      node_inputs:
        - node_id: detect-concept

    - id: demographics-race
      model:
        model_id: ethnicity-demographics-recognition
        model_version_id: b2897edbda314615856039fb0c489796
      node_inputs:
        - node_id: image-crop

    - id: demographics-gender
      model:
        model_id: gender-demographics-recognition
        model_version_id: ff83d5baac004aafbe6b372ffa6f8227
      node_inputs:
        - node_id: image-crop

    - id: demographics-age
      model:
        model_id: age-demographics-recognition
        model_version_id: fb9f10339ac14e23b8e960e74984401b
      node_inputs:
        - node_id: image-crop

Face Search

A workflow that combines face detection, recognition, and embedding to generate facial landmarks and enable visual search based on the embeddings of detected faces.

workflow:
  id: Face-Search
  nodes:
    - id: face-detect
      model:
        model_id: face-detection
        model_version_id: fe995da8cb73490f8556416ecf25cea3

    - id: crop
      model:
        model_id: margin-100-image-crop
        model_version_id: 0af5cd8ad40e43ef92154e4f4bc76bef
      node_inputs:
        - node_id: face-detect

    - id: face-landmarks
      model:
        model_id: face-landmarks
        model_version_id: 98ace9ca45e64339be94b06011557e2a
      node_inputs:
        - node_id: crop

    - id: face-alignment
      model:
        model_id: landmarks-align
        model_version_id: 4bc8b83a327247829ec638c78cde5f8b
      node_inputs:
        - node_id: face-landmarks

    - id: face-embed
      model:
        model_id: face-identification-transfer-learn
        model_version_id: fc3b8814fbe54533a3d80a1896dc9884
      node_inputs:
        - node_id: face-alignment

    - id: face-cluster
      model:
        model_id: face-clustering
        model_version_id: 621d74074a5443d7ad9dc1503fba9ff0
      node_inputs:
        - node_id: face-embed

Face Sentiment

A multi-model workflow that combines face detection with sentiment classification to recognize seven emotional expressions: anger, disgust, fear, neutral, happiness, sadness, and contempt.

workflow:
  id: Face-Sentiment
  nodes:
    - id: face-det
      model:
        model_id: face-detection
        model_version_id: 6dc7e46bc9124c5c8824be4822abe105

    - id: margin-110
      model:
        model_id: margin-110-image-crop
        model_version_id: b9987421b40a46649566826ef9325303
      node_inputs:
        - node_id: face-det

    - id: face-sentiment
      model:
        model_id: face-sentiment-recognition
        model_version_id: a5d7776f0c064a41b48c3ce039049f65
      node_inputs:
        - node_id: margin-110

General

A general-purpose image detection workflow that identifies a wide range of common objects and enables visual search using embeddings generated from the detected regions.

workflow:
  id: General
  nodes:
    - id: general-v1.5-concept
      model:
          model_id: aaa03c23b3724a16a56b629203edc62c
          model_version_id: aa7f35c01e0642fda5cf400f543e7c40

    - id: general-v1.5-embed
      model:
          model_id: bbb5f41425b8468d9b7a554ff10f8581
          model_version_id: bb186755eda04f9cbb6fe32e816be104

    - id: general-v1.5-cluster
      model:
          model_id: cccbe437d6e54e2bb911c6aa292fb072
          model_version_id: cc2074cff6dc4c02b6f4e1b8606dcb54
      node_inputs:
        - node_id: general-v1.5-embed

Language Aware OCR

A workflow that performs Optical Character Recognition (OCR) across multiple languages, automatically adapting to the language present in the input text.

workflow:
  id: wf-ocr
  nodes:
    - id: ocr-workflow
      model:
          model_id: language-aware-multilingual-ocr-multiplex

    - id: text-aggregator
      model:
          model_id: text-aggregation
          model_type_id: text-aggregation-operator
          output_info:
            params:
              avg_word_width_window_factor: 2.0
              avg_word_height_window_factor: 1.0

      node_inputs:
        - node_id: ocr-workflow

    - id: language-id-operator
      model:
          model_id: language-id
          model_type_id: language-id-operator
          output_info:
            params:
              library: "fasttext"
              topk: 1
              threshold:  0.1
              lowercase: true

      node_inputs:
        - node_id: text-aggregator

Prompter LLM

A workflow that utilizes a prompt template to interact with a Large Language Model (LLM), enabling dynamic and context-aware text generation based on input data.

workflow:
  id: wf-prompter-llm
  nodes:
    - id: prompter
      model:
          model_id: prompter
          model_type_id: prompter
          description: 'Prompter Model'
          output_info:
            params:
              prompt_template: 'Classify sentiment between postive and negative for the text {data.text.raw}'

    - id: llm
      model:
          user_id: mistralai
          model_id: mistral-7B-Instruct
          app_id: completion

      node_inputs:
        - node_id: prompter

RAG Prompter LLM

This workflow combines a Large Language Model (LLM) with a Retrieval-Augmented Generation (RAG) prompter template to generate responses informed by relevant external knowledge.

workflow:
  id: wf-prompter-llm
  nodes:
    - id: rag-prompter
      model:
          model_id: rag-prompter
          model_type_id: rag-prompter
          description: 'RAG Prompter Model'

    - id: llm
      model:
          user_id: mistralai
          model_id: mistral-7B-Instruct
          app_id: completion

      node_inputs:
        - node_id: rag-prompter

tip

Click here to view more YAML-based workflows examples.

Assorted Examples​

ASR Sentiment​

Demographics​

Face Search​

Face Sentiment​

General​

Language Aware OCR​

Prompter LLM​

RAG Prompter LLM​