Skip to main content

Text Embeddings

Use Clarifai and LlamaIndex to create text embeddings


Embeddings create a vector representation of textual content. This is beneficial because it implies we can conceptualize text within a vector space, and facilitate tasks such as semantic search where we look for pieces of text that exhibit the highest similarity within that vector space.

Let’s illustrate how you can use LlamaIndex to interact with Clarifai models and create text embeddings.

Prerequisites

  • Python development environment
  • Get a PAT (Personal Access Token) from the Clarifai’s portal under the Settings/Security section
  • Get the URL of the model you want to use. Text embedding models can be found here
  • Alternatively, get the ID of the user owning the model you want to use, the ID of the app where the model is found, and the name of the model
  • Install LlamaIndex and Clarifai Python SDK by running pip install llama-index-embeddings-clarifai
info

You can learn how to authenticate with the Clarifai platform here.

note

Clarifai models can be referenced either through their full URL or by using the combination of user ID, app ID, and model name. If the model version is not specified, the latest version will be used by default.

Here is how you can create text embeddings.

############################################################################################################################
# In this section, we set the user authentication, model URL, and prompt text. Alternatively, set the user and app ID,
# and model name. Change these strings to run your own example.
###########################################################################################################################

PAT = "YOUR_PAT_HERE"

MODEL_URL = "https://clarifai.com/cohere/embed/models/cohere-text-to-embeddings"
PROMPT = "Hello World!"

# Alternatively, you can specify user ID, app ID, and model name
#USER_ID = "cohere"
#APP_ID = "embed"
#MODEL_NAME = "cohere-text-to-embeddings"

############################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
############################################################################

# Import the required packages
import os
from llama_index.embeddings.clarifai import ClarifaiEmbedding

# Set Clarifai PAT as environment variable
os.environ["CLARIFAI_PAT"] = PAT

# Initialize the LLM class
embed_model = ClarifaiEmbedding(model_url=MODEL_URL)

# Alternatively
# embed_model = ClarifaiEmbedding(
# user_id=USER_ID,
# app_id=APP_ID,
# model_name=MODEL_NAME
# )

embeddings = embed_model.get_text_embedding(PROMPT)
print(len(embeddings))
# Print the first five elements of embeddings list
print(embeddings[:5])
Output Example
1024
[-0.0266333669424057, -0.01617247611284256, 0.03460061177611351, -0.04136759787797928, -0.016348375007510185]
info

You can explore the LlamaIndex documentation to learn more on how to use the framework with Clarifai for text embedding tasks.