Skip to main content

Audio as Input

Learn how to perform inference with audio as input using Clarifai SDKs


The Clarifai SDKs for Audio Processing provides a comprehensive set of tools and functionalities, enabling you to process audio inputs with unparalleled ease and efficiency. Whether you're working on applications related to voice recognition, sound classification, or speech-to-text conversion, our SDK streamlines the development process, allowing you to focus on building cutting-edge functionalities.

Audio to Text

Harness the power of the Predict API to seamlessly transform audio files into text-based formats using our advanced Automatic Speech Recognition (ASR) model. With this functionality, you can effortlessly transcribe spoken words from audio, opening up possibilities for diverse applications such as transcription services, voice command processing, and more.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "facebook"
#APP_ID = "asr"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'asr-wav2vec2-base-960h-english'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'model_version'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id

# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)

audio_url = "https://s3.amazonaws.com/samples.clarifai.com/GoodMorning.wav"

# The predict API gives the flexibility to generate predictions for data provided through URL, Filepath and bytes format.

# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(audio_bytes, input_type="audio")

# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(audio_filepath, input_type="audio")

model_url = "https://clarifai.com/facebook/asr/models/asr-wav2vec2-large-robust-ft-swbd-300h-english"

model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
audio_url, "audio"
)

# Print the output
print(model_prediction.outputs[0].data.text.raw)

Output
GOOD MORNING I THINK THIS IS GOING TO BE A GREAT PRESENTATION