Audio as Input
Learn how to perform inference with audio as input using Clarifai SDKs
The Clarifai SDKs for Audio Processing provides a comprehensive set of tools and functionalities, enabling you to process audio inputs with unparalleled ease and efficiency. Whether you're working on applications related to voice recognition, sound classification, or speech-to-text conversion, our SDK streamlines the development process, allowing you to focus on building cutting-edge functionalities.
Audio to Text
Harness the power of the Predict API to seamlessly transform audio files into text-based formats using our advanced Automatic Speech Recognition (ASR) model. With this functionality, you can effortlessly transcribe spoken words from audio, opening up possibilities for diverse applications such as transcription services, voice command processing, and more.
- Python
- Typescript
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "facebook"
#APP_ID = "asr"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'asr-wav2vec2-base-960h-english'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'model_version'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
audio_url = "https://s3.amazonaws.com/samples.clarifai.com/GoodMorning.wav"
# The predict API gives the flexibility to generate predictions for data provided through URL, Filepath and bytes format.
# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(audio_bytes, input_type="audio")
# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(audio_filepath, input_type="audio")
model_url = "https://clarifai.com/facebook/asr/models/asr-wav2vec2-large-robust-ft-swbd-300h-english"
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
audio_url, "audio"
)
# Print the output
print(model_prediction.outputs[0].data.text.raw)
Output
GOOD MORNING I THINK THIS IS GOING TO BE A GREAT PRESENTATION
import { Model } from "clarifai-nodejs";
/**
Your PAT (Personal Access Token) can be found in the Account's Security section
Specify the correct userId/appId pairings
Since you're making inferences outside your app's scope
USER_ID = "facebook"
APP_ID = "asr"
You can set the model using model URL or model ID.
Change these to whatever model you want to use
eg : MODEL_ID = 'asr-wav2vec2-base-960h-english'
You can also set a particular model version by specifying the version ID
eg: MODEL_VERSION_ID = "model_version"
Model class objects can be initialised by providing its URL or also by defining respective userId, appId and modelId
eg :
const model = new Model({
authConfig: {
userId: "clarifai",
appId: "main",
pat: process.env.CLARIFAI_PAT,
},
modelId: MODEL_ID,
});
*/
const audioUrl =
"https://s3.amazonaws.com/samples.clarifai.com/GoodMorning.wav";
/**
The predict API gives flexibility to generate predictions for data provided through URL, Filepath and bytes format.
Example for prediction through Bytes:
const modelPrediction = await model.predictByBytes({
inputBytes,
inputType
});
Example for prediction through Filepath:
const modelPrediction = await model.predictByFilepath({
filepath,
inputType
});
*/
const modelUrl =
"https://clarifai.com/facebook/asr/models/asr-wav2vec2-large-robust-ft-swbd-300h-english";
const model = new Model({
url: modelUrl,
authConfig: {
pat: process.env.CLARIFAI_PAT,
},
});
const modelPrediction = await model.predictByUrl({
url: audioUrl,
inputType: "audio",
});
// Print the output
console.log(modelPrediction?.[0]?.data?.text?.raw);