Skip to main content

Audio

Make predictions on audio inputs


Input: Audio

Output: Text

To get predictions for a given audio input, you need to supply the audio along with the specific model from which you wish to receive predictions. You can supply the input via a publicly accessible URL or by directly sending bytes.

You need to specify your choice of model for prediction by utilizing the MODEL_ID parameter.

The file size of each audio input should be under 5MB. This is typically suitable for a 48kHz audio file lasting up to 60 seconds, recorded with 16-bit audio quality. If your file exceeds this limit, you will need to split it into smaller chunks.

info

The initialization code used in the following examples is outlined in detail on the client installation page.

Predict via URL

Below is an example of how you would use the asr-wav2vec2-base-960h-english audio transcription model to convert English speech audio, sent via a URL, into English text.

#########################################################################################
# In this section, we set the user authentication, user and app ID, model ID, and
# audio URL. Change these strings to run your own example.
########################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = "YOUR_PAT_HERE"
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = "facebook"
APP_ID = "asr"
# Change these to make your own predictions
MODEL_ID = "asr-wav2vec2-base-960h-english"
AUDIO_URL = "https://samples.clarifai.com/negative_sentence_1.wav"

##########################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
##########################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (("authorization", "Key " + PAT),)

userDataObject = resources_pb2.UserAppIDSet(
user_id=USER_ID, app_id=APP_ID
) # The userDataObject is required when using a PAT

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject,
model_id=MODEL_ID,
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(audio=resources_pb2.Audio(url=AUDIO_URL))
)
],
),
metadata=metadata,
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception(
"Post workflow results failed, status: "
+ post_model_outputs_response.status.description
)

# Since we have one input, one output will exist here
output = post_model_outputs_response.outputs[0]

# Print the output
print(output.data.text.raw)
Text Output Example
I AM NOT FLYING TO ENGLAND

Predict via Bytes

Below is an example of how you would use the asr-wav2vec2-base-960h-english audio transcription model to convert English speech audio, sent as bytes, into English text.

#########################################################################################
# In this section, we set the user authentication, user and app ID, model ID, and
# audio file location. Change these strings to run your own example.
########################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = "YOUR_PAT_HERE"
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = "facebook"
APP_ID = "asr"
# Change these to make your own predictions
MODEL_ID = "asr-wav2vec2-base-960h-english"
AUDIO_FILE_LOCATION = "YOUR_AUDIO_FILE_LOCATION_HERE"

##########################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
##########################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (("authorization", "Key " + PAT),)

userDataObject = resources_pb2.UserAppIDSet(
user_id=USER_ID, app_id=APP_ID
) # The userDataObject is required when using a PAT

with open(AUDIO_FILE_LOCATION, "rb") as f:
file_bytes = f.read()

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject,
model_id=MODEL_ID,
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(audio=resources_pb2.Audio(base64=file_bytes))
)
],
),
metadata=metadata,
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception(
"Post workflow results failed, status: "
+ post_model_outputs_response.status.description
)

# Since we have one input, one output will exist here
output = post_model_outputs_response.outputs[0]

# Print the output
print(output.data.text.raw)