Batch Predict CSV on Custom Text Model
Enjoy the convenience of working with CSV files and text.
Below is a script that can be used to run prediction in a batch on text/sentences stored in a CSV file, using your custom text model.
To start, you'll need to create your own Custom Text Model, either via our Portal or using the API.
Make sure to record the model ID, version ID that you want to use (each model gets one after being successfully trained), and the API key of the application in which the model exists.
This script assumes that you have a CSV file which has one column named "text" where the text you want to run predictions on is. It'll output another CSV file containing the predicted concepts for each text, together with confidence values.
nlp:model:predicts.py
1
"""
2
A script designed for running bulk NLP model predictions on a .csv file of text entries.
3
It requires the library clarifai_grpc (to install it: `pip install clarifai_grpc`).
4
5
Mandatory arguments:
6
- a CSV file with a "text" column; additional columns will be included/returned in the output file
7
- a Clarifai API KEY
8
- the model ID of the NLP model that you wish to predict with
9
- the specific model version ID for the above NLP model
10
11
Optional/Default arguments:
12
- the "top n" number of results to be returned from the model predictions. default 3. [1-200]
13
- the batch size or number of inputs to send in per predict call. default 32. max 128.
14
15
Example usage:
16
python nlp_model_predicts --csv_file CSVFILE --api_key API_KEY --model_id MODEL_ID --model_version MODEL_VERSION
17
18
Example input CSV file:
19
text,random_column_1
20
"The quick brown fox something something.",perhaps_some_data
21
"The lazy dog is...",some_other_data
22
23
Example output CSV file:
24
text,predict_1_concept,predict_1_value
25
"The quick brown fox something something.",predicted_concept,0.873
26
"The lazy dog is...",predicted_concept,0.982
27
"""
28
29
import argparse
30
import csv
31
import os
32
33
from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
34
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
35
from clarifai_grpc.grpc.api.status import status_code_pb2
36
37
38
def chunker(seq, size):
39
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
40
41
42
def get_predict(texts, stub, model_id, model_version, auth_metadata, top_n):
43
"""
44
inputs:
45
• texts: a list of text to run predictions on
46
• auth_metadata: (('authorization', 'Key YOUR_API_KEY'),)
47
• top_n: integer for the desired max number of returned concepts [limit 20]
48
49
returns:
50
• the original text
51
• predict_n_concept : predicted concept ID
52
• predict_n_value : predict concept value
53
"""
54
55
if len(texts) > 128:
56
raise Exception('Input length over maximum batch size. Please send in batches less than 128.')
57
58
inputs = [
59
resources_pb2.Input(data=resources_pb2.Data(text=resources_pb2.Text(raw=x)))
60
for x in texts
61
]
62
63
# make the model predict request
64
request = service_pb2.PostModelOutputsRequest(
65
model_id=model_id,
66
version_id=model_version,
67
inputs=inputs,
68
)
69
70
response = stub.PostModelOutputs(request, metadata=auth_metadata)
71
72
if response.status.code != status_code_pb2.SUCCESS:
73
raise Exception("A failed response: " + str(response.status) + "\n\nFull response:\n" + str(response))
74
75
# parse results
76
list_of_dicts = []
77
for resp in response.outputs:
78
temp_dict = {
79
'text': resp.input.data.text.raw
80
}
81
82
for n in range(top_n):
83
try:
84
temp_dict['predict_{}_concept'.format(n + 1)] = resp.data.concepts[n].id
85
temp_dict['predict_{}_value'.format(n + 1)] = "%.3f" % resp.data.concepts[n].value
86
except Exception as e:
87
print(e)
88
break
89
90
list_of_dicts.append(temp_dict)
91
92
return list_of_dicts
93
94
95
def main():
96
parser = argparse.ArgumentParser(
97
description=
98
'Given a CSV file with a "text" column, provide NLP model predictions.'
99
)
100
parser.add_argument('--api_key', required=True, help='the app\'s API key', type=str)
101
parser.add_argument('--csv_file', required=True, help='the CSV file with texts', type=str)
102
parser.add_argument('--model_id', required=True, help='the model ID', type=str)
103
parser.add_argument(
104
'--model_version', required=True, help='the specific model version ID', type=str)
105
parser.add_argument(
106
'--top_n', default=3, type=int, help='num results returned. default 3. max 200.')
107
parser.add_argument(
108
'--batch_size', default=32, type=int, help='prediction batch size. default 32. max 128')
109
110
args = parser.parse_args()
111
112
# setup the gRPC channel
113
channel = ClarifaiChannel.get_json_channel()
114
stub = service_pb2_grpc.V2Stub(channel)
115
metadata = (('authorization', f'Key {YOUR_API_KEY}'.format(args.api_key)),)
116
117
texts = []
118
with open(args.csv_file) as f:
119
csv_reader = csv.DictReader(f)
120
for row in csv_reader:
121
if 'text' not in row:
122
raise Exception('The CSV file must contain column with a header named text')
123
124
texts.append(row['text'])
125
126
predicted_data = []
127
# run model predictions in batches
128
for i, texts_chunk in enumerate(chunker(texts, args.batch_size)):
129
print("Predicting chunk #" + str(i + 1))
130
predicted_data.extend(get_predict(texts_chunk, stub, args.model_id, args.model_version, metadata, args.top_n))
131
132
output_name = os.path.splitext(args.csv_file)[0] + '_results.csv'
133
print('Results saved to {}'.format(output_name))
134
135
with open(output_name, 'w') as f:
136
csv_writer = csv.DictWriter(f, fieldnames=predicted_data[0].keys())
137
csv_writer.writeheader()
138
csv_writer.writerows(predicted_data)
139
140
141
if __name__ == '__main__':
142
main()
Copied!

Example Usage

Let's say you have the following CSV file, and want to predict, for each text in a row, whether the sentence is grammatically positive or negative. You first build a custom text model that was created to map text into two concepts: "positive" and "negative. See our Custom Text Model walkthrough on how to do that via our API.
my:data.csv
1
number,text
2
1,"We have never been to Asia, nor have we visited Africa."
3
2,"I am never at home on Sundays."
4
3,"One small action would change her life, but whether it would be for better or for worse was yet to be determined."
5
4,"The waitress was not amused when he ordered green eggs and ham."
6
5,"In that instant, everything changed."
Copied!
With that, you can run the script on the CSV file in the following manner, which will produce a new CSV file.
1
python nlp_model_predicts.py --api_key YOUR_API_KEY --model_id YOUR_MODEL_ID --model_version YOUR_MODEL_VERSION_ID --csv_file my_data.csv --top_n 2
Copied!
my:data:results.csv
1
text,predict_1_concept,predict_1_value,predict_2_concept,predict_2_value
2
"We have never been to Asia, nor have we visited Africa.",negative,1.000,positive,0.000
3
I am never at home on Sundays.,negative,1.000,positive,0.000
4
"One small action would change her life, but whether it would be for better or for worse was yet to be determined.",positive,1.000,negative,0.000
5
The waitress was not amused when he ordered green eggs and ham.,negative,1.000,positive,0.000
6
"In that instant, everything changed.",positive,0.998,negative,0.002
Copied!
Last modified 1mo ago
Copy link
Edit on GitHub