Overview
The Caption Vocabulary is a way to enhance the accuracy of your AI-Generated Live Closed Captions (CC) by adding terms that the AI may not be able to easily interpret. These usually are of the form of Names (Places, people, job titles, etc. which may be pertinent to your Streaming needs) and Initialisms (F.B.I., A.C.M.E., etc.). It can be accessed in the control center here: https://controlcenter.invintusmedia.com/manage/vocab
Note When you Create, Update, or Remove words from the Vocabulary manager page the changes are saved automatically so there is no "Save All Changes" button like some of our other components have.
How To Use
Add a phrase as it should appear in a captions generated from the AI captions. This is useful for naming speakers, job titles or truly unique or rare terms that wouldn't appear in a dictionary.
Examples of good custom vocabulary might be made-up or uniquely spelled words like:
sparkletini, Ginnifer
You should not add terms like e pluribus unum or orthostatic hypotension
Keep in mind that short phrases do better than long phrases so it's advised to keep phrases shorter when possible.
General Rules
- Up to 6000 phrases may be submitted per transcription job for English, and up to 1000 for other languages.
- Phrases must contain at least one alphabetic character from the respective language.
- Phrases cannot be longer than 12 words.
- Individual words cannot be longer than 34 characters.
- For English, non-numeric characters in the Basic Latin set are allowed, i.e. (U+0000-U+002F and U+003A-U+007F. For other languages, most non-numeric characters in the language are allowed.
- Non-alphabetic characters will be ignored during speech recognition, but will be favored in the output. For example, if you submit "Yahoo!" as a custom vocabulary, speech recognition will favor outputting "Yahoo!" when it recognizes "yahoo" in the audio.
Initialisms
Initialisms are abbreviations consisting of initial letters pronounced separately, like CPU. Submit your initialisms as custom vocabulary to improve their speech recognition.
- Initialisms have to contain at least 3 letters
- An initialism will be recognized only when pronounced letter by letter
- Ampersands & are supported and will be treated as a letter pronounced like and
- Initialisms will be recognized when submitted in the following formats only:
- ABC
- A.B.C.
- a.b.c.
Non-Alphabetic Characters
Take note of the following information for some specific characters:
- Numbers are not allowed, but note that some common terms like 401k are recognized by default and as such do not need to be added as custom vocabulary
- Standalone ampersands in phrases will be treated like the word and
- Dashes in words will be ignored, for example "this-and-that" will be treated as a single word roughly pronounced like this and that