captions Archives - 3Play Media https://www.3playmedia.com/blog/tag/captions/ Take Your Video Content Global Mon, 11 Aug 2025 16:14:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 https://www.3playmedia.com/wp-content/uploads/2025/07/cropped-favicon_1x-300x300-1-32x32.webp captions Archives - 3Play Media https://www.3playmedia.com/blog/tag/captions/ 32 32 Measuring Captioning Accuracy: Why WER and NER Analyses Differ https://www.3playmedia.com/blog/measuring-captioning-accuracy-why-wer-and-ner-analyses-differ/ Wed, 28 Feb 2024 20:33:55 +0000 https://www.3playmedia.com/blog/measuring-captioning-accuracy-why-wer-and-ner-analyses-differ/ Captioning Best Practices for Media & Entertainment [Free eBook] When it comes to measuring captioning accuracy, there’s no shortage of errors that need to be considered: punctuation, grammar, speaker identification, capitalization, and word errors, to name a few. But what does it mean when a captioning vendor says their captions are 99% accurate?  It turns...

The post Measuring Captioning Accuracy: Why WER and NER Analyses Differ appeared first on 3Play Media.

]]>

  • Captioning

Measuring Captioning Accuracy: Why WER and NER Analyses Differ


Captioning Best Practices for Media & Entertainment [Free eBook]


When it comes to measuring captioning accuracy, there’s no shortage of errors that need to be considered: punctuation, grammar, speaker identification, capitalization, and word errors, to name a few. But what does it mean when a captioning vendor says their captions are 99% accurate? 

It turns out that “99% accuracy” can mean very different things depending on the model a vendor uses to measure said accuracy. Different vendors use different measurement models, which can contribute to confusion when percentages are marketed to describe the accuracy of closed captions. In this blog, we’ll discuss the NER model and how it differs from two commonly used measurements, Word Error Rate (WER) and Formatted Error Rate (FER).

The NER Model

The NER model, which originated in Europe and is often used in Canada, differs from the accuracy measurement rates commonly used in the United States. In the U.S., all errors—including spelling, punctuation, grammar, speaker identifications, word substitutions, omissions, and more—are considered to obtain a percentage that measures the average accuracy of the closed captions on a piece of media. 

In contrast, NER scoring emphasizes meaning and how accurately ideas are captured in captions, making it an extremely subjective and legally risky measurement. For instance, the FCC closed captioning guidelines state, “In order to be accurate, captions must match the spoken words in the dialogue, in their original language (English or Spanish), to the fullest extent possible and include full lyrics when provided on the audio track.” More specifically, the guidelines require captions to include all words spoken in the order spoken (i.e., no paraphrasing). Considering the legal requirements of live and recorded captioning, the subjectivity of NER scoring makes it an inherently risky method.

How are NER Scores Calculated?

Vendors grade each caption error based on its severity or resulting understandability when using the NER model. In many cases, vendors decide for themselves what constitutes a critical error. This subjectivity means that a caption file could get different NER results depending on who scores the file—contributing to significant liability for customers. 

The NER Calculation
NER Score = (Words – NER Deductions) / Words * 100

One of the reasons NER scores get inflated so quickly is that the denominator of the NER equation is all of the words written. However, the numerator, which is the number of correct words, also starts at the total count of words and is only deducted by fractions of certain words, even if a whole sentence is paraphrased or several words are wrong in sequence. In addition, the denominator is the total number of words captioned, not the total number of words that should have been captioned based on verbatim dialogue.


 Discover Captioning Best Practices for the Entertainment Industry ➡ 


Types of NER Errors

NER errors are categorized under two main types, each with corresponding deduction values of either 0.0, 0.25, 0.5, or 1.0 (a full point deduction). In this way, the NER model functions more as a score than a percentage. Caption scoring begins at 100 and is graded according to the number of errors and their assigned score deductions. 

Of note, the NER marking of “Correct Edition” indicates that paraphrased captions capture the full meaning of the spoken content. However, a Correct Edition marking might have a starkly decreased WER score with no deduction in the NER score. At 3Play Media, we see many examples of this difference, which is consequential for accessibility and legal compliance with FCC standards and other legislation.

NER vs. WER: Different Measurements Provide Different Results
In conducting market research, we scored a Canadian government meeting transcription using the NER method and received a score of 99.00 (or “very good”) because the captioner used a high degree of paraphrasing that was “mostly successful.” However, when we scored the same meeting using the WER method, we received an accuracy rating of 93.2%, which is not legally compliant under the FCC due to the number of captions that were paraphrased compared to the verbatim speech. We plan to conduct further research to analyze the measurement challenges of NER vs. WER.

Edition errors represent the loss of an idea unit or piece of information. They include:

  • Critical Error (False Information): An editing or paraphrasing error provides false but plausible information (-1.0)
  • Major Error (Loss of Main Point): Inaccurate captions lose the main point of an idea (-0.5)
  • Minor Error (Loss of Detail): Inaccurate captions keep the main point but lose a detail (-0.25)
  • Correct Edition: Paraphrase captures the full meaning (0.0)

Recognition errors represent misrecognition of the spoken content. They include:

  • Critical Error (False Information): A wrong word, phrase, or punctuation error provides false but plausible information (-1.0)
  • Major Error (Nonsense Error): A wrong word, phrase, or punctuation affects comprehension of an idea. (-0.5)
  • Minor Error (Benign Error): A wrong word, phrase, or punctuation affects readability but not comprehension. (-0.25)

Word Error Rate (WER) and Formatted Error Rate (FER)

More commonly, captioning accuracy for recorded content is often made up of two pieces: Word Error Rate (WER) and Formatted Error Rate (FER). WER is the standard measure of transcription accuracy and considers the number of inaccurate words versus the total number of words. In contrast, FER is the percentage of word errors when formatting elements such as punctuation, grammar, speaker identification, non-speech elements, capitalization, and other notations are taken into account. 

For closed captioning, the FCC mandates all of these formatting requirements to achieve at least 99% accuracy for recorded content. For recorded and live content, the FCC quality standards do not permit the same amount of flexibility that the NER model allows. While live captioning does not have firm accuracy standards and instead relies on best practices, the FCC still focuses on accuracy, synchronicity, completeness, and placement—which are more aligned with WER and FER than NER. 

WER and FER vs. NER: Unequal Measures of Accuracy and Quality

Compared to WER and FER, NER is not an equivalent measure of accuracy or quality. While captioning with a high NER score may be useful for viewers who value overall meaning instead of absolute accuracy, higher WER and FER measurements are essential for d/Deaf and hard-of-hearing viewers and legal compliance.

Some vendors do not state which model they’re using in determining accuracy for recorded captions, which is a misleading practice and can put your content at risk for litigation. When evaluating a potential vendor, you should always inquire about the models they use to determine the accuracy of their captions.

Additionally, NER scoring is more beneficial for live content and less applicable to recorded content, so be wary when a vendor uses NER to describe accuracy for recorded captioning. There are inherent challenges in captioning live content, and recorded captions should be measured differently because the captioner has more time and can perfect the verbatim transcription. NER scoring, if used, should always be near perfect for recorded content because a recorded captioner should never need to summarize the spoken content—captions should achieve verbatim accuracy and, by doing so, retain meaning.

Ultimately, when offering closed captioning as an accommodation, the best practice is often to provide an equitable experience by presenting content as spoken, which necessitates using WER and FER to measure accuracy.


Closed Captioning Best Practices for Media and Entertainment: Read the eBook


About the author

Related Posts

The post Measuring Captioning Accuracy: Why WER and NER Analyses Differ appeared first on 3Play Media.

]]>
Open Captions vs. Closed Captions: What’s the Difference? https://www.3playmedia.com/blog/open-captioning-use/ Wed, 22 Feb 2023 15:00:00 +0000 https://www.3playmedia.com/blog/open-captioning-use/ [FREE Webinar] Quick Start to Captioning We live in an age where much of our communication, entertainment, and news happen digitally. Because we spend an average of 6 hours and 48 minutes per week watching online videos, you’ve likely heard of closed captioning.  Closed captioning is everywhere you look: on streaming platforms, on educational videos,...

The post Open Captions vs. Closed Captions: What’s the Difference? appeared first on 3Play Media.

]]>

  • Captioning

Open Captions vs. Closed Captions: What’s the Difference?


[FREE Webinar] Quick Start to Captioning


We live in an age where much of our communication, entertainment, and news happen digitally. Because we spend an average of 6 hours and 48 minutes per week watching online videos, you’ve likely heard of closed captioning. 

Closed captioning is everywhere you look: on streaming platforms, on educational videos, and on the TV at your local gym or bar. Open captioning, on the other hand, isn’t as familiar to many people.

Read on to learn what open captions are, how they work, and when they’re used.

Open Captions vs. Closed Captions

How are open captions different from closed captions? The short answer is that closed captions can be turned off while open captions cannot.

Closed captions are created on a separate track from the video, which means they can be toggled on or off. Open captions are burned into a video track, so they’re permanently on screen and cannot be turned off.

How Do Open Captions Work?

The process for creating closed and open captions is the same – the difference lies in how the captions are associated with your video. 

For both closed and open captions, you must first transcribe an audio file into a text transcript. At 3Play Media, we do this with a unique process that combines automatic speech recognition software and multiple rounds of human editing to ensure the highest accuracy rate.

Then, the text and media need to be synchronized so that the text appears with its corresponding audio track. Captions reflect all forms of audio information, including dialogue, sound effects, music, and more.

Closed captions are published by uploading a separate caption file to your video platform or player. The platform associates your caption file and video file and plays them together, allowing users to turn captions on or off with the CC toggle button.

To publish open captions, you need to burn the captions into the video file itself. Once you have captions for your video, you can add open captioning. When ordering open captioning from 3Play Media, we output a video file containing the original video with the captions burned into the file so they will always appear. When the open captioning is complete, we send an email to notify you that the files are available for download. Our standard output is an M4V video file, but our team will work with you to provide almost any output you need.

Learn the Basics of Captioning [FREE webinar]

Why Use Open Captions?

Now that you know the difference between open captions and closed captions, you might be wondering why you should use open captions.

Since open captions are part of the video itself, they provide the advantage of being supported by all video players and devices. Open captions also eliminate inconsistencies across different video players and allow publishers to control the exact size and style of the captions. Another advantage of open captions is that they make it easier to create DVDs and other physical media.

Let’s discuss some examples of where you might encounter open captions.

Social Media

Autoplaying videos on silent is a common practice on social media platforms. A study by Verizon Media found that 69% of users watch video with the sound off in public places, and 25% of users watch video without sound in private places. 

Without captions, many viewers will not be able to understand your content. Many social media content creators use open captions for consistency across platforms, ensuring their videos are understandable and accessible.

Film Screenings

Under the Americans with Disabilities Act, movie theaters are required to provide and maintain closed captioning and audio description equipment for digital films that are produced with accessibility features. However, many people who use captions have difficulty with movie theater captioning devices.

To improve viewer experience and ensure permanent accessibility, advocates from the d/Deaf and hard of hearing communities have called for screenings at movie theaters and film festivals to include open captions

Videos in Public Spaces

Open captions help make videos that are publicly displayed more accessible.

For example, The National Park Service provides open captions for all audiovisual programs that “enable viewers with hearing loss to participate fully when viewing video or multimedia productions without self-identifying.”

What are Subtitles for the Deaf and Hard of Hearing?

Subtitles for the deaf and hard of hearing (SDH) share similarities with open captions, especially if they’re burned in. SDH are subtitles that combine the information of both captions and subtitles. Like open captions, they serve D/deaf and hard of hearing audiences and are permanently on-screen.

SDH subtitles assume the end user cannot hear the dialogue. They include important non-dialogue information such as sound effects, music, and speaker identification. SDH and open captions are sometimes confused with one another, but there are a couple of key differences. SDH subtitles vary from open captions in that they are generally designed to provide translations of dialogue into another language. SDH subtitles also allow for greater customization options than open captions.

Benefits of Open Captions

Organizations in the corporate, education, and social media industries may benefit from publishing videos with open captions, finding that ensuring equal access to their content has the added perk of improved brand perception and user experience.

When videos have open captions, viewers can watch them in places where audio is unavailable. If someone’s on a noisy train or on a crowded street, captions will convey the speech when the sound is obscured. Captions also let viewers enjoy videos on mute in quiet environments like a library, office, or train.

Open captions provide a permanently inclusive viewing experience for those with hearing loss without the need to upload a separate track for captions. They help to ensure that all viewers, regardless of their hearing abilities, can fully understand and engage with your videos.

quick start to captioning

__

This blog was originally published on September 9, 2016 by Elisa Lewis and has since been updated for accuracy, clarity, and freshness.


About the author

Related Posts

The post Open Captions vs. Closed Captions: What’s the Difference? appeared first on 3Play Media.

]]>