Making accessible videos
Videos can be exciting, informative and evocative. The nature of video requires that a viewer engage three abilities at once: sight, hearing, and cognitive processing. A lot of information can be lost if one of these is impaired. An accessible video provides alternatives for visual and auditory content and offers a way to review the content that is separate from the timing of the video. This guide will ensure videos that you deliver are accessible to all users.
Requirements in brief
Videos must have:
- Closed captions
- Open captions are acceptable on social media platforms that don’t allow closed captions
- A transcript
- Proper color contrast
- Legible text
- Audio descriptions when needed
- No animations that can affect users with photosensitivity or vestibular disorders
- American Sign Language (ASL) interpretation when required
Captions and transcripts
Captions and transcripts are fundamental needs for accessibility, and should be available for every video the state delivers. Captions are synchronized to your video, and differ slightly from “subtitles” by definition. Subtitles are defined specifically to show a translation of spoken language on the screen. When that distinction becomes important, is when you are delivering a video in a language besides English. If your video contains language a other than English, captions and transcripts will be determined by the predominant language for that video.
For example: Your video is primarily in English, and targeted at an English-speaking audience, but contains passages where people are speaking Spanish. Captions should be written in English, and the Spanish language should be identified by prefacing that section of dialog with [Speaking Spanish].
If your audience is non-English speakers, you will want to provide captions and a transcript in that language, rather than just English subtitles. Any passages in the video spoken in another language should be identified.
Captions
“Closed” captions are text synchronized and overlayed on video that can be turned on or off at any time. When a video has captions that are “burned in” to the video and cannot be turned off, those are referred to as “open” captions. Closed captions are the preferred standard for accessibility for a few reasons: screen reader software can interact with them if needed, and viewers who find them distracting can turn them off. If you are putting video on a platform that doesn’t allow closed captions, like some social media sites, then open captions should be used. When writing captions, follow these guidelines. (Some specific guidelines taken from Section 508.)
Caption guidelines
- Captions must be 99% accurate, so any auto-generated captions must be checked for accuracy
- Text should be synchronized with audio, and appear at the same time as the words are spoken
- If there are multiple speakers in the video, identify each
- If there is meaningful music or a meaningful sound, identify it
- If there is no meaningful audio, keep captions off the screen
- If words are spoken with a specific emphasis (yelling, crying), identify the emotion
- Use a sans serif font, like Helvetica or Aria, and 18pt font size
- Use white text over a black translucent background, and do not change text/background color
- Use no more than two lines of text at a time
- Use no more than 45 characters per line
- Keep captions on screen long enough to read (minimum 3 seconds for short passages)
- Break captions at logical points, usually a comma or period
- Display captions at the center of the lower third section of the video, unless these captions block important onscreen text
- Do not use motion or animation with captions
- Color contrast: Transcripts and captions should follow proper color contrast requirements
- Legible text: Transcripts and captions should follow text requirements
Caption examples

A bad text break: This caption ends with only part of a phrase that is the two words "I think." The phrase "I think" should be moved to the next caption, so that the whole thought is completed, rather than showing this sentence broken up in an odd manner. If a break is needed due to length, look for a comma or logical break in the grammar.

Speaker identification: Since there are two people on the screen, it is important to identify who is speaking in the captions. In this case [Angela] has been added before the statement in the captions to identify the speaker.

Too much in one caption: This image has a caption with three lines of text, and each line exceeds the character limit. This is a lot to digest in one caption, and should be broken into at least two caption segments.
How to add captions
Transcripts
Transcripts can be even more important than captions based on the type of assistive technology being used to consume content. For someone who has both a hearing impairment and a visual impairment, a transcript will be the primary way to access the information. Additionally, a transcript offers a way for people to get the content of your video without being constrained to the pace of the speakers in the video, which is especially important for folks with cognitive processing disorders. Even if your video has no audio, include a transcript that describes what is happening in the video, so meaning can still be conveyed.
Transcripts can accompany a video in several ways depending on what is available to you. Some video players will allow a transcript to appear alongside the video that you upload. If the player you are using does not allow that (like YouTube) then you should add a transcript to the page the video is embedded on adjacent to the video. This might be directly on the page after the video, wrapped in a collapsible section on the page below the video, or a .txt or .doc file available to download.
When embedding a transcript in Mass.gov, you can copy and paste your transcript into a text box when you add your video to the page. If you are copying and pasting your captions into this box, be sure to remove all of the time stamps, as those are frustrating to have announced after every sentence with a screen reader or Braille display.
Transcript guidelines:
- Contains all spoken words
- Contains any words that appear on screen and are not spoken, like titles or slide content
- If there are multiple speakers, identify each
- If words are spoken with a specific emphasis (yelling, crying), identify the emotion
- If there is meaningful music or a meaningful sound, identify it
- Use a sans serif font, like Helvetica or Aria, and 18pt font size
- Text of any included audio description that describes important visuals and actions on the screen
- Color contrast: Transcripts and captions should follow proper color contrast requirements
- Legible text: Transcripts and captions should follow text requirements
Audio Descriptions (AD)
Audio descriptions (AD) provide a way for visually impaired users to understand what is happening in your video. Not everything needs to be described in an audio description, but actions that tell part of the story the video is telling, or that are important to understand should be included. They should also include onscreen text that is not spoken aloud by a narrator or person on screen. If your video has no audio, an audio description should be provided to describe the content on the screen, so meaning can be conveyed.
Audio descriptions should be done in a separate voice. This could be a second person, or generated text-to-speech if necessary, so that the listener can hear the difference between the description and the actual narration. Some players (like Vimeo) allow you to include this in a separate audio track, so the user can select an audio description track to play back if they want it. If you are delivering content somewhere that the user cannot select a different audio track (like YouTube) then you can include the audio description in the original video, or link to a copy of the video with the audio description available.
When AD isn’t necessary
Depending on the content of your video and how it is scripted, it may not need an audio description. For example, if your video is an “explainer” type video, and everything done on the screen is being described by the narrator, then an audio description is not needed. When you script a video, think about writing a script for a podcast. If you can understand everything you need to know by listening only, then you do not need to create an audio description. To ensure this, avoid nondescript or directional language, like “select this,” “the button to the right” or “look at this slide for more information.”
When some AD is needed
Some types of videos may only require a simple audio description as an introduction. For example, let’s say your video contains an excerpt from a Secretary speaking publicly, and there are no additional important visuals. The audio description would introduce the speaker at the beginning of the video, stating something like: “Massachusetts Secretary [Name] speaks at a podium to the press.” If there are multiple speakers, you would repeat this type of audio description. But, if an introduction is done by the initial speaker, such as the Secretary saying, “I would like to introduce Senator [Name], who will continue speaking about this topic,” you wouldn’t need to restate the same thing in AD. However, if there is something visually important that happens, like the Senator wearing a shirt that supports a cause, provide an audio description saying, “Senator [Name] takes the podium wearing a [Cause] t-shirt.”
When AD is needed throughout
If your video will require a fair amount of audio descriptions, consider this when editing your video. Many videos quickly cut between clips that need a description, while a narrator speaks. Prepare for this by leaving time between lines of narration or dialog when editing the video. Audio descriptions are usually spoken fairly quickly and should be brief. A few seconds between dialog is typically all that is needed. While it is uncommon, if there is a rapid visual demonstration, you may need to pause, slow down, or cut to b-roll to give yourself time to include AD if needed.
How to write an Audio Description
Audio Descriptions are similar to alternative text for images. They should be brief, one or two sentence descriptions of the action taken onscreen. Be as succinct as possible. Consider the meaning of the visuals and what is being conveyed. If the content of the visual is very complicated and too much to describe, such as a detailed graph where all of the data points are important, ensure that information is in the text transcript.
Audio Description examples

A rapid montage of Massachusetts workers is shown: Don’t describe each worker. Instead, write an overall description of the segment and what it is showing, such as “A series of clips showing Massachusetts workers engaged in their jobs.”

A fourth grade science teacher demonstrates a lesson to students: Since the meaning of the segment is to convey what students are learning, this would be a time to describe what the teacher is doing, such as “A fourth grade science teacher pours vinegar into a flask containing baking soda, while wide-eyed students watch.”

Slides are shown during a presentation: When slides are shown with text, work with the presenter before filming to make sure the slides are explained. If the slide shows important information not described by the presenter, add a brief description in AD. Include the content on the slide in the transcript.

Graphs or charts are shown: If a graph or table is shown, the description will depend on the purpose of the graph. If the purpose is to show a general trend, state the trend in AD such as "A graph showing a large decrease in unemployment." If the purpose is specific datapoints which are important, describe those in your script rather than in AD. If that is not possible, include a brief description in AD, and include the details of the information in your transcript, similar to how complex slides would be handled.
Animation
Animations, motion graphics, and transitions are a wonderful way to keep your video entertaining and engaging. However, flashing animations, which happen in many videos like movies and TV shows, can cause seizures in photosensitive viewers. You may have seen a warning about this at the beginning of a TV show episode, or even at a concert that uses flashing lights. That warning has become more common after high-profile incidents that happened years ago, like when a major cartoon caused 600 children to be hospitalized with no previous diagnosis of photosensitivity. There are also common animations that can affect people with vestibular disorders, causing dizziness or nausea. This is when objects move in a pattern referred to as sinusoidal motion (explanation follows) that creates motion similar to how a boat would rock causing seasickness.
Flashing
While a flicker that happens during an action sequence maybe common in the entertainment industry, flashing animations should never be used in a video created by the state. The warning about flashing animations is sufficient for privately produced entertainment videos, but it means that those viewers must stop watching. That means any video containing such flashes cannot be viewed by those affected constituents.
Flashing refers to content that flashes more than 3 times a second. The speed, brightness, and color (particularly the color red) can make this worse for photosensitive viewers. Avoid all rapid flashing.
Blinking
While blinking does not typically occur at a rapid speed and cause seizures, it can be a major distraction for people with a neurodiversity or with other conditions. Blinking can be allowed for a short time, so long as it stops, or can be stopped by the viewer.
Sinusoidal motion
Sinusoidal motion is a term referring to movement that oscillates constantly in a wave using a constant slow down then speed up effect.
It is very common in motion graphics to use a feature like “ease in/ease out,” which means your content slows down as it enters the screen, and speeds up as it leaves. This can be added to animations in Microsoft PowerPoint, Adobe AfterEffects, Apple Motion, and other animation applications. This type of animation can be used, but it should not be looped.
A common use case where this is seen is in a loading indicator. Often a loading indicator is a spinning ring or circle. Frequently, this spinning movement speeds up and slows down repeatedly as it rotates, and rotation is typically rapid. This should not be done.
Animated objects can use ease in and ease out transitions to enter or exit the screen, but avoid looping those transitions on the same object. This includes grow/shrink type of effects, where an object “zooms in” on the z-axis.
Transitions
Many video editors offer flashy blur/zoom/spinning transitions. While these effects can “look cool” for a home video project, they are often distracting, and may contain flashes or sinusoidal motion, particularly if it is a quick transition. It is ok for some movement in transitions, like if an image slides onto the screen to begin a montage of photos. However, don’t repeat the movement again for each photo, as that repeated movement type compounds distraction. Also, the use of many types of transitions in a video is distracting, and is not a good design practice. For an accessible and professional look and feel, stick to cuts, dissolves and fades, wipes only if you are trying to invoke everyone's favorite space opera, and avoid rapid flashy transitions.
Video: Motion Examples for Accessibility
Skip this video Motion Examples for Accessibility.Providing American Sign Language (ASL) for videos
Depending on the content in your video, you may need to provide an American Sign Language (ASL) interpreter on screen. If this is the case, when scripting your video, like scripting to include audio descriptions, consider the timing of the editing of your video to leave room for interpretation. As ASL is not a one-to-one translation of English, it can take longer for someone signing to finish a statement than it can to speak it aloud.
Your agency should determine the need to include an on-screen interpreter based on purpose and criticality of your video. If your video contains information that is critical for constituents, be sure to include ASL interpretation. Anything that affects the day-to-day life of constituents in an important way, their health and safety, or is a requirement that constituents must comply with, should include ASL interpretation.
Examples might include:
- Governor’s Office announcements and press releases
- Legal and policy changes that affect constituents
- COVID or other disease strategies and policies
- Local chemical releases like mosquito spraying schedules for Eastern Equine Encephalitis (EEE)
- How to apply for housing, unemployment, veterans benefits or similar services of need
- Getting a Real ID or other documentation required of constituents
- Any content specific for constituents or state employees with disabilities
This list is not exhaustive but is meant to give you an idea of the types of content where ASL interpretation is needed.
When including ASL in your video, you can add the interpreter to your video in different ways depending on how it suits the content best. Ensure that there is good contrast between the background and the clothing and hands of your interpreter. If your video player allows for multiple video file upload so people can switch between video tracks, add the ASL interpreted video as a second video track
Following are some examples of video layout for including ASL.

Show your interpreter picture-in-picture over your primary video track.

If your primary video track can be easily cropped, you can use split screen to show both interpreter and primary footage.

If you need more room to show the primary video so important visual information isn't missed, make both the primary video track and interpreter picture-in-picture on your canvas.

If your target audience is the Deaf and hard of hearing community, consider making the interpreter full screen, with main content picture-in-picture.