What is thematic analysis?
Thematic analysis helps you analyze unstructured data, such as text from feedback surveys, social media comments, or emails. Themes help capture common issues that come up in this kind of data.
For example, think of all the ways somebody might ask to contact you in a web survey. They might write, “Why can’t I find your phone number?” They might also say, "I need to email you" or "Phone?!" or "Where's your phone num?" To group all of these, you might assign them a theme such as "contact info." This makes the core idea easier to track. You can even reuse this theme to help you understand different data, such as a call center call or a constituent interview.
Tracking themes helps us:
- See which issues are most common
- Prioritize what to work on
- Avoid paying too much attention to "squeaky wheels." (That is, particularly angry comments that might not represent the most critical issues.)
- Monitor trends over time to confirm that our improvements are working
How is “thematic analysis” different from reading feedback or social listening?
You may already have a process where you review feedback or social media posts. Maybe you even identify action items and distribute them to stakeholders. This can be valuable and result in quick improvements. However, it doesn't help you build a reporting dataset over time. It can also encourage you to focus on feedback that stands out (a particularly angry constituent) instead of large trends.
Thematic analysis takes more time and effort but can deliver more value. Developing themes helps us identify and group constituents’ experiences. The “identifying” process helps us understand people’s challenges. The “grouping” helps us focus on the most common issues rather than overreacting to “squeaky wheel” comments. For example, imagine you look at 50 pieces of feedback. If you apply the theme " contact info" to 30 of them, you've likely identified a point of friction you need to address.
How to use this guide
Conducting thematic analysis is valuable but time consuming. It requires planning, setup, collaboration, and routines. This step-by-step guide helps you preview what this work is like. It also helps you understand what you can learn from doing it. You'll need:
- Some unstructured text data, like web feedback, social media posts, or constituent emails
- A teammate who can collaborate with you. If you don't have a teammate available, Mass Digital can support you for this exercise. To get help, schedule a consult.
You can try this guide out with a small dataset in a few hours total.
Step 1. Gather some data
Gather a small dataset. Mass.gov feedback data, which is available in the CMS, is an easy thing to start with.
You can try this out with as few as 20 messages, though it's better to aim for 50 or more, especially if they're short.
In addition to the messages, collect the message’s source. If this is Mass.gov feedback, that source is the page the message was submitted on. If it is a social media post, it may be the social media site, or the post that the constituent was replying to.
For this exercise, limit yourself to data from related sources. For example, take feedback from one or more pages about eligibility, or all the replies to a related series of social media posts.
Step 2: Put the data in a spreadsheet
We recommend using Excel or a similar tool for your first round of analysis.
- Add a column for where the message came from. You can call this "Source" or "Page"—whatever makes sense for the type of data you have.
- Add the messages to a column titled, "Message"
When you’re done, you’ll have a dataset that looks like this one, which uses feedback from a page on reporting cybersecurity incidents:
| Source | Message |
|---|---|
| Report a cybersecurity incident | please add how to report a phishing scam by text (not always getting these by email) |
| Report a cybersecurity incident | I want to report a cyber incident and you don’t seem to care. We got sucked into the smishing deal because the claim was we had not paid Mass. Pike tolls |
| Report a cybersecurity incident | I’m getting a text which I think is a scam |
Step 3: Set up the model theme dictionary
A theme dictionary is the set of themes, their definitions, and examples. You'll use the definitions to guide how you apply themes.
For this guide, we've created a theme dictionary you can start with. In the future, you can modify this to fit your organization's needs. You can also create one from scratch. This can be difficult and time-consuming, but valuable if you commit to thematic analysis.
Add the model theme dictionary to your data sheet
- Download the model theme dictionary
- (Recommended): Copy the theme dictionary into a separate tab in your data sheet
Add 3 columns to your data sheet: "Bucket," "Topic," and "Subtopic." Our data from above will now look like this:
Source Message Bucket Topic Subtopic Report a cybersecurity incident please add how to report a phishing scam by text (not always getting these by email) Report a cybersecurity incident some of the referenced crimes that Inhad been violated by and assaulted are much more complex and had been elaborated as a result of compromised officials with anti-american ways of proceedures. - (Recommended): Add data validation to your new columns. (This may require you to add the theme names to a separate tab.)
Step 4: Learning to use the theme dictionary
Themes
Each theme in a theme dictionary includes:
- A theme name
- A definition. This may include both what is and is not included in the theme.
- Examples so you can see how to apply the theme
The most important part of the theme is its definition. Definitions help you and your team members use the same themes for the same types of messages. This means 2 researchers can look at the same data and come to the same conclusion. It also means that you can make the same decision now as you would in 2 months.
For example, here are a pair of theme definitions:
| Theme name | Theme definition | Example |
|---|---|---|
| find information | Trouble locating information. Includes information that seems indirectly about action, too. For example, learning about eligibility is "find information," even if you suspect they want to eventually apply or do something. | I was looking for the fiduciary tax tables and could not find them. |
| take action | Want to do something such as apply, make an appointment, log in, download, call. Questions about how to find or submit an application are about "action." Trying to contact someone is taking action. Questions about eligibility are "information." | Need # to call to pay tolls |
The definitions help your team agree on how to analyze data. In this example, the theme definition for "find information" instructs us to use it even if we suspect that someone wants to take action at some point in the future. This helps us classify messages about things like "job listings" as "find information."
If you rely on a theme name, your intuition may encourage you to use it in ways that are different from the definition. This makes consistent analysis across your team shaky. For this reason, always return to the definition to justify your theme assignments.
The model dictionary includes 3 different types of themes
- Bucket: Very broad label for what the person is doing ("find information" or "take action"). Also includes "frustration" for data whose general intent you can't figure out.
- Topic: Loosely covers the different stages of getting a service (including information as a service): learning, apply, appeals, etc. Also includes common categories like "account admin" that could be part of any stage.
- Subtopic: More specific than "topic." Covers common places where things go wrong or get complicated, such as looking for a login, checking application status, and uploading documents.
You may assign 1 theme from each column to a piece of data you're analyzing. For example, you will never assign more than 1 theme from the "bucket" column. You do not have to assign any theme if nothing in the column fits.
You don't need to memorize the themes and their definitions
At this stage, just get a sense for what themes are in the dictionary. In the future, you and your team can modify the dictionary's themes and definitions.
Review the example uses
In the model dictionary, the "example usage of themes" tab shows 5 examples of the dictionary in use. Each one includes a choice of bucket, topic, or subtopic. Next to each theme is a reason for why it was chosen (or left blank). This is a good place to begin to get started learning how to apply themes.
Step 5: Assign your first themes
Read each message. For each, try to understand what the constituent’s experience is. Remember that we’re not solving the constituent’s problem. We're identifying what they think it is, or what they are experiencing. Think of this as "following along with the constituent on their journey," even if you don't totally understand it.
Example
One row of our sample data reads:
"I want to report a cyber incident and you don’t seem to care. We got sucked into the smishing deal because the claim was we had not paid Mass. Pike tolls.”
Here are relevant observations we can make about this constituent's experience:
- They want to report a cyber incident
- They were a victim of smishing (involving an unpaid Mass Pike toll)
- They feel frustrated with something, though it's hard to tell specifically what. (It could be the path to reporting an incident or the state's response to cyber scams, for example.)
Here are some observations you might have that are not relevant to our analysis:
- The state does in fact care about cyber scams and is doing things to address them
- The state does offer ways to report cyber incidents
- The constituent should have known better
- The information is on this web page. The constituent would have seen it if they looked a little harder.
It's critical that we stay focused on the constituent's experience. If we see several more messages like this, it might eventually lead us to an insight like, "The way we have structured our writing causes people to continually miss critical information."
Selecting themes
According to the definitions in the dictionary, the themes we'd assign for this example are:
- Bucket: Take action. The constituent wants to "do something" (report a cyber incident).
- Topic: Apply. Apply "includes signing up and reporting." It doesn't matter if the service doesn't exist. This is what they're trying to do.
- Subtopic: Expected a topic or action. Constituent "Assumed they'd be able to do something but can't."
The quotations in these themes are from the theme definition. Theme definitions are the main way we select themes, not theme names.
Don't assign themes by name alone. Use the definition and examples
An easy mistake to make in thematic analysis is treating the theme name as a definition. For example, the distinction between "find information" and "take action" seems intuitive. However, imagine someone sends a message about wanting to find eligibility information. Is this "find information" because they're looking for information, or is it take action because their real goal is to apply? This theme dictionary explicitly says it is "find information," and so that's what you should use if you use this dictionary.
Consistency is what's important here: You and your team need to treat messages the same way each time they come up. You’ll get better at this with practice rereading the feedback and definitions.
Step 6: Compare your analysis with a teammate's
Once you have assigned themes to at least 25 messages, you're ready to compare notes with a teammate. This is a standard practice for research, also called norming. It maintains consistency and increases rigor. It's what allows teams to do this work over time. It also helps you develop a common language for understanding constituents' experiences.
To norm, have your teammate complete the same steps you just did. When they're done, meet and compare how they assigned themes to how you did. Discuss differences until you agree on which themes fit best. You may even pick a theme that neither of you picked initially.
At first, a norming session may take a long time. As you become more experienced, they'll speed up.
(If you don't have anyone to do this with, we'll fill in! Schedule a consult to get support.)
Updating the theme dictionary
Good norming sessions often lead to updating the theme dictionary. You'll clarify definitions and add examples. You might even add or remove themes, though we recommend waiting to do this until you've used the theme dictionary for a while.
Step 7: Next steps
Assess what you've learned
Review your data. What themes did you assign most? Does this surprise you? Do the themes in your dictionary represent issues you were expecting?
You can also make a bar chart to visualize the theme counts.
You can also use these themes as the basis for a scenario walkthrough. Try to investigate why people might be having the experiences they are having. You may also be able to learn more by speaking with stakeholders or people who interact with constituents frequently.
Set up a monthly process
You can conduct thematic analysis each month. This allows you to track themes over time. Doing this means you can see if your revisions reduce negative feedback about a particular theme.
A good monthly process benefits from governance. For example, you may want to define:
- How you'll get data
- Where analysis will happen
- How you'll norm and resolve differences of opinion about which theme(s) to apply
- What the process is for modifying a theme definition
- How you or another team will act on your findings
Creating your own theme dictionary
This "getting started" guide does not cover how to create a dictionary. You may decide to do this when you find the model theme dictionary too constraining. For example, you may want to break 1 theme into multiple, or remove themes that aren't useful to your context.
Expect making your own dictionary to be time consuming. You'll need to draft, revise, and test the dictionary multiple times. You may need many iterations of a dictionary before it's useful.
We're happy to work with you to get your own dictionary started.