Disaggregating data and assessing inequities

Disaggregating data is important to identify racial and ethnic – differences that are unjust and avoidable. Once identified, they can be addressed through changes to policy, practice, and programs. This section will discuss considerations for disaggregating data and assessing for inequities.

Intro summary

Many programs collect individual-level data on the participants or clients they serve. From these data, prevalence estimates and rates are often calculated and presented in aggregate. This means all data are grouped together to provide a summary measure (e.g., the prevalence of diabetes in Massachusetts). Alternatively, data can be disaggregated (or stratified), meaning they are broken down and analyzed in smaller units (e.g., race, ethnicity, or zip code), rather than presented as an overall rate. 

Disaggregation uncovers health disparities — differences between the health of one population and another in measures of who becomes infected or develops disease, who experiences disease, who dies from disease, and other adverse health conditions1. When disaggregating data across race and ethnicity, it is important to remember that differences often reflect the impacts of systemic privilege or oppression rather than inherent differences between groups. 

Disaggregating data is also important to identify racial and ethnic health inequities – differences that are unjust and avoidable – that can then be addressed through changes to policy, practice, and programs. For example, the prevalence of diabetes among Asian women in Norfolk County (disaggregate) may be much higher than the overall prevalence of diabetes in Massachusetts (aggregate). The disaggregated data highlight this health inequity so that future policy and practice can address it.   

Table of Contents

Considerations for disaggregating data 

Using data for racial equity begins with determining if and how people of various races and ethnicities experience health outcomes differently. There are multiple considerations that are important to explore when disaggregating data. These include engaging with community members, identifying sources of race and ethnicity data available, determining which outcomes to disaggregate, breaking down race and ethnicity into smaller categories and respecting self-identification. 

Engaging with community members: Community members can assist in identifying which racial and ethnic subgroups are most prevalent in the geographic area and which health outcomes are most relevant. Involving community members may also provide additional insight into intersectional issues such as race and ethnicity, language access, and immigration status.  

Identifying sources of race and ethnicity data available to your program: Sources may include surveys or program intake/assessment forms. Consider how these data are collected — are measures self-reported or do they come from another data source such as the individual’s medical record?  

Determining which health-related outcome(s) to disaggregate: Health-related outcomes can include measures of disease or death, health behaviors, health-related social needs, and program-specific measures (e.g., use of services). For example, outcomes to discover inequities in tobacco control might include smoking-related cancer mortalities, use of tobacco in the past 30 days, rates of successful attempts of tobacco cessation, age of first tobacco use, access to tobacco retailers, referrals to tobacco cessation programs, or completed referrals for tobacco cessation programs.   

Breaking down race and ethnicity into as fine categories as data allow: If a program can look at health outcomes by ethnicity (e.g., Chinese, Filipino, Vietnamese), the analysis will provide more detailed and specific information about a particular community as compared to grouping all ethnicities together (e.g., Asian). 

Respecting self-identification. If there are multiple sources of data on race and ethnicity, prioritize self-reported data.2   

Using proxy measures

If your program does not have race and ethnicity data, there are indirect or proxy measures that can be analyzed such as country of origin, language, income, education, or zip code. With any use of proxy variables, more context and interpretation will be needed to properly frame the message, and limitations of those data should be acknowledged (e.g., if using zip code as a proxy, frame within context of residential segregation). When using this approach, be clear about the possibility of confounding (a distortion of the association between racial groups and an outcome that occurs when racial groups differ with respect to other factors that influence the outcome), as racial and ethnic inequities may become evident when the data are disaggregated by other variables (e.g., income or education).   

Suppression rules

Data suppression is when selected information is removed or hidden when there is concern that small numbers might identify individuals. Data suppression and other methods to protect confidentiality should be considered particularly when data are being presented 1) by geographic areas smaller than the state level, or 2) by more than one covariate (e.g., year, race, gender). DPH has outlined confidentiality procedures under which individual-level or aggregate-level data can be disclosed (DPH’s Confidentiality Procedures, See Procedure 7). For non-DPH Road Map users, please refer to your home agency’s official policies.  

While protecting confidentiality is a critical part of sharing data, it is important to consider the ways in which data suppression policies can contribute to the systemic erasure of certain communities and their health needs. For example, Native Americans and Tribal populations have been particularly disadvantaged by these practices as they are frequently excluded from analysis or lumped together with other smaller racial groups and analyzed as “other.” The Urban Indian Health Institute defines this systemic erasure as a result of omission and/or suppression of populations as Data Genocide.3

While data suppression policies can make data sharing difficult because of the necessity to protect individual rights to confidentiality, the following example from The Urban Indian Health Institute’s Best Practices for American Indian and Alaskan Native Data Collection Guide4 uses COVID-19 surveillance data to provide recommendations for managing small numbers. If numbers are too small to protect privacy, the guide recommends that researchers:  

  • Consider aggregating data using a larger geographical area. For example, use several adjacent counties or present data at the state level.  
  • Take into consideration how surveillance data for other conditions with small numbers is presented and discussed. 
  • Aggregate data across time to include a larger time frame for the analysis. For example, aggregate data over three or five years rather than one year.   

“Even with small numbers, patterns or striking differences can stand out and should be investigated further.”

Even after implementing the above recommendations, it can sometimes still result in inadequate protection of confidentiality and create inequities in how programs share their data across different communities. These very real limitations underscore the need and urgency for health departments to develop a wider array of approaches and sources for gathering health information. Health departments can expand their capacity in analytic methods such as “rounding” and “masking” that can, in some circumstances, be used to protect confidentiality without having to use data suppression rules. Some of these examples can be found in the Handbook on Statistical Disclosure Control, 2007. In addition to the alternatives mentioned, more examples can be found in the CDC Foundation’s Improving Engagement in Community Level Data Collection.

Assessing for inequities 

Now that the data have been disaggregated by race and ethnicity (or a proxy variable), the next step is to assess for inequities by subgroup.   

To truly assess for inequities, rather than just the magnitude or burden of health disparities on certain subpopulations, it is critical to connect the disparities to social and structural determinants of health.

While the racial subgroup “White” is often used as a comparison group in disaggregated data, this does not indicate that such a group represents a standard or ideal to which other racial and ethnic groups should be compared. Rather, the White group represents a population that has not been subject to structural racism and the resulting inequities in social, economic, or environmental factors that contribute to racial health inequities. 

When presenting data on inequities, it’s recommended to:

  1. Use proportions (ratios in which the numerator is a subset of the denominator) or rates (frequency of events during a certain time period divided by the number of people at risk for the event during that time period) instead of counts alone to account for differences in population subgroups. This allows for valid comparisons of health events between population groups and better assessment of risk. 
  2. Compare the results across population sub-groups and decide whether meaningful differences exist. It is not necessary for there to be a statistically significant difference. Although there is often a push for data analysis to demonstrate statistical significance, the use of statistical tests to interpret data can sometimes be detrimental because of the interpretation of the term “statistically significant” to mean “real” or “valid.” When comparing differences across small groups, the sizes of the populations compared are often not large enough for a difference to be considered statistically significant even if a meaningful difference does exist. Even with small numbers, patterns or noticeable differences can stand out and should be investigated further. In some cases, small numbers may signal a concern, especially if no cases are expected. For this reason, it is important to remember that descriptive epidemiology can be equally as valuable as causal epidemiology.   

To address inequities in public health it is necessary to identify differences across groups and over time using quantitative data as described above. However, it is equally important in the telling of a community’s story to collect data that describe the current conditions, context, and causes of their pain, trauma, and lived experience. Using qualitative data is critical to comprehensively understand the inequities seen in your data. Qualitative data and analysis can provide more nuanced information to help illuminate inequities. Sources of qualitative data include focus groups, open-ended survey questions, advisory boards, and interviews.  

"Descriptive epidemiology seeks to characterize the distributions of health, disease, and harmful or beneficial exposures in a well-defined population as they exist, including any meaningful differences in distribution, and whether that distribution is changing over time. Descriptive epidemiology also seeks to embed this data in the historical and sociological context, so that we can attempt to understand the ways in which that context contributes to patterns of disease and mortality" Fox, Matthew P., et al. "On the need to revitalize descriptive epidemiology."

Example

This example demonstrates the importance of using rates when comparing health events. 

During 2016, there were 2,715 low birth weight (LBW, weighing <2500 grams) infants born to White, non-Hispanic mothers in Massachusetts. During the same year, there were 801 LBW births to Black, non-Hispanic mothers. Given these two data points, you might conclude that LBW births are more of an issue for White mothers than Black mothers. However, there were 42,448 births to White mothers and only 7,095 births to Black mothers in Massachusetts during 2016. Therefore, only 6.4% of infants of White mothers were LBW compared with 11.3% of infants of Black mothers. By comparing proportions instead of the actual numbers, it becomes clear that Black mothers have a higher likelihood of delivering a LBW infant than White mothers in Massachusetts. 

Reflection

Now that you have examined your disaggregated data and determined if and how different races and ethnicities experience health outcomes differently, reflect on the following with your team:

  • Are you comfortable with the completeness and quality of your data, or is additional work needed in this area?
  • Did you identify inequities among racial groups in the health outcomes you are examining?  
  • Which partners will you engage to assist in interpreting the data and planning your next steps?

Check in with your team to determine if you are ready to begin incorporating contextual data to shape the narrative in a way that considers historical and current policies and system factors that impact the health of communities. 

Resources

Below are resources with further information about measuring health inequities: 

The following resources provide more information about the value of descriptive analysis and relevance of statistical significance: 

Contact

Help Us Improve Mass.gov  with your feedback

Please do not include personal or contact information.
Feedback