Background on syndromic surveillance data
- The data source for this analysis is emergency department (ED) visits from the National Syndromic Surveillance Program (NSSP) ESSENCE platform.
- NSSP is a collaboration among CDC, federal partners, local and state health departments, and academic and private sector partners to collect, analyze, and share electronic patient encounter data received from multiple health care settings. For more information on NSSP, visit NSSP on cdc.gov.
- Currently, 100% of emergency departments in the Commonwealth are sending data to ESSENCE, allowing for a complete picture of ED visits.
- When a patient is admitted, discharged, or transferred, the hospital’s electronic medical record system triggers real-time HL7 messages, which travel through Mass HIWay to the National Syndromic Surveillance Program.
- Data can then be accessed and analyzed in ESSENCE. These records contain information about the visit, patient, and reason for the visit, including diagnosis codes, but do not include the patient’s name nor the patient’s home address. There is very limited identifiable information about the patient included.
- While ESSENCE receives updates to health records instantaneously, delays in test results, hospital coding, etc. result in often-substantial delays between the visit and the notification of the patient’s diagnosis. This is why we advise caution when interpreting data that are only one to two weeks old.
Data
- Statewide respiratory ED visits from December 1st, 2022 to December 1st, 2023 were retrieved using the CDC Broad Acute Respiratory DD v1 query. For more information, visit query definition (PDF).
- Data from the past year were assumed to follow similar patterns of reporting and delay as current data, so this time frame was used to train the nowcasting method for the current season. The data used to make these estimates will be refreshed periodically to preserve the efficiency of the estimate
Nowcasting Method
- Using data from the 2022-2023 season, time was measured between when a report was received to when the respiratory illness information was considered complete. The record initiation timestamp was subtracted from the message-receipt timestamp for the first reporting time of a respiratory diagnosis code (as defined the CDC Broad Acute Respiratory DD v1 query), for each visit.
- For each date of the time period sampled, (starting with 12/01/2022-12/01/2023), percent completeness was determined at 1-week intervals from date of visit, defined as the number of visits flagged by the query at each 1-week interval since visit date (1 week after, 2 weeks after, etc.), divided by the known total count of COVID-19 visits for that day.
- Completeness for Date X at N Weeks = (Count of Visits for Date X with a Respiratory Code by N Weeks from Date X) ÷ (Final Known Count of Respiratory Visits for Date X)
- Reporting completeness percents for all dates were compiled for each interval. For each interval since visit date, median (50th percentile) and 2.5th and 97.5th percentile completeness measurements were determined, giving an informed point estimate and estimate range for completeness of data at the given age (ie. the percent completeness of the counts N weeks since date of interest).
- These point estimates and ranges were applied to current weekly data in order to estimate, based on reported data’s age, what respiratory reporting counts would be once the data are fully reported (nowcast).
- Example: If a 1-week-old count of visits is 1,100, and 1-week-old data are estimated to have 88% completeness, the final count (accounting for reporting delays) can be estimated to be 1,100/0.88 = 1,250.
- The 2.5th and 97.5th percentile completeness estimates are then used to create upper and lower nowcast estimates of final count.
Assessment
- A retrospective validation study of this method examined 20 snapshots of weekly data pulls from 2024, generating nowcast predictions for the 10 weeks prior to each snapshot. (n = 20 weeks × 10 predictions = 200). The estimate counts produced via these snapshots could be compared to known counts in order to assess the method's performance. The method produced point estimates of counts that were significantly correlated with actual values (R=1; p < 2.2e-16)
- Prediction intervals were found to contain the true final value within their range (model coverage) 83% of the time. The remaining 17% of results that were outside the interval fell only slightly outside: widening the prediction interval by 1% in each direction (99% of low estimate, 101% of high estimate) brought the coverage up to 97%.
- Residuals (= estimated value - actual value) were small, often demonstrating a slight underestimation of the final numbers, and were concentrated around late December and early January, a time period that often sees high respiratory visits, lower hospital staffing, and greater-than-normal reporting delays.
Conclusions
- This method examines past reporting delays for respiratory visits in order to account for reporting delays of current data. The method has been found to reliably estimate the final total counts of respiratory visits from their incomplete (still updating) counts.
- Through nowcasting, we are given a better and more timely sense of current trends than with counts alone. This allows us to spot changes in rates of respiratory illnesses in a timelier fashion.
Background on the Moving Epidemic Method (MEM)
- The moving epidemic method is a set of steps for categorizing disease rates into activity levels, based on what was observed in past seasons.
- The method was first introduced by Vega et al. in 2004, developed to categorize influenza activity. A modified version of the MEM was adopted by the CDC in 2015 in order to track seasonal influenza. In recent years, several public health departments have developed versions of the MEM for additional illness categories, such as COVID-19, RSV, and respiratory illness overall.
- The MEM consists of 2 sets of calculations. Each calculation uses data from the 5 most recent waves (for non-COVID-19 syndromes, the 2020-2021 respiratory season is excluded, as no significant respiratory wave was observed):
- Baseline calculation: Determines the level of activity at which an epidemic (such as the seasonal influenza or a COVID-19 wave) has begun. Once respiratory activity has reached/exceeded this level, the seasonal wave has begun.
- For non-COVID-19 syndromes, the low incidence periods, the times during which respiratory activity is outside of a peak and remains very low, come regularly each year and are defined as weeks 20-39 of the year. For COVID-19, the low incidence periods were classified using a wave-identification algorithm, as described by Vega et al.
- This calculation uses the 6 highest weekly rates from each low incidence period of the 5 most recent seasons. (n=30)
- Baseline = the upper limit of the one tailed 95% CI of the arithmetic mean of those rates.
- Activity levels calculation: Categorizes levels within the epidemic, once activity has passed the baseline. The MA DPH Respiratory Illness Dashboard divides activity into Low/Baseline, Moderate, High, and Very High.
- For non-COVID-19 syndromes, the respiratory wave was defined as weeks 40-19 of the year. For COVID-19, waves were classified using a wave-identification algorithm, as described by Vega et al.
- This calculation uses the 6 highest weekly rates from each of the 5 most recent waves. (n=30)
- The thresholds are calculated as the upper bounds of the one-sided confidence intervals of the geometric mean of those rates at:
- Moderate = 50%
- High = 90%
- Very High = 97.5%
- Baseline calculation: Determines the level of activity at which an epidemic (such as the seasonal influenza or a COVID-19 wave) has begun. Once respiratory activity has reached/exceeded this level, the seasonal wave has begun.