Banana AnalyticsBANANAANALYTICS
Insights

Technical Blog

Building CHNAs That Don't Break When Datasets Do

PRAMS suspended. EJScreen removed. HRSA UDS SOGI variables stripped. The federal datasets your last CHNA relied on are more fragile than they were three years ago, and the IRS clock has not paused for any of it.

May 10, 2026

If you cited PRAMS, EJScreen, NOAA's Billion-Dollar Disasters database, or HRSA UDS sexual orientation and gender identity breakdowns in your last CHNA, your next cycle has a data problem. Each of those sources has been suspended, removed, reconstructed by a volunteer coalition, or stripped of variables between January 2025 and May 2026. The IRS Section 501(r)(3) clock has not paused for any of it. Tax-exempt hospitals still owe a Community Health Needs Assessment every three years with an adopted implementation strategy on schedule.

The working assumption for a CHNA scoping a 2026 or 2027 adoption deadline should be that any single federal source cited today may not exist in the same form when cited in the final report.

What has actually changed, with sources

We are going to be specific because vague claims do not help anyone scope an assessment. Every status below is sourced to the documenting organization rather than asserted in our voice.

PRAMS (Pregnancy Risk Assessment Monitoring System)

The CDC team that ran PRAMS was placed on administrative leave in April 2025 as part of the HHS reduction in force. PRAMS data collection was suspended, the Automated Research File data request portal stopped processing requests, and the federal grants that funded state-level PRAMS implementation are scheduled to expire April 30, 2026. The Commonwealth Fund and the Harvard T.H. Chan School of Public Health have both published explainers documenting the operational status. Mississippi, the state with the highest infant mortality rate in the country, suspended its own PRAMS data collection in September 2025. A Federal Register notice in November 2025 sought public comment on extending the OMB collection authority through 2029, with comments closing January 20, 2026, but the staff who would execute that collection are not the same staff who built PRAMS over the prior 38 years. For CHNA purposes: the most recent PRAMS data anyone can cleanly cite is the 2022 release. Any 2023 or 2024 figure should be sourced to a state-level data sharing agreement, not to the CDC public release.

EJScreen (EPA Environmental Justice Screening and Mapping Tool)

EPA removed EJScreen from its website on February 5, 2025, per the Environmental Data and Governance Initiative's change log. The Public Environmental Data Partners coalition published a reconstruction of EJScreen 2.3 at screening-tools.com which mirrors the EPA tool's data and percentile-ranking logic. The reconstruction is functional but is not maintained by EPA, does not receive EPA's underlying data updates on a guaranteed cadence, and carries no federal data-quality guarantee. CHNAs that previously relied on EJScreen percentile flags now have to either cite the PEDP reconstruction (with a methodology note on the federal removal), substitute CDC's Environmental Justice Index where it remains available, or build environmental burden indicators from the underlying source data (EPA AQS for air, EPA TRI for toxic releases, EPA SDWIS for drinking water).

HRSA Uniform Data System (UDS) sexual orientation and gender identity variables

Per KFF's September 2025 brief on disappearing federal data, HRSA removed SOGI elements from the UDS going back to 2016, when the variables were first added. UDS is the reporting system that HRSA-funded community health centers (FQHCs) use to report program data, and many CHNAs in catchments served by FQHCs cite UDS measures directly. The FQHC-served-population denominators in UDS are intact. The SOGI breakdowns are not. Any CHNA equity analysis that disaggregated by SOGI using UDS will need a primary data source.

NOAA Billion-Dollar Weather and Climate Disasters database

Per the National Security Archive's September 2025 review, NOAA retired the Billion-Dollar Weather and Climate Disasters database and the National Centers for Environmental Information (NCEI) experienced losses to several publicly available datasets. CHNAs in disaster-prone catchments that drew historical disaster cost time-series from this database now have to substitute FEMA NRI expected annual loss data or reconstruct from state-level emergency declarations.

CDC website and dataset removals broadly

The Wikipedia overview of the 2025 federal online resource removals tracks more than 8,000 webpages and approximately 3,000 datasets removed or modified, with much of the content later restored after legal challenges, sometimes in modified form. The Harvard Law School Library Innovation Lab's Data.gov Archive has preserved 311,000 datasets from the 2024-to-2025 window. The Public Environmental Data Partners coalition continues to mirror environmental data. KFF's position is that following legal challenges much of the data has been restored, but with changes to specific variables, particularly on gender identity and in some cases sexual orientation. The empirical situation is not “everything is gone.” It is “specific variables, specific datasets, and specific staff capacity have changed, in patterns that are not easy to predict in advance.”

What does not appear on this list is also informative. CDC PLACES (the small-area estimates of chronic disease prevalence) is still being released. CDC WONDER mortality data is still being released, with the usual lag. The Census Bureau's American Community Survey is still being released. HRSA's National Provider Identifier registry is still being maintained. The CMS Medicare Geographic Variation public use file is still being released. EPA's Air Quality System monitoring data is still being released. FEMA's National Risk Index is still being maintained. HHS emPOWER, the Medicare-beneficiary-level data on electricity-dependent populations, is still being maintained. The CDC's National Wastewater Surveillance System is still being maintained. The picture is not uniform decline. It is selective disruption that hits hardest in areas where the federal data was already the only source.

What this means for CHNA methodology

A few specific practical implications follow from the pattern above.

The first practical change is that dataset citation needs vintage-and-snapshot specificity. A CHNA that cites “EJScreen percentile rank for ozone” is now ambiguous: which version, retrieved from where, on what date? CHNAs from this point forward should cite both the dataset version and the retrieval date, and where the underlying federal source has been removed should explicitly note the archive or reconstruction the figure was sourced from. This was always good methodology. It is now load-bearing.

Single-source indicators are fragile in a way they were not three years ago. A CHNA priority area that rests entirely on one federal dataset is one administrative action away from being unsupported. CHNAs that triangulate the same underlying construct across multiple independent sources will be more defensible. Heat health risk, for example, can be triangulated across NOAA temperature data, CDC PLACES coronary heart disease and diabetes prevalence, HRSA cardiology supply, and DOE/ORNL EAGLE-I power-outage exposure. Each individual source could be disrupted; the construct is robust to losing any one.

Equity analyses that depended on federal SOGI variables now need primary data plans. If a CHNA's equity framework historically rested on UDS SOGI breakdowns, federal BRFSS SOGI modules, or similar federal sources, the 2026-and-forward version of that analysis needs an explicit fallback: state BRFSS modules where states still ask, FQHC-direct partnerships, community-based participatory data collection, or qualitative methods clearly named as such. The IRS 501(r)(3) requirement to take into account input from persons representing the broad interests of the community, including medically underserved and minority populations, has not changed. The federal datasets that operationalized that input have.

Staff capacity is part of data quality, in a way that was easier to take for granted before. A dataset that still exists but whose maintaining team has been laid off is not the same data product it was before. PRAMS is the clearest example: even if collection resumes, the institutional knowledge that built the survey, weighted the responses, and approved data requests has been dispersed. CHNAs should plan for longer turnaround times on data requests, more state-by-state heterogeneity in what is available, and the practical reality that “we requested this and it didn't arrive in time” is now a foreseeable risk, not an edge case. Build it into the schedule.

Federal data archiving is now part of the CHNA toolkit, not a curiosity. The Harvard Law Innovation Lab Data.gov Archive, the Public Environmental Data Partners reconstructions, the CDC.gov mirrors maintained by university libraries, and the Data Rescue Project clearinghouse are all places where CHNA practitioners will increasingly need to source figures that used to come straight from a .gov URL. Cite them, and document the federal-archive provenance in the methodology section. A reviewer who later wants to verify a number should be able to.

Data resilience as an architectural property

There is a more general point under all of this. CHNA data infrastructure that depends on every source being available on every refresh, in the same shape, with the same variables, is not the right design pattern for the next decade. The right design pattern looks more like a data warehouse than a data download.

Side-by-side architecture comparison. On the left, a fragile single-source pattern: one source (EJScreen) feeds an Environmental Risk indicator, and when EJScreen is removed in February 2025 the indicator has no defensible substitute. On the right, a robust composite pattern: five sources (EPA AQS, EPA TRI, EJScreen, NOAA ACIS, EAGLE-I outage) feed a composite Environmental Risk score, and when EJScreen is removed the composite still produces a defensible score using the remaining four inputs, with the methodology page documenting the missing input and resulting confidence band.
Fragile single-source feed (left) versus robust composite signal (right). One administrative action removes the source; the right-hand construct degrades quantifiably rather than disappearing.

A few habits that produce assessments which are robust to upstream disruption, in roughly increasing order of effort:

  • Pin and version every source. When you pull CDC PLACES, record the release version and the retrieval date. When you pull ACS, record the 5-year vintage. When you pull HRSA NPPES, record the file version. If a dataset is later restated or removed, you have a defensible snapshot.
  • Mirror externally when permitted. For datasets where the license allows it, pull a local copy on every refresh and keep prior vintages. Storage is cheap. Reproducibility is not.
  • Build constructs, not feeds. A “Heat Health Risk” indicator that blends NOAA temperature, CDC PLACES disease prevalence, and HRSA cardiology supply is more durable than a “NOAA temperature percentile” that does the same job. If one input drops out, the construct degrades in a quantifiable way; it does not disappear.
  • Track upstream changes structurally, not anecdotally. When a federal dataset's schema changes, your pipeline should fail loudly rather than silently produce different numbers. Schema diffing on every ingest is a 50-line script that catches nine of the ten ways a dataset can quietly break.
  • Archive your archive sources. The Harvard archive, the PEDP reconstructions, the CDC.gov library mirrors are themselves volunteer-maintained. If you cite them in a CHNA, snapshot the specific page or file you cited at the time of citation. URL rot is real, and audit trails matter.

These are not new ideas in data engineering. They are widely understood standards for any production analytical system. The CHNA world has historically not had to operate at that standard because the federal feeds were stable enough that quarterly downloads and Excel-based methodology sections worked. That assumption is no longer reliable.

How this maps to what we built

Banana Analytics was built on the premise that community environmental health intelligence belongs in a versioned data warehouse with explicit lineage, not on top of ad-hoc downloads. We score all 3,222 US counties and 74,000 census tracts across four dimensions (Environmental Risk, Disease Burden, Provider Gap, SDOH Stress) by fusing CDC PLACES, CDC WONDER, HRSA HPSA designations, CMS NPPES, Census ACS, EPA AQS, EPA EJSCREEN, EPA TRI, EPA UCMR5/ECHO PFAS detections, NOAA NCEI Storm Events, HHS emPOWER, DOE/ORNL EAGLE-I power-outage burden, Stanford Childs/Burke wildfire-attributable PM2.5, CMS Geographic Variation, USDA Census of Agriculture, NCI State Cancer Profiles, and other federal and academic sources into composite signals. When one input is disrupted, the dimension scores degrade in a quantifiable way rather than going to zero, and the methodology page documents which sources contribute to which scores so practitioners know what they are relying on.

We also maintain a versioned data catalog that tracks every dataset we ingest, its source, its vintage, its grain, its row count, and its last-refreshed timestamp. CHNA practitioners using the platform can pull a county or service-area cohort summary that cites these sources and vintages directly, which resolves the citation specificity problem at the methodology-section level.

This is not the only way to build CHNA data infrastructure that holds up under disruption. It is one way, and the architectural pattern (versioned ingest, multi-source compound constructs, explicit lineage, schema diffing) is more important than any particular tool.

For practitioners

If you are running a CHNA cycle that adopts in 2026 or 2027 and you historically cited PRAMS, EJScreen, NOAA Billion-Dollar Disasters, or HRSA UDS SOGI variables, the time to update your data plan is now, not in the methodology drafting phase. The substitutions are workable but they require lead time and reviewer-defensible documentation.

If you write CHNAs as a consultant, the data resilience properties of the underlying platform you use are now a procurement question worth asking. “What happens to your scores if this federal source goes dark?” is a legitimate diligence question, and the answer should be specific.

If you run community benefits or strategy at a hospital, the assessment-cycle clock does not pause for upstream disruption. Building a year of slack into the data assembly phase, with explicit fallback sources documented for every priority indicator, is the practical move.

If you work at a public health department or FQHC, the federal data infrastructure that funded a lot of your historical analysis is genuinely under more strain than it has been in any recent period. Local data collection capacity and state-level partnerships are increasingly load-bearing. Funding for that capacity, where it exists, is worth defending.

The Banana Analytics platform is free at the county level for the basic profile, and Pro and Consultant Studio features, including cohort summaries with cited data lineage and vintage, come with a 14-day free trial so you can validate the workflow on a real cohort before committing. We also make data available without a procurement conversation when the work warrants it.

Banana Analytics is a public benefit corporation building community environmental health intelligence for health systems, public health departments, and community organizations. We're committed to 1% for the Planet. If you're doing good work and a license is out of reach, we'd rather have a conversation than lose you. Reach out.