How We Score Every US County for Environmental Health Risk

Out of 3,222 US counties, 731 have two or more health system risk dimensions simultaneously elevated above the 70th national percentile. Tighten the filter to three of four dimensions, and the number drops to 127. These counties don't just face one problem. Not just bad air quality, or high disease rates, or a shortage of doctors. They face multiple converging crises at once. And the people living in them have mortality rates 40–105% higher than counties in the bottom quintile of our scoring system.

That statistic comes from the compound signal methodology we built at Banana Analytics. This article explains what compound signals are, how we calculate them, and, critically, whether they actually predict real-world health outcomes. We're publishing the full methodology because we think transparency should be the baseline for any platform asking health systems and public health departments to trust its data.

The Problem: Health Data Exists in Silos

Anyone who has worked on a Community Health Needs Assessment knows the pain. You need environmental data from EPA. Disease prevalence from CDC PLACES. Mortality from CDC WONDER. Provider supply from HRSA. Demographics from the Census Bureau. Social determinants from County Health Rankings. Each source has its own portal, its own geographic identifiers, its own update schedule, and its own format.

Building a complete picture of a single county's environmental health landscape means visiting 10–15 separate federal data portals. And once you've assembled the data, the hardest question remains unanswered: what does it mean when these factors converge in the same place?

That's the question compound signals answer.

What Is a Compound Signal?

A compound signal is the geographic co-occurrence of multiple elevated health risk dimensions in the same county. The concept is simple: any single elevated risk factor is common. High disease burden alone describes hundreds of counties. Poor provider access alone describes hundreds more. But when environmental exposure, population health burden, healthcare access deficit, and social vulnerability are all elevated simultaneously in the same geography, that's rare. And it's actionable.

We score every US county across four dimensions:

The first dimension, Environmental Risk, captures a county's exposure to environmental health hazards across air quality (PM2.5, ozone, NO2, SO2), water quality (PFAS contamination), soil and chemical exposure (radon, pesticide use), and climate stress (extreme heat days, summer temperatures). Toxic release data from EPA TRI contributes to the air quality sub-domain. Each indicator is percentile-ranked nationally and weighted within its domain.

Disease Burden measures population health across six clinical service lines: respiratory (asthma, COPD), oncology (cancer prevalence), cardiovascular (coronary heart disease, stroke), endocrine (diabetes), renal (kidney disease), and behavioral health (depression, mental distress). All prevalence data comes from CDC PLACES model-based estimates.

Provider Gap quantifies healthcare access deficit using specialty-specific provider counts from the CMS National Plan and Provider Enumeration System (NPPES). To account for cross-county access (a county with zero pulmonologists might border a major medical center), the score uses a 50/50 population-weighted blend of within-county provider density and neighboring-county provider density, using the Census Bureau's county adjacency file. The blended rate is then inverted: counties with fewer specialists per capita score higher, reflecting greater unmet need.

SDOH Stress measures social vulnerability across food insecurity, housing instability, transportation access, economic stress, education, and health insurance coverage. Sources include County Health Rankings, CDC PLACES, and Census ACS.

Each dimension produces a 0–100 score based on national percentile ranking. A county at the 85th percentile for Environmental Risk has worse environmental conditions than 85% of US counties.

These four dimensions combine into an Opportunity Score weighted 25% Environmental Risk, 35% Disease Burden, 25% Provider Gap, and 15% SDOH Stress. Disease burden gets the highest weight because it's the most directly actionable dimension for health systems. It tells them which patients need which services right now. Environmental risk and provider gap provide the context for why the burden exists and whether the system is equipped to handle it. SDOH stress provides the upstream social conditions that shape both access and outcomes. When SDOH data is unavailable for a county (about 3% of cases), the score falls back to the 3-dimension formula at 30/40/30.

Adding SDOH as a fourth dimension wasn't just architecturally correct; it measurably improved the model's ability to predict real-world health outcomes. Correlations with county-level mortality increased by 0.08–0.10 points across every major cause of death compared to the 3-dimension specification.

When two or more of these dimensions exceed the 70th percentile simultaneously, we flag a compound signal. When three or more exceed it, that's a strong signal, and it identifies a community where environmental conditions, population health, healthcare access, and social determinants are all working against residents at the same time.

Why the 70th Percentile?

The threshold isn't arbitrary. We swept it from the 50th to the 90th percentile and measured what happened.

At the 50th percentile, over a third of US counties qualified as compound signal counties. That's too many; if a third of the country triggers the alert, the alert isn't useful. At the 80th percentile, the number drops to a fraction of a percent. That's too exclusive for regional health planning.

The 70th percentile identifies 731 counties with compound signals (2 or more dimensions elevated), including 127 with strong signals (3 or more dimensions). That's selective enough to be meaningful, but broad enough to inform service area planning for health systems.

Named Signals: When the Story Gets Specific

Beyond the aggregate compound signal, we compute four named signals that target specific environmental health patterns. Each one asks a different question:

Respiratory Burden asks: where are people breathing bad air, already sick with respiratory disease, and far from a pulmonologist? This signal combines PM2.5 exposure (40%), a blended asthma/COPD prevalence score (30%), and pulmonology access deficit (30%).

Wildfire Smoke Vulnerability asks: which communities are near active fires, already have compromised lungs, and can't get to a doctor? This signal is seasonal and dynamic, incorporating active fire proximity, recent peak AQI, respiratory disease prevalence, and pulmonology access.

Heat Health Risk asks: where is it dangerously hot, the population already has conditions that heat worsens, and the healthcare system is thin? This combines extreme heat exposure with cardiovascular and diabetes comorbidity and cardiology access.

Industrial Pollution Burden asks: which counties sit near toxic industry, have contaminated water, face agricultural chemical exposure, and lack adequate healthcare? This signal combines TRI facility density, PFAS contamination, pesticide use, and provider access.

Each named signal uses a 60/40 dominant-secondary blend for multi-condition disease components. If a county has both high asthma and high COPD, the higher condition gets 60% weight and the lower gets 40%. This preserves the signal from both conditions. A county with two elevated respiratory diseases scores higher than one with a single dominant condition.

Do the Scores Actually Predict Anything?

A scoring methodology is only as good as its relationship to real-world outcomes. We validated our compound signal scores against county-level mortality data from CDC WONDER (2018–2024) and Medicare spending data from the CMS Geographic Variation Public Use File. The results cover approximately 3,100 US counties.

The opportunity score correlates with mortality across every major cause of death. Spearman rank correlations ranged from +0.598 for all-cause mortality to +0.430 for cancer mortality, all statistically significant (p < 0.001). These are moderate-to-strong positive correlations, strong enough to demonstrate that the scores capture real health system burden, but not so high that they suggest circularity.

Compound signal counties have substantially worse outcomes. Counties flagged with strong compound signals (3+ dimensions elevated) had 36% higher all-cause mortality, 41% higher heart disease mortality, and 57% higher chronic lower respiratory disease mortality compared to non-signal counties. Every comparison was statistically significant (Kruskal-Wallis, p < 0.001).

The top quintile tells the starkest story. Counties in the highest-risk quintile of our opportunity score have mortality rates 40–105% higher than the lowest-risk quintile:

Chronic lower respiratory disease mortality: 2.05x higher
Heart disease mortality: 1.68x higher
Stroke mortality: 1.56x higher
Suicide mortality: 1.56x higher
All-cause mortality: 1.55x higher
Cancer mortality: 1.40x higher

The named signals also predict their corresponding outcomes, though with a caveat. The heat health risk signal correlates most strongly with cardiovascular mortality (+0.40), consistent with the established epidemiological link between chronic heat exposure and cardiovascular strain. The respiratory burden signal correlates with chronic lower respiratory disease mortality (+0.279), though notably this is weaker than the aggregate opportunity score's correlation with the same outcome (+0.538). The gap suggests that the multi-dimensional composite, which includes SDOH and broader provider access effects, captures respiratory mortality drivers that the exposure-focused named signal alone does not. This is actually an argument for the compound approach: single-dimension analysis misses context that the full model preserves.

Why the Composite Outperforms Any Single Dimension

We also examined how each dimension correlates with mortality independently:

Dimension	All-cause	Heart	CLRD	Cancer	Suicide
Disease Burden	+0.685	+0.627	+0.581	+0.635	+0.361
SDOH Stress	+0.671	+0.588	+0.604	+0.536	+0.430
Provider Gap	+0.145	+0.150	+0.157	-0.008	+0.134
Environmental Risk	-0.207	-0.210	-0.172	-0.268	-0.096
Composite	+0.598	+0.546	+0.538	+0.430	+0.383

Disease Burden and SDOH Stress are each individually stronger predictors (r = 0.5–0.7) than the composite (r = 0.4–0.6). The composite is arithmetically weaker because it averages in Environmental Risk, which correlates negatively with mortality at the county level (an urban-rural confound we discuss below). But the composite is more useful because it identifies counties where multiple risk factors converge, not just counties with high disease prevalence or high social vulnerability alone. The compound signal's value is in detecting convergence, not maximizing any single correlation.

What the Scores Don't Predict (and Why That's Important)

Drug overdose mortality shows essentially no correlation with the opportunity score (r = +0.05). That's not a failure. It's evidence that the methodology measures what it claims to measure. Overdose deaths are driven by opioid prescribing patterns, drug supply chains, and economic despair — not by PM2.5 levels or PFAS contamination. If compound signal scores predicted everything, it would suggest we'd built a generic “disadvantaged county” index rather than an environmental health intelligence tool.

We also found that Environmental Risk correlates negatively with mortality at the county level. This is a well-documented ecological confound: urban counties have worse air quality and more industrial pollution but also have better healthcare access, earlier disease detection, and lower mortality rates. Rural counties with the highest mortality often have lower environmental exposure indices. This doesn't mean environmental pollution is protective. It means county-level analysis conflates exposure with access. The named signals (Respiratory Burden, Heat Health Risk) resolve this by combining exposure with disease and access in clinically specific ways, rather than analyzing each dimension in isolation.

What We Got Wrong (And What We're Still Working On)

No methodology is perfect. Here's what we're honest about:

County-level scores can mask dramatic within-county variation. A county with a high respiratory burden score may have asthma concentrated in neighborhoods near industrial facilities while the rest of the county is unaffected. Our scores are appropriate for regional planning and needs assessment. They are not individual-level risk predictions.

Compound signals identify convergence: places where environmental risk and disease burden co-occur. They do not prove that environmental conditions caused the disease patterns. The epidemiological literature establishes those causal pathways (PM2.5 and respiratory disease, heat and cardiovascular mortality). Our contribution is identifying where those pathways converge with healthcare access deficits at the county level.

The NPPES registry records where providers register, not necessarily where they practice. To mitigate this, we implemented a county adjacency adjustment: each county's provider density is computed as a 50/50 population-weighted blend of within-county and neighboring-county rates, using the Census Bureau's county adjacency file. This reduces false positives where a county borders a major medical center. Even with this adjustment, provider-dominant weight specifications showed the lowest rank stability in our sensitivity analysis, reflecting the inherent noise in registration-based provider data relative to CDC and EPA measured outcomes.

Every weight in the system reflects expert judgment, not empirical optimization. We tested robustness across dozens of alternative weight specifications and found that rankings are stable (r > 0.95 in most cases), but we want to be clear: these are design choices, not discovered truths. We publish every weight and every sensitivity result so users can evaluate for themselves.

Why We Published This

Most data platforms in the community health space don't publish their methodology. You get a score, maybe a description, and a “trust us” handshake. We think that's insufficient for tools that inform health system investment, community benefit strategy, and grant applications affecting real communities.

We publish our indicator weights, our scoring formulas, our validation results, and our limitations because we want our users to understand exactly what they're looking at. We believe that transparency is the only sustainable competitive advantage in a market where everyone draws from the same 20 public data sources.

The full technical methodology, including sensitivity analysis results and validation data, is available at banana-analytics.com/methodology. You can also see exactly which federal data sources feed the platform and how current each one is.

Banana Analytics is a public benefit corporation building environmental health intelligence for communities, health systems, and public health organizations. We're committed to 1% for the Planet. If you're doing good work and can't afford a license, we'd rather have a conversation than lose you. Reach out.