Methodology
How We Score Counties.
Every weight, every threshold, every data source. We publish our methodology transparently because the organizations using this data deserve to know exactly how the numbers are produced.
Methodology v1.8.0 · Last updated April 2026
Overview
What the platform measures and why
Banana Analytics scores every US county (N ≈ 3,222) across multiple dimensions of health system need. Each dimension is scored 0–100 using national percentile ranking, where 100 represents the highest risk or greatest burden.
When two or more pillars simultaneously exceed the 70th percentile, this convergence is flagged as Multi-Pillar Convergence — a structural arithmetic observation about co-occurring environmental exposure, population health burden, and healthcare access deficit that indicates an outsized and potentially underserved health need. This is distinct from the named Compound Signals (below) — the latter are weighted, health-relevant composites with their own evidence bases (Outage Vulnerability, Smoke Burden, etc.).
The platform produces two types of composite scores:
- Opportunity Score — a weighted composite of Environmental Risk, Disease Burden, and Provider Gap used for broad county prioritization.
- Named Compound Signals — eight health-relevant risk composites grouped into two families: Burden (cumulative environmental load on a population — Respiratory Burden, Smoke Burden, Industrial Burden, Field Burden, Runoff Burden) and Vulnerability (population susceptibility to a specific hazard — Heat Vulnerability, Heat-Dialysis Vulnerability, Outage Vulnerability). Every signal follows the same
{Exposure} × {Population}grammar.
Scoring Dimensions
Four dimensions of county-level health need
Environmental Risk Score (0–100)
Composite of air quality, water contamination, soil hazards, and climate extremes. Higher scores indicate greater environmental burden.
| Sub-domain | Weight | Indicators |
|---|---|---|
| Air Quality | 35% | PM2.5 annual mean (35%), Ozone (25%), TRI releases (20%), NO2 (10%), SO2 (10%) |
| Water Quality | 25% | PFAS severity score |
| Soil & Chemical | 20% | Radon zone classification (50%), Pesticide use in kg (50%) |
| Climate | 20% | Days above 95°F (60%), Avg summer max temperature (40%) |
Disease Burden Score (0–100)
Weighted prevalence of chronic conditions organized by clinical service line. Higher scores indicate greater population health burden.
| Service Line | Weight | Conditions |
|---|---|---|
| Respiratory | 25% | Current asthma (50%), COPD (50%) |
| Oncology | 20% | Cancer prevalence |
| Cardiovascular | 20% | Coronary heart disease (50%), Stroke (50%) |
| Endocrine | 15% | Diabetes |
| Renal | 10% | Kidney disease |
| Behavioral Health | 10% | Depression (50%), Frequent mental distress (50%) |
Provider Gap Score (0–100, inverted)
Healthcare specialists per 100,000 population, percentile-ranked and inverted so that counties with fewer providers score higher (indicating worse access). The score uses a 50/50 population-weighted blend of within-county and neighboring-county provider density (using Census Bureau county adjacency files), reducing false positives where a county borders a major medical center. Named compound signals use specialty-specific provider counts: pulmonology for respiratory and wildfire signals, cardiology for heat health risk.
SDOH Composite (0–100)
Social determinants of health composite spanning seven domains: food insecurity, housing instability, transportation barriers, utility difficulties, interpersonal safety, behavioral health access, and provider access. Derived from Census ACS, County Health Rankings, and CMS Geographic Variation data.
Composite Scoring
How dimensions become scores
Opportunity Score
Disease Burden receives the highest weight (35%) because chronic disease prevalence is the most direct indicator of population health need and service demand. Environmental Risk and Provider Gap each receive 25% as upstream exposure and access modifiers. SDOH Stress (15%) captures the social conditions that shape both access and outcomes. When SDOH data is unavailable for a county, the score falls back to the 3-dimension formula (30/40/30).
Adding SDOH as a fourth dimension measurably improved the model's predictive validity: correlations with county-level mortality increased by 0.08–0.10 points across every major cause of death compared to the 3-dimension specification.
Multi-Pillar Convergence
Multi-Pillar Convergence is the structural-arithmetic check: how many of the four scoring pillars simultaneously exceed the 70th percentile. It is distinct from the named Compound Signals (Heat \u00d7 Outage Vulnerability, Wildfire Burden, etc.) documented further down the page — those are weighted, health-relevant composites with their own evidence bases. A county can show multi-pillar elevation without triggering any named compound signal, and vice versa.
| Tier | Criteria | County Count |
|---|---|---|
| No elevation | 0–1 of 4 pillars elevated | 2,492 (77%) |
| Moderate | 2 of 4 pillars elevated | 604 (19%) |
| Strong | 3 of 4 pillars elevated | 126 (4%) |
| Extreme | All 4 pillars elevated | 1 (<0.1%) |
Normalization
All indicator values are converted to national percentile ranks (0–100) before weighting. This normalizes disparate units (µg/m³, prevalence %, provider counts) to a common scale. Values are capped at the 1st and 99th percentiles (winsorization) before ranking to limit outlier influence.
Missing data handling
Counties missing an indicator are scored on available data only. Weights are renormalized so missing data does not penalize. Each score carries a coverage and confidence classification:
| Confidence | Coverage | Interpretation |
|---|---|---|
| High | ≥90% of components present | Score is well-supported |
| Medium | 50–89% of components present | Score is reasonable but less certain |
| Low | <50% of components present | Interpret with caution |
Named Compound Signals
Eight clinically interpretable risk composites
Each named signal combines an environmental exposure, a disease prevalence blend, and a provider access deficit into a single score. Multi-condition disease components use a 60/40 dominant/secondary blend: the higher-percentile condition receives 60% weight and the lower receives 40%, ensuring that counties with both conditions elevated score higher than those with only one.
Respiratory Burden
Air pollution exposure × Respiratory-vulnerable population
| Component | Weight | Source |
|---|---|---|
| PM2.5 annual mean | 40% | EPA AQS / EJSCREEN |
| Asthma + COPD blend | 30% | CDC PLACES |
| Pulmonology access deficit | 30% | NPPES (inverted) |
Smoke Burden
Wildfire smoke exposure × Respiratory-vulnerable population
| Component | Weight | Source |
|---|---|---|
| Active fires within 200km | 12.5% | NIFC/WFIGS |
| 30-day max AQI | 12.5% | EPA AQS daily |
| Annual mean wildfire-PM2.5 | 15% | Stanford Childs/Burke (Harvard Dataverse) |
| Smoke-days >55 µg/m³ | 15% | Stanford Childs/Burke |
| Asthma + COPD blend | 25% | CDC PLACES |
| Pulmonology access deficit | 20% | NPPES (inverted) |
Heat Vulnerability
Extreme heat exposure × Heat-vulnerable population
| Component | Weight | Source |
|---|---|---|
| Avg summer max temperature | 40% | NOAA ACIS |
| CHD + Diabetes blend | 30% | CDC PLACES |
| Cardiology access deficit | 30% | NPPES (inverted) |
Industrial Burden
Industrial emissions exposure × Surrounding population
| Component | Weight | Source |
|---|---|---|
| TRI facility count | 35% | EPA Envirofacts |
| PFAS contamination | 25% | EPA UCMR5/ECHO |
| Pesticide usage (kg) | 20% | USGS PNSP |
| Total provider access deficit | 20% | NPPES (inverted) |
Heat-Dialysis Vulnerability
Extreme heat exposure × Dialysis-dependent population
| Component | Weight | Source |
|---|---|---|
| Avg summer max temperature | 40% | NOAA ACIS |
| Dialysis-dependent Medicare beneficiaries (per 1k) | 35% | HHS emPOWER |
| Chronic kidney disease prevalence | 25% | CDC PLACES |
Outage Vulnerability
Power outage risk × Electricity-dependent medical population
| Component | Weight | Source |
|---|---|---|
| Avg summer max temperature | 30% | NOAA / PRISM |
| EAGLE-I outage burden | 25% | DOE/ORNL EAGLE-I |
| emPOWER DME per 1k Medicare | 20% | HHS emPOWER |
| CHD + COPD blend (60/40 dominant) | 15% | CDC PLACES |
| Pre-1980 housing % (AC proxy) | 10% | Census ACS B25034 |
Field Burden
Pesticide + heat exposure × Farmworker population
| Component | Weight | Source |
|---|---|---|
| Pesticide intensity (kg/sq mi rank) | TBD | USGS PNSP (EPest_HIGH) |
| Summer max-temp percentile | TBD | NOAA / PRISM |
| Farmworker exposure proxy | TBD | USDA NASS livestock + crop area |
Runoff Burden
Agricultural runoff + flood exposure × Uninsured rural population
| Component | Weight | Source |
|---|---|---|
| CAFO density rank | TBD | USDA Census of Ag + EPA AU formula |
| Flood exposure | TBD | Lynch / Parks 2025 (pending #76) |
| Rural Medicaid coverage gap | TBD | Census ACS + Census urban-rural classification |
Heat-Dialysis Vulnerability is anchored on Taiwan NHIRD findings (5.3× CKD heat-hospitalization rate, 9× ESRD heat-stroke mortality). HHS emPOWER counts between 1 and 10 are masked to the literal value 11 for beneficiary privacy; per-1,000 rates derived from the masked counts respect the same floor.
Smoke Burden is a 6-component blend of acute and chronic wildfire-smoke exposure with population vulnerability and response capacity. The acute legs (active fires within 200km + 30-day max AQI) capture recent-event proximity and observed air-quality response. The chronic legs (annual mean wildfire-PM2.5 + smoke-days > 55 µg/m³) use the peer-reviewed Stanford Childs/Burke product (Childs et al, Environmental Science & Technology 2022; Burke et al, Nature 2023) which captures the smoke component the EPA classifies as “exceptional events” and excludes from official AQS readings — AQS undercounts wildfire-attributable PM2.5 by 10–30% in fire-affected counties. The population leg (asthma + COPD prevalence, 25%) and the response-capacity leg (pulmonology access deficit, 20%) close the loop on who is exposed and who can treat them. Coverage for the chronic legs is CONUS only (V1, 2006–2020); counties in AK, HI, PR, VI, GU fall back to the acute legs alone.
This signal is the merged successor to the legacy Wildfire Smoke Vulnerability and Wildfire-Attributable Burden signals (methodology v1.5.0). Those two signals shared 2 of 4 components and cannibalized each other on county pages; the merged 6-component blend is more robust than either alone, captures both acute and chronic exposure in one number, and reflects both who is exposed and who can treat them. Saved cohorts and share-view URLs that reference the legacy signals continue to resolve via a soft-redirect to the merged signal.
Outage Vulnerability is the compound that no aggregator surfaces. Anchored on McBrien & Casey, PLOS Medicine 2026 (8+ hour county outages associate with respiratory hospitalization risk ratio 1.05); Stone et al, Environmental Science & Technology 2023 (heat × blackout doubles all-cause mortality in modeling, with Phoenix worst-case ~13,000 deaths); and the realized 2021 Texas Uri event (246 official deaths versus 702–814 excess-mortality estimates). Outage burden uses DOE/ORNL EAGLE-I 15-minute data validated in Brelsford et al, Scientific Data 2024. Coverage is 3,050 of 3,222 US counties; AK and some sparsely-served rural counties may have no signal at all. The 10% AC-proxy leg is currently null pending Census ACS B25034 derivation in the env profile pipeline; the score still computes from 4 of 5 components and the confidence_level reflects which inputs were available.
Field Burden surfaces the counties where outdoor agricultural workers face simultaneous heat-illness and pesticide-exposure risk — the demographic HRSA 330(g) migrant/seasonal worker centers were created to serve. Pesticide intensity uses USGS Pesticide National Synthesis Project (PNSP), EPest_HIGH estimates. Summer max-temp percentile uses NOAA / PRISM gridded climate data. Top 5 are TX panhandle and AZ agricultural counties. Note that USGS PNSP is on a medium-low update reliability footing (program nearly killed in 2023; no 2018 data and no 2020+ release as of this writing). Component weights documented in the gold pipeline manifest; see ticket #90 for the compute() implementation.
Runoff Burden surfaces counties where concentrated animal feeding operations sit in flood-exposed terrain and the local health system is least equipped to absorb the public-health spillover. Pattern: Hurricane Florence inundated 91 NC swine + 36 poultry CAFOs in 2018, releasing untreated waste into waterways and floodwaters that residents waded through. CAFO density uses the same gold composite as the standalone CAFO Density tile (USDA Census + EPA 40 CFR §122.23). Flood exposure pending Lynch/Parks 2025 ingestion (ticket #76); the v1 signal uses a placeholder-friendly formulation that will sharpen when flood data lands. Top 5 are MN/MO/IA/KY/SD pork-belt counties. See ticket #91.
Data Sources & Freshness
19 federal + peer-reviewed data foundations
All data sources are publicly available US federal datasets. The platform refreshes data daily via automated Airflow pipelines. The “Typical Lag” column indicates how current the source data typically is at the time of ingestion. See the full data sources page for descriptions and current vintage dates.
| Source | Indicators | Refresh | Typical Lag |
|---|---|---|---|
| EPA AQS | PM2.5, Ozone, NO2, SO2 | Monthly | 6\u201318 months |
| EPA EJSCREEN | Modeled PM2.5, Diesel PM, Traffic proximity, Superfund proximity | Quarterly | 1 year |
| EPA TRI | Toxic releases, facility counts | Annual | 18 months |
| EPA UCMR5/ECHO | PFAS detections | Quarterly | 3\u20136 months |
| USGS PNSP | Pesticide use by county | Annual | 2\u20133 years |
| EPA Radon Zones | Radon zone classification | Static (1993) | N/A |
| NOAA ACIS | Days above 95\u00b0F, summer max temp | Monthly | Current year |
| CDC PLACES | Disease prevalence (9 conditions) | Monthly | 1\u20132 years |
| CDC WONDER | Cause-specific mortality (8 causes) | Monthly | 1\u20132 years |
| NCI State Cancer Profiles | Cancer incidence by site | Monthly | 2\u20133 years |
| CHR / NVSS | Low birth weight, infant mortality | Monthly | 1\u20132 years |
| CMS NPPES | Provider supply by specialty | Monthly | Current month |
| Census ACS 5-Year | Demographics, SDOH indicators | Quarterly | 1 year |
| CMS Geographic Variation | Medicare spending by category | Monthly | 1\u20132 years |
| HHS emPOWER Map | Electricity-dependent Medicare DME (dialysis, oxygen, ventilators); counts 1\u201310 masked to \u226410 for privacy | Monthly | Current month |
| Stanford Childs/Burke | Wildfire-attributable PM2.5 (annual mean + smoke days). Harvard Dataverse 10.7910/DVN/DJVMTV. CONUS only. | Annual | 3\u20134 years |
| DOE/ORNL EAGLE-I | 15-min customer-hours-out outage burden (Brelsford et al Sci Data 2024). figshare 10.6084/m9.figshare.24237376. 3,050 of 3,222 US counties. | Annual | 1 year |
| NOAA NCEI Storm Events | Recorded weather events 2010\u2013present (events, deaths, injuries, damages). 8 health-relevant event-type buckets. | Monthly | 2\u20133 months |
| USDA Census of Agriculture | County-level livestock head counts (hogs, cattle, dairy, broilers, layers, turkeys, sheep). Combined with EPA 40 CFR \u00a7122.23 to derive Animal Units + CAFO density composite + dominant species + NPDES confidence class. | Every 5 years | 1\u20132 years (2022 vintage current) |
Sensitivity & Robustness
How stable are the results?
Weight stability
We test alternative weight vectors across reasonable perturbation ranges for every set of weights in the system:
- Environmental domain weights: Equal (25/25/25/25), air-heavy (45/20/15/20), climate-heavy (25/20/15/40), water-heavy (25/35/20/20)
- Opportunity score weights: Equal across all dimensions, disease-dominant (50%), environment-dominant (40%), and ±10 percentage point perturbations on each dimension
- Named signal weights: ±10 percentage point perturbations on the two highest-weighted components
For each alternative, we compute Spearman rank correlation with the baseline, count of counties that change multi-pillar convergence tier, and stability of the top-25 and bottom-25 counties. The sensitivity analysis scripts are versioned alongside the scoring pipeline for reproducibility.
Threshold selection
The multi-pillar convergence threshold (70th percentile) was evaluated by sweeping from the 50th to the 90th percentile in 5-point increments. The sweep measures how many counties qualify and what percentage of the US population they represent at each threshold.
The 70th percentile was selected to balance signal prevalence (enough qualifying counties to be useful for planning) with actionability (few enough that the signal remains discriminating). Below the 65th percentile, multi-pillar convergence becomes too common to inform prioritization; above the 80th, it becomes too rare to support service line planning.
Limitations of weight selection
Dimension and indicator weights reflect structured expert judgment informed by epidemiological literature on environmental health linkages and health system service line economics. They are not empirically optimized against outcome data. We publish our weights transparently and provide sensitivity analysis tooling so users can assess robustness and explore alternative specifications.
Validation
Do the signals predict real-world outcomes?
The validation framework tests whether environmental and surveillance signals correlate with real-world health system utilization. We evaluated 32 signal-outcome pairs across 7 domains (ILI surveillance, wastewater COVID, temperature, humidity, heat index, air quality, severe storms) against outcomes including NHSN hospital admissions, NSSP emergency department visits, and bed occupancy rates.
Key findings
ILI surveillance correlations
r = 0.81\u20130.91
ILI surveillance signals strongly predict influenza hospitalizations (r = 0.81) and flu ED visits (r = 0.91) across all 50+ states. 100% of states show statistically significant associations.
Wastewater COVID signals
r = 0.70\u20130.79
COVID wastewater percentiles predict COVID ED visits (r = 0.79) and hospitalizations (r = 0.70) with 100% state significance. Granger causality confirmed in 58\u201370% of states.
Temperature and respiratory ED
r = 0.63
Weekly max temperature correlates with combined respiratory ED visits across all 48 reporting states. Granger causality confirmed in 73% of states.
Regional consistency
10 / 10 regions
All ten NOAA climate regions show significant ILI-to-hospitalization correlations, with mean |r| ranging from 0.47 (Alaska) to 0.89 (Northwest).
Opportunity Score vs. county-level mortality
The 4-dimension Opportunity Score correlates with mortality from CDC WONDER across every major cause of death (all p < 0.001):
| Mortality Outcome | Spearman r |
|---|---|
| All-cause mortality | +0.598 |
| Heart disease mortality | +0.546 |
| Chronic lower respiratory disease | +0.538 |
| Cancer mortality | +0.430 |
Environmental signal domain performance
| Signal Domain | Pairs Tested | Avg Composite Score |
|---|---|---|
| ILI Surveillance | 4 | 0.81 |
| Wastewater COVID | 2 | 0.81 |
| Humidity | 3 | 0.51 |
| Temperature | 7 | 0.31 |
| Heat Index | 6 | 0.28 |
| Air Quality | 6 | 0.16 |
| Severe Storms | 4 | 0.13 |
Composite scores combine correlation strength, statistical significance, Granger causality, and state-level consistency into a single 0–1 metric. Scores above 0.5 indicate strong, consistent predictive relationships.
Interpretation
The strongest validation comes from ILI and wastewater surveillance, where signals predict hospital utilization with high consistency across regions. Environmental exposure signals (temperature, humidity) show moderate but significant associations with respiratory ED visits. Air quality and storm signals show weaker county-level correlations, consistent with the ecological nature of the analysis and the diffuse exposure pathways involved.
These correlations are observed at the state level over time and do not establish individual-level causal pathways. They demonstrate that the signal domains tracked by the platform correspond to measurable patterns in health system utilization.
Limitations & Interpretation Guidance
What the scores can and cannot tell you
Ecological fallacy
All scores represent county-level aggregate patterns. Within-county variation may be substantial. A county with a high respiratory burden score may have asthma concentrated in specific neighborhoods near industrial sources, while other areas of the county are unaffected. Scores should not be interpreted as individual-level risk assessments.
Cross-sectional design
Compound signals identify geographic convergence of risk factors at a point in time. They do not establish temporal sequence or causation. A county with high environmental risk and high disease burden may reflect (a) environmental exposure contributing to disease, (b) population migration patterns that co-locate vulnerable groups with environmental hazards, or (c) shared upstream determinants affecting both.
Data lag
The platform integrates data sources with varying recency. CDC PLACES health estimates lag 1–2 years. EPA air quality data may lag 6–18 months. Provider counts from NPPES reflect registration, not active practice. Scores represent the best available composite picture, not a real-time snapshot.
NPPES limitations
Provider counts are derived from the National Plan and Provider Enumeration System, which records where providers register, not necessarily where they practice. To mitigate this, the Provider Gap score uses a 50/50 population-weighted blend of within-county and neighboring-county provider density. This adjacency adjustment reduces false positives where a county borders a major medical center, but does not fully resolve registration-vs-practice location discrepancies.
Weight subjectivity
Dimension and indicator weights reflect structured expert judgment informed by epidemiological literature and health system service line economics. They are not empirically derived from outcome data. Sensitivity analysis demonstrates that key findings are robust across reasonable weight perturbations.
Geographic resolution
The platform currently operates at county-level resolution. Sub-county patterns (ZIP code, census tract) may differ substantially from county-level scores. County-level analysis is most appropriate for regional planning and needs assessment; facility-level decisions require finer-grained analysis.
Versioning & Changelog
Methodology versions
The methodology is locked within a major version. Formula changes increment the minor version. Structural changes (new dimensions, new signals, validation-informed recalibration) increment the major version. Every scored output row is stamped with the methodology version that produced it.
- Final brand-name taxonomy (#116). Every named signal now follows the same {Exposure} × {Population} grammar with a one- or two-word brand-name + a long-form subtitle. Renames: Wildfire Burden → Smoke Burden, Heat Health Risk → Heat Vulnerability, Heat × Outage Vulnerability → Outage Vulnerability, Industrial Pollution Burden → Industrial Burden, Pesticide × Heat (Farmworker) → Field Burden, CAFO × Flood × Uninsured Rural → Runoff Burden. Two families codified: Burden (Respiratory, Smoke, Industrial, Field, Runoff) and Vulnerability (Heat, Heat-Dialysis, Outage). signal_id strings stay stable for back-compat.
- Merged Wildfire Smoke Vulnerability + Wildfire-Attributable Burden into one Smoke Burden signal (#115). The two legacy signals shared 2 of 4 components (asthma + COPD blend, pulmonology access deficit) and cannibalized each other on county pages. The merged signal blends 6 components — fire activity (12.5%) + 30-day max AQI (12.5%) + Stanford Childs/Burke annual mean wildfire-PM2.5 (15%) + smoke-days >55 µg/m³ (15%) + asthma + COPD blend (25%) + pulmonology access deficit (20%) — capturing both acute and chronic exposure plus population vulnerability and response capacity in one number.
- Total named signals now 8 (was 9). Existing share-view URLs and saved cohorts referencing the legacy signal_ids continue to resolve via a soft-redirect to wildfire_burden.
- Added Pesticide × Heat (Farmworker) named signal — USGS PNSP pesticide intensity × NOAA summer max-temp percentile × farmworker-exposure proxy. Surfaces the counties where outdoor ag workers face simultaneous heat and pesticide risk. 3,254 counties scored; top 5 are TX panhandle + AZ ag counties. Anchors HRSA 330(g) migrant/seasonal-worker-center positioning.
- Added CAFO × Flood × Uninsured Rural named signal — CAFO density × flood exposure × rural Medicaid coverage gap. Pattern after Hurricane Florence's 2018 inundation of 91 NC swine + 36 poultry CAFOs. 3,369 counties scored; top 5 are MN/MO/IA/KY/SD pork-belt counties. Flood exposure currently uses a placeholder formulation pending Lynch/Parks 2025 ingestion (ticket #76).
- Total named signals now 9 (was 7). gold/compound_signal_scores.parquet rebuilt automatically with both new signals; score distributions are healthy (medians ~50, maxes 92–96, no degenerate columns).
- Added /platform/rural-ag landing page — public, ungated buyer surface for the agricultural-exposures data depth (HRSA 330(g) centers, ag-state Medicaid, Critical Access Hospitals, state ag-health departments).
- Agricultural Exposures foundation (ticket #74) — five new pipelines landed: USDA NASS crops + livestock surveys, USGS PNSP national pesticide use, CA PUR section-grain pesticide application, and a gold CAFO composite index derived from USDA Census + EPA 40 CFR §122.23 animal-unit conversions. Spans 3,222 counties.
- New CAFO Density section on county pages and the platform Environmental Factors tab — surfaces national rank-percentile, animal units per square mile, dominant livestock species, and the NPDES federal-permit confidence class per state (high / medium / low / state-only).
- NPDES confidence-class methodology: state-only program counties (IN, ID, AR) are flagged because federal CAFO permit data is structurally absent there; the index runs entirely off USDA Census head counts and is labeled accordingly. Conservative bias on AU totals (USDA doesn't break out the EPA piglet sub-class), so actuals likely run 5–10% higher than reported.
- Two follow-up compound signals filed but not yet shipped: pesticide_heat_farmworker (#90, all inputs landed; needs compute()) and cafo_flood_uninsured_rural (#91, blocked on flood-exposure ingest from #76).
- Added Heat × Outage Vulnerability named signal — the compound that no aggregator surfaces. 30% summer heat × 25% EAGLE-I outage burden × 20% emPOWER DME × 15% CHD/COPD blend × 10% pre-1980 housing AC proxy. Anchored on McBrien & Casey PLOS Med 2026, Stone EST 2023, and the realized 2021 Texas Uri event.
- New foundation pipeline foundation_doe_eagle_i (annual) ingesting DOE/ORNL EAGLE-I 15-min outage data — customer-hours-out, peak intervals, MCC-normalized fraction. 3,050 of 3,222 US counties.
- New foundation pipeline foundation_noaa_storm_events (monthly) ingesting NOAA NCEI Storm Events Database — cumulative 2010-present + last-5y windows, 8 health-relevant event-type buckets.
- Tract bundles auto-extend to 7 named signals via REGISTRY iteration; the new heat-outage signal flows to all 3,222 county tract.json files automatically with is_county_fallback=true (county-only inputs).
- Added Heat × Dialysis Vulnerability named signal — pairs summer heat with HHS emPOWER dialysis-dependent Medicare beneficiaries and CDC PLACES CKD prevalence (40/35/25 weights)
- Added Wildfire-Attributable Burden named signal — uses Stanford Childs/Burke peer-reviewed wildfire-PM2.5 (Harvard Dataverse), distinct from the existing acute Wildfire Smoke Vulnerability signal which uses fire-perimeter proximity
- New foundation pipeline foundation_cms_empower (monthly) ingesting HHS emPOWER electricity-dependent DME counts at county resolution; counts 1–10 are masked to ≤10 for privacy
- New foundation pipeline foundation_stanford_wildfire (annual) ingesting Childs/Burke wildfire-PM2.5 (CONUS only)
- Three of four pillars (Environmental Risk, Disease Burden, SDOH Stress) now resolve at tract-native resolution; Provider Gap uses haversine-weighted adjacency adjustment with σ=20km decay (slice 2 of tract-native scoring)
- Opportunity Score expanded to 4 dimensions: added SDOH Stress (15%) alongside Environmental Risk (25%), Disease Burden (35%), Provider Gap (25%)
- Provider Gap now uses 50/50 population-weighted adjacency adjustment (within-county + neighboring-county blend)
- Compound signal detection expanded from 3 to 4 dimensions; tiers updated accordingly
- Validation correlations improved by 0.08–0.10 across all mortality outcomes with the 4-dimension model
- Cancer incidence data added (NCI State Cancer Profiles, 2,926 counties, 5 sites)
- Full EJSCREEN environmental justice proximity indicators (9 new fields)
- Demographics with race/ethnicity composition and pre-1978 housing lead risk proxy
- Historical trends for 8 CDC PLACES measures (2018–2023)
- Replaced max() with dominant/secondary blend (60/40) for multi-condition disease components in named signals
- Added specialty-specific provider gaps: pulmonology for respiratory and wildfire signals, cardiology for heat health risk
- Added components_present and coverage_pct metadata to all dimension scores
- Published sensitivity analysis across weight and threshold parameters
- Published validation against NHSN, NSSP, and CMS utilization data
- Added Limitations & Interpretation Guidance section
- Updated methodology text to use associational rather than causal language
- Initial release with 4 named compound signals
- max() aggregation for multi-condition disease components
- Total provider density (not specialty-specific)
Citation
How to cite this methodology
If you reference the methodology or its outputs in published work, community health needs assessments, or grant applications, please cite the version you used. The methodology version is displayed on every report generated by the platform.
See the methodology in action.
Explore environmental risk, disease burden, provider access, and compound signals for any US county.