Banana AnalyticsBANANAANALYTICS

Methodology

How We Score Counties.

Every weight, every threshold, every data source. We publish our methodology transparently because the organizations using this data deserve to know exactly how the numbers are produced.

Methodology v1.8.0 · Last updated April 2026

Overview

What the platform measures and why

Banana Analytics scores every US county (N ≈ 3,222) across multiple dimensions of health system need. Each dimension is scored 0–100 using national percentile ranking, where 100 represents the highest risk or greatest burden.

When two or more pillars simultaneously exceed the 70th percentile, this convergence is flagged as Multi-Pillar Convergence — a structural arithmetic observation about co-occurring environmental exposure, population health burden, and healthcare access deficit that indicates an outsized and potentially underserved health need. This is distinct from the named Compound Signals (below) — the latter are weighted, health-relevant composites with their own evidence bases (Outage Vulnerability, Smoke Burden, etc.).

The platform produces two types of composite scores:

  1. Opportunity Score — a weighted composite of Environmental Risk, Disease Burden, and Provider Gap used for broad county prioritization.
  2. Named Compound Signals — eight health-relevant risk composites grouped into two families: Burden (cumulative environmental load on a population — Respiratory Burden, Smoke Burden, Industrial Burden, Field Burden, Runoff Burden) and Vulnerability (population susceptibility to a specific hazard — Heat Vulnerability, Heat-Dialysis Vulnerability, Outage Vulnerability). Every signal follows the same {Exposure} × {Population} grammar.

Scoring Dimensions

Four dimensions of county-level health need

Environmental Risk Score (0–100)

Composite of air quality, water contamination, soil hazards, and climate extremes. Higher scores indicate greater environmental burden.

Sub-domainWeightIndicators
Air Quality35%PM2.5 annual mean (35%), Ozone (25%), TRI releases (20%), NO2 (10%), SO2 (10%)
Water Quality25%PFAS severity score
Soil & Chemical20%Radon zone classification (50%), Pesticide use in kg (50%)
Climate20%Days above 95°F (60%), Avg summer max temperature (40%)

Disease Burden Score (0–100)

Weighted prevalence of chronic conditions organized by clinical service line. Higher scores indicate greater population health burden.

Service LineWeightConditions
Respiratory25%Current asthma (50%), COPD (50%)
Oncology20%Cancer prevalence
Cardiovascular20%Coronary heart disease (50%), Stroke (50%)
Endocrine15%Diabetes
Renal10%Kidney disease
Behavioral Health10%Depression (50%), Frequent mental distress (50%)

Provider Gap Score (0–100, inverted)

Healthcare specialists per 100,000 population, percentile-ranked and inverted so that counties with fewer providers score higher (indicating worse access). The score uses a 50/50 population-weighted blend of within-county and neighboring-county provider density (using Census Bureau county adjacency files), reducing false positives where a county borders a major medical center. Named compound signals use specialty-specific provider counts: pulmonology for respiratory and wildfire signals, cardiology for heat health risk.

SDOH Composite (0–100)

Social determinants of health composite spanning seven domains: food insecurity, housing instability, transportation barriers, utility difficulties, interpersonal safety, behavioral health access, and provider access. Derived from Census ACS, County Health Rankings, and CMS Geographic Variation data.

Composite Scoring

How dimensions become scores

Opportunity Score

Opportunity = 0.25 × Environmental Risk + 0.35 × Disease Burden + 0.25 × Provider Gap + 0.15 × SDOH Stress

Disease Burden receives the highest weight (35%) because chronic disease prevalence is the most direct indicator of population health need and service demand. Environmental Risk and Provider Gap each receive 25% as upstream exposure and access modifiers. SDOH Stress (15%) captures the social conditions that shape both access and outcomes. When SDOH data is unavailable for a county, the score falls back to the 3-dimension formula (30/40/30).

Adding SDOH as a fourth dimension measurably improved the model's predictive validity: correlations with county-level mortality increased by 0.08–0.10 points across every major cause of death compared to the 3-dimension specification.

Multi-Pillar Convergence

Multi-Pillar Convergence is the structural-arithmetic check: how many of the four scoring pillars simultaneously exceed the 70th percentile. It is distinct from the named Compound Signals (Heat \u00d7 Outage Vulnerability, Wildfire Burden, etc.) documented further down the page — those are weighted, health-relevant composites with their own evidence bases. A county can show multi-pillar elevation without triggering any named compound signal, and vice versa.

TierCriteriaCounty Count
No elevation0–1 of 4 pillars elevated2,492 (77%)
Moderate2 of 4 pillars elevated604 (19%)
Strong3 of 4 pillars elevated126 (4%)
ExtremeAll 4 pillars elevated1 (<0.1%)

Normalization

All indicator values are converted to national percentile ranks (0–100) before weighting. This normalizes disparate units (µg/m³, prevalence %, provider counts) to a common scale. Values are capped at the 1st and 99th percentiles (winsorization) before ranking to limit outlier influence.

Missing data handling

Counties missing an indicator are scored on available data only. Weights are renormalized so missing data does not penalize. Each score carries a coverage and confidence classification:

ConfidenceCoverageInterpretation
High≥90% of components presentScore is well-supported
Medium50–89% of components presentScore is reasonable but less certain
Low<50% of components presentInterpret with caution

Named Compound Signals

Eight clinically interpretable risk composites

Each named signal combines an environmental exposure, a disease prevalence blend, and a provider access deficit into a single score. Multi-condition disease components use a 60/40 dominant/secondary blend: the higher-percentile condition receives 60% weight and the lower receives 40%, ensuring that counties with both conditions elevated score higher than those with only one.

Respiratory Burden

Air pollution exposure × Respiratory-vulnerable population

Burden
ComponentWeightSource
PM2.5 annual mean40%EPA AQS / EJSCREEN
Asthma + COPD blend30%CDC PLACES
Pulmonology access deficit30%NPPES (inverted)

Smoke Burden

Wildfire smoke exposure × Respiratory-vulnerable population

Burden
ComponentWeightSource
Active fires within 200km12.5%NIFC/WFIGS
30-day max AQI12.5%EPA AQS daily
Annual mean wildfire-PM2.515%Stanford Childs/Burke (Harvard Dataverse)
Smoke-days >55 µg/m³15%Stanford Childs/Burke
Asthma + COPD blend25%CDC PLACES
Pulmonology access deficit20%NPPES (inverted)

Heat Vulnerability

Extreme heat exposure × Heat-vulnerable population

Vulnerability
ComponentWeightSource
Avg summer max temperature40%NOAA ACIS
CHD + Diabetes blend30%CDC PLACES
Cardiology access deficit30%NPPES (inverted)

Industrial Burden

Industrial emissions exposure × Surrounding population

Burden
ComponentWeightSource
TRI facility count35%EPA Envirofacts
PFAS contamination25%EPA UCMR5/ECHO
Pesticide usage (kg)20%USGS PNSP
Total provider access deficit20%NPPES (inverted)

Heat-Dialysis Vulnerability

Extreme heat exposure × Dialysis-dependent population

Vulnerability
ComponentWeightSource
Avg summer max temperature40%NOAA ACIS
Dialysis-dependent Medicare beneficiaries (per 1k)35%HHS emPOWER
Chronic kidney disease prevalence25%CDC PLACES

Outage Vulnerability

Power outage risk × Electricity-dependent medical population

Vulnerability
ComponentWeightSource
Avg summer max temperature30%NOAA / PRISM
EAGLE-I outage burden25%DOE/ORNL EAGLE-I
emPOWER DME per 1k Medicare20%HHS emPOWER
CHD + COPD blend (60/40 dominant)15%CDC PLACES
Pre-1980 housing % (AC proxy)10%Census ACS B25034

Field Burden

Pesticide + heat exposure × Farmworker population

Burden
ComponentWeightSource
Pesticide intensity (kg/sq mi rank)TBDUSGS PNSP (EPest_HIGH)
Summer max-temp percentileTBDNOAA / PRISM
Farmworker exposure proxyTBDUSDA NASS livestock + crop area

Runoff Burden

Agricultural runoff + flood exposure × Uninsured rural population

Burden
ComponentWeightSource
CAFO density rankTBDUSDA Census of Ag + EPA AU formula
Flood exposureTBDLynch / Parks 2025 (pending #76)
Rural Medicaid coverage gapTBDCensus ACS + Census urban-rural classification

Heat-Dialysis Vulnerability is anchored on Taiwan NHIRD findings (5.3× CKD heat-hospitalization rate, 9× ESRD heat-stroke mortality). HHS emPOWER counts between 1 and 10 are masked to the literal value 11 for beneficiary privacy; per-1,000 rates derived from the masked counts respect the same floor.

Smoke Burden is a 6-component blend of acute and chronic wildfire-smoke exposure with population vulnerability and response capacity. The acute legs (active fires within 200km + 30-day max AQI) capture recent-event proximity and observed air-quality response. The chronic legs (annual mean wildfire-PM2.5 + smoke-days > 55 µg/m³) use the peer-reviewed Stanford Childs/Burke product (Childs et al, Environmental Science & Technology 2022; Burke et al, Nature 2023) which captures the smoke component the EPA classifies as “exceptional events” and excludes from official AQS readings — AQS undercounts wildfire-attributable PM2.5 by 10–30% in fire-affected counties. The population leg (asthma + COPD prevalence, 25%) and the response-capacity leg (pulmonology access deficit, 20%) close the loop on who is exposed and who can treat them. Coverage for the chronic legs is CONUS only (V1, 2006–2020); counties in AK, HI, PR, VI, GU fall back to the acute legs alone.

This signal is the merged successor to the legacy Wildfire Smoke Vulnerability and Wildfire-Attributable Burden signals (methodology v1.5.0). Those two signals shared 2 of 4 components and cannibalized each other on county pages; the merged 6-component blend is more robust than either alone, captures both acute and chronic exposure in one number, and reflects both who is exposed and who can treat them. Saved cohorts and share-view URLs that reference the legacy signals continue to resolve via a soft-redirect to the merged signal.

Outage Vulnerability is the compound that no aggregator surfaces. Anchored on McBrien & Casey, PLOS Medicine 2026 (8+ hour county outages associate with respiratory hospitalization risk ratio 1.05); Stone et al, Environmental Science & Technology 2023 (heat × blackout doubles all-cause mortality in modeling, with Phoenix worst-case ~13,000 deaths); and the realized 2021 Texas Uri event (246 official deaths versus 702–814 excess-mortality estimates). Outage burden uses DOE/ORNL EAGLE-I 15-minute data validated in Brelsford et al, Scientific Data 2024. Coverage is 3,050 of 3,222 US counties; AK and some sparsely-served rural counties may have no signal at all. The 10% AC-proxy leg is currently null pending Census ACS B25034 derivation in the env profile pipeline; the score still computes from 4 of 5 components and the confidence_level reflects which inputs were available.

Field Burden surfaces the counties where outdoor agricultural workers face simultaneous heat-illness and pesticide-exposure risk — the demographic HRSA 330(g) migrant/seasonal worker centers were created to serve. Pesticide intensity uses USGS Pesticide National Synthesis Project (PNSP), EPest_HIGH estimates. Summer max-temp percentile uses NOAA / PRISM gridded climate data. Top 5 are TX panhandle and AZ agricultural counties. Note that USGS PNSP is on a medium-low update reliability footing (program nearly killed in 2023; no 2018 data and no 2020+ release as of this writing). Component weights documented in the gold pipeline manifest; see ticket #90 for the compute() implementation.

Runoff Burden surfaces counties where concentrated animal feeding operations sit in flood-exposed terrain and the local health system is least equipped to absorb the public-health spillover. Pattern: Hurricane Florence inundated 91 NC swine + 36 poultry CAFOs in 2018, releasing untreated waste into waterways and floodwaters that residents waded through. CAFO density uses the same gold composite as the standalone CAFO Density tile (USDA Census + EPA 40 CFR §122.23). Flood exposure pending Lynch/Parks 2025 ingestion (ticket #76); the v1 signal uses a placeholder-friendly formulation that will sharpen when flood data lands. Top 5 are MN/MO/IA/KY/SD pork-belt counties. See ticket #91.

Data Sources & Freshness

19 federal + peer-reviewed data foundations

All data sources are publicly available US federal datasets. The platform refreshes data daily via automated Airflow pipelines. The “Typical Lag” column indicates how current the source data typically is at the time of ingestion. See the full data sources page for descriptions and current vintage dates.

SourceIndicatorsRefreshTypical Lag
EPA AQSPM2.5, Ozone, NO2, SO2Monthly6\u201318 months
EPA EJSCREENModeled PM2.5, Diesel PM, Traffic proximity, Superfund proximityQuarterly1 year
EPA TRIToxic releases, facility countsAnnual18 months
EPA UCMR5/ECHOPFAS detectionsQuarterly3\u20136 months
USGS PNSPPesticide use by countyAnnual2\u20133 years
EPA Radon ZonesRadon zone classificationStatic (1993)N/A
NOAA ACISDays above 95\u00b0F, summer max tempMonthlyCurrent year
CDC PLACESDisease prevalence (9 conditions)Monthly1\u20132 years
CDC WONDERCause-specific mortality (8 causes)Monthly1\u20132 years
NCI State Cancer ProfilesCancer incidence by siteMonthly2\u20133 years
CHR / NVSSLow birth weight, infant mortalityMonthly1\u20132 years
CMS NPPESProvider supply by specialtyMonthlyCurrent month
Census ACS 5-YearDemographics, SDOH indicatorsQuarterly1 year
CMS Geographic VariationMedicare spending by categoryMonthly1\u20132 years
HHS emPOWER MapElectricity-dependent Medicare DME (dialysis, oxygen, ventilators); counts 1\u201310 masked to \u226410 for privacyMonthlyCurrent month
Stanford Childs/BurkeWildfire-attributable PM2.5 (annual mean + smoke days). Harvard Dataverse 10.7910/DVN/DJVMTV. CONUS only.Annual3\u20134 years
DOE/ORNL EAGLE-I15-min customer-hours-out outage burden (Brelsford et al Sci Data 2024). figshare 10.6084/m9.figshare.24237376. 3,050 of 3,222 US counties.Annual1 year
NOAA NCEI Storm EventsRecorded weather events 2010\u2013present (events, deaths, injuries, damages). 8 health-relevant event-type buckets.Monthly2\u20133 months
USDA Census of AgricultureCounty-level livestock head counts (hogs, cattle, dairy, broilers, layers, turkeys, sheep). Combined with EPA 40 CFR \u00a7122.23 to derive Animal Units + CAFO density composite + dominant species + NPDES confidence class.Every 5 years1\u20132 years (2022 vintage current)

Sensitivity & Robustness

How stable are the results?

Weight stability

We test alternative weight vectors across reasonable perturbation ranges for every set of weights in the system:

  • Environmental domain weights: Equal (25/25/25/25), air-heavy (45/20/15/20), climate-heavy (25/20/15/40), water-heavy (25/35/20/20)
  • Opportunity score weights: Equal across all dimensions, disease-dominant (50%), environment-dominant (40%), and ±10 percentage point perturbations on each dimension
  • Named signal weights: ±10 percentage point perturbations on the two highest-weighted components

For each alternative, we compute Spearman rank correlation with the baseline, count of counties that change multi-pillar convergence tier, and stability of the top-25 and bottom-25 counties. The sensitivity analysis scripts are versioned alongside the scoring pipeline for reproducibility.

Threshold selection

The multi-pillar convergence threshold (70th percentile) was evaluated by sweeping from the 50th to the 90th percentile in 5-point increments. The sweep measures how many counties qualify and what percentage of the US population they represent at each threshold.

The 70th percentile was selected to balance signal prevalence (enough qualifying counties to be useful for planning) with actionability (few enough that the signal remains discriminating). Below the 65th percentile, multi-pillar convergence becomes too common to inform prioritization; above the 80th, it becomes too rare to support service line planning.

Limitations of weight selection

Dimension and indicator weights reflect structured expert judgment informed by epidemiological literature on environmental health linkages and health system service line economics. They are not empirically optimized against outcome data. We publish our weights transparently and provide sensitivity analysis tooling so users can assess robustness and explore alternative specifications.

Validation

Do the signals predict real-world outcomes?

The validation framework tests whether environmental and surveillance signals correlate with real-world health system utilization. We evaluated 32 signal-outcome pairs across 7 domains (ILI surveillance, wastewater COVID, temperature, humidity, heat index, air quality, severe storms) against outcomes including NHSN hospital admissions, NSSP emergency department visits, and bed occupancy rates.

Key findings

ILI surveillance correlations

r = 0.81\u20130.91

ILI surveillance signals strongly predict influenza hospitalizations (r = 0.81) and flu ED visits (r = 0.91) across all 50+ states. 100% of states show statistically significant associations.

Wastewater COVID signals

r = 0.70\u20130.79

COVID wastewater percentiles predict COVID ED visits (r = 0.79) and hospitalizations (r = 0.70) with 100% state significance. Granger causality confirmed in 58\u201370% of states.

Temperature and respiratory ED

r = 0.63

Weekly max temperature correlates with combined respiratory ED visits across all 48 reporting states. Granger causality confirmed in 73% of states.

Regional consistency

10 / 10 regions

All ten NOAA climate regions show significant ILI-to-hospitalization correlations, with mean |r| ranging from 0.47 (Alaska) to 0.89 (Northwest).

Opportunity Score vs. county-level mortality

The 4-dimension Opportunity Score correlates with mortality from CDC WONDER across every major cause of death (all p < 0.001):

Mortality OutcomeSpearman r
All-cause mortality+0.598
Heart disease mortality+0.546
Chronic lower respiratory disease+0.538
Cancer mortality+0.430

Environmental signal domain performance

Signal DomainPairs TestedAvg Composite Score
ILI Surveillance40.81
Wastewater COVID20.81
Humidity30.51
Temperature70.31
Heat Index60.28
Air Quality60.16
Severe Storms40.13

Composite scores combine correlation strength, statistical significance, Granger causality, and state-level consistency into a single 0–1 metric. Scores above 0.5 indicate strong, consistent predictive relationships.

Interpretation

The strongest validation comes from ILI and wastewater surveillance, where signals predict hospital utilization with high consistency across regions. Environmental exposure signals (temperature, humidity) show moderate but significant associations with respiratory ED visits. Air quality and storm signals show weaker county-level correlations, consistent with the ecological nature of the analysis and the diffuse exposure pathways involved.

These correlations are observed at the state level over time and do not establish individual-level causal pathways. They demonstrate that the signal domains tracked by the platform correspond to measurable patterns in health system utilization.

Limitations & Interpretation Guidance

What the scores can and cannot tell you

Ecological fallacy

All scores represent county-level aggregate patterns. Within-county variation may be substantial. A county with a high respiratory burden score may have asthma concentrated in specific neighborhoods near industrial sources, while other areas of the county are unaffected. Scores should not be interpreted as individual-level risk assessments.

Cross-sectional design

Compound signals identify geographic convergence of risk factors at a point in time. They do not establish temporal sequence or causation. A county with high environmental risk and high disease burden may reflect (a) environmental exposure contributing to disease, (b) population migration patterns that co-locate vulnerable groups with environmental hazards, or (c) shared upstream determinants affecting both.

Data lag

The platform integrates data sources with varying recency. CDC PLACES health estimates lag 1–2 years. EPA air quality data may lag 6–18 months. Provider counts from NPPES reflect registration, not active practice. Scores represent the best available composite picture, not a real-time snapshot.

NPPES limitations

Provider counts are derived from the National Plan and Provider Enumeration System, which records where providers register, not necessarily where they practice. To mitigate this, the Provider Gap score uses a 50/50 population-weighted blend of within-county and neighboring-county provider density. This adjacency adjustment reduces false positives where a county borders a major medical center, but does not fully resolve registration-vs-practice location discrepancies.

Weight subjectivity

Dimension and indicator weights reflect structured expert judgment informed by epidemiological literature and health system service line economics. They are not empirically derived from outcome data. Sensitivity analysis demonstrates that key findings are robust across reasonable weight perturbations.

Geographic resolution

The platform currently operates at county-level resolution. Sub-county patterns (ZIP code, census tract) may differ substantially from county-level scores. County-level analysis is most appropriate for regional planning and needs assessment; facility-level decisions require finer-grained analysis.

Versioning & Changelog

Methodology versions

The methodology is locked within a major version. Formula changes increment the minor version. Structural changes (new dimensions, new signals, validation-informed recalibration) increment the major version. Every scored output row is stamped with the methodology version that produced it.

v1.8.0April 2026
  • Final brand-name taxonomy (#116). Every named signal now follows the same {Exposure} × {Population} grammar with a one- or two-word brand-name + a long-form subtitle. Renames: Wildfire Burden → Smoke Burden, Heat Health Risk → Heat Vulnerability, Heat × Outage Vulnerability → Outage Vulnerability, Industrial Pollution Burden → Industrial Burden, Pesticide × Heat (Farmworker) → Field Burden, CAFO × Flood × Uninsured Rural → Runoff Burden. Two families codified: Burden (Respiratory, Smoke, Industrial, Field, Runoff) and Vulnerability (Heat, Heat-Dialysis, Outage). signal_id strings stay stable for back-compat.
  • Merged Wildfire Smoke Vulnerability + Wildfire-Attributable Burden into one Smoke Burden signal (#115). The two legacy signals shared 2 of 4 components (asthma + COPD blend, pulmonology access deficit) and cannibalized each other on county pages. The merged signal blends 6 components — fire activity (12.5%) + 30-day max AQI (12.5%) + Stanford Childs/Burke annual mean wildfire-PM2.5 (15%) + smoke-days >55 µg/m³ (15%) + asthma + COPD blend (25%) + pulmonology access deficit (20%) — capturing both acute and chronic exposure plus population vulnerability and response capacity in one number.
  • Total named signals now 8 (was 9). Existing share-view URLs and saved cohorts referencing the legacy signal_ids continue to resolve via a soft-redirect to wildfire_burden.
v1.7.0April 2026
  • Added Pesticide × Heat (Farmworker) named signal — USGS PNSP pesticide intensity × NOAA summer max-temp percentile × farmworker-exposure proxy. Surfaces the counties where outdoor ag workers face simultaneous heat and pesticide risk. 3,254 counties scored; top 5 are TX panhandle + AZ ag counties. Anchors HRSA 330(g) migrant/seasonal-worker-center positioning.
  • Added CAFO × Flood × Uninsured Rural named signal — CAFO density × flood exposure × rural Medicaid coverage gap. Pattern after Hurricane Florence's 2018 inundation of 91 NC swine + 36 poultry CAFOs. 3,369 counties scored; top 5 are MN/MO/IA/KY/SD pork-belt counties. Flood exposure currently uses a placeholder formulation pending Lynch/Parks 2025 ingestion (ticket #76).
  • Total named signals now 9 (was 7). gold/compound_signal_scores.parquet rebuilt automatically with both new signals; score distributions are healthy (medians ~50, maxes 92–96, no degenerate columns).
  • Added /platform/rural-ag landing page — public, ungated buyer surface for the agricultural-exposures data depth (HRSA 330(g) centers, ag-state Medicaid, Critical Access Hospitals, state ag-health departments).
v1.6.0April 2026
  • Agricultural Exposures foundation (ticket #74) — five new pipelines landed: USDA NASS crops + livestock surveys, USGS PNSP national pesticide use, CA PUR section-grain pesticide application, and a gold CAFO composite index derived from USDA Census + EPA 40 CFR §122.23 animal-unit conversions. Spans 3,222 counties.
  • New CAFO Density section on county pages and the platform Environmental Factors tab — surfaces national rank-percentile, animal units per square mile, dominant livestock species, and the NPDES federal-permit confidence class per state (high / medium / low / state-only).
  • NPDES confidence-class methodology: state-only program counties (IN, ID, AR) are flagged because federal CAFO permit data is structurally absent there; the index runs entirely off USDA Census head counts and is labeled accordingly. Conservative bias on AU totals (USDA doesn't break out the EPA piglet sub-class), so actuals likely run 5–10% higher than reported.
  • Two follow-up compound signals filed but not yet shipped: pesticide_heat_farmworker (#90, all inputs landed; needs compute()) and cafo_flood_uninsured_rural (#91, blocked on flood-exposure ingest from #76).
v1.5.0April 2026
  • Added Heat × Outage Vulnerability named signal — the compound that no aggregator surfaces. 30% summer heat × 25% EAGLE-I outage burden × 20% emPOWER DME × 15% CHD/COPD blend × 10% pre-1980 housing AC proxy. Anchored on McBrien & Casey PLOS Med 2026, Stone EST 2023, and the realized 2021 Texas Uri event.
  • New foundation pipeline foundation_doe_eagle_i (annual) ingesting DOE/ORNL EAGLE-I 15-min outage data — customer-hours-out, peak intervals, MCC-normalized fraction. 3,050 of 3,222 US counties.
  • New foundation pipeline foundation_noaa_storm_events (monthly) ingesting NOAA NCEI Storm Events Database — cumulative 2010-present + last-5y windows, 8 health-relevant event-type buckets.
  • Tract bundles auto-extend to 7 named signals via REGISTRY iteration; the new heat-outage signal flows to all 3,222 county tract.json files automatically with is_county_fallback=true (county-only inputs).
v1.3.0April 2026
  • Added Heat × Dialysis Vulnerability named signal — pairs summer heat with HHS emPOWER dialysis-dependent Medicare beneficiaries and CDC PLACES CKD prevalence (40/35/25 weights)
  • Added Wildfire-Attributable Burden named signal — uses Stanford Childs/Burke peer-reviewed wildfire-PM2.5 (Harvard Dataverse), distinct from the existing acute Wildfire Smoke Vulnerability signal which uses fire-perimeter proximity
  • New foundation pipeline foundation_cms_empower (monthly) ingesting HHS emPOWER electricity-dependent DME counts at county resolution; counts 1–10 are masked to ≤10 for privacy
  • New foundation pipeline foundation_stanford_wildfire (annual) ingesting Childs/Burke wildfire-PM2.5 (CONUS only)
  • Three of four pillars (Environmental Risk, Disease Burden, SDOH Stress) now resolve at tract-native resolution; Provider Gap uses haversine-weighted adjacency adjustment with σ=20km decay (slice 2 of tract-native scoring)
v1.2.0April 2026
  • Opportunity Score expanded to 4 dimensions: added SDOH Stress (15%) alongside Environmental Risk (25%), Disease Burden (35%), Provider Gap (25%)
  • Provider Gap now uses 50/50 population-weighted adjacency adjustment (within-county + neighboring-county blend)
  • Compound signal detection expanded from 3 to 4 dimensions; tiers updated accordingly
  • Validation correlations improved by 0.08–0.10 across all mortality outcomes with the 4-dimension model
  • Cancer incidence data added (NCI State Cancer Profiles, 2,926 counties, 5 sites)
  • Full EJSCREEN environmental justice proximity indicators (9 new fields)
  • Demographics with race/ethnicity composition and pre-1978 housing lead risk proxy
  • Historical trends for 8 CDC PLACES measures (2018–2023)
v1.1.0April 2026
  • Replaced max() with dominant/secondary blend (60/40) for multi-condition disease components in named signals
  • Added specialty-specific provider gaps: pulmonology for respiratory and wildfire signals, cardiology for heat health risk
  • Added components_present and coverage_pct metadata to all dimension scores
  • Published sensitivity analysis across weight and threshold parameters
  • Published validation against NHSN, NSSP, and CMS utilization data
  • Added Limitations & Interpretation Guidance section
  • Updated methodology text to use associational rather than causal language
v1.0.0March 2026
  • Initial release with 4 named compound signals
  • max() aggregation for multi-condition disease components
  • Total provider density (not specialty-specific)

Citation

How to cite this methodology

Banana Analytics. (2026). Compound Signal Scoring Methodology, v1.8.0. Banana Analytics Technical Documentation. https://banana-analytics.com/methodology

If you reference the methodology or its outputs in published work, community health needs assessments, or grant applications, please cite the version you used. The methodology version is displayed on every report generated by the platform.

See the methodology in action.

Explore environmental risk, disease burden, provider access, and compound signals for any US county.