Corpus Intelligence CMS Data Sources 2026-04-26 00:39 UTC
CMS Data Sources
7 datasets registered · 5 live API endpoints · stdlib-only client
🛡️ Public data only — no PHI permitted on this instance.
CMS Sources
7
in registry
Live API
5
real-time endpoints
Estimated Volume
~50M
rows per year combined
Update Lag
18 mo
typical CMS release delay
CMS Open Data endpoints powering corpus analytics
CMS Open Data Endpoint Registry
Dataset / Description ID Status Cadence Volume Diligence Use Case
Medicare Physician & Other Practitioners
Annual Medicare payments by NPI — services, beneficiary counts, allowed charges, reimbursement. Primary source for provider-level utilization benchmarking.
Rndrng_NPI, Tot_Srvcs, Tot_Benes, Tot_Mdcr_Pymt_Amt
9552 API LIVE Annual (calendar year lag ~18 months) ~10M per year Market concentration (HHI/CR3), provider regime classification, opportunity ranking
Medicare Part D Prescribers
Drug prescribing patterns by NPI — drug name, claim count, cost, beneficiary demographics. Critical for specialty pharmacy and pharma services diligence.
Prscrbr_NPI, Tot_Clms, Tot_Drug_Cst, Tot_Benes
9548 API LIVE Annual ~25M per year Specialty pharmacy benchmarking, biosimilar penetration analysis
Medicare Inpatient Hospitals
DRG-level charges and payments for inpatient stays. Charge-to-payment ratio reveals pricing aggressiveness vs. Medicare reimbursement norms.
Rndrng_Prvdr_CCN, DRG_Cd, Avg_Tot_Pymt_Amt, Avg_Mdcr_Pymt_Amt
9545 API LIVE Annual ~200K per year Hospital efficiency benchmarking, charge capture analysis, case-mix validation
Medicare Outpatient Hospitals
APC-level outpatient charges and payments. Tracks ambulatory surgery center (ASC) migration opportunity and outpatient volume trends.
Rndrng_Prvdr_CCN, APC_Cd, Avg_Tot_Pymt_Amt
9546 API LIVE Annual ~500K per year ASC conversion opportunity analysis, outpatient volume benchmarking
Medicare Advantage Enrollment
MA plan enrollment by contract, county, and plan type. Tracks MA penetration growth — critical for value-based care deal analysis.
Contract_ID, Enroll_AMT, Plan_Type
9549 API LIVE Monthly ~1M per year MA penetration trend, provider-payer alignment scoring, VBC opportunity sizing
CMS Open Payments (Sunshine Act)
Drug/device manufacturer payments to physicians and teaching hospitals. Identifies conflict-of-interest exposure and pharma alignment.
physician_npi, total_amount_of_payment_usdollars, name_of_manufacturer
open-payments REFERENCE Annual ~12M per year Physician referral pattern analysis, conflict screening for acquisitions
Medicare Cost Reports (HCRIS)
Hospital cost report data — total costs, revenues, payer mix, FTE counts. Gold standard for hospital financial diligence. Annual bulk download.
Prov_Num, Total_Costs, Total_Revenues, Medicaid_Days, Medicare_Days
HCRIS BULK DL Annual (18-month lag) ~6K facilities per year Payer mix verification, cost structure benchmarking, margin validation
rcm_mc/data_public/cms_api_client.py
CMS API Client — Quick Reference
# rcm_mc/data_public/cms_api_client.py
from rcm_mc.data_public.cms_api_client import fetch_pages, fetch_provider_utilization

# Low-level: fetch raw paginated JSON
rows = fetch_pages(
    "https://data.cms.gov/data-api/v1/dataset/9552/data",
    limit=5000,
    max_pages=3,
    retry_count=3,
)

# High-level: physician utilization with normalization
providers = fetch_provider_utilization(
    state="TX",
    specialty="Cardiology",
    year=2022,
    max_pages=5,
)
# → List[Dict] with: npi, provider_name, state, specialty,
#   total_services, total_beneficiaries, total_payment

# Winsorize heavy-tailed payment columns
from rcm_mc.data_public.cms_api_client import winsorize_column
clean = winsorize_column([col for col in payments], upper_quantile=0.95)
    
FunctionDescription
fetch_pages(endpoint, ...)Raw paginated fetch with retry. Returns List[Dict].
fetch_provider_utilization(state, specialty, year, ...)Physician utilization, filtered and normalized.
fetch_geographic_variation(state, ...)Geographic variation dataset for regional benchmarking.
normalize_row(row)Map CMS column names → internal canonical names.
winsorize_column(values, upper_quantile)Clip heavy tails for comparability analysis.
safe_float(value, default)Null-safe numeric coercion from CMS string fields.
how CMS data feeds corpus modules
Integration with Corpus Analytics

CMS data feeds three analytical layers in the corpus:

Market Structure
HHI / CR3 / CR5 concentration by state-specialty. Feeds white-space scoring and M&A opportunity ranking.
→ market_concentration.py
Provider Regime
Classify providers as durable_growth / steady / stagnant / declining using multi-year payment trends.
→ regime_classifier.py
Consensus Ranking
Blended opportunity rank across volume, payment, regime, and investability lenses for target screening.
→ cms_provider_ranking.py