CMS Data Sources
7 datasets registered · 5 live API endpoints · stdlib-only client
🛡️ Public data only — no PHI permitted on this instance.
CMS Sources
7
in registry
Live API
5
real-time endpoints
Estimated Volume
~50M
rows per year combined
Update Lag
18 mo
typical CMS release delay
DATA SOURCESCMS Open Data endpoints powering corpus analytics
CMS Open Data Endpoint Registry
| Dataset / Description | ID | Status | Cadence | Volume | Diligence Use Case |
|---|---|---|---|---|---|
| Medicare Physician & Other Practitioners Annual Medicare payments by NPI — services, beneficiary counts, allowed charges, reimbursement. Primary source for provider-level utilization benchmarking. Rndrng_NPI, Tot_Srvcs, Tot_Benes, Tot_Mdcr_Pymt_Amt |
9552 | API LIVE | Annual (calendar year lag ~18 months) | ~10M per year | Market concentration (HHI/CR3), provider regime classification, opportunity ranking |
| Medicare Part D Prescribers Drug prescribing patterns by NPI — drug name, claim count, cost, beneficiary demographics. Critical for specialty pharmacy and pharma services diligence. Prscrbr_NPI, Tot_Clms, Tot_Drug_Cst, Tot_Benes |
9548 | API LIVE | Annual | ~25M per year | Specialty pharmacy benchmarking, biosimilar penetration analysis |
| Medicare Inpatient Hospitals DRG-level charges and payments for inpatient stays. Charge-to-payment ratio reveals pricing aggressiveness vs. Medicare reimbursement norms. Rndrng_Prvdr_CCN, DRG_Cd, Avg_Tot_Pymt_Amt, Avg_Mdcr_Pymt_Amt |
9545 | API LIVE | Annual | ~200K per year | Hospital efficiency benchmarking, charge capture analysis, case-mix validation |
| Medicare Outpatient Hospitals APC-level outpatient charges and payments. Tracks ambulatory surgery center (ASC) migration opportunity and outpatient volume trends. Rndrng_Prvdr_CCN, APC_Cd, Avg_Tot_Pymt_Amt |
9546 | API LIVE | Annual | ~500K per year | ASC conversion opportunity analysis, outpatient volume benchmarking |
| Medicare Advantage Enrollment MA plan enrollment by contract, county, and plan type. Tracks MA penetration growth — critical for value-based care deal analysis. Contract_ID, Enroll_AMT, Plan_Type |
9549 | API LIVE | Monthly | ~1M per year | MA penetration trend, provider-payer alignment scoring, VBC opportunity sizing |
| CMS Open Payments (Sunshine Act) Drug/device manufacturer payments to physicians and teaching hospitals. Identifies conflict-of-interest exposure and pharma alignment. physician_npi, total_amount_of_payment_usdollars, name_of_manufacturer |
open-payments | REFERENCE | Annual | ~12M per year | Physician referral pattern analysis, conflict screening for acquisitions |
| Medicare Cost Reports (HCRIS) Hospital cost report data — total costs, revenues, payer mix, FTE counts. Gold standard for hospital financial diligence. Annual bulk download. Prov_Num, Total_Costs, Total_Revenues, Medicaid_Days, Medicare_Days |
HCRIS | BULK DL | Annual (18-month lag) | ~6K facilities per year | Payer mix verification, cost structure benchmarking, margin validation |
API CLIENTrcm_mc/data_public/cms_api_client.py
CMS API Client — Quick Reference
Python Usage
# rcm_mc/data_public/cms_api_client.py from rcm_mc.data_public.cms_api_client import fetch_pages, fetch_provider_utilization # Low-level: fetch raw paginated JSON rows = fetch_pages( "https://data.cms.gov/data-api/v1/dataset/9552/data", limit=5000, max_pages=3, retry_count=3, ) # High-level: physician utilization with normalization providers = fetch_provider_utilization( state="TX", specialty="Cardiology", year=2022, max_pages=5, ) # → List[Dict] with: npi, provider_name, state, specialty, # total_services, total_beneficiaries, total_payment # Winsorize heavy-tailed payment columns from rcm_mc.data_public.cms_api_client import winsorize_column clean = winsorize_column([col for col in payments], upper_quantile=0.95)
Key Functions
| Function | Description |
|---|---|
| fetch_pages(endpoint, ...) | Raw paginated fetch with retry. Returns List[Dict]. |
| fetch_provider_utilization(state, specialty, year, ...) | Physician utilization, filtered and normalized. |
| fetch_geographic_variation(state, ...) | Geographic variation dataset for regional benchmarking. |
| normalize_row(row) | Map CMS column names → internal canonical names. |
| winsorize_column(values, upper_quantile) | Clip heavy tails for comparability analysis. |
| safe_float(value, default) | Null-safe numeric coercion from CMS string fields. |
ANALYTICS INTEGRATIONhow CMS data feeds corpus modules
Integration with Corpus Analytics
CMS data feeds three analytical layers in the corpus:
Market Structure
HHI / CR3 / CR5 concentration by state-specialty. Feeds white-space scoring and M&A opportunity ranking.
→ market_concentration.py
Provider Regime
Classify providers as durable_growth / steady / stagnant / declining using multi-year payment trends.
→ regime_classifier.py
Consensus Ranking
Blended opportunity rank across volume, payment, regime, and investability lenses for target screening.
→ cms_provider_ranking.py