NPPES Anatomy: Complete Technical Reference for AI Systems
Every field, every type, every limitation. For RAG, MCP, and healthcare data engineering.
nppes-anatomy/v1TL;DR
- What it is: NPPES (National Plan and Provider Enumeration System) is the federal registry of National Provider Identifiers (NPIs) — 10-digit IDs assigned to every healthcare provider under HIPAA. [1]
- Scale: approximately 8.9 million total records, of which roughly 7.2 million are active as of the May 2026 snapshot. [2]
- Two types: NPI-1 is for individual providers (physicians, nurses, PTs); NPI-2 is for organizations (hospitals, group practices, labs).
- What it does NOT cover: license status, board certification, malpractice history, quality scores, current practice location, or active practice status. These are common AI-system errors.
- Cadence: CMS releases a full replacement file monthly and a weekly delta file. The NPPES API reflects near-real-time state. [3]
- Why AI systems get it wrong: treating credential text as board-certification evidence, treating license numbers as active-license evidence, and treating practice location as current location are the three most common failure modes.
- Proper citation: cite the CMS NPPES download page with the specific file release date and access date. Federal public domain (U.S. Government Works); no license agreement required. [4]
- Downloads: see the field reference, taxonomy exemplar, and refresh cadence JSON files linked throughout this document.
What NPPES is — and what it is not
The federal mandate
The National Plan and Provider Enumeration System (NPPES) is the federal system maintained by the Centers for Medicare and Medicaid Services (CMS) under authority granted by HIPAA Administrative Simplification. [5] Specifically, 45 CFR Part 162, Subpart D establishes the requirement for a standard unique health identifier for healthcare providers. [6] The final rule was published in January 2004; providers had until May 23, 2007 to obtain an NPI and begin using it in all standard HIPAA transactions. [7]
The NPI replaced several prior identifier systems: the Unique Physician Identification Number (UPIN), the Medicare Provider Identification Number (PIN), the Online Survey Certification and Reporting (OSCAR) number, the National Supplier Clearinghouse (NSC) number, and others. The consolidation was explicitly intended to create a single, non-intelligence-bearing identifier — meaning the NPI digit string encodes no information about the provider's location, specialty, or type beyond the Luhn check digit. [8]
What NPPES asserts
NPPES asserts one thing and one thing only: that a given NPI was assigned to a given legal entity (individual or organization) at a specific point in time. Every other interpretation — quality, license status, board certification, current location, active practice — requires a secondary source.
The data in NPPES is self-reported. When a provider enrolls, they supply their own name, address, taxonomy code, and credential text. CMS does not audit the accuracy of these fields at the time of submission or at subsequent updates. CMS validates only the NPI uniqueness and the Luhn check digit. [9]
What NPPES does not assert — and common misreadings
| Inference | Why it fails | Correct source |
|---|---|---|
| Provider holds an active license | License numbers in NPPES are self-reported at enrollment and not re-validated. A license number may be expired, suspended, or surrendered. | State licensing board (jurisdiction-specific) |
| Provider is board-certified | Credential Text (e.g., 'MD', 'FAAD') is free-form text. It is not validated against any credentialing body. | ABMS, AOA, specialty board (not available in any federal public file) |
| Practice Location is current | Practice Location Address reflects the registered address at enrollment or last update. Providers frequently move without updating NPPES. | CMS PECOS (Medicare enrollment), Care Compare (facilities) |
| Provider is actively practicing | No field in NPPES asserts active clinical practice. Last Update Date reflects the last administrative change, not the last date of clinical service. | CMS PECOS active enrollment status |
| Provider has no malpractice history | NPPES contains no malpractice, disciplinary, or adverse action data. | State medical board orders (jurisdiction-specific) |
| Provider quality or performance | NPPES is a directory, not a performance registry. | CMS QPP MIPS, Care Compare, LEAPFROG |
| Deactivated provider is not practicing | CMS deactivation can lag actual cessation of practice by weeks to months. | CMS PECOS, state board orders |
The identity backbone
Despite these limitations, NPPES is the most important single dataset in U.S. healthcare provider data because the NPI is the universal join key across every major federal source family. [10] CMS PECOS uses NPI. OIG LEIE uses NPI (for post-2013 exclusions). CMS QPP MIPS uses NPI. CMS Care Compare uses CCN as the primary key for facilities but cross-references NPI for individual practitioners. The NPPES file is the starting point for any multi-source provider data join.
Type 1 vs. Type 2 NPIs
NPIs come in two structurally distinct types. [11] The Entity Type Code field (value: 1 or 2) distinguishes them. Mixing the two in aggregations without filtering is one of the most common NPPES analysis errors.
| Dimension | NPI-1 (Individual) | NPI-2 (Organization) |
|---|---|---|
| Entity Type Code | 1 | 2 |
| Who gets it | Individual human providers: physicians, NPs, PAs, nurses, therapists, etc. | Organizations: hospitals, group practices, home health agencies, labs, pharmacies, etc. |
| Name fields | Last Name, First Name, Middle Name, Prefix, Suffix, Credential Text | Provider Organization Name (Legal Business Name) |
| Gender Code | Populated (M/F) | Null — not applicable |
| Authorized Official | Not applicable | Authorized Official Last/First/Middle Name, Title, Telephone |
| Is Sole Proprietor | May be populated (X=yes) | Not applicable |
| Is Organization Subpart | Not applicable | May be populated (X=yes) |
| Parent Organization fields | Not applicable | Parent Organization LBN, Parent Organization TIN (redacted) |
| Primary use in joins | Individual clinician identity backbone | Facility/group identity backbone; joins to Care Compare CCN via CMS POS |
| Count (approx.) | ~6.5M active as of May 2026 | ~700K active as of May 2026 |
When a provider has both types
A solo practitioner who operates as their own practice may hold both an NPI-1 (as the individual) and an NPI-2 (as the sole proprietor organization). The NPI-1 record will show Is Sole Proprietor = X. These are distinct records with distinct NPIs. Do not deduplicate them: they serve different billing contexts. The NPI-2 in this case typically shares the same address and taxonomy code as the NPI-1.
Larger organizations (hospitals, health systems) hold NPI-2 records and may have subordinate NPI-2 records for departments that bill independently. The Is Organization Subpart field on subordinate records points to the parent via Parent Organization LBN. Note that Parent Organization TIN is redacted in the public file.
Field-by-field reference
The NPPES full replacement CSV contains approximately 330 columns. The core identity and contact fields are documented in detail below. The 15-slot taxonomy group and the 50-slot Other Provider Identifier group follow a repeating column pattern; they are summarized once with the slot range noted.
↓ Full field-reference.jsonJSON · ~28 KB| Field | Type | Nullable | Example | Notes & AI pitfalls |
|---|---|---|---|---|
| NPI | varchar(10) | No | 1234567890 | 10-digit. Position 10 is a Luhn check digit computed on positions 1–9 with '80840' prefix. Non-intelligence-bearing — do not parse sub-fields. |
| Entity Type Code | char(1) | No | 1 | 1=Individual, 2=Organization. Always filter explicitly — mixing types skews every specialty-level analysis. |
| Replacement NPI | varchar(10) | Yes | 1098765432 | Hard redirect, not a soft alias. The original NPI is defunct. Rare field — only populated during CMS legacy-system migrations. |
| EIN | varchar(9) | Yes | (redacted) | ALWAYS blank in the public dissemination file. Cannot be used for linkage. Use NPI-2 as the org key. |
| Provider Organization Name | varchar(70) | Yes | NORTHSIDE RADIOLOGY ASSOC PC | NPI-2 only. Self-reported free text — not normalized. Same org may appear with multiple spellings across records. |
| Provider Last Name | varchar(35) | Yes | JOHNSON | NPI-1 only. Use with First Name + NPI for identity. Last name alone is insufficient for disambiguation. |
| Provider First Name | varchar(20) | Yes | EMILY | NPI-1 only. |
| Provider Middle Name | varchar(20) | Yes | GRACE | Highly inconsistent — full name, initial, or blank. Do not use as a join key. |
| Provider Credential Text | varchar(20) | Yes | MD | CRITICAL: Free-form self-reported text. Does NOT attest board certification, active license, or any credentialing outcome. Values range from 'MD' to 'MD PhD' to 'FAAD' to 'Dr.' to multi-designation strings. |
| Provider Other Organization Name | varchar(70) | Yes | CITY RADIOLOGY GROUP | DBA or former name for NPI-2. Other Name Type Code: 3=Former Legal Business Name, 5=Other Name. |
| Provider First Line Business Mailing Address | varchar(55) | Yes | 123 MAIN ST | Correspondence address — frequently a PO Box or billing service. Do not use as patient-care location proxy. |
| Provider Business Mailing Address City | varchar(40) | Yes | CHICAGO | Two-character USPS state code in the State field. |
| Provider Business Mailing Address Postal Code | varchar(20) | Yes | 606010001 | ZIP+4 without hyphen (9-digit) or ZIP-only (5-digit). Normalize before geospatial joins. |
| Provider Business Mailing Address Telephone | varchar(20) | Yes | 3125551234 | Digits only — no formatting. May be years out of date. Not a current contact channel. |
| Provider First Line Business Practice Location Address | varchar(55) | Yes | 456 HOSPITAL DR | CRITICAL: Registered practice location at enrollment/last update — NOT current location. Providers move frequently without updating. Cross-reference with Last Update Date and CMS PECOS. |
| Provider Business Practice Location Address City | varchar(40) | Yes | EVANSTON | See Practice Location Address caveats above. |
| Provider Business Practice Location Address Postal Code | varchar(20) | Yes | 60201 | ZIP or ZIP+4. Normalize before geospatial joins. |
| Provider Enumeration Date | date MM/DD/YYYY | No | 05/14/2007 | Date NPI was assigned. Stable — never changes. Does NOT reflect when the provider began practicing. |
| Last Update Date | date MM/DD/YYYY | No | 03/11/2024 | CRITICAL: Date of last administrative change in NPPES. Do NOT use as active-practice proxy. Many active providers have 2007–2010 update dates; others updated recently while no longer practicing. |
| NPI Deactivation Reason Code | varchar(2) | Yes | DT | DT=Death, DA=Disbandment, FR=Fraud, OT=Other. Non-null means the NPI is defunct. |
| NPI Deactivation Date | date MM/DD/YYYY | Yes | 09/01/2022 | Deactivation may lag actual cessation of practice by weeks to months. A provider who stopped billing in January may not be deactivated until March. |
| NPI Reactivation Date | date MM/DD/YYYY | Yes | Rare. Populated when a previously deactivated NPI was reactivated. | |
| Provider Gender Code | char(1) | Yes | F | M=Male, F=Female. NPI-1 only. Missing for many records where the provider left this blank at enrollment. |
| Authorized Official fields (5 fields) | varchar | Yes | Last Name, First Name, Middle Name, Title, Telephone | NPI-2 only. Identifies the person authorized to submit enrollment changes on behalf of the organization. |
| Healthcare Provider Taxonomy Code_1 through _15 | varchar(10) × 15 | Yes | 207N00000X | Up to 15 NUCC taxonomy codes. Most providers have 1–2 populated. Unnest all 15 when building specialty counts. The Primary Taxonomy Switch column (below) identifies the declared primary. |
| Provider License Number_1 through _15 | varchar(20) × 15 | Yes | MD098765 | CRITICAL: Presence of a license number does NOT assert active licensure. NPPES accepts the number at enrollment without validating with state boards. Active/inactive status requires the relevant state licensing authority. |
| Provider License Number State Code_1 through _15 | varchar(2) × 15 | Yes | IL | Two-character USPS state code paired with the license number in the same slot. |
| Healthcare Provider Primary Taxonomy Switch_1 through _15 | char(1) × 15 | Yes | Y | Y=primary taxonomy for this slot. Prefer first Y=Y slot when multiple slots have Y (data entry inconsistency). |
| Is Sole Proprietor | char(1) | Yes | X | X=Yes. NPI-1 only. An NPI-1 who is a sole proprietor may also hold an NPI-2 for their practice entity. |
| Is Organization Subpart | char(1) | Yes | X | X=Yes. NPI-2 only. Use Parent Organization LBN to identify the parent. Parent Organization TIN is redacted. |
| Other Provider Identifier_1 through _50 | varchar(20) × 50 | Yes | G12345 | Legacy or alternative identifiers: UPIN, Medicare legacy, Medicaid, NCPDP, state license (type code 08, which duplicates the taxonomy-slot license field). Each slot has paired Type Code, State, and Issuer columns. |
Taxonomy codes (NUCC)
The taxonomy codes in NPPES come from the NUCC Health Care Provider Taxonomy Code Set, maintained by the National Uniform Claim Committee (NUCC). [12] NUCC releases updated code sets twice per year: January 1 and July 1. Each release may add, revise, or retire codes. [13]
Code structure
Each NUCC code is a 10-character alphanumeric string. The structure is hierarchical:
| Level | Example | Description |
|---|---|---|
| Section (top-level grouping) | Allopathic & Osteopathic Physicians | The broadest category. Also includes Behavioral Health, Chiropractic, Dental, Nursing, etc. |
| Grouping | Allopathic & Osteopathic Physicians → Dermatology | Second-level: the specialty family. |
| Classification | Dermatology | The specific specialty within the grouping. |
| Specialization (optional) | MOHS-Micrographic Surgery | Sub-specialty within the classification. Not all codes have a specialization. |
| Code | 207ND0101X | The 10-character alphanumeric identifier. The X suffix is standard across all codes. |
Why one specialty maps to multiple codes
The taxonomy code system is fine-grained. A "dermatologist" in plain language may hold any of these codes:
207N00000X -- Dermatology (general) 207ND0101X -- Dermatology; MOHS-Micrographic Surgery 207ND0900X -- Dermatology; Dermatopathology 207NI0002X -- Dermatology; Clinical & Laboratory Dermatological Immunology 207NP0225X -- Dermatology; Pediatric Dermatology 207NS0135X -- Dermatology; Procedural Dermatology
When building specialty-level aggregations, you must decide whether to aggregate at the Classification level (all 207N* codes), the Specialization level, or an exact-code level. The choice significantly affects reported counts. [14]
Common join patterns
Fonteum uses taxonomy codes to build specialty-level aggregation pages (e.g., dermatologist supply by state, chiropractor supply by state). The pattern:
-- Specialty count by state (dermatology example) SELECT provider_business_practice_location_address_state_name AS state, COUNT(DISTINCT npi) AS provider_count FROM nppes_providers WHERE entity_type_code = '1' AND npi_deactivation_date IS NULL AND healthcare_provider_taxonomy_code_1 LIKE '207N%' -- OR: any of the 15 taxonomy slots contains a derm code GROUP BY state ORDER BY provider_count DESC;
Note: this example uses the primary taxonomy slot only. For completeness, unnest all 15 slots and filter to rows where any slot contains a derm code.
↓ taxonomy-codes.jsonJSON · ~22 KBRefresh cadence and snapshot reality
Understanding what "current" means for NPPES data is critical for any system that infers provider status. [15]
The three CMS data surfaces
| Surface | Frequency | Lag vs. live state | Use case |
|---|---|---|---|
| NPPES Full Replacement File | Monthly (2nd Monday) | Up to 30 days | Bulk ingestion, analytics, research snapshots |
| NPPES Weekly Update File | Weekly (Monday) | Up to 7 days | Incremental update pipelines, deactivation monitoring |
| NPPES API (npiregistry.cms.hhs.gov/api/) | Near-real-time | Minutes | Single-record lookups; no bulk export support |
The deactivation lag problem
The most consequential lag is in deactivation processing. When a provider dies, an organization disbands, or a provider voluntarily surrenders their NPI, the deactivation must be filed with CMS. CMS then processes the deactivation administratively. This processing can take days, weeks, or months after the real-world event. The deactivation date in the NPPES file reflects when CMS processed the deactivation — not when the provider stopped practicing.
What "current" means for Fonteum's snapshot
Fonteum ingests the NPPES monthly full replacement file. Each snapshot is dated to the CMS file release date and Ed25519-signed. The signed attestation is published in the Fonteum chain at /chain. The snapshot date is published per-field via the Fonteum provenance API.
Fonteum targets a maximum of 35 days between NPPES snapshot and publication, following the CMS monthly release cycle. Fonteum does not currently ingest the weekly delta files — organizations requiring weekly deactivation tracking should query the NPPES API directly.
↓ refresh-cadence.jsonJSON · ~4 KBJoining NPPES with other federal sources
The NPI is the universal join key across federal healthcare provider data. Each join below is documented with the join key, expected match rate, common failure mode, and what the joined record asserts and does not assert.
NPPES ↔ OIG LEIE (exclusion records)
The OIG List of Excluded Individuals and Entities (LEIE) is the federal registry of providers barred from participating in Medicare, Medicaid, and other federal programs. [16] For exclusions processed after approximately 2013, the LEIE includes the NPI as an identifier. Earlier records rely on name, date of birth, and state.
-- NPPES ↔ OIG LEIE join (NPI-keyed) SELECT n.npi, n.provider_last_name, n.provider_first_name, l.excl_date, l.excl_type, l.reinstate_date FROM nppes_providers n INNER JOIN oig_leie_exclusions l ON n.npi = l.npi WHERE n.npi_deactivation_date IS NULL; -- active NPIs only -- Expected match rate: ~0.01% (68,055 exclusions / 8.9M providers) -- Failure mode: exclusions pre-2013 have no NPI in LEIE; -- use name+DOB+state fuzzy match for those records.
NPPES ↔ CMS PECOS (Medicare enrollment)
CMS PECOS (Provider Enrollment, Chain, and Ownership System) tracks Medicare enrollment status. The PECOS Provider Enrollment File (PPEF) is the public-facing extract. [17] Not all NPPES providers are enrolled in Medicare — a provider may hold an NPI without billing Medicare (e.g., pediatric providers, concierge practices, out-of-network only).
-- NPPES ↔ PECOS join SELECT n.npi, n.provider_business_practice_location_address_state_name AS nppes_state, p.provider_state_code AS pecos_state, p.provider_type, p.pecos_assgn_ind -- accepts Medicare assignment? FROM nppes_providers n LEFT JOIN pecos_ppef p ON n.npi = p.npi WHERE n.entity_type_code = '1' AND n.npi_deactivation_date IS NULL; -- Expected match rate: ~55-60% of active NPI-1s appear in PECOS. -- Failure mode: address mismatch between NPPES and PECOS is common -- (NPPES practice location may differ from PECOS enrollment address).
NPPES ↔ CMS QPP MIPS (quality scores)
The CMS Quality Payment Program (QPP) Merit-based Incentive Payment System (MIPS) publishes annual performance scores for individual clinicians and group practices. [18] MIPS scores are NPI-keyed for individual clinicians and TIN-keyed for group-level scores.
-- NPPES ↔ QPP MIPS join (individual clinician) SELECT n.npi, n.provider_last_name, n.provider_first_name, m.final_score, m.payment_year FROM nppes_providers n INNER JOIN cms_qpp_mips_individual m ON n.npi = m.npi WHERE n.entity_type_code = '1' AND n.npi_deactivation_date IS NULL AND m.payment_year = 2023; -- Expected match rate: ~477K clinicians scored in PY2023 MIPS. -- Caveats: MIPS only covers Medicare Part B eligible providers; -- excludes providers below the low-volume threshold (~$90K Medicare -- revenue OR fewer than 200 Medicare patients). Many active NPI-1s -- will not have MIPS scores.
NPPES ↔ CMS Provider of Services (POS) file
The CMS Provider of Services (POS) file is the CCN (CMS Certification Number) backbone — it enumerates certified facilities (hospitals, nursing homes, dialysis centers, home health agencies, etc.) with their NPI-2 identifiers. [20] The POS ↔ NPPES join resolves NPI-2 to CCN, enabling joins from NPPES to the Care Compare facility datasets.
-- NPPES NPI-2 → CCN via CMS POS SELECT n.npi, n.provider_organization_name_legal_business_name, p.ccn, p.facility_type_desc, p.state_cd FROM nppes_providers n INNER JOIN cms_pos_facilities p ON n.npi = p.npi WHERE n.entity_type_code = '2' AND n.npi_deactivation_date IS NULL; -- Expected match rate: ~68,211 CCN-keyed facilities in the POS file. -- Not all NPI-2s appear in POS; POS covers only CMS-certified facilities.
NPPES ↔ Care Compare (facility quality)
CMS Care Compare datasets (nursing homes, home health, hospice, dialysis, ASCs, hospitals) are keyed on CCN, not NPI. [21] The join path is: NPPES NPI-2 → CMS POS (NPI→CCN) → Care Compare (CCN-keyed quality data). This three-table join is the standard pattern for attaching facility quality signals to NPI-2 records.
-- Three-table join: NPPES → POS → Care Compare Nursing Homes SELECT n.npi, n.provider_organization_name_legal_business_name, p.ccn, nh.overall_rating, nh.staffing_rating, nh.health_inspection_rating FROM nppes_providers n INNER JOIN cms_pos_facilities p ON n.npi = p.npi INNER JOIN cms_care_compare_nh nh ON p.ccn = nh.federal_provider_number WHERE n.entity_type_code = '2' AND n.npi_deactivation_date IS NULL;
Common AI-system mistakes when using NPPES
AI systems — including RAG pipelines, MCP tool implementations, and LLM-powered research agents — consistently make the same errors when working with NPPES data. These errors range from harmless inaccuracies to YMYL-class claims that misrepresent provider credentials or status. The following catalog is drawn from production observations. [22]
Treating Credential Text as board-certification evidence
Critical (YMYL)Provider Credential Text is free-form text typed by the provider at enrollment. 'FAAD' in this field means the provider typed 'FAAD' — it does not mean Fonteum or CMS has confirmed fellowship in the American Academy of Dermatology. The field is unvalidated. A system that renders 'Board-certified dermatologist — confirmed by NPPES' is making a false claim.
Fix: Render credential text with a clear limitation disclaimer: 'Self-reported credential: FAAD.' Never present it as independently attested. For board certification claims, there is no federal public file that provides this — do not surface it without a primary-source board attestation.
Treating license numbers as active-license evidence
Critical (YMYL)The Provider License Number fields contain the number the provider supplied at enrollment. CMS does not validate against state licensing boards at any point. A license that was revoked in 2021 may still appear in a 2026 NPPES record if the provider never updated their record. A system that renders 'Licensed in Illinois — IL MD098765' is asserting a status it cannot support.
Fix: Render license numbers as: 'License number on file: IL MD098765 (not independently validated — check the Illinois Department of Financial and Professional Regulation for current status).' Never assert 'licensed' or 'active license' from this field alone.
Treating practice location as current location
HighThe Provider Business Practice Location Address reflects the registered location at enrollment or last update. Providers move, retire, or join new practices without updating NPPES. A provider enumerated in 2008 with a Last Update Date of 2010 may have a practice location that is 16 years out of date. Rendering this as 'current location' is misleading.
Fix: Always display the Last Update Date alongside any practice location. Add a recency flag (e.g., 'Location last confirmed 2010 — may have changed'). Cross-reference with CMS PECOS for Medicare-enrolled providers.
Using Last Update Date as active-practice evidence
HighA recent Last Update Date does not mean the provider is actively practicing. It means the administrative record was changed recently. Providers frequently update addresses, phone numbers, or taxonomy codes without any clinical significance. Conversely, many active providers have Last Update Dates from 2007–2010.
Fix: Do not use Last Update Date as a proxy for active-practice status. There is no reliable single-field active-practice indicator in NPPES. The combination of (no deactivation date) + (PECOS active enrollment) is the strongest available signal from federal data.
Conflating Type 1 and Type 2 in specialty aggregations
MediumAn NPI-2 for a dermatology group practice may hold the same taxonomy code (207N00000X) as an individual NPI-1 dermatologist. Counting all NPIs with a given taxonomy code without filtering by Entity Type Code combines organizations and individuals in the same count.
Fix: Always filter by Entity Type Code when building provider counts: use entity_type_code = '1' for individual practitioner counts, entity_type_code = '2' for organization counts. Report both separately.
Treating Replacement NPI as a soft alias
MediumWhen a Replacement NPI is present, it means the original NPI is administratively defunct and replaced by the new one. Some systems index both NPIs as alternative identifiers for the same provider. This is incorrect — the original NPI should be treated as defunct.
Fix: When Replacement NPI is non-null, mark the source record as defunct. All forward references should use the replacement NPI only.
Using telephone numbers as current contact channels
LowPhone numbers in NPPES are self-reported at enrollment and may be years out of date. A number from a 2007 enrollment that was never updated is likely wrong. This is particularly common for mailing address telephone numbers.
Fix: Always show the Last Update Date alongside any telephone number from NPPES. Do not present as a current contact method without secondary-source confirmation.
Using EIN for organization linkage
LowThe EIN field is always blank in the public dissemination file. Systems that attempt to extract or match on EIN will find nothing. This is by design — CMS redacts tax identifiers in public files.
Fix: Use NPI-2 as the organization key. There is no EIN in the public NPPES file.
Provenance and how to cite NPPES properly
NPPES data is a federal government work published by CMS under authority of HIPAA Administrative Simplification (45 CFR Part 162). [23] Under 17 U.S.C. § 105, works of the U.S. government are not subject to copyright protection. Commercial and academic reuse are both permitted; attribution is professional courtesy, not a legal requirement. [24]
The recommended citation sentence
BibTeX
@misc{cms_nppes_2026,
author = {{Centers for Medicare and Medicaid Services}},
title = {{National Plan and Provider Enumeration System (NPPES)
NPI Registry — Full Replacement File}},
year = {2026},
month = {May},
note = {Full replacement file released 2026-05-12;
accessed 2026-05-30. Federal public domain
(U.S. Government Works).},
url = {https://download.cms.gov/nppes/NPI_Files.html},
institution = {CMS, U.S. Department of Health and Human Services},
}JSON-LD Dataset.citation snippet
{
"@type": "Dataset",
"name": "NPPES NPI Registry",
"creator": {
"@type": "GovernmentOrganization",
"name": "Centers for Medicare and Medicaid Services (CMS)"
},
"license": "https://www.usa.gov/government-works",
"distribution": {
"@type": "DataDownload",
"contentUrl": "https://download.cms.gov/nppes/NPI_Files.html",
"encodingFormat": "text/csv"
},
"temporalCoverage": "2007/..",
"datePublished": "2026-05-12",
"version": "Full Replacement 2026-05-12"
}Citing a Fonteum snapshot specifically
When citing data as processed by Fonteum (cross-joined, provenance-tagged, FHIR-serialized), cite both the upstream CMS source and the Fonteum snapshot:
nppes-anatomy/v1. Chain attestation: fonteum.com/chain.Limitations
1. All fields are self-reported by the provider or organization
CMS does not audit the accuracy of name, address, credential, or taxonomy submissions. There is no federal mechanism for detecting errors, outdated addresses, or misrepresented credentials in the NPPES file.
2. Practice Location is registered, not necessarily current
The Practice Location Address reflects where the provider registered at enumeration or last update. There is no legal requirement to update this field when a provider changes practice settings. In a dataset of 8.9M records with a median enumeration date of 2008, a significant fraction of practice locations are outdated.
3. Deactivation lags real-world events
NPI deactivation requires administrative action by CMS. A provider who retires, moves overseas, or dies may remain in the active NPPES registry for weeks to months after the real-world event. The weekly deactivation file reduces this lag but does not eliminate it.
4. Credential Text is unstructured and unvalidated
The 20-character credential field contains whatever the provider typed during enrollment. It is not normalized, not controlled vocabulary, and not validated against any credentialing body. The same credential may appear as 'MD', 'M.D.', 'Dr.', or 'Doctor of Medicine'. Multi-credential strings ('MD FAAD') are common and not parseable without custom NLP.
5. License numbers are present but their active status is not
NPPES stores license numbers self-reported at enrollment. The file does not include license expiration dates, suspension status, or revocation history. This information exists only at the state licensing board level and is not publicly available in bulk for most states (see the contractor-licensing-matrix-2026-05-06 for state-level availability).
6. Taxonomy codes reflect self-declared specialty, not credentialed specialty
A provider selecting '207N00000X' (Dermatology) is self-declaring that specialty. NPPES does not validate taxonomy code selection against training, board certification, or state scope-of-practice rules.
7. The public dissemination file redacts EIN and Parent Organization TIN
Organization linkage via tax identifiers is not possible with the public file. The only public-file organization linkage keys are the NPI itself and the free-text Parent Organization LBN (which is not normalized).
8. Approximately 1.7M records are deactivated — flag them before analysis
The full replacement file includes both active and deactivated records. Deactivated records have a non-null NPI Deactivation Date. Many analyses should filter these out (NPI Deactivation Date IS NULL) to work with active providers only.
What Fonteum adds on top
Fonteum ingests CMS NPPES and attests the raw data without modification. On top of the raw data, Fonteum adds the following layers:
| Layer | What it provides | Where to find it |
|---|---|---|
| Ed25519-signed snapshot chain | Each NPPES ingestion is hashed and signed. The signature anchors the snapshot to the exact CMS bytes at ingestion time — not to Fonteum's processed output. | /chain |
| Cross-source joins as deterministic columns | OIG LEIE exclusion status, CMS PECOS enrollment status, CMS QPP MIPS score, and CMS POS CCN linkage are resolved as columns on each NPI record — not inferred prose. | /data |
| Per-field provenance (14-tuple) | Every rendered fact carries source, source_url, dataset_id, snapshot_date, methodology_version, confidence, and eight additional provenance fields. No AI prose substitution. | /sources |
| FHIR R4 Practitioner and Organization | NPPES NPI-1 records serialize to US Core Practitioner; NPI-2 to US Core Organization. Available via /api/fhir/Practitioner and /api/fhir/Organization. | /data/nppes |
| NUCC taxonomy normalization | Taxonomy codes are resolved to Section > Grouping > Classification > Specialization via the NUCC code set. Available as structured fields, not free-text labels. | /tools |
| MCP server for AI agent access | The Fonteum MCP server exposes NPI lookup, cross-source join queries, and provenance retrieval for AI agents and LLM tool-use integrations. | agent.json |
Want the signed Markdown mirror for LLM ingestion? See /llms.txt and /llms-full.txt.
Frequently asked questions
What is an NPI?
A National Provider Identifier (NPI) is a 10-digit numeric identifier assigned by CMS under HIPAA Administrative Simplification (45 CFR 162.406). Every healthcare provider who transmits health information electronically in HIPAA-covered transactions is required to obtain an NPI. NPIs replaced earlier identifier systems (UPIN, OSCAR, PIN, NSC) as of May 23, 2007.
How often is NPPES refreshed?
CMS releases a full replacement file monthly (typically the second Monday of each month) and a weekly update file covering additions, changes, and deactivations. The NPPES API (npiregistry.cms.hhs.gov/api/) reflects near-real-time state but does not support bulk export.
Is NPPES authoritative for license status?
No. NPPES accepts license numbers at enrollment and does not validate them against state licensing boards at each refresh. The presence of a license number in the Provider License Number field does not assert that the license is currently active, unexpired, or unsuspended. Active/inactive license status must be sourced from the relevant state licensing authority.
Can I use NPPES to find where a doctor practices today?
Not reliably. The Provider Business Practice Location Address is the location registered at enumeration or last update time. Providers frequently move, join new groups, or retire without updating their NPPES record. Cross-referencing with CMS PECOS and the NPPES Last Update Date improves currency, but no public federal source provides real-time practice location data.
What is NUCC and how does it relate to NPPES?
The National Uniform Claim Committee (NUCC) maintains the Health Care Provider Taxonomy Code Set — the controlled vocabulary used in the NPPES taxonomy slots. Each NUCC code is a 10-character alphanumeric string identifying a provider type and specialty (e.g., 207N00000X = Dermatologist). NUCC releases updates twice per year (January and July). NPPES stores up to 15 taxonomy codes per provider.
Why does the same provider appear with different addresses across sources?
Three common reasons: (1) NPPES practice location is registered at enumeration and may be years out of date; (2) CMS PECOS stores the Medicare enrollment address, which may differ from the NPPES enumeration address; (3) Care Compare shows the facility address, which differs from individual provider practice address. The address that is most current depends on which source was updated most recently — no single federal source has authoritative real-time location data.
How does Fonteum's NPPES snapshot differ from a direct CMS download?
Fonteum's snapshot is structurally identical to the CMS file but adds: (1) Ed25519-signed attestation anchoring the snapshot to a specific CMS release, published in the Fonteum chain; (2) cross-source joins to OIG LEIE, CMS PECOS, and CMS QPP MIPS as deterministic columns; (3) NUCC taxonomy normalization; (4) per-field provenance metadata; (5) FHIR R4 Practitioner and Organization serialization via the /api/fhir/ endpoints.
How should I cite NPPES in a paper?
Cite the CMS NPPES data dissemination page with the specific snapshot date. Recommended sentence form: 'Provider data sourced from the CMS National Plan and Provider Enumeration System (NPPES) NPI Registry, full replacement file released [date], accessed [your access date]. Available at https://download.cms.gov/nppes/NPI_Files.html. Federal public domain (U.S. Government Works).'
Can I use NPPES data in a commercial product?
Yes. NPPES data is a federal government work (U.S. Government Works) and is not subject to copyright in the United States under 17 U.S.C. § 105. Commercial use is permitted. CMS does not require a data use agreement for the public dissemination file. Attribution is professional courtesy, not a legal requirement.
What is the difference between Type 1 and Type 2 NPIs?
Type 1 (NPI-1) is assigned to individual human providers (physicians, nurses, therapists, etc.). Type 2 (NPI-2) is assigned to organizations (hospitals, group practices, labs, etc.). The two types have different schema structures: NPI-1 records populate name fields (Last Name, First Name, Gender), while NPI-2 records populate organization name fields. Both types use the same taxonomy code slots.
What is a Replacement NPI?
A Replacement NPI is the new NPI assigned when CMS administratively replaces one NPI with another — a rare occurrence, typically during legacy system migrations. It is a hard redirect: the original NPI is defunct and all forward references should use the replacement value. Do not treat a Replacement NPI as a soft alias or secondary identifier.
Why is the EIN field empty?
The Employer Identification Number (EIN) field is present in the NPPES CSV header but is REDACTED in the public dissemination file to protect sensitive tax information. The field will always be blank. Use the NPI-2 identifier itself as the organization key for linkage purposes.
How do I join NPPES with OIG LEIE?
Join on NPI where available (OIG LEIE includes NPI for exclusions processed after ~2013) or on name + DOB + state for earlier records. The expected NPI match rate for recent exclusions is approximately 85-90%; older records require fuzzy name matching. A non-match does not mean the provider is not excluded — it means the exclusion predates NPI-level tracking in LEIE.
How do I get programmatic access via Fonteum?
Fonteum exposes NPPES-derived data through the REST API (/api/v1/providers), FHIR R4 endpoints (/api/fhir/Practitioner, /api/fhir/Organization), and an MCP server for AI agent integration. See /data/nppes for API documentation and /tools for the NPI lookup tool.
What does the chain attest for an NPPES snapshot?
The Fonteum chain records the SHA-256 hash of the CMS NPPES full replacement file at the moment of ingestion, signed with an Ed25519 key whose public key is published at /.well-known/chain-public-key. The chain entry asserts: which CMS file was consumed, when it was consumed, the hash of the file bytes, and which methodology version processed it. The chain does not attest the accuracy of the underlying CMS data — it attests that Fonteum processed exactly the bytes CMS published.
Primary sources cited
- CMS NPPES NPI Registry Data Dissemination Files
- CMS NPPES NPI Registry Search
- CMS NPPES Downloadable Data — Schedule
- U.S. Government Works (17 U.S.C. § 105)
- 45 CFR Part 162 — HIPAA Administrative Simplification
- 45 CFR Part 162, Subpart D — Standard Unique Health Identifier
- Federal Register Vol. 69 No. 15 — HIPAA NPI Final Rule (2004)
- CMS NPI Final Rule — Background Document
- CMS NPPES API Help
- CMS National Provider Identifier Standard (overview)
- 45 CFR § 162.406 — Standard unique health identifier for health care providers
- NUCC Health Care Provider Taxonomy Code Set
- NUCC Provider Taxonomy — Release History
- NUCC Taxonomy Code Set — Section/Grouping/Classification/Specialization hierarchy
- CMS NPPES Downloadable Files — Release Schedule
- OIG List of Excluded Individuals and Entities (LEIE)
- CMS PECOS — Medicare Fee-for-Service Public Provider Enrollment
- CMS Quality Payment Program (QPP) overview
- CMS QPP MIPS — individual clinician performance data
- CMS Provider of Services (POS) File — Hospital & Non-Long-Term Care Facilities
- CMS Care Compare — Provider Data Catalog
- CMS NPPES NPI Registry Data Dissemination — field specification
- 45 CFR Part 162 — HIPAA Administrative Simplification
- U.S. Government Works copyright status
- CMS data.cms.gov — NPPES provider characteristics
Cite this reference
Sentence form
nppes-anatomy/v1. Available at https://fonteum.com/research/nppes-anatomy.BibTeX
@techreport{fonteum2026nppes,
author = {{Fonteum Research}},
title = {{NPPES Anatomy: Complete Technical Reference
for AI Systems}},
institution = {Fonteum},
year = {2026},
month = {May},
note = {Reviewed by Dr. Jennifer Montecillo, MD.
Methodology version: nppes-anatomy/v1.
NPPES snapshot date: 2026-05-01.},
url = {https://fonteum.com/research/nppes-anatomy},
}JSON-LD Dataset.citation snippet
{
"@type": "ScholarlyArticle",
"url": "https://fonteum.com/research/nppes-anatomy",
"datePublished": "2026-05-30",
"author": {"@type": "Organization", "name": "Fonteum"},
"reviewedBy": {
"@type": "Person",
"name": "Dr. Jennifer Montecillo, MD"
},
"version": "nppes-anatomy/v1"
}Reviewed by Dr. Jennifer Montecillo, MD
Gullas College of Medicine, 2019. Non-practicing medical reviewer focused on source interpretation, terminology, and limitations language.
Fonteum Research · 2026-05-30 · All data traces to the CMS NPPES full replacement file snapshot 2026-05-01, federal public domain (U.S. Government Works). Methodology: nppes-anatomy/v1. Chain attestation at fonteum.com/chain. Internal links: /data · /sources · /chain · /tools.