Data Catalog

You can find the datasets currently ingested in the mimi_ws_1, a data lakehouse workspace. For most cases, we tried to preserve the original table formats with some exceptions: We transformed column names to use the Snake case naming scheme, e.g., Provider Name becomes provider_name. We added three variables to all tables:

  • mimi_src_file_name: This column shows the source file's name for the data.
  • mimi_src_file_date: This column shows the date mark of the source file. For example, the monthly NPPES data files have the date postamble in their file names. We extract the date mark of the file name and store the value in this column. The date mark can be either the creation date or the measurement cut-off date. Please read the table description for more details.
  • mimi_dlt_load_date: This column shows when the data is loaded to the lake house as a delta table.

The Catalog section of the workspace provides more details of these tables and columns. Please log in to the workspace and navigate to the Catalog menu to learn more about the datasets.

-- The first level maps to {schema}
-- and the next level maps to {table} fields.
SELECT * FROM mimi_ws_1.{schema}.{table} LIMIT 10;

See the list of schemas and tables below:

CDC - mimi_ws_1.cdc

Census - mimi_ws_1.census

CMS Coding & Billing - mimi_ws_1.cmscoding

CMS Fee Schedule - mimi_ws_1.cmsfeeschedule

Data.CMS.gov - mimi_ws_1.datacmsgov

Data Commons - mimi_ws_1.datacommons

  • Description: Datasets from the Data Commons project
  • Tables:

Data.Medicaid.gov - mimi_ws_1.datamedicaidgov

CMS DE-SynPUF - mimi_ws_1.desynpuf

  • Description: CMS Data Entrepreneurs' Synthetic Public Use File (DE-SynPUF)
  • Tables:
    • beneficiary_summary: Beneficiary Summary from 2008 to 2010
    • carrier_claims: Carrier Claims from 2008 to 2010
    • inpatient_claims: Inpatient Claims from 2008 to 2010
    • outpatient_claims: Outpatient Claims from 2008 to 2010
    • prescription_drug_events: Prescription Drug Events from 2008 to 2010

FDA - mimi_ws_1.fda

  • Description: Datasets from the U.S. Food & Drug Administration (FDA)
  • Tables:
    • adverse_event_base: Drug Adverse Event - Base Table - multiquarter
    • adverse_event_drug: Drug Adverse Event - Drug Table, a part of the Drug Adverse Event Base table
    • adverse_event_reaction: Drug Adverse Event - Reaction Table, a part of the Drug Adverse Event Base table
    • enforcement: Drug Recall Enforcement - Base Table
    • enforcement_ndc_detail: Drug Recall Enforcement - Package NDC Table, a part of the Drug Recall Enforcement Base table
    • ndc_directory: NDC Directory - multiweek
    • ndc_label: Drug Package Labels - full text data
    • ndc_to_active_ingredients: NDC to Active Ingredients Mapping - a part of the NDC Directory
    • ndc_to_pharm_class: NDC to Pharmacologic Class Mapping - a part of the NDC Directory
    • ndc_to_rxcui: NDC to RxCUI Mapping - a part of the NDC Directory
    • orangebook_exclusivity: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - exclusivity info, multiweek
    • orangebook_patent: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - patent info, multiweek
    • orangebook_products: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - products, multiweek
    • purplebook: All FDA-licensed (approved) biological products regulated by the Center for Drug Evaluation and Research (CDER), aka Purple Book - multimonth

Graham Center - mimi_ws_1.grahamcenter

HealthIT - mimi_ws_1.healthit

HHS-OIG - mimi_ws_1.hhsoig

HRSA - mimi_ws_1.hrsa

HUDUser - mimi_ws_1.huduser

  • Description: Datasets from huduser.gov - a part of the Office of Policy Development and Research, PD&R
  • Tables:
    • cbsa_to_zip: Core-Based Statistical Area (CBSA) to ZIP Code Crosswalk (raw data) - multiyear
    • cbsa_to_zip_otm: CBSA to ZIP crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest
    • county_to_zip: County to ZIP Code Crosswalk (raw data) - multiyear
    • county_to_zip_otm: County to ZIP crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest
    • tract_to_zip: Census Tract to ZIP Code Crosswalk (raw data) - multiyear
    • tract_to_zip_mto: Census Tract to ZIP crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_cbsa: ZIP Code to Core-Based Statistical Area (CBSA) (raw data) - multiyear
    • zip_to_cbsa_mto: ZIP to CBSA crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_county: ZIP Code to County Crosswalk (raw data) - multiyear
    • zip_to_county_mto: ZIP to County crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_tract: ZIP Code to Census Tract Crosswalk (raw data) - multiyear
    • zip_to_tract_otm: ZIP to Census Tract crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest

MedlinePlus - mimi_ws_1.medlineplus

  • Description: Datasets from MedlinePlus - knowledge base, XML
  • Tables:

NBER - mimi_ws_1.nber

Neighborhood Atlas - mimi_ws_1.neighborhoodatlas

  • Description: Datasets from the Neighborhood Atlas - Area Deprivation Index
  • Tables:
    • adi_censusblock: Area Deprivation Index (ADI) Original (Census Block Group Level)
    • adi_censustract: Area Deprivation Index (ADI) Aggregated (Census Tract Level, USE WITH CAUTION)
    • adi_county: Area Deprivation Index (ADI) Aggregated (Census Block Group Level, USE WITH CAUTION)

NPPES - mimi_ws_1.nppes

  • Description: Datasets from the National Plan & Provider Enumeration System (NPPES).
  • Tables:
    • address_census: The address_census originates from the address_key table.
    • address_h3: The table uses the h3geo project https://h3geo.org/ to give a unique ID to a specific address.
    • address_key: This table is derived from the main npidata and pl tables.
    • address_placekey: This table is currently under development. This is to test out the Placekey services, which provides contextual information about the location. Not for production use. Please consult with the administrator
    • address_radar: This table is currently under development. This is to test out the Radar Geocoding services (paid service). Not for production use. Please consult with the administrator
    • deactivated: The deactivated NPIs in the file show when the provider was deactivated, "NPI Deactivation Date". To get more information on "why," one needs to link this information with the OIG file. https://oig.hhs.gov/exclusions/exclusions_list.asp
    • endpoint: This table contains all Endpoints, such as HIE, Direct message endpoints (email addresses), associated with Type 1 and Type 2 NPIs.
    • endpoint_se: The endpoint_se table tracks the start and end dates of each HIE/Direct Message endpoint.
    • license_se: The license_se table tracks each license entry's start and end dates in the npidata table.
    • mongodb_export: The mongodb_export table powers the https://npi-db.org web-app, a demonstration project.
    • npidata: NPIDATA
    • openpayments: This table is derived from the openpayments schema, where the original data comes from the https://openpaymentsdata.cms.gov/ site.
    • otherid_ccn_se: The otherid_ccn_se table tracks the start and end dates of each CCN entry.
    • otherid_se: The otherid_se table tracks the start and end dates of each other ID entry in the npidata table.
    • othername: This file contains additional Other Names associated with Type 2 NPIs, such as Doing-Business-As or previous names.
    • othername_se: The othername_se table tracks the start and end dates of each other name per npi per type.
    • pl: This table lists all of the non-primary Practice Locations associated with Type 1 and Type 2 NPIs. When a business expands to new sites, those addresses appear in this table.
    • pl_se: The pl_se table tracks each address' start and end dates.
    • taxonomy_se: The taxonomy_se table tracks the start and end dates of each taxonomy entry in the npidata table.

Open Payments - mimi_ws_1.openpayments

Part C/D - mimi_ws_1.partcd

Payer MRF - mimi_ws_1.payermrf

Prescription Drug Plan - mimi_ws_1.prescriptiondrugplan

  • Description: Datasets from the Part-D Formularies and Networks section - a subsection of the data.cms.gov site
  • Tables:
    • basic_drugs_formulary: Basic Drugs Formulary File - multiquarter
    • beneficiary_cost: Beneficiary Cost File - multiquarter
    • excluded_drugs_formulary: Excluded Drugs Formulary File - multiquarter
    • geographic_locator: Geographic Locator File - multiquarter
    • indication_based_coverage_formulary: Indication Based Coverage (IBC) Formulary File - multiquarter
    • insulin_beneficiary_cost: Insulin Beneficiary Cost File - multiquarter
    • partial_gap_coverage: Partial Gap File - multiquarter
    • pharmacy_networks: Pharmacy Networks File - multiquarter
    • plan_information: Plan Information File - multiquarter
    • pricing: Pricing File - multiquarter

Provider Data Catalog - mimi_ws_1.provdatacatalog

Surgo Ventures - mimi_ws_1.surgoventures

Synthea - mimi_ws_1.synthea

  • Description: Datasets from the MITRE Synthea project - 1.1M synthetic patients
  • Tables:
    • allergies: Patient allergy data.
    • careplans: Patient care plan data, including goals.
    • conditions: Patient conditions or diagnoses.
    • devices: Patient-affixed permanent and semi-permanent devices.
    • encounters: Patient encounter data.
    • imaging_studies: Patient imaging metadata.
    • immunizations: Patient immunization data.
    • medications: Patient medication data.
    • observations: Patient observations including vital signs and lab reports.
    • organizations: Provider organizations including hospitals.
    • patients: Patient demographic data.
    • payer_transitions: Payer Transition data (i.e. changes in health insurance).
    • payers: Payer organization data.
    • procedures: Patient procedure data including surgeries.
    • providers: Clinicians that provide patient care.
    • supplies: Supplies used in the provision of care.

Zillow - mimi_ws_1.zillow

  • Description: Datasets from Zillow (a real-estate marketplace company)
  • Tables:
    • homevalue_zip: Home Values by Zillow - time-series
    • rent_zip: Rentals by Zillow - time-series