Data Catalog

Welcome to the Data Catalog, where you'll find an overview of all datasets currently ingested in the mimi_ws_1 workspace, which operates as our data lakehouse.

In most cases, we have preserved the original table formats from the source systems. However, we’ve made a few adjustments to ensure consistency and clarity. For example, all column names have been transformed to follow the snake case naming convention. This means spaces and special characters are replaced with underscores, and all letters are lowercase.
For instance, a column like Provider Name is now standardized as provider_name. Please see the Data Engineering section for more information.

Data Sources and Tables. Below is a comprehensive list of the data sources and tables available within the mimi_ws_1 workspace:

AHRQ - mimi_ws_1.ahrq

CDC - mimi_ws_1.cdc

  • Description: Datasets from the Centers for Disease Control and Prevention (CDC)
  • Tables:
    • nhanes_demo_demographic_variables_sample_weights: CDC NHANES DEMO Demographic Variables & Sample Weights
    • nhanes_exam_blood_pressure: CDC NHANES EXAM Blood Pressure
    • nhanes_exam_body_measures: CDC NHANES EXAM Body Measures
    • nhanes_lab_albumin_creatinine_urine: CDC NHANES LAB Albumin & Creatinine - Urine
    • nhanes_lab_alpha1acid_glycoprotein_serum_surplus: CDC NHANES LAB Alpha-1-Acid Glycoprotein - Serum (Surplus)
    • nhanes_lab_cholesterol_hdl: CDC NHANES LAB Cholesterol - HDL
    • nhanes_lab_cholesterol_ldl_triglycerides: CDC NHANES LAB Cholesterol - LDL & Triglycerides
    • nhanes_lab_cholesterol_total: CDC NHANES LAB Cholesterol - Total
    • nhanes_lab_fasting_questionnaire: CDC NHANES LAB Fasting Questionnaire
    • nhanes_lab_glycohemoglobin: CDC NHANES LAB Glycohemoglobin
    • nhanes_lab_glyphosate_glyp_urine: CDC NHANES LAB Glyphosate (GLYP) - Urine
    • nhanes_lab_highsensitivity_creactive_protein: CDC NHANES LAB High-Sensitivity C-Reactive Protein
    • nhanes_lab_insulin: CDC NHANES LAB Insulin
    • nhanes_lab_oral_glucose_tolerance_test: CDC NHANES LAB Oral Glucose Tolerance Test
    • nhanes_lab_plasma_fasting_glucose: CDC NHANES LAB Plasma Fasting Glucose
    • nhanes_lab_standard_biochemistry_profile: CDC NHANES LAB Standard Biochemistry Profile
    • nhanes_metadata: National Health and Nutrition Examination Survey (NHANES) Metadata
    • nhanes_mortality: CDC NHANES Mortality Dataset
    • nhanes_qre_acculturation: CDC NHANES QRE Acculturation
    • nhanes_qre_air_quality: CDC NHANES QRE Air Quality
    • nhanes_qre_alcohol_use: CDC NHANES QRE Alcohol Use
    • nhanes_qre_blood_pressure_cholesterol: CDC NHANES QRE Blood Pressure & Cholesterol
    • nhanes_qre_bowel_health: CDC NHANES QRE Bowel Health
    • nhanes_qre_cardiovascular_health: CDC NHANES QRE Cardiovascular Health
    • nhanes_qre_diabetes: CDC NHANES QRE Diabetes
    • nhanes_qre_hospital_utilization_access_to_care: CDC NHANES QRE Hospital Utilization & Access to Care
    • nhanes_qre_income: CDC NHANES QRE Income
    • nhanes_qre_kidney_conditions: CDC NHANES QRE Kidney Conditions
    • nhanes_qre_medical_conditions: CDC NHANES QRE Medical Conditions
    • nhanes_qre_preventive_aspirin_use: CDC NHANES QRE Preventive Aspirin Use
    • nhanes_qre_smoking_adult_recent_tobacco_use_youth_cigarettetobacco_use: CDC NHANES QRE Smoking - Adult Recent Tobacco Use & Youth Cigarette/Tobacco Use
    • nhanes_qre_smoking_cigarette_use: CDC NHANES QRE Smoking - Cigarette Use
    • nhanes_qre_smoking_household_smokers: CDC NHANES QRE Smoking - Household Smokers
    • nhanes_qre_smoking_recent_tobacco_use: CDC NHANES QRE Smoking - Recent Tobacco Use
    • nndss: National Notifiable Diseases Surveillance System (NNDSS)
    • nwss_covid: National Wastewater Surveillance System (NWSS) for Covid
    • nwss_mpox: National Wastewater Surveillance System (NWSS) for Mpox
    • places_censustract: PLACES: Local Data for Better Health - censustract-level, multiyear
    • places_county: PLACES: Local Data for Better Health - county-level, multiyear
    • places_zcta: PLACES: Local Data for Better Health - zcta-level, multiyear
    • svi_censustract_multiyears: Social Vulnerability Index at Census Tract-Level - multiyear
    • svi_censustract_y2000: Social Vulnerability Index at Census Tract-Level - Year 2000
    • svi_censustract_y2010: Social Vulnerability Index at Census Tract-Level - Year 2010
    • svi_censustract_y2014: Social Vulnerability Index at Census Tract-Level - Year 2014
    • svi_censustract_y2016: Social Vulnerability Index at Census Tract-Level - Year 2016
    • svi_censustract_y2018: Social Vulnerability Index at Census Tract-Level - Year 2018
    • svi_censustract_y2020: Social Vulnerability Index at Census Tract-Level - Year 2020
    • svi_censustract_y2022: Social Vulnerability Index at County-Level - Year 2022
    • svi_county_multiyears: Social Vulnerability Index at County-Level - multiyear
    • svi_county_y2000: Social Vulnerability Index at County-Level - Year 2000
    • svi_county_y2010: Social Vulnerability Index at County-Level - Year 2010
    • svi_county_y2014: Social Vulnerability Index at County-Level - Year 2014
    • svi_county_y2016: Social Vulnerability Index at County-Level - Year 2016
    • svi_county_y2018: Social Vulnerability Index at County-Level - Year 2018
    • svi_county_y2020: Social Vulnerability Index at County-Level - Year 2020
    • svi_county_y2022: Social Vulnerability Index at County-Level - Year 2022
    • urbanrural_classification: NCHS Urban-Rural Classification Scheme for Counties
    • vsrr_drugoverdose: Vital Statistics Rapid Reporting (VSRR) for Drug Overdose

Census - mimi_ws_1.census

CMS Coding & Billing Section - mimi_ws_1.cmscoding

CMS Data & Research Section - mimi_ws_1.cmsdataresearch

CMS Payment Section - mimi_ws_1.cmspayment

Data.CMS.gov - mimi_ws_1.datacmsgov

Data Commons - mimi_ws_1.datacommons

  • Description: Datasets from the Data Commons project
  • Tables:

Data.Healthcare.gov - mimi_ws_1.datahealthcaregov

  • Description: Datasets from the data.healthcare.gov site
  • Tables:
    • formulary_details: Plan Formulary Details, e.g., Drug Name, RxNorm ID - multimonth
    • mrf_xlsx: Machine Readable File (MRF) URLs - multimonth
    • plan: Plan Information, e.g., Plan Name, ID - multimonth
    • plan_formulary_base: Plan Formulary Base Information - multimonth
    • provider_addresses: Provider Addresses from the Provider Directories - multimonth
    • provider_base: Provider Directory (base/master) - multimonth
    • provider_plans: Provider to Contracted Plans from the Provider Directories - multimonth

Data.Medicaid.gov - mimi_ws_1.datamedicaidgov

CMS DE-SynPUF - mimi_ws_1.desynpuf

  • Description: CMS Data Entrepreneurs' Synthetic Public Use File (DE-SynPUF)
  • Tables:
    • beneficiary_summary: Beneficiary Summary from 2008 to 2010
    • carrier_claims: Carrier Claims from 2008 to 2010
    • inpatient_claims: Inpatient Claims from 2008 to 2010
    • outpatient_claims: Outpatient Claims from 2008 to 2010
    • prescription_drug_events: Prescription Drug Events from 2008 to 2010

Environmental Protection Agency - mimi_ws_1.epa

FDA - mimi_ws_1.fda

  • Description: Datasets from the U.S. Food & Drug Administration (FDA)
  • Tables:
    • adverse_event_base: Drug Adverse Event - Base Table - multiquarter
    • adverse_event_drug: Drug Adverse Event - Drug Table, a part of the Drug Adverse Event Base table
    • adverse_event_reaction: Drug Adverse Event - Reaction Table, a part of the Drug Adverse Event Base table
    • enforcement: Drug Recall Enforcement - Base Table
    • enforcement_ndc_detail: Drug Recall Enforcement - Package NDC Table, a part of the Drug Recall Enforcement Base table
    • ndc_directory: NDC Directory - multiweek
    • ndc_label: Drug Package Labels - full text data
    • ndc_to_active_ingredients: NDC to Active Ingredients Mapping - a part of the NDC Directory
    • ndc_to_pharm_class: NDC to Pharmacologic Class Mapping - a part of the NDC Directory
    • ndc_to_rxcui: NDC to RxCUI Mapping - a part of the NDC Directory
    • orangebook_exclusivity: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - exclusivity info, multiweek
    • orangebook_patent: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - patent info, multiweek
    • orangebook_products: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - products, multiweek
    • purplebook: All FDA-licensed (approved) biological products regulated by the Center for Drug Evaluation and Research (CDER), aka Purple Book - multimonth

Graham Center - mimi_ws_1.grahamcenter

HealthIT - mimi_ws_1.healthit

HHS-OIG - mimi_ws_1.hhsoig

HRSA - mimi_ws_1.hrsa

HUDUser - mimi_ws_1.huduser

  • Description: Datasets from huduser.gov - a part of the Office of Policy Development and Research, PD&R
  • Tables:
    • cbsa_to_zip: Core-Based Statistical Area (CBSA) to ZIP Code Crosswalk (raw data) - multiyear
    • cbsa_to_zip_otm: CBSA to ZIP crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest
    • county_to_zip: County to ZIP Code Crosswalk (raw data) - multiyear
    • county_to_zip_otm: County to ZIP crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest
    • tract_to_zip: Census Tract to ZIP Code Crosswalk (raw data) - multiyear
    • tract_to_zip_mto: Census Tract to ZIP crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_cbsa: ZIP Code to Core-Based Statistical Area (CBSA) (raw data) - multiyear
    • zip_to_cbsa_mto: ZIP to CBSA crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_county: ZIP Code to County Crosswalk (raw data) - multiyear
    • zip_to_county_mto: ZIP to County crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_tract: ZIP Code to Census Tract Crosswalk (raw data) - multiyear
    • zip_to_tract_otm: ZIP to Census Tract crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest

MedlinePlus - mimi_ws_1.medlineplus

  • Description: Datasets from MedlinePlus - knowledge base, XML
  • Tables:
    • also_called: Alternative names or terms for health topics
    • health_topic: Main information for each individual health topic
    • information_category: Categories of information for each site
    • language_mapped_topic: Information about topic translations or language variants
    • mesh_heading: Medical Subject Headings (MeSH) descriptors for topics
    • mesh_qualifier: MeSH qualifiers associated with MeSH descriptors
    • organization: Organizations associated with each site
    • other_language: Information about the topic in other languages
    • primary_institute: Information about the primary institute associated with a topic
    • related_topic: Information about topics related to the main topic
    • see_reference: Cross-references or "see also" type information
    • site: Information about external sites related to the topic
    • standard_description: Standard descriptions for each site
    • topic_group: Group information associated with health topics

mimilabs - mimi_ws_1.mimilabs

  • Description: Datasets from mimilabs.
  • Tables:
    • cmsemails: CMS Email Subscriptions

NBER - mimi_ws_1.nber

Neighborhood Atlas - mimi_ws_1.neighborhoodatlas

  • Description: Datasets from the Neighborhood Atlas - Area Deprivation Index
  • Tables:
    • adi_censusblock: Area Deprivation Index (ADI) Original (Census Block Group Level)
    • adi_censustract: Area Deprivation Index (ADI) Aggregated (Census Tract Level, USE WITH CAUTION)
    • adi_county: Area Deprivation Index (ADI) Aggregated (Census Block Group Level, USE WITH CAUTION)

National Library of Medicine - mimi_ws_1.nlm

NPPES - mimi_ws_1.nppes

  • Description: Datasets from NPPES (National Plan and Provider Enumeration System)
  • Tables:
    • address_census_geocoder_dedup: De-duplicated Geocoding Results for the address_key table
    • address_census_geocoder_raw: Raw Geocoding Results from the US Census Geocoder, derived from the `address_key``
    • address_key: Address Key, A collection of Unique Address Strings for all providers and times
    • deactivated: Deactived Provider List
    • endpoint: Health Information Exchange (HIE) Endpoints, i.e., provider HIE contact address - multiyear
    • endpoint_se: HIE Endpoints formatted with Start and End dates
    • license_se: Provider License Data with Start and End dates, derived from npidata
    • mongodb_export: Data Extract for npi-db.org (a demo project)
    • npi_to_address: NPI to Geocoded Address (both practice and mail address), only for the latest npidata batch
    • npidata: NPIDATA - the base NPI directory, multiyear
    • openpayments: Open Payment Summaries for npi-db.org, derived from the openpayments schema
    • otherid_ccn_se: CCNs (CMS Certification Number, often used for facilities) with Start and End dates
    • otherid_se: Other Provider IDs with Start and End dates, derived from npidata
    • othername: Other Business Names such as DBA - multiyear
    • othername_se: Other Business Names formatted with Start and End dates
    • pl: Other Practice Locations - multiyear
    • pl_se: Other Practice Locations formatted with Start and End dates
    • taxonomy_se: Provider Taxonomies with Start and End dates

Open Payments - mimi_ws_1.openpayments

Palmetto GBA - mimi_ws_1.palmettogba

Part C/D - mimi_ws_1.partcd

Payer MRF - mimi_ws_1.payermrf

Prescription Drug Plan - mimi_ws_1.prescriptiondrugplan

  • Description: Datasets from the Part-D Formularies and Networks section - a subsection of the data.cms.gov site
  • Tables:
    • basic_drugs_formulary: Basic Drugs Formulary File - multiquarter
    • beneficiary_cost: Beneficiary Cost File - multiquarter
    • excluded_drugs_formulary: Excluded Drugs Formulary File - multiquarter
    • geographic_locator: Geographic Locator File - multiquarter
    • indication_based_coverage_formulary: Indication Based Coverage (IBC) Formulary File - multiquarter
    • insulin_beneficiary_cost: Insulin Beneficiary Cost File - multiquarter
    • partial_gap_coverage: Partial Gap File - multiquarter
    • pharmacy_networks: Pharmacy Networks File - multiquarter
    • plan_information: Plan Information File - multiquarter
    • pricing: Pricing File - multiquarter

Provider Data Catalog - mimi_ws_1.provdatacatalog

State Government Databases - mimi_ws_1.stategov

Surgo Ventures - mimi_ws_1.surgoventures

CMS Synthetic Medicare PUF - mimi_ws_1.synmedpuf

  • Description: CMS Synthetic Medicare PUF
  • Tables:
    • beneficiary: Beneficiary Summary
    • carrier: Carrier Claims
    • dme: Durable Medical Equipment Claims
    • hha: Home Health Agency Claims
    • hospice: Hospice Claims
    • inpatient: Inpatient Claims
    • outpatient: Outpatient Claims
    • pde: Prescription Drug Events from 2008 to 2010
    • snf: Skilled Nursing Facility Claims

Synthea - mimi_ws_1.synthea

  • Description: Datasets from the MITRE Synthea project - 1.1M synthetic patients
  • Tables:
    • allergies: Patient allergy data.
    • careplans: Patient care plan data, including goals.
    • conditions: Patient conditions or diagnoses.
    • devices: Patient-affixed permanent and semi-permanent devices.
    • encounters: Patient encounter data.
    • imaging_studies: Patient imaging metadata.
    • immunizations: Patient immunization data.
    • medications: Patient medication data.
    • observations: Patient observations including vital signs and lab reports.
    • organizations: Provider organizations including hospitals.
    • patients: Patient demographic data.
    • payer_transitions: Payer Transition data (i.e. changes in health insurance).
    • payers: Payer organization data.
    • procedures: Patient procedure data including surgeries.
    • providers: Clinicians that provide patient care.
    • supplies: Supplies used in the provision of care.

Zillow - mimi_ws_1.zillow

  • Description: Datasets from Zillow (a real-estate marketplace company)
  • Tables:
    • homevalue_zip: Home Values by Zillow - time-series
    • rent_zip: Rentals by Zillow - time-series