Projects

Please contact us for any new beautiful small project ideas that need our attention. We are here to help.

We believe that, with the advancement of AI and modularized build-ups, such small projects can have beautiful impacts.

Demo Projects

  • https://npi-db.org/: A simple NPI database search engine to demonstrate the list of publicly available datasets in our workspace.

Active Projects

active

Creating a comprehensive provider registry using public data sources

Accurate provider information is difficult to come by. However, there are many different sources in the public data space. So far, all these data sources are treated separately. Combining them together, we think some interesting insights about the provider would come out, as well as more accurate data will surface.

Read more

active

DRGPy - MSDRG in Python

This is a Python implementation of the MS-DRG algorithm, Medicare Prospective Payment System, which is originally available via Java. The project mimics the behavior of the Java implementation (not perfect, unfortunately) and is open-sourced at https://github.com/yubin-park/drgpy

Read more

active

hccpy - HCC in Python

I implemented the HCC algorithm (initially released in SAS) in Python. The project has been widely adopted in various VC-backed healthcare start-ups and even big enterprises. https://github.com/yubin-park/hccpy

Read more

active

Autoscalable PDF generator

The healthcare industry still needs a lot of PDF-formatted documents; however generating such documents from raw data relies on outdated technologies. Using Typst (https://typst.app/), we can create a fast and light PDF generator API service.

Read more

Project Ideas (not implemented yet)

idea

Make a Python version of the OIG risk audit toolkit

The risk of over-coding is getting higher. The Office of Inspector General (OIG) has been increasing the audit intensity over the years. Recently, the OIG publicly released their audit methodology on 12/14/2023. Although the methodology is well documented in the PDF file, the audit algorithm is not easy to apply to the real data. We plan to implement the algorithm in Python as we did for the HCC algorithm, so that many healthcare data engineers can try the logic and prevent any overdocumentation in the future.

Read more

idea

Public alert system using Twitter, health blogs, and weather data

Twitter and weather data are real-time and provide valuable insights to avoid catastrophic events. For example, Mayo Clinic tweets patient education messages many times a day. As extermee weather events become more prevalent, we want to create an alert system that curates from these patient education materials and real-time weather forecast data.

Read more

idea

hccpy with FHIR - hccfier

We want to revamp the hccpy project (https://github.com/yubin-park/hccpy) to work with the FHIR resources. We also want to make this package avaiable in Javascript as well.

Read more

idea

Indexing CMS PDF documents

CMS/CMMI publish a lot of PDF-formatted documents; they are difficult to search and find relevant contents. With the help from LLMs, we want to index and build machine-readable database of CMS documents. We would also need to build a web crawler that constantly checks and parses the CMS websites.

Read more

idea

Ingesting and cleaning up public MAO datasets

CMS publishes many public datasets around MAO's Star Rating performances, enrollments, and other KPIs. We want to organize the data such that other organizations do not need to do this repetitive work. We want to provide the cleaned up data via Databricks Delta Sharing.

Read more