mimilabs

Thu Mar 07 2024idea

Indexing CMS PDF documents

CMS/CMMI publish a lot of PDF-formatted documents; they are difficult to search and find relevant contents. With the help from LLMs, we want to index and build machine-readable database of CMS documents. We would also need to build a web crawler that constantly checks and parses the CMS websites.

Git Metadata:

References:

Download PDF (last updated at 8:18:48 PM, Tue Jul 16 2024)