Indexing CMS PDF documents

CMS/CMMI publish a lot of PDF-formatted documents; they are difficult to search and find relevant contents. With the help from LLMs, we want to index and build machine-readable database of CMS documents. We would also need to build a web crawler that constantly checks and parses the CMS websites.

