Epstein E Mail Drop, Part IV

1 Comment on Epstein E Mail Drop, Part IV

  1. Where is the index?

    In the legal world, including an index at the back of a lawsuit that one has filed with the local court has been default behavior for any law firm for at least thirty years, the software that they use to draft the lawsuit does it automatically.

    The government and the Congress are riddled with lawyers and so it is pretty likely there is an index to these documents. More than one, because the people creating these indices don’t trust one another and will want to produce their own indices to be sure nothing has been left out (or to ensure that certain things are not included).

    Feeding these three million documents (ACTUALLY, six million documents – half of them are still being concealed from the public) to some sort of open source OCR filtering software and extracting all the words embedded in all the document scans and other images, and sorting these words alphabetically and linking the words to the documents which reference these words is actually so easy that a college student could probably do the task over a weekend using just their desktop computer.

    It would be a small step from this to producing an HTML-encoded index, where each word linked to a list of documents that referenced the word and maybe even a short bit of text surrounding each reference so that readers can assess the relevance before clicking the link to read further. If space allowed, the entire result could be encoded as an ISO, published online, downloaded, and accessed when convenient, with no annoying dependencies upon the administrators of https://www.justice.gov/epstein – the so-called ‘library’, lol

    I modestly propose that interested parties file FOI requests for the missing index(es), while building their own using the available, heavily redacted documents. As less-redacted versions become available, rebuild and republish the index.

    Let me know if you need help.

    Here are some open software packages that might he useful:

    % pkg search ocr
    gocr-0.52_1 – OCR (Optical Character Recognition) program
    ocrad-0.29 – OCR program implemented as filter
    ocrs-0.10.4_4 – Rust CLI tool for OCR
    py311-easyocr-1.7.2 – End-to-end multi-lingual Optical Character Recognition (OCR) solution
    py311-ocrmypdf-16.11.1 – Adds an OCR text layer to scanned PDF files
    py311-pyocr-0.8.5_1 – Python wrapper for OCR engines (Tesseract, Cuneiform, etc)

You may also like...