• 0 Posts
  • 12 Comments
Joined 10 months ago
cake
Cake day: November 23rd, 2024

help-circle



  • For the OCR process you can probably wrangle up a simple bash pipeline with ocrmypdf and just let it run in the background once until all your PDFs have a text layer.

    With that tool it should be doable with something like a simple while loop:

    find . -type f -name '*.pdf' -print0 |
        while IFS= read -r -d '' file; do
            echo "Processing $file ..."
            ocrmypdf "$file" "$file"
            # ocrmypdf "$file" "${file%.pdf}_ocr.pdf"   # if you want a new file instead of overwriting the old
        done
    

    If you need additional languages or other options you’ll have to delve a little deeper into the ocrmypdf documentation but this should be enough duct tape to just whip up a full OCR cycle.


  • In case you are already using ripgrep (rg) instead of grep, there is also ripgrep-all (rga) which lets you search through a whole bunch of files like PDFs quickly. And it’s cached, so while the first indexing takes a moment any further search is lightning fast.

    It supports a whole truckload of file types (pdf, odt, xlsx, tar.gz, mp4, and so on) but i mostly used it to quickly search through thousands of research papers. Takes around 5 minutes to index everything for my 4000 PDFs on the first run, then it’s smooth sailing for any further searches from there.




  • I used the recommended migration tool and it worked okay for many containers but iirc the docker ones had to have one of the security options manually changed in their config which didn’t transform properly with the tool (maybe nesting enable?).

    May very well have changed in the meantime or I only made a mistake, that was in my experimentation phase.

    Ultimately, I did rebuild my instances from the ground since I also switched file system, and to make better use of incus profiles (e.g. one with docker provisioned, one with monitoring and so on) so I couldn’t give you a long-term migration review.

    For me that was (relatively) painless by just migrating the docker volumes in place and rebuilding the stacks, of course ymmv.

    If you decide on migrating and stumble upon issues don’t hesitate to hit me up - I’m only an amateur but maybe I can still help!


  • After having my dinky homelab machine on proxmox for a couple years, since the start of the year I am now running basically everything under a clean Debian system using incus and docker on the individual lxc guests.

    Incus has completely replaced proxmox for me and it’s so much easier to reason about (for me at least) that I wanted to maybe point your cold hands in that direction too ;)