Can you read them? Speech to text perhaps. That can also be done locally.
If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.
If assessing other solutions, that's at least an upper bound on the cost of any other solution.
Both will force use to us D1 ballpoint pen cartridges, so no suggestions in you must write with favorite fountain pen, or are a Hi-Tec-C only pen lifestyle.
If their handwriting is like mine, i think it will take more time that way. The other people will be interrupting them every second to ask “what’s that word?”.
Eventually, they’ll learn and speed will go up, but with this amount of work, work will be finished before they make up for the learning curve.
Yea, I use a cheapie voice recorder that only saves .wav files for ~10 memos per day, and Whisper transcripts are good. "Tiny" model, 4GB ram laptop. "Base" model runs too, but slower, and produces different inaccuracies.
But overall, if I were suggest an ideal process: 1) transcribe notes w/ Whisper, 2) play back the media in VLC with the transcripts and correct the errors. T = 16 hours of proofing/correction + ~8 hours of headless transcription of *.wav before hand.
I’d add that I had better luck using smaller chunks (about 20 seconds) per wav file for accuracy. Whisper seems to go berserk if you pump in lengthy audio (30+ seconds).
I’d be tempted to at least try breaking down the notes into one line long images (about a sentence) each and give it ago with Gemini. I haven’t tested their ocr, but even if it has errors, I bet you could just ask Gemini again to best fix the sentence.
Since OP is doing this deliberately for speech2text they will presumably enunciate very clearly and have a good mic. S2T has very good performance under ideal conditions like that
so what? it's 10 years of notes. even if you double or triple or quadruple the time editing it for inaccuracies it's still better than an ocr almost certainly.
Tesseract out of the box is terrible for anything non standard. I tried using it for the comic books. Unusable. The training for your font is doable, but it's very time intensive (while the tools are pretty good!).
I'd say any of the language models are far better than Tesseract. I did some work in this space and it was an absolute nightmare, event working with pdfs.
For OCR of handwriting, I did some comparative analysis a year back, and I found that Tesseract was... not good. However TrOCR was okay, certainly the best of the FOSS solutions. But Textract from Amazon was the best one by far far for handwriting, though your mileage will vary
Great solution. And, if the notes don't contain confidential information, you could totally hire someone on Fiverr to read them for you. Or on Mechanical Turk, have the same notes be read more than once by different people, so you can compare and more easily find errors in transcription later.
Additionally, you could hire other people to read them, dividing the task into whatever manageable chunks or even having multiple people read the same parts for agreement.
In the days before good software transcription I saved a ton of time I grad school by splitting up interviews and using mechanical Turk or up work ( can’t remember which one, to transcribe 1 minute snippets, and then took another pass)
I've been working on an AI app for 18 months and almost replied to tell you no, AI cant do that.
And, doh, took a minute, and realized I'm a dunce.* And now there's at least 3 I can think of off the top of my head, not to mention local.
Training a handwriting recognition AI is universally accessible. What a time to be alive.
* If you're dense like me: they're not saying "any AI" as "any handwriting recognition machine learning model you build from that dataset". They're saying as AI as any multimodal LLM, it'll do in context learning on what you upload.
I have a similar issue but reading them won't work because the person who wrote them passed away. It there another solution that could transcribe this sort of thing (maybe the original use case would have been for historical texts)?
If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.
If assessing other solutions, that's at least an upper bound on the cost of any other solution.