Can you read them? Speech to text perhaps. That can also be done locally. If a n...

mvkel · on June 1, 2024

This is the best answer.

Any techie will desperately try to come up with a tech solution to this problem.

A few months of development later, you might have something that yields trustworthy output.

But 16 hours? No tech solution will be done faster than that.

Don't build a factory for a one-off.

bckr · on June 1, 2024

> But 16 hours? No tech solution will be done faster than that.

True

> Don't build a factory for a one-off.

One thing on my wishlist is that I end up with a way to instantly transcribe my notes.

nolamark · on June 1, 2024

bookmark the nuwa pen project, and check back in 6 month or so when/if experience reports are in. ( https://nuwapen.com/en-us )

If you are willing to use special paper, there is existing Neo Smartpen ( https://shop.neosmartpen.com/ )

Both will force use to us D1 ballpoint pen cartridges, so no suggestions in you must write with favorite fountain pen, or are a Hi-Tec-C only pen lifestyle.

sky2224 · on June 1, 2024

> One thing on my wishlist is that I end up with a way to instantly transcribe my notes.

Many of the implementations are clunky in my opinion, but this exists as a feature in many note taking tablet apps.

lozenge · on June 1, 2024

Err, 16 hours just to read. Then you still need to deal with the inaccuracies of speech to text.

bongodongobob · on June 2, 2024

Have 4 people do it and you're done by lunch.

Someone · on June 3, 2024

If their handwriting is like mine, i think it will take more time that way. The other people will be interrupting them every second to ask “what’s that word?”.

Eventually, they’ll learn and speed will go up, but with this amount of work, work will be finished before they make up for the learning curve.

IanCal · on June 2, 2024

Modern speech to text is, for me, extremely accurate. You also have the original audio and can rerun things as and when technology improves.

harlanji · on June 2, 2024

Yea, I use a cheapie voice recorder that only saves .wav files for ~10 memos per day, and Whisper transcripts are good. "Tiny" model, 4GB ram laptop. "Base" model runs too, but slower, and produces different inaccuracies.

But overall, if I were suggest an ideal process: 1) transcribe notes w/ Whisper, 2) play back the media in VLC with the transcripts and correct the errors. T = 16 hours of proofing/correction + ~8 hours of headless transcription of *.wav before hand.

neverokay · on June 2, 2024

I’d add that I had better luck using smaller chunks (about 20 seconds) per wav file for accuracy. Whisper seems to go berserk if you pump in lengthy audio (30+ seconds).

I’d be tempted to at least try breaking down the notes into one line long images (about a sentence) each and give it ago with Gemini. I haven’t tested their ocr, but even if it has errors, I bet you could just ask Gemini again to best fix the sentence.

IanCal · on June 6, 2024

Whisper works on 30s chunks iirc. You need to use something that's automatically splitting up your input if it's longer.

lallysingh · on June 2, 2024

Do it over 2 weekends. Or an hour a night instead of TV.

DonsDiscountGas · on June 2, 2024

Since OP is doing this deliberately for speech2text they will presumably enunciate very clearly and have a good mic. S2T has very good performance under ideal conditions like that

weaksauce · on June 2, 2024

so what? it's 10 years of notes. even if you double or triple or quadruple the time editing it for inaccuracies it's still better than an ocr almost certainly.

spaceship__sun · on June 1, 2024

Have you tried gpt4o?

tkgally · on June 2, 2024

Or Gemini 1.5 Pro. The latest multimodal models, while still far from perfect, do seem to be getting better at image recognition and OCR.

Void_ · on June 2, 2024

I recommend https://whispermemos.com

DeathArrow · on June 2, 2024

>Don't build a factory for a one-off.

Maybe other people can use the software, so it's not a one-off?

oslem · on June 2, 2024

I’d imagine 16 hours is a low estimate if OP wants to retain formatting.

PartiallyTyped · on June 1, 2024

I mean, I'd totally try Tesseract[1], a few samples, and a python script. Shouldn't take more than 5 minutes to validate this.

Adobe also has the whole scan thing, and apple can — in some cases — correctly transcribe characters from images.

https://github.com/tesseract-ocr/tesseract

mrazomor · on June 2, 2024

Tesseract out of the box is terrible for anything non standard. I tried using it for the comic books. Unusable. The training for your font is doable, but it's very time intensive (while the tools are pretty good!).

motoxpro · on June 2, 2024

I'd say any of the language models are far better than Tesseract. I did some work in this space and it was an absolute nightmare, event working with pdfs.

driscoll42 · on June 2, 2024

For OCR of handwriting, I did some comparative analysis a year back, and I found that Tesseract was... not good. However TrOCR was okay, certainly the best of the FOSS solutions. But Textract from Amazon was the best one by far far for handwriting, though your mileage will vary

123yawaworht456 · on June 2, 2024

from my experience with tesseract ~1 year ago, it was frequently fucking up even with crispy PNG screenshots

I really doubt it can handle handwriting

radiantspace · on June 2, 2024

Handwritten notes, cmon! Don't waste time on tesseract for that.

dSebastien · on June 1, 2024

You made my day. It's obviously an awesome approach!

Documented here: https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....

bambax · on June 2, 2024

Great solution. And, if the notes don't contain confidential information, you could totally hire someone on Fiverr to read them for you. Or on Mechanical Turk, have the same notes be read more than once by different people, so you can compare and more easily find errors in transcription later.

smarm52 · on June 1, 2024

Some good transcription solutions:

https://zapier.com/blog/best-text-dictation-software/#window...

https://otter.ai/

(Haven't actually tried Otter, but it gets a LOT of good reviews.)

BetterWhisper · on June 1, 2024

Reading the notes aloud is a really good solution without having to spend a ton of time on trying to OCR handwriting.

I can recommend https://www.videototextai.com/ for transcribing huge amounts of audio. (Disclaimer, I am the founder of VideoToTextAI)

ujkiolp · on June 2, 2024

Bad solution simply because of information loss!

* after STT, there is objectively less info in the storage format

* OP cannot take advantage of rapidly advancing OCR tech on the storage

* inevitably OP might end up saving the originals “just in case”- rendering this entire process useless

rahimnathwani · on June 2, 2024

Using STT today doesn't stop OP from also storing high resolution scans for the future.

bcx · on June 2, 2024

Additionally, you could hire other people to read them, dividing the task into whatever manageable chunks or even having multiple people read the same parts for agreement.

In the days before good software transcription I saved a ton of time I grad school by splitting up interviews and using mechanical Turk or up work ( can’t remember which one, to transcribe 1 minute snippets, and then took another pass)

bckr · on June 1, 2024

Great recommendation, thank you. I have considered this and it’s definitely the simplest way to achieve what I want.

jacknobody · on June 1, 2024

If you also gave all that text, with its audio, to the putative AI, it might have enough training material to learn to read your handwriting.

refulgentis · on June 1, 2024

I've been working on an AI app for 18 months and almost replied to tell you no, AI cant do that.

And, doh, took a minute, and realized I'm a dunce.* And now there's at least 3 I can think of off the top of my head, not to mention local.

Training a handwriting recognition AI is universally accessible. What a time to be alive.

* If you're dense like me: they're not saying "any AI" as "any handwriting recognition machine learning model you build from that dataset". They're saying as AI as any multimodal LLM, it'll do in context learning on what you upload.

bckr · on June 1, 2024

Agreed!

giantg2 · on June 2, 2024

I have a similar issue but reading them won't work because the person who wrote them passed away. It there another solution that could transcribe this sort of thing (maybe the original use case would have been for historical texts)?

canadaduane · on June 1, 2024

Using MacWhisper (or other similar whisper.cpp app or utility), you could do it all on-device for a free or one-time fee, too.

note: I have no relation to MacWhisper, just a happy customer.