Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you read them? Speech to text perhaps. That can also be done locally.

If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.

If assessing other solutions, that's at least an upper bound on the cost of any other solution.



This is the best answer.

Any techie will desperately try to come up with a tech solution to this problem.

A few months of development later, you might have something that yields trustworthy output.

But 16 hours? No tech solution will be done faster than that.

Don't build a factory for a one-off.


> But 16 hours? No tech solution will be done faster than that.

True

> Don't build a factory for a one-off.

One thing on my wishlist is that I end up with a way to instantly transcribe my notes.


bookmark the nuwa pen project, and check back in 6 month or so when/if experience reports are in. ( https://nuwapen.com/en-us )

If you are willing to use special paper, there is existing Neo Smartpen ( https://shop.neosmartpen.com/ )

Both will force use to us D1 ballpoint pen cartridges, so no suggestions in you must write with favorite fountain pen, or are a Hi-Tec-C only pen lifestyle.


> One thing on my wishlist is that I end up with a way to instantly transcribe my notes.

Many of the implementations are clunky in my opinion, but this exists as a feature in many note taking tablet apps.


Err, 16 hours just to read. Then you still need to deal with the inaccuracies of speech to text.


Have 4 people do it and you're done by lunch.


If their handwriting is like mine, i think it will take more time that way. The other people will be interrupting them every second to ask “what’s that word?”.

Eventually, they’ll learn and speed will go up, but with this amount of work, work will be finished before they make up for the learning curve.


Modern speech to text is, for me, extremely accurate. You also have the original audio and can rerun things as and when technology improves.


Yea, I use a cheapie voice recorder that only saves .wav files for ~10 memos per day, and Whisper transcripts are good. "Tiny" model, 4GB ram laptop. "Base" model runs too, but slower, and produces different inaccuracies.

But overall, if I were suggest an ideal process: 1) transcribe notes w/ Whisper, 2) play back the media in VLC with the transcripts and correct the errors. T = 16 hours of proofing/correction + ~8 hours of headless transcription of *.wav before hand.


I’d add that I had better luck using smaller chunks (about 20 seconds) per wav file for accuracy. Whisper seems to go berserk if you pump in lengthy audio (30+ seconds).

I’d be tempted to at least try breaking down the notes into one line long images (about a sentence) each and give it ago with Gemini. I haven’t tested their ocr, but even if it has errors, I bet you could just ask Gemini again to best fix the sentence.


Whisper works on 30s chunks iirc. You need to use something that's automatically splitting up your input if it's longer.


Do it over 2 weekends. Or an hour a night instead of TV.


Since OP is doing this deliberately for speech2text they will presumably enunciate very clearly and have a good mic. S2T has very good performance under ideal conditions like that


so what? it's 10 years of notes. even if you double or triple or quadruple the time editing it for inaccuracies it's still better than an ocr almost certainly.


Have you tried gpt4o?


Or Gemini 1.5 Pro. The latest multimodal models, while still far from perfect, do seem to be getting better at image recognition and OCR.



>Don't build a factory for a one-off.

Maybe other people can use the software, so it's not a one-off?


I’d imagine 16 hours is a low estimate if OP wants to retain formatting.


I mean, I'd totally try Tesseract[1], a few samples, and a python script. Shouldn't take more than 5 minutes to validate this.

Adobe also has the whole scan thing, and apple can — in some cases — correctly transcribe characters from images.

https://github.com/tesseract-ocr/tesseract


Tesseract out of the box is terrible for anything non standard. I tried using it for the comic books. Unusable. The training for your font is doable, but it's very time intensive (while the tools are pretty good!).


I'd say any of the language models are far better than Tesseract. I did some work in this space and it was an absolute nightmare, event working with pdfs.


For OCR of handwriting, I did some comparative analysis a year back, and I found that Tesseract was... not good. However TrOCR was okay, certainly the best of the FOSS solutions. But Textract from Amazon was the best one by far far for handwriting, though your mileage will vary


from my experience with tesseract ~1 year ago, it was frequently fucking up even with crispy PNG screenshots

I really doubt it can handle handwriting


Handwritten notes, cmon! Don't waste time on tesseract for that.


You made my day. It's obviously an awesome approach!

Documented here: https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....


Great solution. And, if the notes don't contain confidential information, you could totally hire someone on Fiverr to read them for you. Or on Mechanical Turk, have the same notes be read more than once by different people, so you can compare and more easily find errors in transcription later.


Some good transcription solutions:

https://zapier.com/blog/best-text-dictation-software/#window...

https://otter.ai/

(Haven't actually tried Otter, but it gets a LOT of good reviews.)


Reading the notes aloud is a really good solution without having to spend a ton of time on trying to OCR handwriting.

I can recommend https://www.videototextai.com/ for transcribing huge amounts of audio. (Disclaimer, I am the founder of VideoToTextAI)


Bad solution simply because of information loss!

* after STT, there is objectively less info in the storage format

* OP cannot take advantage of rapidly advancing OCR tech on the storage

* inevitably OP might end up saving the originals “just in case”- rendering this entire process useless


Using STT today doesn't stop OP from also storing high resolution scans for the future.


Additionally, you could hire other people to read them, dividing the task into whatever manageable chunks or even having multiple people read the same parts for agreement.

In the days before good software transcription I saved a ton of time I grad school by splitting up interviews and using mechanical Turk or up work ( can’t remember which one, to transcribe 1 minute snippets, and then took another pass)


Great recommendation, thank you. I have considered this and it’s definitely the simplest way to achieve what I want.


If you also gave all that text, with its audio, to the putative AI, it might have enough training material to learn to read your handwriting.


I've been working on an AI app for 18 months and almost replied to tell you no, AI cant do that.

And, doh, took a minute, and realized I'm a dunce.* And now there's at least 3 I can think of off the top of my head, not to mention local.

Training a handwriting recognition AI is universally accessible. What a time to be alive.

* If you're dense like me: they're not saying "any AI" as "any handwriting recognition machine learning model you build from that dataset". They're saying as AI as any multimodal LLM, it'll do in context learning on what you upload.


Agreed!


I have a similar issue but reading them won't work because the person who wrote them passed away. It there another solution that could transcribe this sort of thing (maybe the original use case would have been for historical texts)?


Using MacWhisper (or other similar whisper.cpp app or utility), you could do it all on-device for a free or one-time fee, too.

note: I have no relation to MacWhisper, just a happy customer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: