Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed, but I think vocal biometrics is a significantly more difficult problem to solve when even the best speech recognition still has issues like this.

Phonetics is hard. Especially with ambient noise, echo and such. I had a conversation with one of the speech engineers when I worked at a speech recognition company and the level of detailed problems to solve was impressive. Totally made sense after talking about it, but things I would not have thought about before.

I'd imagine the next thing to come in this area that would really make an improvement is an "on person" microphone. Maybe it's a pen in your pocket, or some kind of vibration detection (that could pick up the wearers voice), that would then allow some improvemnts in the domain of "who is talking" and how well the voice is processed.



The vibration detection is commercially available, in the 3-digit $ range, but it's unfortunately hideous as it needs some mild force against the throat and thus requires a very high neckline or a scarf. The potential alternative, Subvocal recognition, suffers from the need to have nerve/muscle-sensing electrodes on/around the throat, which have poor long-term bio compatibility (think >10h/day, on average) and the alternative, implants, suffer from growing out if a wire sticks out, at least as far as I know. One might be able to implant special electrodes the skin would not reject, but I don't know of any suitable material. The obvious benefit is that you use your normal speaking logic, just stop at the point where you actually move your throat (afaik), and don't modulate with your mouth or lungs. It's silent for anyone around, so one could, theoretically, call someone while sitting inside a meeting, and hear both speakers, while being able to selectively talk to the other end of the phone line.

Both of these technologies work in nightclubs and fighter jets. I assume they are very high SNR, as far as ambient noise is concerned. The subvocal one might just be a phonetic/intonation input, which then requires voice synthesis if actual voice is the goal.


And once you've managed to achieve perfect biometrics your next task is to prevent replay attacks.

An alternative, "wireless" approach to near-perfect security would be to invent-plement some kind of vocal, human-executable GPG/TLS.


You don't really need perfect biometrics for it to be useful, or even biometrics at all. A "my human just spoke in this room, and here's a signature of what the fleshbag said" broadcast from a microphone to let devices within a very short range to get confirmation of which person spoke if they picked up a command, coupled with pairing your speaker to your devices to give the notification from your mic more authority would already help substantially with most of the issues people have. It wouldn't stop a determined adversary, but for most people it's not a determined adversary that is the problem.

It doesn't even need to get good audio - just enough to give a bit of an indication that what the device picked up was me talking and not random noise and ideally some way to somewhat correlate it to the audio the device picked up to give it an indication it was me it heard. It'd also give you the option of setting the devices to require confirmation for certain types of orders if they were not confirmed by an authorised device, or if they were not confirmed by a device (so you could let people present give instructions but not some random joker on voice chat in your online game for example).

If we could get support for that into e.g. a watch, it'd be very much useful.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: