Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you replace "fuzzy human speech-command detection" with "buttons which can accidentally be pressed" - how is this different from butt-dialing?


Most phone now have some kind of lock screen, which makes it pretty difficult to get to the butt dialing stage. Will speech command recognition get to that stage?

The main reason I won't have any of those products in my house is because of that. I'd much rather have a confirmation of some kind before the system takes action.

"...random talking..."

Assistant "I am recording"

Me: "Stop recording"


You'd be amazed at my dad's ability to butt dial people with an iPhone. It's possible. Like multiple times per week possible.


I always press the lock button before I put the phone back in my pocket, but I've seen people put the phone back in their pocket or purse unlocked. If you have the phone app already running, butt dialing is very likely to occur.


Lock doesn't have to be swipe-to-open. I'd be amazed if he could buttprint-unlock ;)


http://anuscan.com/

It might be a bit awkward at first to do that in front of strangers, but once it catches on, that will wear off. ;-)


My friend does this fairly often. He'll finish a call, and turn off his screen, but he has a tendency to to hit the fingerprint scanner as he's putting it into his pocket due to the design of the case and the way he holds it.


Got an LG G5 and quickly learned that they added a "feature" where double tapping each volume button opened a preconfigured application (Camera or QuickMemo+). Even when locked. Phone had to be completely powered off for it not to trigger. Went in settings and disabled that on day two when I saw how many pictures I had of the inside of my pocket.


Some Androids (like mine) won't lock for a few seconds after turning off the screen. It's a nice feature when you press to turn off the screen, and quickly realize you need to do something else. Just turn on the screen and it's still unlocked.

Entering the pattern after it's locked is pretty much impossible though.


I can configure how long that delay should be. IIRC it's int he display/lockscreen menu. I don't like it tho, as I much prefer the device to be locked when I tell it to, so I won't have to care anymore. The fingerprint scanner on the back makes the unlocking trivial though.


But then that means if you put it in your pocket it might suddenly do something because it's not locked yet.


It might, but it hasn't in the 18 months I've been using it. I don't know what the timer is supposed to be, but it seems like something less than 2 seconds. I'm pretty sure I couldn't get my phone in my pocket and press the power button that quickly if I wanted to.

Also, the case adds to the pressure needed to depress the button anyway.


I dropped my phone in the gym and it somehow typed garbage all over the Notes app where I keep my grocery list and record of what I did in the gym last...

This sort of thing makes me nostalgic for the old days of explicit "Save" commands in every application.


I'll settle for a reasonable "undo" feature.


Undo seems to be a concept in Android apps sometimes, but I'm not aware of a universal way to access it, like Ctrl-Z.

And once you have Undo, it's nice if you always have Redo, in case you Undo once too many.


Yeah, a digital assistant could (should) verbalize "uh huh, yes, i'm listening, interesting, hmm" like a real, nosy human. Then you get a chance to say, "go away, alexa".


The Echo devices will light up when listening, and can also given audible warning.

But it'd certainly be good with more feedback.


I will have one of these in my house when it doesn't send recordings of me to the Internet in the first place. It can do the voice recognition locally. I'm actually really interested in having one of these in my home, I'm also actually really not interested in it being cloud based at all.


some kind of acoustic biometrics would be helpful here (ie respond only to account holder, or disable some actions for others) along with better heuristic recognition of directives

thats not foolproof but much better than what we have now, and i think we are pretty close


Agreed, but I think vocal biometrics is a significantly more difficult problem to solve when even the best speech recognition still has issues like this.

Phonetics is hard. Especially with ambient noise, echo and such. I had a conversation with one of the speech engineers when I worked at a speech recognition company and the level of detailed problems to solve was impressive. Totally made sense after talking about it, but things I would not have thought about before.

I'd imagine the next thing to come in this area that would really make an improvement is an "on person" microphone. Maybe it's a pen in your pocket, or some kind of vibration detection (that could pick up the wearers voice), that would then allow some improvemnts in the domain of "who is talking" and how well the voice is processed.


The vibration detection is commercially available, in the 3-digit $ range, but it's unfortunately hideous as it needs some mild force against the throat and thus requires a very high neckline or a scarf. The potential alternative, Subvocal recognition, suffers from the need to have nerve/muscle-sensing electrodes on/around the throat, which have poor long-term bio compatibility (think >10h/day, on average) and the alternative, implants, suffer from growing out if a wire sticks out, at least as far as I know. One might be able to implant special electrodes the skin would not reject, but I don't know of any suitable material. The obvious benefit is that you use your normal speaking logic, just stop at the point where you actually move your throat (afaik), and don't modulate with your mouth or lungs. It's silent for anyone around, so one could, theoretically, call someone while sitting inside a meeting, and hear both speakers, while being able to selectively talk to the other end of the phone line.

Both of these technologies work in nightclubs and fighter jets. I assume they are very high SNR, as far as ambient noise is concerned. The subvocal one might just be a phonetic/intonation input, which then requires voice synthesis if actual voice is the goal.


And once you've managed to achieve perfect biometrics your next task is to prevent replay attacks.

An alternative, "wireless" approach to near-perfect security would be to invent-plement some kind of vocal, human-executable GPG/TLS.


You don't really need perfect biometrics for it to be useful, or even biometrics at all. A "my human just spoke in this room, and here's a signature of what the fleshbag said" broadcast from a microphone to let devices within a very short range to get confirmation of which person spoke if they picked up a command, coupled with pairing your speaker to your devices to give the notification from your mic more authority would already help substantially with most of the issues people have. It wouldn't stop a determined adversary, but for most people it's not a determined adversary that is the problem.

It doesn't even need to get good audio - just enough to give a bit of an indication that what the device picked up was me talking and not random noise and ideally some way to somewhat correlate it to the audio the device picked up to give it an indication it was me it heard. It'd also give you the option of setting the devices to require confirmation for certain types of orders if they were not confirmed by an authorised device, or if they were not confirmed by a device (so you could let people present give instructions but not some random joker on voice chat in your online game for example).

If we could get support for that into e.g. a watch, it'd be very much useful.


I agree, it would probably not even be close to foolproof.

I believe any implementation of security through acoustic biometrics would be vulnerable to replay attacks.

Systems to reproduce acoustics with high fidelity are commonplace - You might be using the output component of such a system right now if you're listening to music.

You could make the Assistant remember the exact fingerprints of all previous activation phrases and only trust you if it was original. This could be circumvented if you spoke the activation phrase at any point where your assistant could not hear you, for example to another Assistant of the same brand.


Wouldn't modifying the sample slightly, like lowering pitch by a few cents or stretching parts, also make it seem original?

Audio is definitely too easy to spoof for it to be a security method IMO.


i meant biometrics merely as a UX improvement, ie, to help prevent the device from responding to the wrong thing "accidentally"

it may have a place in security as well but i can only see it as part of a much more holistic model


"respond only to account holder, or disable some actions for others"

This is exactly how the Google Home works right now.


On the Echo at least, there's an option to have the device make a thump-type sound when it's activated. There's also a light ring that activates out of the box and shows where it believes the audio is coming from.


I tested mine right now, and I can't see any correlation between the light ring and my location relative to it.


could be picking up a bounced audio if it close to a wall or other surface.


Maybe, but if it can't handle that, it's kinda pointless to try to detect direction - all the places in my house where I'd be likely to want to place it are <0.5m from a wall or less; I'm certainly not going to place them in the middle of a room.


as long as you remember to lock the phone before butt pocketing it. I still butt dial today but its mainly right after i talked to someone.


I can't pick up my coworker's phone and manually dial a number, because years of learning from our mistakes have led us to implement lock screens and such.

I can, however, say "Hey Siri, call 867-5309." Or "Hey Siri, Facebook status: my anus is bleeding. SEND!"

Because we are apparently incapable of applying past lessons to new technologies.


It really shouldn't be hard to do a quick safe filter and ask "You sure you want to send that?", again, like a real human.

My point is they're really half-assed products right now that are behaving badly/immaturely.


Apparently it did have a confirm question.


> "Hey Siri, Facebook status: my anus is bleeding. SEND!"

Serves me right for having screen reading on near my phone.....

(jk, jk, but seriously though)


If they operate anything like the way I operate, they know there should be some kind of lock on there, they just haven't gotten around to building that part yet.

And if their management is anything like my management, they go ahead and release it anyway.


Hm. I can't "Hey Siri" someone else' phone, can I? It's certainly never worked for me.


To my knowledge, “Hey Siri” has never been tied to the owner’s voice. I was able to post to a colleague’s Facebook account a couple years ago, using nothing but voice commands. It was sitting on his desk, locked. I haven’t tried it since, and I keep it disabled on my own phone.


"Hey Siri" has in fact been tied to the owner's voice since the iPhone 6s (2015): https://www.macrumors.com/2015/09/11/apples-hey-siri-feature...


Progress! I stand corrected. Thank you.


Meanwhile, I have my Pixel lock on screen off (only because the fingerprint reader makes unlocking/turning on easy) with OK Google unlocking enabled.

The only problem here is that Google still doesn't recognise my voice half the time, perhaps because I'm never sure about how I should speak to an inanimate object.

Speaking of unlocking methods, face unlock appears to no longer be available as an option for me. I can't find it in settings anymore (smart unlock etc).


If the iPhone has a lock screen, Siri will not do any of those actions without it being unlocked first FYI.


That was definitely not the case when I tried it a couple years ago. If they’ve since locked it down, then I’m glad to hear that.


a) Your phone has a microphone that picks up information spoke more-or-less directly into it. An Echo is specifically equipped with a microphone array designed to extract audio from anywhere in the room.

b) There is a great deal less ambiguity around "pressing buttons" than there is around interpreting speech. While it is unlikely that your phone will incorrectly detect button presses, it's very common for voice-activated devices to a) incorrectly detect a wake word (either negative or positively); and b) misunderstand some particular word used in the command. Your phone is not going to think you pressed the "Clothes" button when you actually pressed "Close".

c) The entire functionality of the device is accessible from behind that big ambiguous interface. On a phone, there are many distinct steps and screens to step through when you want to do something (this complex interface, by the way, is a non-trivial part of why butt-dials are quite rare these days). On a "smart speaker", most things are just one misheard statement/command away from occurring.


It's interesting that the pattern of human communication where things are just one misheard statement/command away from occurring has been well modeled in these devices.

Does having the device programmed to make mistakes make it more comfortable to use, because we know humans are infallible too.


I don't think there is much of a difference. People took a while to understand butt-dialing, and not do it as much. It'll take a while for people to get used to voice assistants that trigger seemingly randomly.


when my phone is locked (which it always is in a pocket) it can't butt dial.

whats the analogous behaviour for alexa?

also remember these are shared devices in a home, not a personal device in your hand. could i shout into somebody's window, "hey alexa send my browsing history to [email protected]"!


The analogous behaviour would be you being aware of Alexa's flaws and not triggering it accidentally.

While you may have never pocket dialled, plenty of people have. I've received more than a few accidental calls in the past.

I equate this to some phone users placing their devices hanging over the edge of tables - as if they don't care about them hitting the floor. Should phone makers toughen their phones and should Amazon improve Alexa anyway? Sure.


They can still call the police while locked, which is something that happened to a friend with an old feature phone while were were in the process of illegally hiding/dumping an old car.


Mike, is that you?


The "lock" analogue on an Alexa device would be the mute button on the top of the unit.

https://www.safety.com/wp-content/uploads/2017/06/mute-butto...


AFAIK, pocket dials have mainly been reduced by improvements in phones, rather than changes in user behavior; if the camera just sees inside of a pocket, it's unlikely to be an intentional button press.


I mean way back in the day when people had candy-bar phones and had to remember to enable the keypad lock :D


I made a number of emergency calls from my pocket on my Nokia 1100 even with key lock enabled. It was infuriating. It happened once when I was having a heated argument with somebody and it took some work to convince the dispatcher not to send a law enforcement officer. (There was nothing physical happening and no threats of violence, but we were speaking very sharply to each other.) I do not miss that about my Nokia 1100 at all.


Simply: no one else is dialing via my butt.


Just as nobody but the primary user(s) triggered the failure in this case!


hold my beer


I haven't butt-dialled anyone since I got swipe to unlock on my iPhone. A 6 digit unlock code/fingerprint would make it seem even less likely to happen again.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: