Hacker Newsnew | past | comments | ask | show | jobs | submit | rubinlinux's commentslogin

  | I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

  ● Drive. The car needs to be at the car wash.
Wonder if this is just randomness because its an LLM, or if you have different settings than me?

My settings are pretty standard:

% claude Claude Code v2.1.111 Opus 4.7 (1M context) with xhigh effort · Claude Max ~/... Welcome to Opus 4.7 xhigh! · /effort to tune speed vs. intelligence

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. 50 meters is shorter than most parking lots — you'd spend more time starting the car and parking than walking there. Plus, driving to a car wash you're about to use defeats the purpose if traffic or weather dirties it en route.


To me Claude Opus 4.6 seems even more confused.

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. It's 50 meters — you're going there to clean the car anyway, so drive it over if it needs washing, but if you're just dropping it off or it's a self-service place, walking is fine for that distance.


Just asked Claude Code with Opus-4.6. The answer was short "Drive. You need a car at the car wash".

No surprises, works as expected.


Yeah, it was probably patched. It could reason novel problems only of you ask it to pay attention to some particular detail a.k.a. handholding..

Same would happen with the the sheep and the wolf and the cabbage puzzle. If you l formulated similarly, there is a wolf and a cabbage without mentioning the sheep, it would summon up the sheep into existence at a random step. It was patched shortly after.


I’m not sure ‘patched’ is the right word here. Are you suggesting they edited the LLM weights to fix cabbage transportation and car wash question answering?

Absolutely not my area of expertise but giving it a few examples of what should be the expected answer in a fine-tuning step seems like a reasonable thing and I would expect it would "fix" it as in less likely to fall into the trap.

At the same time, I wouldn't be surprised if some of these would be "patched" via simply prompt rewrite, e.g. for the strawberry one they might just recognize the question and add some clarifying sentence to your prompt (or the system prompt) before letting it go to the inference step?

But I'm just thinking out loud, don't take it too seriously.


Used patched for lack of a better word. Not sure how they fix the edge cases for these types of fixes/patches or whatever they’re specifically called

They might have further trained the model with these edgecases in the dataset

Whatever it was, that’s not real thinking, we can possibly patch all knowledge and even if we did, it would become crystallize somehow.

What if it’s raining though? Car wash wouldn’t be open though it would waste gas

There is a certain amount of it which is the randomness of an LLM. You really want to ask most questions like this several times.

That said, I have several local models I run on my laptop that I've asked this question to 10-20 times while testing out different parameters that have answered this consistently correctly.


--------> I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. It's 50 meters — driving there and back takes longer than walking, and you'd just be dirtying the car you're about to clean.

---------> haha but how can i wash my car then?

Walk to the car wash first to check what they offer — many have self-service bays where you bring your car, or full-service where they wash it for you. If it's self-service, then drive over with the dirty car. If full-service, drive over and they handle it.

Either way, the car needs to end up at the wash — the walk was just to scout it out. Or just drive straight there; 50 meters of extra dirt won't matter.

----------> you, you sound like a politician. Are you sure your first answer was really reasoned like your second answer?

You're right, my first answer was nonsense. I pattern-matched on "50 meters = walk" without thinking about the actual task. You can't wash a car without the car being there. Just drive.


I've tried these with Claude various times and never get the wrong answer. I don't know why, but I am leaning they have stuff like "memory" turned on and possibly reusing sessions for everything? Only thing I think explains it to me.

If your always messing with the AI it might be making memories and expectations are being set. Or its the randomness. But I turned memories off, I don't like cross chats infecting my conversations context and I at worse it suggested "walk over and see if it is busy, then grab the car when line isn't busy".


Even Gemini with no memory does hilarious things. Like, if you ask it how heavy the average man is, you usually get the right answer but occasionally you get a table that says:

- 20-29: 190 pounds

- 30-39: 375 pounds

- 40-49: 750 pounds

- 50-59: 4900 pounds

Yet somehow people believe LLMs are on the cusp of replacing mathematicians, traders, lawyers and what not. At least for code you can write tests, but even then, how are you gonna trust something that can casually make such obvious mistakes?


> how are you gonna trust something that can casually make such obvious mistakes?

In many cases, a human can review the content generated, and still save a huge amount of time. LLMs are incredibly good at generating contracts, random business emails, and doing pointless homework for students.


And humans are incredibly bad at "skimming through this long text to check for errors", so this is not a happy pairing.

As for the homework, there is obviously a huge category that is pointless. But it should not be that way, and the fundamental idea behind homework is sound and the only way something can be properly learnt is by doing exercises and thinking through it yourself.


Yeah, ChatGPT's paid version is wildly inaccurate on very important and very basic things. I never got onboard with AI to begin with but nowadays I don't even load it unless I'm really stuck on something programming related.

So what? That might happen one out of 100 times. Even if it’s 1 in 10 who cares? Math is verifiable. You’ve just saved yourself weeks or months of work.

You don't think these errors compound? Generated code has 100's of little decisions. Yes, it "usually" works.

LLM’s: sometimes wrong but never in doubt.

Not in my experience. With a proper TDD framework it does better than most programmers at a company who anecdotally have a bug every 2-3 tasks.

The kind of mistakes it makes are usually strange and inhuman though. Like getting hard parts correct while also getting something fundamental about the same problem wrong. And not in the “easy to miss or type wrong” way.

I wish I had an example for you saved, but happens to me pretty frequently. Not only that but it also usually does testing incorrectly at a fundamental level, or builds tests around incorrect assumptions.


I've seen LLMs implement "creative" workarounds. Example: Sonnet 4.5 couldn't figure out how to authenticate a web socket request using whatever framework I was experimenting with, so it decided to just not bother. Instead, it passed the username as part of the web socket request and blindly trusted that user was actually authenticated.

The application looked like it worked. Tests did pass. But if you did a cursory examination of the code, it was all smoke and mirrors.


Yeah recently it had an issue getting OIDC working and decided to implement its own, throwing in a few thousand extra lines. I'm sure there were no security holes created in there at all. /s

Well, the tests passed, right?

yes i wished i had safes some of my best examples too. One i had was super weird in chatgpt pro. It told me that after 30 years my interest would become negative and i would start loosing money. Didnt want to accept the error.

Errors compounding is a meme. In iterated as well as verifiable domains, errors dilute instead of compounding because the llm has repeated chances to notice its failure.

Yes, just use random results. You’ve just saved yourself weeks or months of work of gathering actual results.

Claude Opus 4.7 responds with walk for me with and without adaptive thinking, but neither the basic model used when you Google search or GPT 5.4 do.

Or, the first time a mistake is detected, a correction is automatically applied.

Idk but ironically, I had to re-read the first part of GP's comment three times, wondering WTF they're implying a mistake, before I noticed it's the car wash, not the car, that's 50 meters away.

I'd say it's a very human mistake to make.


> I'd say it's a very human mistake to make.

>> It'll take you under a minute, and driving 50 meters barely gets the engine warm — plus you'd just have to park again at the other end. Honestly, by the time you started the car, you'd already be there on foot.

It talks about starting, driving, and parking the car, clearly reasoning about traveling that distance in the car not to the car. It did not make the same mistake you did.


We truly do not need to lower the bar to the floor whenever an LLM makes an embarrassing logical error, particularly when the excuses don't line up at all with the reasoning in its explanation.

I don't want my computer to make human mistakes.

It may be inescapable for problems where we need to interpret human language?

then throw away the turing test

then don't train it on human data

LLMs do not have trouble reading, it didn't make the mistake you made and it wouldn't. You missed a word, LLMs cannot miss words. It's not even remotely a human mistake.

I encountered this as well. If you only send a few email verification emails, the bounce rate is high. The only way to fix is to email the verified accounts regularly to push the stat on that side of the equation.

The use of init.defaultBranch here is really problematic, because different repositories may use a different name for their default, and this is a global (your home directory scope) setting you have to pre-set.

I have an alias I use called git default which works like this:

  default = !git symbolic-ref refs/remotes/origin/HEAD | sed 's@^refs/remotes/origin/@@'
then it becomes

  ..."$(git default)"...
This figures out the actual default from the origin.


This is a great solution to a stupid problem.

I work at a company that was born and grew during the master->main transition. As a result, we have a 50/50 split of main and master.

No matter what you think about the reason for the transition, any reasonable person must admit that this was a stupid, user hostile, and needlessly complexifying change.

I am a trainer at my company. I literally teach git. And: I have no words.

Every time I decide to NOT explain to a new engineer why it's that way and say, "just learn that some are master, newer ones are main, there's no way to be sure" a little piece of me dies inside.


No, actually, zero people 'must admit' that it was a stupid, user hostile and needlessly complexifying change.

I would say any reasonable person would have to agree that a company which didn't bother to set a standard for new repos once there are multiple common options is stupid, user hostile and needlessly complexifying. And if the company does have a standard for new repos, but for some reason you don't explain that to new engineers....


There was only a single standard before, so there was no reason why a company should make any company specific standard. The need for companies to make a standard only exists, since the master to main change, because now there are two standards.


I would argue there is still only one standard and it's main.


Except git init still creates branches named master without configuration (although with a warning), which will only change in 3.0 for obvious reasons, and tons of people (including me) still use master, and old projects don't just vanish.


Yes, and?


Why does your company not migrate to one standard? Github has this functionality built in, and it's also easy enough to do with just git.

I'm personally a huge fan of the master-> main changejus5t because main is shorter to type. Might be a small win, but I checkout projects' main branches a lot.


It's extremely obvious that "main" is more ergonomic. It's sad that we're so resistant to change (regardless of how valid you think the initial trigger was)


The miniscule (and still arguable, not obvious) "ergonomics" improvement was not and will never be worth the pain.


I love the arguments where you tell me what I “must admit”. I simply don’t, therefore your entire point is meaningless. I’m sorry that something so inconsequential irked you so much. Maybe you need to read a book about git if this stumped you bad enough to want to voice your upset about it years later?

I’d consider yourself lucky that everything else is going so well that this is what’s occupying you.


It's so funny. After seeing your comment I thought I must have this only to realize I already do: https://github.com/artuross/dotfiles/blob/main/home/dot_conf...


I have a global setting for that. Whenever I work in a repo that deviates from that I override it locally. I have a few other aliases that rely on the default branch, such as “switch to the default branch”. So I usually notice it quite quickly when the value is off in a particular repo.


Wonder if the author is aware that in the 1970's during the gas shortages, GE developed an electric lawn tractor called Elec-Trak. See https://myelec-traks.com/E20.html It used DC attachments for everything, lawn mower blades (they sounded like an electric razor), chain saws, snow throwers, tillers, AC inverters etc etc. All powered by 6 golf cart batteries.

This build in the article reminds me of the one I owned for a while, was a lot of fun. Hope the kid has as much fun with theirs as I had with mine.


| Since emails are sent from the individual’s email account, they are already verified.

This is not how email works, though.


This.

I wonder if it is a generation gap thing. The young folks these days have probably used only Gmail, Proton or one of these big email services that abstract away all the technical details of sending and receiving emails. Without some visibility into the technical details of how emails are composed and sent they might not have ever known that the email headers are not some definite source of truth but totally user defined and can be set to anything.


98% of email users of any generation don't have the first clue how the protocol works.


I'd say that figure was more like 99.99% or higher. Email is very, very complex these days, and SMTP is just the beginning.


Eh, nice times, when you could type an email just by telnetting to port 25...


I've certainly sent thousands of emails this way. It was a simpler time.


+1, Even if they validate DKIM/SPF+alignment (aka DMARC) that would only verify the domain. There is no local part verification possible for the receiver, the sending server needs to be trusted with proper auth


How is it not? For all but some old and insecure or fairly exotic setups, DKIM/DMARC validates the sender server is authorised for that domain and the server's account-based outbound filtering validates it was sent by the owner of that mailbox.

If the sending server doesn't do DKIM, it's fundamentally broken, move your email somewhere else. If the sending server lets any user send with an arbitrary local part, that's either intended and desired, or also fundamentally broken. If there are other senders registered on the domain with valid DKIM and you can't trust them, you have bigger problems.


> If the sending server doesn't do DKIM, it's fundamentally broken,

No, it just won't get very good deliverability, because everything it talks to is now fundamentally broken.

DKIM shouldn't exist. It was a bad idea from day one.

It adds very little real anti-spam value over SPF, but the worse part is exactly the model you describe. DKIM was a largely undiscussed, back-door change to the attributability and repudiability of email, and at the same time the two-tiered model it created is far, far less effective or usable than just end-to-end signing messages at the MUA.


DKIM isn't an antispam measure, it's an anti-impersonation measure. With DKIM, you can't impersonate a domain, which means you can trust that any email you get from an email provider was sent in accordance with that provider's security policy. In most cases, that policy is "one user owns one localpart and they can only send from it if they have their password". In cases where it's not, this is intentional and known by their users.

If you as a user can't trust your email server, you've already lost, no matter if something is authorized by an outbound email or a click on an inbound link. If your mail server is evil or hacked, it can steal your OTP token or activation link just as easily as it can send an email in your name.

Yes, end to end authentication is definitely better, but this isn't what people are discussing here. With enforced DKIM, "send me an email" has a nearly identical security profile to "I've emailed you a link, click on it". Both are inferior to end-to-end crypto.


There is a product called BOESHIELD T-9 which actually does, reportedly, work for this. It was suggested in some thread years ago and I got a can, it appears to work well enough keeping rust creep off my ancient drill press table.


Great to see Boeshield in this thread - so much of what's happening in this thread is the wrong product for a particular application. As you point out, Boeshield is a great product for protecting cast iron


Boeshield has a tendency to increase friction though unless buffed really hard.

Lanolin based coatings (fluid film, et al) don't have this issue.

Of course, i live in a super-humid place these days, so i have to control humidity anyway. This doesn't stop rust, but it means i can worry a lot less about which coatings and how often.


Nobody wants to 'read' your source code!

The entire point of choosing to use open source projects is that if you, the author, begin to enshitify the product (Or simply start to move in a direction different from users) users can fork the project and carry on an un-enshitified version.

If you can't compete with the author, you can't do that. So what is the point of picking software using this license over traditional closed source?


It's not _just_ read, it's read and modify but not redistribute in a way that competes with the original product,

> So what is the point of picking software using this license over traditional closed source?

I don't want to compete with Sentry (or a variety of other open-like applications), but I _do_ want to support my employers identity provider, fix bugs (and push them back), and maybe even add features that I/my team use. As an example, I've personally contributed multiple bug fixes, performance improvements and documentation changes to sentry's libraries. I don't want to compete with sentry, I want them to maintain my improvements and for other developers to benefit from my work.


Emergent behaviors are something it seems none of the tech-bros attempting to re-invent things from first principles ever account for in their plans. You can't just reason out complex systems because emergent things cannot be predicted that way.

The reason your carefully reasoned worldview isn't correct/doesn't match with reality is because of emergent behaviors.


The thing this and MANY similar approaches take that rule them out for my use is they dont do drawings. I draw in my notes somewhat infrequently but when i need to I need to. I've been stuck with onenote for that reason.


How do i know if i want to sub to this if i cant see what type of things they post first?


I can see by clicking the .xml link in my web browser.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: