Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Its considered an LLM tell because its a term that is rarely used by the median modern casual writer; its not at all uncommon even in current literature [0], and its even more common in older literature, so complaining about it in a model designed to reproduce a particular style of 19th century print literature is silly in the extreme.

[0] many “LLM tells” fit this pattern of just being common features of professionally-published works that are less often seen in casual writing.



FWIW, the 19th century style dataset this is trained on doesn't seem to have any examples of delve [1], with the exception of:

> It was into that bank that the creature had worked its way, and on listening I could hear it delving and scraping at a great rate, about a yard from the back of the wall.

I bring that up to point out that this isn't necessarily (more) common in the 19th century style print literature, so the observation might not be silly. The model creating the modern synthetic version injected 'delve' 9 times, which implies that it is more frequently used in modern literature or just something that models tend to inject. Though, I could be missing something (either in searching the data set, or how this works).

[1] https://huggingface.co/datasets/dleemiller/irish_penny_journ...


It's very common in fantasy novels - dwarves and wizards do a lot of delving into caves, dungeons, and towers. It's also a solid academic term, so scientists delve into a lot of subjects, and brain people do a lot of delving into psyches.

LLMs are raising the bar by expanding the vocabulary people are exposed to, so words like delve will stick out. I think it's preferred by writers because it articulates a nice sounding alternative to words like explore, venture, analyze, think about, confront, etc. It's a useful, versatile word, and one of the metrics by which writers measure quality is the minimization of syllables.

LLMs are mostly indistinguishable from humans at this point; a one-shot output from any of the major models can be recognized in the same way you might recognize a writer. With multiple style passes, you're not going to be able to tell the difference between ChatGPT, Ronald Reagan, Bill Clinton, Hunter S. Thompson, Einstein, or any other sufficiently modeled figure. Throw in a few tens of thousands of words written by yourself and most of the models will do a nearly flawless job of copying your stylometric profile.


Delving also has the implication of going deep, while exploring has the implication of going wide. I wonder if human authors really do a good job of picking the right word there. Either way I wonder if “misused delve” could be an interesting signal.


It's being looked at as an AI signal, which is causing labs to artificially suppress it, so it may end up being a "human-authored" signal in the end. Give it a year or two, though, and we're looking at AI with superhuman word choice. There will be dozens of layers of introspection underlying the selection of each word, in a broad context, articulating exactly whatever the user wants. The philosophical and psychological implications of superhuman text generation, beyond the p(doom) discussions, gets crazy. Superhuman persuasion is one facet, but unintended manipulation through reinforcement of secondary and peripheral notions in the context create all sorts of weirdness.

Language communicates ideas, and we've made machines that produce intricate, sophisticated ideas that land in our brains. The consequences are going to be fascinating.


Indeed. It's because of this that I have serious concerns about what "AI" is doing to our writing. My son has been flagged for using "AI" in writing his papers because it sounded "too good." But, I know he wrote it. I've run into the same issue where people have suggested something I wrote was AI because of certain vocabulary words that I've used. The only "defense" against this is to write a little bit shitty, because then there is no suspicion. If everybody starts doing that, especially in academic settings, that's a road to serious sadness IMHO.


We’re in a transitional stage at the moment, so I’m not too worried about maladaptive shitty writing becoming the norm (although, it sucks that your kid is being punished for our failure to adapt).

It would be totally nuts to take points off for using spell check. An LLM should be able to provide style check without causing any concerns; it will become the norm, and then too good prose won’t cause any flags to be thrown.


I delve into facts and case law frequently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: