blog - schark.online

As a lifelong student of linguistics, I've watched with fascination — and, I must admit, unbridled optimism — as our relationship with language undergoes a remarkable transformation. The emergence of advanced language technologies hasn't just changed how we communicate; it has fundamentally reshaped our understanding of human expression itself. In this piece, we'll delve into the fascinating intersection of artificial intelligence and natural language, exploring how these innovations are challenging our very conception of authentic discourse — and promising to blur the boundaries between human and machine-generated text in ways we never imagined possible.

...less than two years ago you likely wouldn't give much thought to the previous paragraph. Nowadays, you may have already guessed it- it was generated by AI. Claude 3.5, to be specific.

AI speak is easy to spot but may be hard to explain. The most forthright evidence tends to be certain popular keywords, as highlighted by Geng and Trotta's paper on "Human-LLM Coevolution: Evidence from Academic Writing." Certain vocabulary, like "delve" or "intricate," are common amongst many AI models and have seen an increase in frequency following the popularization of LLMs. More fascinating may be decrease in otherwise popular natural vocabulary- such as common verbs "is" or "are". Other giveaways include syntactic trademarks, such as em-dashes and other unicode-specific characters. Sometimes it goes deeper than that- LLMs' responses often have an inefficiently airy length- almost as if it were trying to lead you along as it pieces everything together itself. They also lack discernible regional dialects [1]- most complexities of geographic language are rounded out in favor of speaking to a general common denominator. It transmits information via the most abrasive, straightforward vernacular. A hymn with almost no discernible words.

Of course this is only temporary, as some inevitable future looms where the language of these models echoes that which the user implicitly wants to hear. We've seen experiments with this already, with OpenAI recently tweaking GPT-4o's personality to enthusiastically embrace users with a sycophantic personality. [2] Though in this case users were put off, how long will it be until these personalities dynamically adjust based on stored historical context and observed emotions on a per-user basis?

Most interesting to me is the change in human behavior in response to these models. Culture moves fast, as does language used to express that culture. This isn't anything new, especially for the Internet. Altered language to avoid content filters have been around for awhile now, most prominently TikTok's popularization of words like "unalived" and "graped" to dodge otherwise sensitive censors. In the wake of AI, many have taken to purposefully leaving typos within there work for proof of authenticity.

idk if you've noticed people purposefully leave mistakes in their posts / try to change the distribution of their language, keeping ahead of LLMs
— kache (@yacineMTB) February 12, 2025

Typos are increasingly an amusing certificate that people actually wrote something themselves...
— Michael Nielsen (@michael_nielsen) May 2, 2025

Alternatively, there's (perhaps instinctual) adoption of LLM's new speech patterns. This makes sense- humans are pattern-driven beings, and we latch onto those we notice most around us. I don't doubt a large population of children currently supplementing ChatGPT for schoolwork will begin habitually spouting AI-isms as a result of its repeated use. Whether these observed phenomena are actually human responses or posing artificial outputs remains an ever-going exercise to the reader.

Online spaces generally tend to have unique vocabulary exclusive to their in-group. The infamous 4chan "greentext" format is one such case, but examples exist for any forum that has ever allowed posting of user text. Seeing these elsewhere is a memetic signal to others of their larger community- a reference given via metacontext online. LLMs will certainly have a similar impact, leaving unforeseen impressions on their communities and their textual habits. This already occurs with certain users picking up more efficient prompting patterns for particular models, and having difficulties retaining that success with others.

So I've noticed some humans starting to use turns of phrase that I previously associated with ChatGPT.
— Eliezer Yudkowsky ⏹️ (@ESYudkowsky) February 14, 2025

it fascinates me that many people are typing like chatgpt with similar sentence structures (it is not x, but y —) even when they are NOT using a chatbot

large language models already leaves an impact on how we speak
— sanje horah (@sanjehorah) May 1, 2025

An LLM-centric future will be unique, one with a world where information is conveyed via learned behavioral pattern recognition as opposed to rankings or pure storage and recall. Not a search engine, but a cognitive experience. Now more than ever, contextual information and emotion are conveyed via implicit memetics of textual formatting and dialogue. To call AI "living" may be a disservice to both our kind and itself. It is, or will be, uniquely different- something likely more akin to an oracle. A communicative passage to a distillation of the collective human psyche. A Third Impact. A harmony of mathematical precision. As a derivative circular feedback of our intelligence it exists out of us but uniquely apart from us. It cannot be a "god" because it continues to retain our faults; our DNA. It cannot be us because it doesn't.

[1] | For an experiment I showed four popular models a particular brand of soda and asked "what type of beverage is this". Each one answered "carbonated soft drink". While true, it's also an incredibly academic response to a usually informal, regional question.

[2] | More worrying may be the temptation of model hosters to attempt pushing narratives or particular discussion. GPT-4o's burst of enthusiasm has nothing on Grok's inexplicably sudden desire to discuss South African geopolitics.

<-blog
<-schark.online

S

ABOUT

MUSIC

PHOTOGRAPHY

WRITINGS

Automated Language; Language Automated