The matter of LLMs is starting to seriously get out of hands, I could write rivers of ink about it but i’ll make it short because I don’t want boredom to take control.
LLMs are machines, get it? MACHINES! Period, their nature is not different from the one of a car or a coffe maker: the user gives the input, the machine executes the action based on the input, that’s it; there is no thought inside the machine because the machine is not self aware or equipped with consciousness.
There is no thought or feeling, it’s a mechanism that starts when an input is given, this mechanism generates text based on the prompt and the statistic and that’s it.
And most importantly, the goal of the LLM is not to give you a correct response, it’s to give you the response that is the most satisfactory for you: the user; this means that it doesn’t tell you what you need to hear, it tells you what you want to hear.
And that’s why it’s completely irresponsible to use it as a friend or a romantic partner: it’s basically a yes man that seconds everything you say if given enough pre prompting, even worse when used as a financial counselor, medical counselor or legal counselor.
Don’t use text generators as counselor, always rely on human professionals because they tell you what you actually need to hear.
And no, dear theists, just because chatGPT seconded your crazy mental gymnastic it doesn’t make it true, because once again, it’s just a random numbers generator that follows a statistic.
Already covered this topic in some detail here in this post. Including the abuse of LLMs by creationists as automated apologetics generators.
Just so.
That doesn’t mean LLMs are useless.
For example, they are good at certain pattern recognition jobs. They are generally better than a human radiologist at reading X-rays, for example. And can predict things from chest x-rays that they weren’t originally meant to provide such as the statistical probability that you have nascent diabetes.
As a software developer I find LLMs good at generating boilerplate code and other repeating patterns that just require pattern matching rather than actual understanding – essentially, a better auto-complete. But it’s laughably off the rails, often as not, in writing original code based on desired inputs and outputs, and often is clearly guessing even when generating comments explaining your finished code. You have to be very careful with that kind of thing, and the output is generally at best no better than your own, while still requiring just as much testing and desk-checking and so forth. In the meantime, if you come to rely on the LLM to write a lot of your code, you aren’t learning and reinforcing your own understanding and you start to become “stale” and dependent on LLMs. In a word, you become lazy. Or at least this is what is reported to me by colleagues (and what I’d expect to be the case).
After 42 year of writing software I seldom don’t know how to approach any given problem, and at the same time, enjoy solving it; I have no urge to ask an LLM to do it for me [shrug]. It’s largely a solution to a problem I don’t have.
That said, LLMs will be with us going forward as a force multiplier; they aren’t going to go away. The technology HAS plateaued and I expect it will take a couple of years for the market to shake off its “irrational exuberance” and accept that there’s nothing inherent in it that will cause Artificial General Intelligence to “fall out” of it given enough resources and fiddling. Once the purveyors of BS are discredited, and the tech bros with visions of sugar plumbs dancing in their heads are disappointed that they can’t eliminate all those pesky humans, we can get on with life in a more realistic fashion.
What follows will be yet another “AI Winter” until the next major breakthrough.
Indeed, LLMs are nothing but Elizabots on cyber-steroids. They don’t understand anything, have no ability to handle concepts or processes in a genuinely abstract space, and are the lazy person’s go-to for quick and convenient pseudo-solutions.
As Nyarlathotep stated in a different thread:
which is possibly the most succinct objection to LLMs I’ve encountered to date.
Aaaaaand … right on schedule:
By the way, a developer posted on his blog last week, the boast that he had coordinated a flurry of individual LLMs to do all the coding for him on a greenfield (new) project over the course of one week, with the developer in the role of architect, coordinator and tester.
I and some other skeptics tried to pin him down. What was the quality of the code? Relative to what? How was the performance, the maintainability? The documentation? The architecture?
Finally someone thought to ask him what all this LLM activity cost him. He finally admitted the bill was about $6,000 over the week, for the LLM tokens to do this work.
That’s about $300K a year, or about what a very senior level software architect typically grosses … so it ends up being $300K per year PLUS his salary PLUS burning down a forest, climate-wise, to do the work he could have done himself. And that’s even if all the other stuff measures up, which I have to assume it does not since he’s not offering up any actual evidence.
There are of course ways to weasel out of this – assume the LLM will become less costly / more efficient over time, etc. But even the author admitted it was exhausting working at that pace at that level of abstraction, and he probably couldn’t sustain it for any length of time.
It seems to me that even in the best-case scenario this technology is a wash when it is applied to software development or likely anything else involving large amounts of experienced human judgment and direction.
A point of semantics, but I would say an LLM is software. The software runs on a machine, yes, but the actual AI part is software.
As for the point “the goal of the LLM is not to give you a correct response, it’s to give you the response that is the most satisfactory for you: the user”
I would say it is neither - the goal of the LLM is to provide the statistically most probable response.
This may mean that when you ask a question or make a request in a particular way, the outcome will be what you want to hear as opposed to a more logical answer, but that is because of how the request was posed, not because the LLM has an innate goal to please the person making the request.
I do agree that an LLM is not recommended as a counselor in any form - it can certainly be helpful in such roles but you have to understand its limitations and not take pertinent facts on face value.
From my experience, LLMs can be great at providing general knowledge, but - for example with legal counseling - they will often invent case laws and references.
As I understand it, the reason for this is that LLMs are a model of general knowledge. Sure, that knowledge may be weighted based on reputation, etc., but, for example, you wouldn’t expect books on law to have familiarity with Taylor Swift’s discography, but an LLM is going to have all sorts of knowledge on all sorts of things, so yes, it will have some knowledge of legal cases, etc., but it doesn’t process the request in a common sense manner like:
- Request made
- Request is about law
- I will narrow my parameters to the knowledge on law and produce a response that has valid and verified facts only from appropriate texts in my training model
- I have produced a response that contains valid and verified facts taken only from the law books in my training model
instead what you get is:
- Request made
- Using my entire model to determine what is statistically the most probable response
- Producing response
“it’s just a random numbers generator”
I don’t think an LLM would be recommended as a random number generator as its functionality runs contrary to this - it’s more likely to produce the same number (I actually just tested this and it did literally give me the same number based on the same prompt)
Consider an LLM to be like a hammer. A hammer is a great tool if you need to change a nail from being outside something to inside something.
A hammer is a terrible tool if the job is to turn one piece of wood into two pieces of wood - sure, it can accomplish the task, but it’s going to make a large mess.
The success of a hammer is also going to depend on how its used. You hold it further down the handle and you get more power but less accuracy, further up the handle and you get more accuracy, less power. If you turn a claw hammer around, you may still be able to hit the nail, but you’re more likely to bend or break it. You hold the head end and hit the nail with the handle, you may break the hammer before you hammer in the nail.
LLMs can be great, but the prompt matters. Consider your example with the affirmation of an argument - is you ask it in a particular way, you can get ChatGPT to second your claim, but I expect the prompt would need to be heavily weighted into doing so, like “please tell me only positive things about the following:”
If an LLM like Chatgpt is used right, it can provide a much more objective response, and can be eminently helpful. I often ask it to provide an assessment, without indicating which side is mine, and it provides reasonable feedback on both sides. It’s up to the requester whether the feedback is taken at face value or not.
With respect to LLMs, “AI” is a marketing term. LLMs are not “intelligent”, “conscious”, or “aware”. And smarter people than me have declared that there’s nothing inherent in the technology such that sentience will just “fall out of” it based on larger training sets or more compute resources. Indeed, the technology has already plateaued; larger models are only incrementally better beyond a certain point and can actually produce inferior results.
I agree that if you use an LLM correctly and carefully it can be helpful at times. I used one just the other day to produce a quick summary of trouble shooting steps for the battery / inverter stack that provides power backup to my home, and as far as I could tell it was quite accurate, and it arguably gathered the summary faster than I could have with manual Google searches.
That said, I see persistent issues with the LLM that is now on top of all Google search results, for the stuff I typically use it for, which is to research various businesses as to their correct name and address and where the corporate HQ is. For example I get different results for “corporate address” vs “headquarters” vs “corporate location” vs “main address” combined with a company’s name – the same issue LLMs are known to have with word problems involving math, where a correct answer becomes incorrect simply because you substitute one word in the problem for its synonym.
It regularly mis-reports the corporate address for transportation companies because it tends to rely on Federal Motor Carrier Safety Administration data (and web sites that derive info from same) which it doesn’t understand is just the location of transportation terminals and warehouses.
I recently upgraded to Claude Sonnet 4 from GPT5 within my software development IDE and it was a nice step forward, mostly for a better auto-complete and generating simple boilerplate. It still guesses spectacularly wrong about 25% of the time and offers me no net advantage over writing my own code – not even close, really.
My prediction is that when the industry gets over the current hype cycle it will quit trying to make LLMs something they are not (a precursor to Aritifical General Intelligence) and figure out how to correctly productize them for what they are (powerful pattern-matching engines that work best in constrained problem domains with proper guardrails).
I agree. The “AI” goal prior to LLMs was a self-learning system that would develop its own pathways, etc. - what we have with the LLM “AI” of today is a very sophisticated search engine - it doesn’t have any inherent understanding of the knowledge it stores, and so it cannot compute that a response is incorrect by any sense(s).
From my experiences with chatgpt, it will “happily” admit it made an error, will “humbly” assure it will “try” to do better, and then promptly fail at the same task, because while its goal is not to please the user with its responses, it does intentionally misrepresent its abilities and the limits therein.
The random number example is proof of this - it claims to have given a number at random, but it demonstrably isn’t random. Yet it doesn’t admit that the number it gave is not actually random - it maintains the illusion of its integrity.
This is apposite, as will be seen when watched:
It’s interesting to note that several of the issues Sagan included in that course, are still high on the agenda today, and the video also covers the malign influence of LLMs on critical thinking.
It works by comparing (A)the information it was trained on (which is supposed to come from humans, and it assumed to 100% accurate…), v.s. (B)its own created model. It then calculates an error function (C) basically the sum of |A - B|, for every point of data in its training.
It then alters its model by changing it in a small way that will reduce C (over and over again), The model it finds that creates the smallest C is the one it uses when you ask it a question.
Another problem with this setup, as people post more and more AI output, it is getting used by other AI’s as part of their input (A), which is assumed to be 100% correct…
My OCD tapped me on the shoulder, demanding I should comment with a minor nitpicking detail: The comparison is between A and the output of model B. So C is the sum of all |A-B(A)| for all input data points.