"Recognition Pattern"
GPTs Can Learn to Recognize Individual Instances, but Not Variants or Patterns of Classes; How Useful Can Such Systems Be?
Welcome back to the Lamb_OS Substack! As always, I am thankful to have my subscribers and reader stop by! If you are a regular reader of @Lamb_OS, then you – the readers – know you are the only reason I do this. So as always, thank you for visiting!
Let me begin by introducing myself as Dr. William A. Lambos. I call myself a computational neuroscientist, and I’ve been involved with the AI field (one way or another) since about 1970. As far as credentialing, you can see the footnote below if interested1. I write a lot about AI, but not exclusively. When addressing a topic, I take pains to assure my perspective is highly informed. I draw from many areas of study, and my conclusions, beliefs, and predictions often fall outside current or mainstream thinking. But these same beliefs are grounded by 50 years of study and rigorous cross training in multiple fields of study. See my previous screeds on this Substack to learn more, and judge for yourself.
Please subscribe today if you have not already yet done so. This Substack is free! So, please subscribe for whatever reason might appeal to you. But I’d hope you do so for the value it offers.
As any of my regular readers know, I think generative pre-trained transformer models (GPTs) are, well, ridiculous. Each and every one of my previous posts here in
offers at least one good reason for my continuing disappointments, so I’m not going to reiterate them. This footnote will direct you further if you wish.2Instead, I’m going to bring up some new limitations! These are the problems of Pattern Recognition and Salience Detection. I will unequivocally claim that neither of these features has, nor can have, a complete (or even meaningful) computational basis. These two fundamental abilities are each foundational to the agency-driven and adaptive behavior we recognize as “intelligence.”
Often taken for granted, both pattern recognition and salience detection enable even primitive species such as the goldfish to achieve seemingly impossible feats of prowess while navigating the world. As I will discuss below (and in the next post), the ability to classify various aspects of the environment(s) into categories, and to attribute conditional importance to those categories, appears impossibly complex. You see, the ability to instantly and seamlessly recognize things — especially as belonging to classes with varying importance (salience) — is a major part of intelligence. But to do it in real time, while the world changes, and while the body is moving and maintaining balance, is astonishing. But wait, there’s more! Of course, just as important is recognizing when things have reverted to how they were before they changed — sometimes change is temporary. To be able to revert to a former context without having to re-learn it is yet more awe-inspiring.
Most importantly, the adaptive fluency made possible by pattern recognition (and by salience detection) are routinely seen in every species of vertebrate animal, from goldfish to human beings. But it appears neither ability can be achieved or instanciated in computational models — which means that until this is overcome, there can be no generalized AI.
For this Post, we will focus on pattern recognition. Next time we will learn about and analyze the foundational nature of salience. Lastly, we will finish the discussion by showing how these two remarkably primitive but powerful features of brains work together to create a framework for creating and updating an intelligent system’s World Model.
Although it was not always so, today the term Pattern Recognition is familiar to nearly everyone. The eponymous novel by William Gibson, long a favorite among tech types, probably added to the — sorry — salience the phrase currently enjoys. And yet, the term “pattern” is so abstract as to be challenging to define. Let’s break the construct down so as to define it as it applies across multiple domains:
“Pattern,” as a concept, has been of interest since at least the ancient Greeks. The derivation in English is from Old French, in which it was derived from the word “patron” to mean “something serving as a model” (in fact, artists’ subjects continue to be so-named in French). But beginning with the rise of information theory in the 1940s, “pattern” has been used somewhat indiscriminately, and the field itself recognized the need for a definition that was not another tautology. The definition I find most appropriate is that offered by the theoretical physicist Satoshi Wannabe in 1985: “A Pattern is the opposite of chaos; it is an entity, vaguely defined, that could be given a name.” In other words, a pattern can be any entity of interest that one needs to recognize and identify. The central characteristics of a pattern are that it is stable (persistent or recurring, and therefore non-chaotic), and that it could be important enough that one would like to name it, to give it relevance.
“Recognition” is the classification of patterns based on their match, versus mismatch, with shared properties that (at least vaguely) define them. When used informally, recognition is largely binary: “Do you recognize that person, Yes or No?”. From the point of view of a recognition system (including computational systems and all vertebrates), an entity is either familiar, or it is novel. “Familiar” means having previously been detected, and is therefore “known” to the system. If it is novel, it is classified as never having been previously encountered, or unknown.
Binary pattern recognition, however, is insufficient for useful classifiers. What we generally want from data classification schemes (algorithms, models, or in biology) is not binary. Nor is it a SoftMax, in which outputs are a limited set of independent bins (like the handwritten digits 0 to 9). Rather, adaptive pattern matching is the ability to identify an individual item as being an exemplar of, or belonging to, some “container class” of other like items among an infinite number of possible container classes.
Let’s put this all together. Say you have a pet cat named ‘Katrina’. You know that Katrina belongs to the container class ‘house cats’. You recognize Katrina as being both a house cat, and as your unique cat whenever — and wherever — you might see her. Moreover, the very first time you saw Katrina as a kitten, you knew without thinking that she was a cat, even though you had never seen her before. Finally, you can identify (“recognize”) Katrina when she is close or far, when in snow or on sitting on the multicolored pool tiles. You know it’s her when she is upside down, running at full speed, or half covered by the bedsheet. And as Katrina grows into a lovely and warm adult, you continue to know it’s her — by sight, sound, or any other sensory modality.
We are now ready to address the problem of pattern recognition in computational systems. Think about your ability to recognize Katrina in so many different contexts, distances, body positions, rotations, sizes, ages, and dozens of other variants in presentation and behavior. If a pattern is supposed to have an identity which distinguishes it from other patterns, then how is it recognized by the classifier?
This is a foundational question for the subject matter of intelligence. The mechanisms that allow for so-called “recognition invariance” continue to confound neuroscientists and engineers alike, and the truth is these remain elusive. The issue, as well as many others related to the origins and nature of cognition and emotions, is elegantly addressed in the book A Brief History of Intelligence, by the serial AI company founder Max Bennett. I strongly recommend it to anyone interested in the evolution of adaptive behavior and / or A.I. system architectures and algorithms.
Yet the fact remains that to this day, we do not know how a goldfish (or any other vertebrate) can instantly recognize wide variants and transformations in visual patterns years after originally viewing them (including those never seen in nature and completely contrived). This functionality is also necessary in any nontrivial recognition system, including AI systems. Unfortunately, computational automata such as GPTs have no inherent pattern recognition abilities beyond those afforded by convolutional neural networks. And CNNs cannot deal with pattern invariance.
In “AI World”, there are actually no classes, no patterns. There are only exemplars, defined in terms of elements making up the modality. So a picture of Katrina in the input stream would be converted to some matrix of pixel values. That matrix is about all the system can learn about Katrina. If Katrina needs to be identifies or recognized in a different position, then another picture is needed.
Therefore, individual items in the input streams of AI systems, including GPTs and convolutional neural networks, cannot be identified unless every possible instance of the pattern was included in the training set. So for a GPT to recognize Katrina in all the instances described above, we would need 1,000,000 or more pictures of cats like her in the training set. And the same for every other class of, well, everything.
Does anyone need to wonder whether this approach could lead to any form of generalized intelligence? No, they do not, for it cannot.
As I write this, I am watching the Wimbledon Gentleman’s Semi-final matches on my other monitor. I am reminded that when I was 10 years old, tennis balls were white (mid 1960s). Show me a computer vision system based on neural networks that can recognize a white tennis ball. You can’t.
That’s it for this post. As discussed above, next time I plan on addressing the second part of pattern recognition dilemma: pattern importance, or salience. If, after reading this post, you can see how pattern invariance poses a very big problem for AI architectures such as GPTs, just wait. I hope to demonstrate that the salience problem, in my (sometimes not so) humble opinion, leads inexorably to giving up on all machine learning approaches to AGI.
And just in case you are new to my Substack, I’ve been saying the same thing for coming up on over a year.
I hold a postdoctoral certification in clinical neuropsychology and a license to practice in California and Florida. I’ve been coding since mainframes were the only accessible computers and LISP was the ‘lingua franca’ of AI (ca. 1970-81), but when the Zilog microprocessors appeared (anyone remember the Z-80?), I learned to code in machine language (‘assembler code’). Finally, I hold Masters degrees — one quite recent — in computation and data science.
For convenience sake, I will describe some of the most salient problems with GPTs. These problems begin with the architecture of neural networks and transformers that gives rise to the Stochastic Parrot problem, wherein it has been proved that LLMs do not generate language by understanding meaning, or even the full syntax, of human language; this cannot be fixed in either practice or principle. And this is scraping the surface the issues: GPTs of every modality routinely generate egregious output errors of facts, logic, math, and inference. Abilities we take for granted — most of which we are unaware of — such as associative and temporal learning, pattern recognition, higher level learning, executive functioning, and creativity appear to require both DNA and a cellular substrate. The GenAI business model is already showing the hallmarks of “irrational exuberance”, and every reputable financial or investing news source is stating the same, now, during the summer of 2024. I believe it safe to say the profit model is irretrievably broken and also likely unfixable by this point. Of course, GPTs require absurd amounts of power not just to train but to query (“prompt”). There are dozens of other reasons to scrap GPTs and GenAI. The list in this footnote merely scratches the surface. See my many other posts on the subject for a comprehensive treatment (including references from scholarly and reputable sources).