Discussion about this post

User's avatar
Jacob Everist's avatar

> So transformer models are efficient because... But not because they are “attending” to anything. They are not, because this would be impossible.

Why is "attending" impossible? You simply state this, but don't explain why. What do you mean by "attend"?

Also, as far as I know, the transformer models use the word "attention" by learning a contextual filter that selects only for relevant features in the sequence and uses those to learn the context of the current token. In that sense, they use the word "attention" since it learns a computer definition of saliency for the current ML task.

> Biological creatures, of course, do attend selectively, by making use of focus: ignoring inputs with lower salience in order to devote additional resources to the most relevant aspects of the input stream.

I'm with you so far.

> The terms ‘salience’, ‘relevant’, and even ‘resources’ imply intentionality and agency.

Now you lost me. Is there a psychological definition of these terms that lead to these implications? Perhaps it would be helpful if you defined all five of these terms in the way you are using them. There is significant clash with their use in the computer science literature.

> None of these have been shown to be derivable via computation.

I know of many such computational derivations, but I'm using my AI background to understand your words. Again, it would be helpful if you defined your terms in the way you are using them.

I do a lot of cross-disciplinary reading and it's always very frustrating when words have overloaded meanings in different disciplines. Especially if their meanings are similar but with subtly different connotations only understandable with deep study of the field.

Expand full comment
2 more comments...

No posts