Monkey See, Monkey Do

Posted on February 11, 2023
Tags: AI, ChatGPT, machine learning

Tim Lavoie
Zach Galifianakis Math meme

Limitations in Technology, and Where We See Magic

tl;dr: Cool and thought-provoking, but don’t look behind the mirror. Or maybe do.

A family member asked me about ChatGPT today, and I have to say, it was an interesting discussion. I think it highlights how many people see this miraculous-looking technology, and where we might come across some misunderstandings of what it is actually doing.

He had been trying some stuff out with it, and wondered if it could be used to effectively generate a multiplayer game. That is, could one ask ChatGPT a series of questions, and have it produce usable source code that would be compiled? The conversation wandered off into a bit of the territory around what language you might use, which compiler, and so on. At the core of it though, I think it makes sense to get back to the initial question.

That these well-publicized systems have shown interesting capabilities cannot be denied. For example, there are those generating images from the text prompts provided by people, which seem to “get” the style of painting they are asked for.

ChatGPT is more text-based, so you might ask it how to do something, or tell a story in a given author’s style. Yes, it will even produce program source code snippets on request.

To evaluate the abilities of these systems, it is important to have some idea of what they do, and what is left at the end.

For a very simple example, picture taking the collected works of Shakespeare, and then:

  • Read each sentence, and break it up into a sequence of words.
  • For each sequence, note the combinations of words that appear, say as triplets. For any consecutive pair of words, you can determine some sort of probability for which word may follow.
  • Take an initial starting word or perhaps pair of words, and then just generate the next word on the fly. Repeat as desired.

Some of what comes out will look reasonable-ish, if you squint, but it will be fairly clear that the generated results do not quite make sense. The more that comes out, the more you will notice repeat sets of information. This probablistic data set will not even include the full sentences that went into it, never mind full plays, their plot lines and distinct characters. In other words, information has been lost, and what we have left is a very distinct subset of what went into it. Still though, it can be a fun parlour trick. (I should cobble up some example code… haven’t yet, but it should be simple enough.)

More modern language models will perform more sophisticated processing of the inputs, such as assigning the words according to what type of word they are, and recognizing certain sentence structures. Already, you could see how this might let you perform more interested tasks than, “generate a random sentence.” One example is called “sentiment analysis”, where the idea is that you can analyze the text. This tries to get an idea of whether the speaker (if taken from speech) is angry, happy, hostile, and so on.

With the explosion of content on the internet, there is now so much more input for training the models. This means too, that even the unsophisticated processing can look like more than it is, because repeats in the text will also be less obvious. More complex models will have more functionality, but as before, their model is still a tiny fraction of the input which was used to create it.

With the rise of sites such as GitHub and Stack Overflow, there is now a wealth of content around programming, including a huge variety of source code. GitHub themselves have a feature called GitHub Copilot, or as they call it, “Your AI pair programmer”. This is a feature where you can start sketching out your program, and it starts to fill in the blanks for you. Pretty neat, hey?

You might ask, where does this come from? Well, dear reader, it comes from other users of the GitHub source code repositories. All sorts of people put code up there, either to participate in open source projects, or to show something they’ve done. Copilot is essentially parsing this wealth of source data, to fill in blanks as recognized from the patterns in other people’s code.

Does Copilot use all of the GitHub content on the fly? No, of course not, that is a massive amount of data. Again, it will be some sort of condensed form, recognizing patterns in what you are trying to do, and filling in from similar patterns in its model. Is it useful? Maybe! Should you use it as-is? Assuming the model works well, perhaps a better question is, how good is the input data?

People write a lot of code. Some is brilliant, and some is clearly terrible. Maybe more clearly good or bad according an expert in the particular language, and its style preferences, best practices, and problematic uses to avoid. Does the Copilot model have all that? Probably not, as that too would require extensive, explicit training. From the sounds of things, it would appear that one should not use the provided suggestions as-is, without a decent understanding of what the generated code does. As people have said of computers since forever, “Garbage In, Garbage Out.”

Models such as ChatGPT have huge source inputs as training data, and as mentioned, more sophisticated processing of those inputs than I personally understand. In the end, I think it is still fair to categorize them as echoing pieces of their training data, based on pattern recognition in the prompts provided by eager people. I like to think of this as, “monkey see, monkey do.” It does not mean that the monkey understands the concepts, but it can do well when the prompt matches something it has seen in training.

Another aspect to consider is that we primates are also very good at seeing patterns. Perhaps too good, when these patterns are seen when we are expecting to find them. That is, our belief in the responses of models such as ChatGPT likely comes from its match with pattern-recognition aspects of our brains, and we are eager to accept that which matches what we asked for. This is problematic in that we may attribute more capability than is objectively there. Perhaps this should be called, “monkey sees monkey doing, because it expects to.”

A fun, recent example of ChatGPT came up a couple days ago, on Reddit. Someone decided to pair up Stockfish, a well-known chess engine, with ChatGPT, in a chess game. The animated image of the game looks like ChatGPT is doing pretty well, until you notice that it doesn’t follow the rules.

Now, imagine this engine creating your source code for you, and hope it really does what you desire. To be fair, it isn’t designed to play chess, it’s likely just had something in its training data that allows it to fake it, and with confidence. Scammers have done well with little more.

I think the takeaway from all this is that tools such as may take well-defined prompts, and return something that we can bash into proper meaning. Perhaps these responses should be treated as prompts to our own thinking; suggestions, examples to explore, but not something to take verbatim. After all, the full training data is no longer there in the model, either.

A useful approach may be just to learn more about how this sort of thing works, so that we are better able to judge how well it does, and why.

One book that I very much enjoyed was, Blondie24: Playing at the Edge of AI. In it, David Fogel describes the process of creating a model which trained on checkers game boards, to recognize the relative scores of moves according to how the game turned out in the end. Trained against many, many games, his model’s board evaluator was able to play quite well indeed. It is a most readable book, and should not require programming expertise to appreciate.

One lighter-weight way to understanding at a high level what goes into machine learning models is to read and experiment with bite-sized pieces of functionality, with existing libraries. The blog at PyImageSearch is quite approachable, using Python, a very accessible programming language. The examples in the posts show how to use Python to drive a framework called OpenCV, where CV stands for computer vision.

Also, Mark Watson has a series of books on LeanPub, where you can either buy a copy to keep, or read for free on-line. Much of his work has centred on natural language processing, using languages such as Common Lisp, Haskell and Clojure. You can find his books listed here