The Magic Behind ChatGPT
ChatGPT has opened the world to the possibilities of generative AI, but how does the technology work? Ikhlaq Sidhu explains how large language models work and details other case uses for this transformative technology.
© IE Insights.
Transcription
What is ChatGPT? Well, by now, a lot of people in the world have already experienced or used it, at least at a website level. The keyword in ChatGPT is actually the transformer part. It’s something that was developed at Google quite a few years ago, like five or six years ago. And what OpenAI has effectively done is they’ve taken the transformer – this key technology – and they’ve trained it with all the text that comes from CNN, the text that’s in Wikipedia, the text that’s in different reputable places on the Internet.
You can give it a few words, it will take those three, four words and it will predict the fifth word. And it’ll write that on the screen for you. But it’ll also take that word and it’ll make that an input and it’ll come up with the sixth word. With the next word, the seventh one, it’ll be based on all the previous six words.
So it’s basically feeding one at a time, the words back in. And that is changing a little bit of the math and effectively the memory that this transformer has. So words that popped into the beginning of that conversation are still there in the math and in the memory of the transformer, while it’s predicting the next and the next word.
Part of the magic is in how it’s trained. It has to be trained in a way that it’s being told, if it’s a press release, then what should that structure look like? And so it’ll be shown many press releases. It’s been told that this is a good version of an output and this is a not so good version of the output, and it learns that. We haven’t yet seen the full potential of what the transformer can do. And the transformer doesn’t have to be trained only to give you back a presentation of language. It could be trained also to translate between languages, for example, or you could put in a sequence of chemicals and it would be able to be trained to guess what other chemical might go into a similar process like that.
Or it could be trained to predict what gene sequence is likely to occur next to another gene sequence. ChatGPT and transformers and a lot of things that are in this family of large language models, they’re also built on another important technology and in fact it’s a pretty amazing technology. Google developed this technology, you use it all the time and you don’t realize it because it’s built into the Google search engine and it’s built into all kinds of things that you’re doing.
And that’s really an algorithm that’s called word2vec, or it’s short for word to vector. And why this is important is because these algorithms like ChatGPT, they don’t actually take a word as in the spelled out version of the word. They take numbers as input and they produce numbers as outputs. But the problem in that process is that you basically lose the meaning when you turn the word into a number.
But what word2vec figured out how to do is that it turned the word not into one number but into a sequence of numbers. And that sequence of numbers actually retains meaning in the sense that words that are similar in meaning to each other have vectors that are relatively similar or close to each other. If you have boy versus girl, how close or different are those two vectors?
It turns out that they have the same relationship as king versus queen or man versus woman. You type a search phrase into Google, and the reason why those page hits are so accurate is because those numbers do such a great job of capturing the meaning and they make those matches just perfect. Just about anyone who uses ChatGPT will be amazed by the quality of what comes out, and you can easily be fooled into thinking that this thing knows everything and it can do everything. But that’s not fully true.
In fact, it’s far from true. It doesn’t actually assign logic to the things that it is writing, and it doesn’t necessarily assign or do fact checking against what it’s doing. The logic capabilities are not actually built into it. It’s much more about the pattern recognition and the presentation layer of it. One way or the other, it’s working itself into the competitive structure, the competitive strategy, of every company.