When GenAI creates new text (or images, audio, or software code), it may seem like you're talking to a smart conversation partner. In reality, it is a statistical model that predicts which word (or image, sound, or piece of code) is most likely to follow the previous one. Often this works well, sometimes it doesn’t — more on that in the following chapters.
GenAI’s prediction process happens in four steps:
Pre-training: first, the model is fed with a vast amount of information. Through billions of sentences (or other data), it learns to recognize patterns. Over time, it begins to understand grammar, style, and structure.
Tokenization and vectorization: the model breaks the text into components (tokens), which are converted into numbers (vectors). This allows the model to compute with language.
Prediction via neural networks: the model uses a so-called transformer architecture to determine which words are important. It continuously predicts the most likely next token.
Fine-tuning with human feedback: after pre-training, the model is refined with the help of human trainers. They provide feedback on what is desirable, polite, or correct.
Would you like these AI concepts explained in a different way? Or is something still unclear? Then watch the video below (up to 4:00 min):