Blog · AI, Thoughts, Notes

How AI Actually Works, Term by Term

Thirteen words decode how AI really works, from token to scaling laws. No math degree required.

An exploded-view illustration of a glowing machine, its panels pulled apart to reveal labeled internal parts. Floating tags name pieces like token, embedding, attention, and inference, with thin connection lines tracing how text flows from one part to the next.
Pull the cover off and the black box turns into named parts you can actually reason about.

Start here

AI isn't a black box. It's a machine with named parts.

You've heard "artificial intelligence" in headlines, in meetings, and at dinner. It gets talked about like a single thing that just happens, which is exactly the wrong way to picture it. AI isn't magic. It's a stack of engineering choices, math, and data pipelines, and you can follow every one of them if you look at the gears underneath.

Knowing the vocabulary does more than make you sound sharp in a meeting. It hands you a working map of how these systems behave, where they shine, and where they fall apart. By the end of this you'll see how raw text becomes a response, how a model learns, and why the same system can be both powerful and frustratingly limited. Here are thirteen terms, grouped from the building blocks up to the frontier.

What goes in

The building blocks.

Before a model can think about anything, it has to turn your words into something it can work with. These four cover that first stretch, from raw text to a job it can act on.

Token
The smallest unit of text a model reads, processes, or generates. Think of a token as a Lego brick that comes in different sizes, where some are whole words and others are just the start or end of a longer one. The model can't read continuous letters, so it breaks your sentences into these chunks first. It matters because every token you send and every token it writes counts toward your cost, your speed, and how much the system can hold in memory at once.
Embedding
A numerical map that places words with similar meanings close together in a high-dimensional space. Picture a giant city map where neighborhoods stand for concepts, and the distance between two points tells you how related the ideas are. That's how a model figures out that bank and river belong in one neighborhood while bank and money belong in another. Embeddings give the model a mathematical grip on nuance and context without ever reading a dictionary.
Context Window
The maximum number of tokens a model can handle in a single turn before it loses track of the beginning. Imagine a whiteboard where a teacher can only fit so many lines before she has to erase the top to make room. The window sets the boundary for what the model can see and reference while you talk to it. Push past it and the system quietly drops earlier details, which is where confused or contradictory answers come from.
Prompt
The exact text you type to tell the model what to do with everything it knows. Think of it as a recipe card that tells a chef which ingredients to use, how to mix them, and what the finished dish should look like. The model reads it, matches it against its internal maps, and starts generating a response that fits. The clearer you frame the task, the context, and the constraints, the better the answer you get back.

Inside the model

How models learn.

Once the input is structured, the real work happens inside. These terms explain how a model is built, how it focuses, and how it picks up everything it knows.

Parameter
A trainable number that decides how strongly one piece of information pulls on another inside the network. Picture a massive mixing board with millions of sliders, each one controlling how much a given word or pattern should affect the next step. The model nudges these sliders during learning until the balance is right. Parameters are where the model's knowledge actually lives, and the count of them sets the ceiling on how complex a problem it can handle.
Attention
The mechanism that weighs how relevant each token is compared to all the others in the sequence. Imagine a librarian reading a stack of documents with a highlighter, marking the passages that answer your question and skipping the rest. Attention is what lets a model link a pronoun back to the right noun or catch a contradiction buried in a long paragraph. Without it, every word would look equally important and the structure would fall apart.
Training
The first phase, where the model adjusts its parameters by studying patterns across billions of documents. Think of a student cramming for one enormous exam, reading thousands of books and practice tests until grammar, facts, and reasoning sink in. The model keeps correcting its sliders based on how wrong each guess was until the predictions get sharp. Training builds the foundation for everything else, and the quality of the data decides what the model knows and how it thinks.
Fine-tuning
Training a model that's already learned the basics on a smaller, targeted dataset to specialize it. Picture a general practitioner doing a focused residency to become a surgeon, keeping the broad medical knowledge but drilling specific procedures. Fine-tuning shifts the parameters so the model follows instructions better, adopts a tone, or masters a niche without throwing away its original education. It's how you turn a broad knowledge base into a tool that fits your exact job.

AI isn't magic. It's a machine, and once you can name the parts, you stop guessing why it fails and start knowing how to fix it.

At runtime

How they run in the real world.

A trained model still has to meet the real world, where memory, cost, and stale knowledge all push back. These three are the tricks that make a model practical to actually run.

Inference
The phase where the trained model takes your prompt and generates a response in real time. Think of the doctor's actual consultation after years of school, applying what they know to your symptoms right now. Inference is where the ongoing cost lives, because it runs every single time anyone uses the system. The speed, accuracy, and price of your experience all come down to how well this step is handled.
Quantization
A technique that lowers the numerical precision of a model's weights to save memory and run faster. Imagine compressing a high-resolution photo into a smaller file so it loads quickly while still looking clear enough to recognize. Quantization shrinks the parameters and the attention math, so a model can run on cheaper hardware or serve more people at once. It's what puts powerful models on a laptop instead of a data center, though push it too far and the accuracy starts to blur.
RAG
Short for retrieval-augmented generation, a method that pulls outside information into the context window before the model answers. Picture a researcher checking a reference book or a database before writing, so they can cite real sources instead of guessing. RAG injects fresh, accurate data into the prompt without retraining the model on it. It keeps answers grounded, cuts down on hallucinations, and lets you wire the model into your own private knowledge.

What's next

Where it's all heading.

The last two terms are about where the field is going. The systems aren't just answering questions anymore. They're planning, acting, and growing against hard limits.

Agent
A system that uses a model to plan multi-step tasks, make decisions, and call outside tools to get work done. Think of a project manager who breaks a big goal into smaller jobs, hands them out, checks the results, and adjusts when something breaks. Agents go past single replies by chaining reasoning, tool use, and memory together. That's what turns a chatbot into something that can actually finish a task across several apps.
Scaling Laws
The predictable relationship between model size, training data, and compute, showing how performance climbs as you spend more. Picture a graph of a road trip where extra miles bring diminishing returns past a point, but still get you farther. Scaling laws tell engineers how much compute buys a given jump in capability, and when it's smarter to improve the data than to just make the model bigger. They steer where the whole industry puts its money, and they explain why some models cost millions to train while others cost pennies to run.

Understanding AI's terms and the tools you use gets you better output. If you can talk the talk, your AI will walk the walk.

— Mandelson Fleurival

Putting it together

Seeing the machine clearly.

There's the full wiring diagram. Tokens break your words into bricks. Embeddings turn the bricks into coordinates. The context window sets the size of the whiteboard. Your prompt hands the model a job. Parameters are the internal dials, attention highlights what matters, training builds the foundation, and fine-tuning sharpens the tool. Inference runs the actual work, quantization shrinks the footprint, RAG feeds in fresh facts, agents orchestrate the plan, and scaling laws set the price of growth.

None of these pieces work alone. They form one pipeline where data flows in, gets reshaped by mathematical weights, and comes out as a response. Once you understand how each gear turns, you stop guessing why a model failed and start knowing how to fix it. You see the machine clearly, and that clarity is the thing that lets you use it responsibly.

Share this post

Share on LinkedIn Share on X