Daily Bites of AI Knowledge

BrainStack

Every day a little sharper. Clean signal, no fluff.
Short enough to read between meetings.

63 bites across 6 stacks

00 Non-Technical Digest

The gap between technical and non-technical people in AI is widening. This comprehensive glossary and visualization guide bridges that gap, helping anyone understand the building blocks of AI.

#001 Non-Technical Digest

Artificial Intelligence (AI) — What is AI

Computers doing tasks that seem smart—like understanding language, recognizing patterns, or making decisions.

#002 Non-Technical Digest

Machine Learning (ML) — What is AI

A way for computers to learn from examples instead of being told every step.

#003 Non-Technical Digest

Deep Learning (DL) — What is AI

A kind of ML that uses many-layered “neural networks” to learn complex patterns, like in images or text.

#004 Non-Technical Digest

Generative AI — What is AI

AI designed to create new content—text, images, code, or audio—rather than just analyzing or classifying what already exists.

#005 Non-Technical Digest

Large Language Model (LLM) — What is AI

An AI trained on lots of text to predict the next word and generate useful responses.

#006 Non-Technical Digest

Training Data — How Models Are Built

The massive collection of text, images, and other material a model learned from before it was ever made available to users.

#007 Non-Technical Digest

Parameters — How Models Are Built

The billions of internal values adjusted during training that together determine how a model interprets a prompt and constructs a response.

#008 Non-Technical Digest

Foundation Model — How Models Are Built

A large, general-purpose AI trained at enormous scale that other products and specialized tools are built on top of.

#009 Non-Technical Digest

Frontier Model — How Models Are Built

The most capable AI models available at a given moment—the ones currently pushing the boundary of what's possible.

#010 Non-Technical Digest

Fine-Tuning — How Models Are Built

Taking a general-purpose model and training it further on specialized data to improve its performance in a specific domain.

#011 Non-Technical Digest

Training vs Inference — How Models Work

Training is teaching a model from data; inference is the model using what it learned to answer new questions.

#012 Non-Technical Digest

Token — How Models Work

A small piece of text (word, part of a word, or punctuation) that models read and generate, with limits on how many fit at once.

#013 Non-Technical Digest

Context Window — How Models Work

The context window is the total amount of text a model can actively hold and process at once, and when a conversation or document exceeds that limit, earlier content falls outside the model's awareness entirely.

#014 Non-Technical Digest

Prompt — How Models Work

The instruction or question given to an AI model that tells it what to do.

#015 Non-Technical Digest

Inference — How Models Work

Running a trained model to produce a response—what happens every time you use an AI, requiring significant computing power for every single output.

#016 Non-Technical Digest

Data Center — How Models Work

The physical facilities housing thousands of specialized processors that make running AI possible at scale.

#017 Non-Technical Digest

API (Application Programming Interface) — How Models Work

The technical interface that lets software products connect to and use an AI model without building one from scratch.

#018 Non-Technical Digest

RAG (Retrieval-Augmented Generation) — How Models Work

A technique that lets a model search and pull in current or specialized information before answering, reducing reliance on potentially outdated training data.

#019 Non-Technical Digest

Multimodal — How Models Work

A model that can process and generate more than just text—including images, audio, video, and code.

#020 Non-Technical Digest

Open Source — How Models Work

An AI model whose internal components have been made publicly available, allowing anyone to run, modify, or build on it.

#021 Non-Technical Digest

Agent / Agentic AI — How Models Work

An AI system that can take a sequence of actions autonomously to accomplish a goal, rather than just responding to a single question.

#022 Non-Technical Digest

Hallucination — How Models Fail

When an AI says something that sounds confident but is false or made up.

#023 Non-Technical Digest

Bias — How Models Fail

Systematic patterns in a model's outputs that consistently favor or disadvantage certain groups, usually inherited from imbalances in its training data.

#024 Non-Technical Digest

Jailbreaking — How Models Fail

Using carefully constructed prompts to trick a model into producing outputs it was designed to refuse.

#025 Non-Technical Digest

Benchmark — How Models Are Measured and Governed

A standardized test used to measure and compare model capabilities—useful as a starting point, but easy to game and often a poor predictor of real-world usefulness.

#026 Non-Technical Digest

Evaluation (Eval) — How Models Are Measured and Governed

Systematic testing of a model's outputs for quality, accuracy, or safety—one of the most important and most under-resourced parts of responsible AI development.

#027 Non-Technical Digest

RLHF (Reinforcement Learning from Human Feedback) — How Models Are Measured and Governed

A training technique where human ratings of model responses are used to further shape the model toward more preferred behavior.

#028 Non-Technical Digest

Guardrails — How Models Are Measured and Governed

Constraints built into or around a model to block certain types of harmful or off-limits outputs.

#029 Non-Technical Digest

Red-Teaming — How Models Are Measured and Governed

Deliberately trying to make a model produce harmful or policy-violating outputs before release, in order to find and fix vulnerabilities.

#030 Non-Technical Digest

Alignment — How Models Are Measured and Governed

The goal of ensuring an AI system's behavior reflects human values and intentions—including in situations its designers didn't explicitly anticipate.

#031 Non-Technical Digest

AI Safety — How Models Are Measured and Governed

A field of research and practice focused on ensuring AI systems behave reliably and cause more benefit than harm.

#032 Non-Technical Digest

AGI (Artificial General Intelligence) — How Models Are Measured and Governed

A hypothetical AI capable of performing any intellectual task a human can—no such system exists today, but it's the conceptual horizon shaping most AI research and policy.

01 What is AI

Before the products and the policy debates, there are the foundational concepts: what kind of system we're actually talking about, how it came to exist as a technical category, and what distinguishes the AI tools in wide use today from earlier forms of AI research. These five terms form the base layer.

#033 What is

Machine Learning

Machine learning is the broader field from which modern AI systems emerged. Rather than being explicitly programmed with rules, a machine learning system learns from data: it's exposed to examples, adjusts its internal values based on those examples, and improves its performance without being told exactly how to improve. A spam filter that gets better at recognizing junk mail as it sees more of it is a machine learning system. So is a recommendation engine that learns viewing patterns to surface relevant content. The defining characteristic is that the system's behavior is shaped by the data it encounters rather than by hand-coded instructions, which also means its behavior reflects whatever patterns, gaps, and assumptions live in that data. Nearly every AI product in wide use today is a machine learning system of some kind.

#034 What is

Deep Learning

Deep learning is a specific approach within machine learning that uses neural networks with many layers to recognize patterns in data at increasing levels of complexity. Where earlier machine learning methods often required humans to specify which features of the data mattered, deep learning systems discover their own internal representations through exposure to large quantities of examples. A deep learning model trained on images doesn't need to be told what an edge or a texture is; it learns to recognize those features on its own. This capacity for self-directed representation learning is what allowed the current generation of AI systems to process language, images, and audio with a sophistication that earlier approaches couldn't approach. All large language models are deep learning systems.

#035 What is

Neural Network

The computational architecture underlying most modern AI is the neural network: layers of interconnected mathematical operations that transform input data into predictions or outputs. The biological metaphor in the name is loose. These aren't digital brains; they're mathematical systems trained to map inputs to outputs through exposure to enormous quantities of example data. Each layer learns to recognize patterns at a different level of abstraction: early layers in an image-processing network might detect edges, middle layers might detect shapes, and later layers might recognize faces or objects. Understanding that neural networks learn through pattern recognition rather than logical deduction helps explain both their impressive capabilities and their specific failure modes (including why they can produce fluent, confident outputs that are factually wrong).

#036 What is

Generative AI

Generative AI refers to AI systems designed to produce new content rather than simply classify or analyze existing content. Given a text prompt, a generative AI system can write an essay, compose music, generate a photorealistic image, or produce working code. The "generative" label distinguishes these systems from earlier AI applications like spam filters or fraud detection tools, which categorize or predict rather than create. Large language models are one type of generative AI; image generation systems like Midjourney and DALL-E are another. The arrival of capable generative AI is what shifted the public conversation about these technologies from a research topic to a question with direct implications for creative work, intellectual property, professional labor, and information integrity.

#037 What is

Large Language Model (LLM)

A large language model is a type of generative AI system trained on enormous quantities of text to recognize patterns in language and generate responses. When we interact with one, it doesn't retrieve a stored answer. It produces a response word by word, based on statistical relationships it absorbed during training. The "large" refers to the scale of both the training data and the model's internal architecture, not to any quality of judgment or understanding. GPT-5, Claude, Gemini, and Meta's Llama 4 are all large language models. The applications built on top of them — such as customer service chatbots, AI writing assistants, and code generators — are distinct products, often built by companies that had no hand in training the model itself. For example, Anthropic built Claude; a separate company might build a customer service tool that runs on Claude. A safety claim made by the tool's developer may not reflect what Anthropic built into the model, and vice versa.

02 How Models Are Built

Knowing what kind of system an LLM is doesn’t yet explain where it comes from, what it was trained on, or how a general-purpose model becomes a specialized product. The decisions made before a model is ever released to the public shape everything it can and can’t do afterward, including which populations it serves well, where it will fail, and what a product built on top of it is actually capable of.

#038 How Models

Training Data

Training data is the collection of text, images, code, audio, or other material a model learned from before it was ever made available to the public. Before a model responds to a single user, it has processed billions of documents, websites, books, and conversations, and that material is the source of everything it “knows.” The composition of training data shapes the model’s capabilities, its gaps, and which populations it serves accurately versus poorly. A model trained primarily on English-language internet sources will perform better in English than in Tagalog or Welsh, regardless of its overall sophistication. A model trained on a decade of news coverage from a particular region will reflect the assumptions and blind spots embedded in that coverage. Every dataset reflects the choices, constraints, and existing inequalities of whoever assembled it, and those reflections don’t disappear when the model is released.

#039 How Models

Parameters

Parameters are the billions of weighted connections inside a neural network that were adjusted during training to capture patterns in data. They function as the model’s learned internal calibrations: billions of tiny adjustments that together determine how the model interprets a prompt and constructs a response. When a company says a model has “70 billion parameters,” they’re describing its scale, which can correlate with capability but doesn’t guarantee accuracy, reliability, or usefulness for any specific task. A smaller, carefully trained model can outperform a much larger one on particular problems. Parameters are invisible to users but shape every output, and the emphasis on parameter count in AI marketing has made the number feel more meaningful than it tends to be in practice.

#040 How Models

Foundation Model

A foundation model is a large AI system trained on broad, diverse data that serves as a base for more specific applications. Training one requires enormous computational resources, often hundreds of millions of dollars and months of work on specialized hardware. Once trained, that base model can be adapted for particular tasks at a fraction of the original cost. GPT-5, Claude, and Gemini are foundation models. A legal research tool, a medical documentation assistant, and a customer-facing virtual assistant are almost certainly downstream applications built on one of them. When a product claims to be “powered by” a particular model, it’s describing this layered relationship, which also clarifies which decisions belong to the foundation model developer and which belong to the company that built the product on top. Those can be very different companies with very different safety records and data practices.

#041 How Models

Frontier Model

A frontier model is a foundation model at the leading edge of current capability, one that’s pushing the boundaries of what AI systems can do at a given moment. The term is used in research, policy, and journalism to refer specifically to the most advanced models available, as opposed to smaller, older, or less capable ones. GPT-5, Claude Opus 4.6, and Gemini 3.1 Pro are current examples. The distinction matters because frontier models receive the most scrutiny from AI safety researchers, attract the most regulatory attention, and represent the capabilities that policy frameworks are generally trying to address. “Frontier” is also a relative and increasingly context-dependent term: in 2026, the concept has fractured across capability, efficiency, cost, and regulatory dimensions, meaning a model can be at the frontier in one area while trailing in another. A model that led the field eighteen months ago may be significantly behind the current leading edge today, and the pace of that movement has accelerated.

#042 How Models

Fine-Tuning

Fine-tuning is the process of taking a foundation model and continuing to train it on a smaller, more targeted dataset to improve its performance in a specific domain. A general-purpose language model fine-tuned on thousands of clinical notes will handle medical documentation differently than the same base model without that additional training. A model fine-tuned on legal contracts will parse clause structure and liability language more reliably than one without that exposure. Fine-tuning allows developers to build specialized tools without the cost of training from scratch, and it’s how the same underlying architecture can power both a creative writing assistant and a regulatory compliance tool. The resulting model retains its general capabilities while gaining specific fluency in the fine-tuning domain, along with any biases or limitations embedded in that dataset.

03 How Models Work

From the moment a prompt is submitted to the moment a response arrives, a chain of technical processes and infrastructure decisions shapes what we receive. These terms cover the operational layer: how models process input, what limits what they can hold in memory, how they connect to external information, and what it physically takes to run them at scale.

#043 How Models

Tokens

Language models don’t process text the way we read it, word by word. They process tokens, which are chunks of text that might be a full word, a word fragment, a punctuation mark, or a space. Most AI services are priced by token count, all models have limits on how many tokens they can process in a single exchange, and some languages are tokenized more efficiently than others. Tokenization is based on frequency: common words in the training data become single tokens, while rarer or more complex words get split into fragments. Because most models were trained predominantly on English-language text, their tokenizers were optimized for English, meaning the same idea expressed in Korean or Arabic may require significantly more tokens to represent than it would in English. As a rough conversion, 100 tokens is approximately 75 English words, though this varies by language and context. Tokenization also affects how models handle rare words, technical vocabulary, and languages underrepresented in training data, since less common terms often get fragmented in ways that can degrade the model’s handling of them.

#044 How Models

Context Window

The context window is the total amount of text a model can actively hold and work with at once, including both what we’ve provided and what it’s generated in response. When a conversation or document exceeds that limit, earlier material falls outside the model’s working awareness entirely. This is why a model might summarize a 50-page report accurately but seem to forget what was discussed at the start of a two-hour conversation: the beginning of that exchange may have dropped out of its window by the time it’s responding near the end. Context window sizes vary significantly between models. Several frontier systems, including Gemini 3.1 Pro and Claude 4.6, now offer context windows of one million tokens or more, making it possible to process entire codebases or lengthy legal documents in a single pass. Expanding context windows remains one of the most active areas of current development because the limitation has direct consequences for any task requiring sustained attention across long documents or extended exchanges.

#045 How Models

Prompt

A prompt is the input given to a model to generate a response. That can be a question, an instruction, a document to analyze, a conversation history, or some layered combination of all of these. The phrasing, structure, and specificity of a prompt has a measurable effect on the quality of the output, sometimes dramatically so. Telling a model to “write a summary” produces a different result than telling it to “write a three-sentence summary for a non-specialist audience that emphasizes the financial implications.” Telling it to respond as a domain expert produces something different again. Prompt engineering, the practice of designing inputs strategically to improve outputs, has become a distinct professional skill with a growing body of research dedicated to understanding why certain phrasings work better than others and what that reveals about how models represent meaning.

#046 How Models

Inference

Inference is the process of running a trained model to produce an output. Training is what happens when a model learns from data; inference is what happens every time we use one. Generating a single response requires running billions of calculations across all of a model’s parameters, which demands significant computing infrastructure and consumes meaningful energy. When an AI product is slow to respond, or when a service reports that it’s “at capacity,” the bottleneck is almost always inference demand outpacing available computing resources. The economics of inference are one of the main reasons AI services are expensive to operate at scale and why pricing structures vary with usage volume.

#047 How Models

Data Center

Data centers are the physical infrastructure that makes AI inference possible at scale. Running a frontier model requires thousands of specialized processors, called GPUs or TPUs, housed in large facilities with industrial-scale power and cooling systems. The energy demands are substantial: a typical AI-focused hyperscaler consumes as much electricity annually as 100,000 households, and the largest facilities currently under construction are projected to consume 20 times that. US data centers as a whole consumed 183 terawatt-hours of electricity in 2024, roughly equivalent to the annual electricity demand of the entire nation of Pakistan. When AI companies announce new partnerships with energy providers, or when governments compete to attract data center investment, the stakes are the physical infrastructure that determines where AI capabilities are accessible, who controls that access, and what the environmental cost of that access is.

#048 How Models

API (Application Programming Interface)

An API is the technical interface through which software systems communicate with each other. When a company says its product is “powered by GPT-5” or “built on Claude,” it typically means the product sends requests to that model through an API and receives responses in return. APIs are what allow developers to build applications on top of foundation models without training their own, and they’re the mechanism through which most AI products actually reach end users. This layered structure also distributes accountability in ways that aren’t always transparent: data handling, content policies, and safety guardrails can differ between the foundation model API and the application built on top of it, and a company can make safety claims about their product while those claims depend entirely on what the underlying API provider controls.

#049 How Models

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation, or RAG, is a technique that combines a language model with a search or retrieval system, allowing the model to pull in relevant information before generating a response. Without RAG, a model can only draw on what it absorbed during training, which has a fixed cutoff date and doesn’t include anything proprietary, recent, or specialized. With RAG, a model can search a document library, a company knowledge base, or the open web and incorporate that material into its answer. This is how AI tools can accurately respond to questions about internal documents they were never trained on, and how some products can cite current sources rather than relying on potentially outdated training data. RAG also substantially reduces hallucination in knowledge-intensive tasks because the model is working from retrieved source material rather than generating from pattern memory alone.

#050 How Models

Multimodal

A multimodal model is one that can process and generate more than one type of data. Earlier language models worked exclusively with text. Multimodal models can accept images, audio, video, and code as inputs, and generate across some or all of those formats as output. A multimodal model might describe a photograph, read a bar chart, transcribe a voice memo and respond to its contents, or generate an image from a written description. Most frontier models released in the past two years have multimodal capabilities to some degree, and what counts as acceptable input is expanding faster than most public-facing documentation reflects. The expansion of input types also expands the surface area for novel failure modes, some of which are harder to detect than text-based ones because a model interpreting an image has no built-in mechanism for signaling when that interpretation is wrong.

#051 How Models

Open Source

In AI, “open source” typically means a model whose weights have been made publicly available for download and use. An open-source model can be run locally on personal or institutional hardware, fine-tuned without restriction, and modified in ways that a closed proprietary model can’t be. This matters for privacy, for research institutions that can’t afford commercial API costs, and for customization that a proprietary license would preclude. The term is contested, though. Some models are described as “open” when only their weights are released, without the training data or code that would allow independent replication. Others release the code but not the weights. Evaluating what “open source” actually means for a specific model requires looking at what was released and under what license, because the label currently covers a wide range of actual openness.

#052 How Models

Agent / Agentic AI

An agent, in AI, is a system designed to take sequences of actions to accomplish a goal rather than simply responding to a single prompt and waiting. An agentic system might receive an instruction like “research competitors in this market, compile the findings, and schedule a meeting to discuss it,” and then execute each step autonomously: searching the web, analyzing sources, drafting a document, accessing a calendar. These systems raise meaningfully different questions about oversight than standard chatbots. The model is making decisions and taking actions across multiple steps, often in environments where those actions have real-world consequences and can’t easily be undone. As agentic AI becomes standard in workplace software, the relevant questions shift from “what can this model say?” to “what can this model do, to what, and with whose authorization?”

04 How Models Fail

These failure modes aren’t edge cases or rare malfunctions. Hallucination and bias are structural properties of how these systems are built, which means they appear in proportion to how widely the systems are deployed. A hallucinated citation reads identically to an accurate one. Biased outputs arrive with the same confident tone as reliable ones. Knowing what a failure mode is called, why it happens, and under what conditions it’s most likely is what makes it possible to catch rather than pass along unchecked.

#053 How Models

Hallucination

Hallucination is what happens when a model generates information that is false but presented with the confidence of fact. A model might cite a legal case that doesn’t exist, attribute a quote to someone who never said it, or produce a statistic backed by no source. It’s an emergent property of how language models are built: the model doesn’t store facts; it stores patterns. When it doesn’t have the right pattern to draw on, it produces a plausible-looking continuation anyway, because producing plausible continuations is what it was trained to do. There’s no grammatical tell, no hedged tone, no indication that anything went wrong. Hallucination is most consequential in exactly the contexts where outputs look most authoritative: medical information, legal research, financial analysis, and academic work.

#054 How Models

Bias

Bias in AI refers to systematic patterns in a model’s outputs that consistently favor or disadvantage certain groups, topics, or perspectives. It typically originates in training data, where existing social inequalities, historical underrepresentation, and the demographics of whoever created and curated the data all leave their marks. A medical AI trained primarily on data from white male patients will perform less accurately when applied to women and people of color. A hiring tool trained on a decade of past hiring decisions will tend to reproduce whatever criteria drove those decisions, including discriminatory ones. Reducing bias in one dimension can sometimes introduce it in another, which is why bias testing is an ongoing practice rather than a one-time certification. When a company says its model has been tested for bias, the follow-up questions are: tested for which kinds, across which populations, using what methodology, and with what findings made public.

#055 How Models

Jailbreaking

Jailbreaking refers to using carefully constructed prompts to get a model to produce outputs it was designed to refuse. These approaches range from social engineering tactics, like framing a harmful request as a fictional scenario or assigning the model a persona with different rules, to more technical methods that exploit weaknesses in how safety training was implemented. A model that can be reliably bypassed with a two-sentence prompt has meaningfully weaker protections than one that resists more sophisticated attempts, and that difference matters when evaluating what a company’s safety claims are actually built on. Jailbreaking also surfaces regularly in coverage that doesn’t always distinguish between security research, general curiosity, and deliberate misuse, which are three very different things with different implications for how we interpret reports about it.

05 How Models Are Measured and Governed

A model’s benchmark score, its safety claims, and its real-world reliability are three different things. The vocabulary in this section appears most frequently in AI policy proposals, enterprise risk frameworks, safety audits, and public debates about how much oversight these systems require and who should be providing it. Knowing what these terms actually describe is what makes it possible to assess what a company’s claims are built on, and what questions they leave unanswered.

#056 How Models

Benchmark

A benchmark is a standardized test used to measure and compare model performance across specific capabilities. There are benchmarks for logical reasoning, mathematical problem-solving, coding, reading comprehension, and factual accuracy, among many others. Benchmark scores appear prominently in press releases and technical announcements and are regularly used to rank models against competitors. Because of this, there’s significant commercial pressure to optimize specifically for benchmark performance, a pattern researchers sometimes call “teaching to the test.” A model can score well on a benchmark while performing unreliably on the tasks a particular user actually needs. Researchers have documented cases where models that top leaderboards fail basic tasks that fall slightly outside benchmark parameters, which is why benchmark results are a starting point rather than a conclusion.

#057 How Models

Evaluation (Eval)

Evaluation, often called “eval” in technical contexts, is the process of systematically assessing a model’s outputs for quality, accuracy, safety, or some other specified dimension. Evals can be automated, running a model through thousands of test cases and scoring the results algorithmically, or human-reviewed, with people rating responses for helpfulness, potential harm, or factual grounding. They can target specific risks, such as whether a model will provide dangerous instructions when asked indirectly, or broader performance qualities, like whether responses are consistently accurate across different demographics and languages. Rigorous evaluation is one of the most consequential parts of responsible AI development and consistently one of the most under-resourced. The capabilities hardest to measure, subtle reasoning errors, cultural blind spots, compounding failures under real-world conditions, are frequently the ones that matter most in deployment.

#058 How Models

RLHF (Reinforcement Learning from Human Feedback)

Reinforcement Learning from Human Feedback is a training technique in which human raters evaluate model outputs, and those ratings are used to further train the model toward preferred behavior. Rather than learning from raw text alone, a model trained with RLHF learns from human judgments about which responses are more helpful, more accurate, or less harmful. This is one of the primary methods used to make conversational AI feel responsive and well-calibrated to user needs. It also introduces a significant dependency on who the raters are. Rating pools have historically been demographically narrow, often concentrated in a small number of countries, and the assumptions, cultural frames, and preferences of those annotators become embedded in the model’s behavior in ways that aren’t always visible in the final product. “Aligned with human preferences” always means aligned with some humans’ preferences, and the specific preferences of a narrow annotator pool can shape a model’s defaults in ways that affect billions of users.

#059 How Models

Guardrails

Guardrails are constraints built into or around a model to prevent it from producing certain types of outputs: dangerous instructions, private information, content that violates legal or ethical standards, or responses that fall outside a product’s intended use. They can be implemented at the foundation model level, at the API layer, or within a specific application, and a model might be broadly capable of generating harmful content while having application-level guardrails that prevent a particular product from eliciting it. The strength, scope, and consistency of guardrails vary substantially between developers and products. They can sometimes be circumvented through persistent or creative prompting, which is why a company saying it has guardrails is a starting point for evaluation rather than a conclusion.

#060 How Models

Red-Teaming

Red-teaming is the practice of deliberately attempting to make a model produce harmful, dangerous, or policy-violating outputs before release, in order to find and address vulnerabilities before real users encounter them. The term comes from military and cybersecurity practice, where “red teams” function as adversaries assigned to stress-test a system’s defenses. In AI, red-teaming involves prompting models with adversarial, manipulative, or edge-case inputs: asking questions in roundabout ways designed to bypass safety measures, testing how models respond under social pressure, or probing for failure modes in conditions of genuine ambiguity. When a developer describes its model as having been “extensively red-teamed,” the relevant follow-up questions are by whom, under what scope, across which risk categories, and with what findings made public.

#061 How Models

Alignment

Alignment is the goal of ensuring that an AI system’s behavior reflects human values and intentions, including in situations the designers didn’t explicitly anticipate. A well-aligned model does what we mean, not only what we literally instructed. Alignment research addresses questions like: how do we train a model to be honest without also training it to be evasive? How do we prevent a model from pursuing an assigned objective in ways that produce harmful side effects? How do we ensure that a model’s behavior at release reflects the values its developers intended? These are active areas of research with unresolved technical and philosophical dimensions. The word also does significant marketing work, so when a company describes a model as “aligned,” the more useful question is aligned toward what, measured how, and by whose definition of “human values.”

#062 How Models

AI Safety

AI safety is a field of research and practice focused on ensuring that AI systems behave reliably, predictably, and in ways that cause more benefit than harm. It encompasses near-term concerns, such as preventing harmful outputs, ensuring robustness against adversarial use, and building systems that behave consistently under real-world conditions, alongside longer-range research into the behavior of more capable systems that don’t yet exist in released form. The field includes technical researchers, policy specialists, ethicists, and social scientists working across universities, independent research institutes, and AI companies. Disagreements within AI safety are substantive: about what the most pressing risks are, what timelines are realistic, and which interventions are most effective. Following it as a field means tracking those disagreements alongside the research, rather than treating “AI safety” as a label that means the same thing across all the organizations using it.

#063 How Models

AGI (Artificial General Intelligence)

AGI refers to a hypothetical AI system capable of performing any intellectual task that a human can, at a comparable or greater level of competence. No such system exists today. Current AI systems, including the most capable frontier models, fail in ways that a human generalist wouldn’t, and their performance is tied closely to the domains and formats they were trained on. AGI is the conceptual horizon around which much of AI safety research, AI investment, and AI policy is oriented, even in the absence of a system that actually fits the definition. When a company announces that AGI is its explicit goal, or when a regulatory framework addresses the risks of “transformative AI,” AGI is what’s being gestured at. Disagreements about whether AGI is imminent, possible, or even coherently defined as a concept run through nearly every substantive debate in the field, which makes it a term that rewards scrutiny rather than assumption.