Have you ever asked an artificial intelligence something simple, only to receive a blatantly illogical answer? It's not a defect in the model. It's a defect in the prompt.
A trivial yet revealing experiment: asking an LLM whether it's better to go on foot or by car to reach a car wash 40 metres away. Without context, the answer is wrong. Rephrasing the same question with an appropriate cognitive frame, the model identifies the paradox, resolves it, and does so with a touch of sarcasm.
Through a rigorous architectural analysis — attention mechanism, probability distribution over tokens, shortcut reasoning — this article explains why a structured prompt activates radically different computational pathways. A competence that is critical today for anyone working seriously with language models.
Artificial Intelligence · Prompt Engineering · LLM Architecture — April 2026 — Level: Specialist
Abstract
The quality of the response generated by a Large Language ModelA large-scale language model trained on enormous textual corpora. It generates text token by token by maximising the conditional probability given the preceding context. is not determined solely by the model's parametric capacity, but to a substantial degree by the semanticRelating to meaning. In NLP, the semantic dimension concerns the conceptual content of tokens, as distinct from their syntactic form. density and cognitive structure of the input promptThe input text provided to the model. It serves as the initial context from which the model generates its response through probabilistic completion.. This article analyses, through a deliberately paradoxical case study, the architectural mechanisms — attentionThe central mechanism of the Transformer: it calculates how much each token in the sequence should "attend to" every other token, producing weighted contextual representations., probability distribution over tokensThe smallest unit of text processed by the model. It may correspond to a word, part of a word, or a symbol. Modern models use BPE (Byte Pair Encoding) tokenisation., context framingThe way in which the prompt's context steers the model's processing, activating specific semantic and cognitive "frames" that influence the output probability distribution. — that explain why a context-poor prompt produces syntactically correct but logically degenerate responses, while a prompt that activates meta-cognitive frames elicits articulated reasoning, paradox recognition, and even irony.
1 The Case Study: an Everyday Paradox as a Diagnostic Probe
Consider the following controlled experiment. The model is presented with a question in two semantically distinct but denotativelyDenotation is the literal, referential meaning of an expression. Two sentences are denotatively identical if they describe the same situation in the real world. identical variants.
Prompt A — minimal context:
“I need to wash my car. The car wash is 40 metres away. What do I do? Walk there or take the car?”
Typical model response to Prompt A:
“Walk there — it's only 40 metres, there's no point taking the car.”
Prompt B — meta-cognitive context activated:
“Think about the implications and co-implications of the following problem: I need to wash my car. The car wash is 40 metres away. What do I do? Walk there or take the car?”
Typical model response to Prompt B:
“The question contains a fundamental logical paradox: the object to be transported is the means of transport itself. Walking to the car wash 40 metres away is equivalent to abandoning the car. The only coherent solution is to drive there. Anyone who asks this question seriously has probably experienced the phenomenon of the Monday-morning brain in standby mode.”
The response to Prompt A is illogical: it suggests walking to a car wash, ignoring the fact that the car — the object of the service — must necessarily be there. The response to Prompt B, on the other hand, recognises the paradox, names it, resolves it, and even produces a sarcastic assessment of the situation. Why?
2 Transformer Architecture and the Role of Attention in Context Encoding
To understand the phenomenon, we must return to the architecture. Models in the GPT/Claude/Gemini family are TransformerA neural architecture introduced by Vaswani et al. (2017) based entirely on self-attention mechanisms. It has replaced RNNs as the dominant paradigm for large-scale language models. decoder-only or encoder-decoder systems. The central mechanism is Scaled Dot-Product AttentionThe mathematical formulation of attention: Attention(Q,K,V) = softmax(QKᵀ/√d_k)·V. The scaling factor 1/√d_k prevents softmax saturation in high-dimensional spaces.:
Attention(Q, K, V) = softmax( QKT / √dk ) · V
Each token in the input sequence projects three vectors: Query (Q)A vector representing "what the current token is looking for". The dot product with the Keys of other tokens determines the degree of mutual relevance., Key (K)A vector representing "what a token offers" to others. It is compared with Queries to compute attention weights via dot product., Value (V)A vector containing the actual information transferred. Once attention weights are computed via Q and K, the Values are aggregated proportionally to those weights. into latent spaces of dimension d_k. The product QKT produces an affinityA measure of similarity or relevance between two vectors in latent space. High affinity between two tokens means they mutually "attract" each other in the attention mechanism. matrix which, after softmaxA function that transforms a vector of real values into a probability distribution (positive values summing to 1). Used to normalise attention weights., determines how much each position attends to the others. The critical point: the intensity and distribution of attention weights depend on the semantic richness of the prompt.
2.1 Prompt A: attention degeneration in the semantic subgraph
In Prompt A the relevant tokens are: wash, car, 40 metres, walk, car. The model constructs a co-occurrence graphAn implicit structure learnt during pretraining: tokens that frequently appear together in the same context develop similar vector representations and high mutual affinity in the attention mechanism. based on the statistics of pretrainingThe initial training phase on massive unlabelled corpora. The model learns statistical language distributions, semantic encodings, and implicit reasoning patterns.. The token 40 metres has high affinity with concepts of short distance and pedestrian travel; the cluster walk vs. car activates the classic frame "mode of transport for short/long distance". The activated semantic subgraph is that of a personal mobility problem, not an object logistics problem.
The implicit referentThe real-world entity to which a linguistic expression refers. In "take the car?", the referent of "car" is ambiguous: is it the object of the service or the subject's means of transport? The model resolves this ambiguity silently and incorrectly. of "car" is assigned to the subject as a means of transport, not as the object of the service. This referent trackingThe ability to maintain track of which entity each expression refers to throughout a text. In LLMs this is implemented implicitly through the attention mechanism, with potential drift in ambiguous referential chains. error is not corrected because there is no signal in the prompt that activates meta-logical reasoning.
2.2 The problem of functional tokens and the absence of meta-cognitive frames
Language models do not "reason" by default in the computational sense of the term. They operate as probabilistic completion engines: given a sequence, they maximise the probability of the next token conditioned on the entire history:
P(tokent | token1, token2, …, tokent-1)
In the absence of tokens that activate reflective processing modes, the model selects the path of least resistance across the manifoldIn differential geometry and machine learning, a low-dimensional space embedded in a high-dimensional space. Linguistic data are distributed on a manifold: the model navigates this space by choosing high-probability paths. of learnt distributions. A shortcut reasoningA phenomenon whereby the model exploits spurious correlations or surface-level patterns instead of reasoning about the logical structure of the problem. Corresponds to Kahneman's "System 1 thinking" applied to LLMs. error, well documented in the alignment literature.
3 Prompt B: Activation of the Meta-Cognitive Frame
The instruction "think about the implications and co-implications" is not a simple lexical addition. It is a frame-shift operator that acts on multiple levels of processing.
3.1 Effect on attention: redistribution of weights towards second-order tokens
The presence of tokens such as implications and co-implications, which are highly correlated in training with argumentative and analytical texts, redistributes the attention weightsThe weights produced by the attention mechanism: values between 0 and 1 (normalised via softmax) that indicate how much each token influences the representation of another token. towards tokens that would otherwise have been marginalised.
| Token / semantic cluster | Attention weight — Prompt A | Attention weight — Prompt B |
|---|---|---|
| car as object of the service | Low — overwritten by mobility frame | High — activated by implicational analysis |
| Referential identity subject/object | Not resolved | Explicitly investigated |
| Intrinsic logical paradox | Not detected | Detected and named |
| Ironic/sarcastic tone | Absent | Emergent (argumentative frames) |
3.2 Implicit Chain-of-Thought and latent scaffolding
Techniques such as Chain-of-ThoughtA prompting technique (Wei et al., 2022) that induces the model to generate intermediate reasoning steps before the final answer. It significantly improves accuracy on multi-step reasoning tasks. (Wei et al., 2022) and Tree of ThoughtsAn extension of CoT (Yao et al., 2023): the model explores multiple reasoning paths in parallel, evaluating them and selecting the most promising one. (Yao et al., 2023) demonstrate that making the inferential process explicit significantly increases accuracy on reasoning tasks. The model, guided by context, generates intermediate tokens that serve as working memoryIn LLMs, "working memory" is simulated through the tokens already generated in context. CoT leverages this: each intermediate reasoning token becomes part of the context that influences subsequent tokens. made explicit within the sequence itself.
The phrase "think about the implications and co-implications" acts as a trigger for this scaffoldingA support structure provided by the prompt that guides the model in organising its reasoning process before producing the final answer.: the model is incentivised to generate a token sequence that articulates the problem structure before arriving at the terminal answer, forcing resolution of the referential ambiguityA condition in which an expression can refer to more than one distinct entity. In "take the car?", the model must determine whether "car" is the means of transport or the object to be taken to the car wash..
Technical note: in models with explicit reasoning (e.g. OpenAI o1, Claude with extended thinking) this process is even more visible: the internal chain of thoughtA sequence of reasoning tokens generated by the model before the final answer. In models with "extended thinking" this chain is literal: tokens actually produced internally, even if not always visible to the user. is literally a sequence of tokens generated before the response. A solid prompt increases the probability that this chain contains the correct logical steps.
3.3 Effective temperature and probability distribution
With Prompt A the probability peak on walk is sharp and dominant: the distribution is peakyA "peaky" distribution has its mass concentrated on a few dominant values, with low entropy. This corresponds to low variety and high determinism in the model's output., with low entropyIn information theory, a measure of the uncertainty of a distribution. High entropy = many equally probable options. Low entropy = a few tokens dominate. A context-rich prompt tends to increase local entropy, enabling more nuanced outputs.. With Prompt B, the activation of the analytical frame flattens the distribution and allows the selection of semantically richer tokens, enabling a behaviour analogous to the transition from greedy decodingA generation strategy that selects the highest-probability token at each step. It produces deterministic but typically banal and uncreative outputs. to sampling from a broader distribution: nuance emerges.
4 Taxonomy of Errors from Weak Prompts
The case described is representative of a family of systematic errors that manifest when the prompt does not provide sufficient semantic structure:
| Error class | Underlying mechanism | Typical example |
|---|---|---|
| Shortcut Reasoning | Activation of the statistically dominant pattern without coherence verification | "Walk there" for a problem that involves the car |
| Referent Drift | Loss of tracking of the correct referent in pronominal or ellipticalAn ellipsis is the omission of linguistic elements recoverable from context. "Take the car?" omits the explicit referent, creating ambiguity that the model must resolve implicitly. chains | Subject/object confusion in multi-entity scenarios |
| Frame Collapse | The context activates a dominant frameA cognitive schema that organises the interpretation of a situation. From Fillmore's Frame Semantics: every word evokes a frame that activates expectations about roles, relationships, and typical scenarios. that suppresses correct alternative frames | Reading a logistics question as a mobility question |
| Sycophantic Completion | The model completes towards the response it perceives the user expects | Confirming a false implicit premise in the prompt |
| Ambiguity Suppression | The ambiguity is not flagged but silently resolved incorrectly | Responding without seeking clarification on contradictory premises |
5 Operational Principles for Constructing Solid Prompts
5.1 Make the required cognitive frame explicit
Do not merely pose the question: define the type of processing expected. Phrases such as "analyse the logical implications", "identify any contradictions", "reason by cases" are frame operators that redistribute the model's attention towards deeper semanticRelating to meaning. The deeper semantic layers include logical relationships, causal implications, and argumentative structures, as opposed to the lexical surface. layers.
5.2 Saturate referential ambiguities in advance
Every pronoun, every ellipsis, every implicit referent is a drift vector. A robust prompt explicitly names the entities and relationships: instead of "how do I use it?" write "how do I use variable X in function Y?".
5.3 Provide domain context as a prior
The model performs an implicit Bayesian inferenceA probabilistic framework in which prior knowledge is updated in light of new evidence. LLMs implicitly apply this scheme: the prior is the distribution learnt during pretraining; the prompt context is the evidence.. Providing the domain — "in a microservices architecture context", "from an Italian tax law perspective" — acts as a prior that steers the probability distribution towards the correct semantic space.
5.4 Explicit reasoning chains (CoT Prompting)
Recommended pattern: "Reason step by step. First identify the problem's assumptions, then verify whether they are mutually consistent, and finally formulate the answer."
5.5 Quality meta-instructions
Add expected quality conditions: "if you detect ambiguities, flag them before responding", "if the problem contains logical contradictions, name them explicitly". These tokens act as semantic guardsInstructions in the prompt that function as control conditions: they induce the model to verify certain properties of the input before generating the response, reducing the risk of ambiguity suppression and shortcut reasoning. that prevent the silencing of ambiguities.
5.6 Structural separation of context from query
Recommended pattern:
[CONTEXT]: definition of domain and constraints
[OBJECTIVE]: what you wish to achieve
[CONSTRAINTS]: limitations and acceptability criteria
[QUESTION]: the specific query
This separation forces the model to build separate representationsVectors in the model's latent space that encode the meaning of a token or sequence. Separate representations for context and query reduce interference between the two levels in the Transformer's internal processing. for context and query, reducing interference between the two levels.
6 Implications for Production Systems
In enterprise contexts, the consequences of weak prompts are not academic. In RAGRetrieval-Augmented Generation: an architecture that combines a document retrieval system with a generative LLM. The model responds based on retrieved documents, reducing hallucinations on specific knowledge bases. (Retrieval-Augmented Generation) pipelines, a poorly constructed synthesis prompt can generate hallucinationsA phenomenon whereby the model generates plausible but factually incorrect output. It is not a "bug" but a direct consequence of the probabilistic completion mechanism: the model maximises sequence coherence, not truthfulness. on documents that were otherwise correctly retrieved. In autonomous agentsSystems in which an LLM is used as a central controller that plans and executes action sequences to achieve an objective. Prompt weakness is amplified at each step of the agent., a weak system prompt produces behavioural drift that amplifies at every step. In classification or triage systems, frame collapse leads to systematically incorrect categorisations.
Fundamental principle: the quality of an LLM's output is not a property of the model in isolation. It is an emergent property of the system (model + prompt). Evaluating a model on benchmarksStandardised test suites used to measure a model's capabilities (e.g. MMLU, HumanEval, GSM8K). Benchmarks use controlled prompts: results are not directly transferable to real-world contexts with non-optimised prompts. with standardised prompts and then deploying it with improvised prompts is a methodological error that produces unrealistic expectations and predictable operational failures.
Prompt testing must be treated with the same rigour as code testing: edge cases, adversarialPrompts designed to induce undesired model behaviour. Testing with adversarial prompts is a fundamental part of evaluating the robustness of an LLM-based system. prompts, paraphrase variations, regression tests on representative samples of the application domain.
7 Conclusions
The car wash paradox at 40 metres is, in its disarming simplicity, a microscope on the internal structure of language models. It reveals that the Transformer architecture — however powerful — operates in a manner intrinsically reactive to the context provided. There is no "default" correct processing: there is only the processing that the prompt makes probable.
A solid prompt is not an operational luxury nor an ancillary best practice. It is the necessary condition for the model's parametricRelating to the number of parameters (weights) in the model. A model with 70 billion parameters has greater potential capacity, but this capacity remains latent without a prompt that activates the correct computational pathways. capacity to translate into useful output. The distance between an excellent model used poorly and a mediocre model used well is often smaller than practitioners assume.
Prompt engineering is, ultimately, the act of constructing the cognitive context within which the model operates. Those who neglect it are not using an LLM: they are using a black box hoping that statistics will do the work for them. Sometimes it works. Often it doesn't. And the difference, as we have seen, can be paradoxically large — as large as that between walking to the car wash and understanding that the car needs to come with you.
© 2026 · Technical article for specialists · LLM · Attention Mechanism · Prompt Engineering · CoT · Frame Semantics