Theoretical Principle · Entity Engineering

The Data-Identity Principle

From findability to resolution. Because in the agent-mediated web it is not enough to be found: you must become the datum.

Paolo Galbiati {ProfPaul} — Infrastructure-First SEO Engineer

Read the principle ↓

§ Contents

Outline of the principle

1The thesisThe consumer of information changes nature
2DefinitionsKnowledge graph · resolution authority
3The limit of self-declarationWhy the isolated neuron is not enough
4Authority propagatesThe relational model (GNN)
5PageRank as a degenerate caseContinuity with classical theory
6The control problemThe weights are exogenous · minimax robustness
7The falsifiability conditionFalsifiable predictions · paired questions
8Why it is not a gambleDominance argument · antifragility
9LimitsWhere the principle can break
10The statementThe principle in citable form
★Field proofVerified case · three certified timestamps

01 The thesis

The consumer of information is changing its nature

For thirty years, presence on the web has been optimized against one precise objective function: maximizing the probability that a human being, faced with a list of results, chooses me. That is SEO. Its domain is human attention mediated by a visual interface — the SERP — and its currency is the click.

The observation is that the terminal consumer of information is becoming a machine. Traffic generated by autonomous software agents is growing at a rate that industry observers place an order of magnitude above human traffic, and in some verticals it has already overtaken human traffic in value per visit. These agents do not read a page the way a person does: they deliberately combine five retrieval modalities — vector, lexical, graph, structured and multimodal — and the quality of resolution, not the ranking, is the critical variable.

The prediction — the falsifiable part — is that the domain of the old objective function is contracting toward zero, and that a new objective function is replacing it:

Whoever optimizes the first improves their place on a stage that is emptying out. Whoever optimizes the second prepares for a stage that is filling up.

02 Definitions

Three objects, defined precisely

Definition 1 — Knowledge graph

Let be a directed, labelled graph: is the set of named entities (people, organizations, works, places, concepts), the relations, and assigns each edge a type (for example sameAs, memberOf, authorOf). Each entity carries an attribute vector encoding its published assertions: Schema.org properties, Wikidata statements, owned structured data.

Definition 2 — Resolution authority

For an entity and an intent class , I define resolution authority as a scalar : the degree to which an agentic system identifies as the canonical referent for and grants it trust. It is not ranking. Ranking places you among the results; resolution makes you the datum.

03 The limit of self-declaration

Declaring a lot about yourself does not produce authority

The simplest way to represent is a logistic neuron (Rosenblatt, 1958): you weight the attributes the entity declares about itself and squash the sum between 0 and 1.

: self-asserted attributes · : weight assigned by the system · : prior salience (the prior).

Proposition 1 — Structural insufficiency

The isolated-entity model can only represent decision functions that are linearly separable in the space of self-asserted attributes; identity resolution is not one of them (the argument is Minsky & Papert, 1969: XOR is not separable). Two entities may declare almost identical attributes and have to be told apart only by relational context — who cites them, what they are connected to. The discriminating information is not in the node: it is in the edges.

04 The relational model

Authority is not declared: it propagates

I replace the isolated neuron with a graph-propagation operator (Scarselli et al., 2009; Kipf & Welling, 2017). An entity’s authority is a quantity it receives from its neighbours along the edges, and re-transmits.

: neighbours of · : edge weight · : transformation depending on the relation type · : message-passing steps.

The reading is sharp: my authority is a function of the authority of what I am connected to, weighted by the type of connection. A sameAs to a Wikidata identifier, a citation from a recognized source, a verifiable institutional membership: each one injects authority into my state through the propagation term. This is the exact, now explicit, meaning of “becoming the datum”: you do not become authoritative by describing yourself, but by inhabiting the right neighbourhood of the graph and making the edges legible.

Becoming the datum: authority flows from authoritative neighbours toward the entity along typed edges.

05 Continuity with the past

PageRank is the degenerate case

The model does not repudiate the classical theory of the web: it contains it. Removing the non-linearity, reducing the attributes to a constant and allowing a single relation type with normalized adjacency matrix , the fixed point of propagation

…is exactly PageRank (Page & Brin, 1998): authority as the stationary distribution of a random walk on the graph.

In other words, the link-authority of the traditional web is the linear, single-relation, semantics-free version of entity propagation. I am not proposing a break: I am proposing a generalization of a structure the discipline has accepted for twenty-five years.

06 The control problem

The weights are not mine

In the operator there are two families of quantities. What I control: the attributes I publish and, in part, the local topology — which edges I declare. What I do not control: the learned parameters , internal to the system, opaque, and changing with every model and version.

So the correct problem is not an optimization, it is a problem of robustness. I do not want to maximize authority for one system: I want to guarantee it for any plausible system.

Goal: the invariance of recognition. That the entity clears the threshold however the weights may fall.

Proposition 2 — Margin and stability

If the map is -Lipschitz over the plausible set of radius , then a margin guarantees that the resolution of stays stable under any variation of the weights. Encoding your identity redundantly across many independent and concordant sources raises the way an error-correcting code raises the minimum distance: the more coherent copies, the more correct decoding survives the noise. Coherence and identical naming everywhere are not cosmetics: they are the minimax solution.

07 The falsifiability condition

A thesis that cannot be refuted is prophecy, not science

I state the predictions in falsifiable form (Popper, 1934), each with its own explicit refuter.

P1 Primacy of coherence over volume

For the same intent, an entity with high graph coherence reaches resolution at a lower volume of self-publication than an entity betting on keyword density.
Refuter a controlled pair in which the dense-but-incoherent entity is resolved preferentially over the coherent-but-sparse one.

P2 Authority propagation

Connecting to a high-authority neighbour, with node content held constant, measurably increases .
Refuter no resolution differential after a sameAs edge toward an authoritative node, with content controlled.

The paired-question method. I submit to the system pairs of minimally different queries, built to probe the same entity from different relational angles. The asymmetries in the answers are indirect evidence of the latent structure : a form of probing. For this to be method and not anecdote, I fix in advance which asymmetry would count as a refutation.

Epistemological note

When an experiment proves me right, I do not call it proof. I call it corroboration: the hypothesis survived an attempt at refutation, it was not shown to be true. This is the distinction that separates this work from the industry’s bragging.

08 Why it is not a gamble

A dominance argument, not a bet

Two states of the world: = agentic mediation becomes prevalent; = it does not, or only partly. Two strategies: and . The payoff in each case:

Strategy	State — agentic web	State — human web
	≫ 0 — it is the objective function	> 0 — already today: AI citations, Knowledge Panel, verified rankings
	≈ 0 — no consumer for a visual list	> 0 — only in the contracting state

Proposition 3 — Dominance

weakly dominates in both states of the world, and strictly dominates it as soon as . Translated: investing in data-identity is the better choice even if the prediction about the future is wrong. This is the difference between a bet (high variance, tied to a single state) and a no-regret strategy.

A stronger property: convexity. The payoff of entity authority grows convexly with the fraction of consumption mediated by agents: the more the world shifts, the more each unit of structure is worth, because it becomes the only channel of existence. A payoff with positive second derivative with respect to the uncertainty of the transition is antifragile (Taleb, 2012): it benefits from the very volatility of the passage.

09 Limits

Where the principle can break

The dialect is not the language

Schema.org and Wikidata are one way of making the graph legible, necessary but not sufficient. The substrate is widening: vector retrieval, context architectures, interaction protocols (MCP, WebMCP, NLWeb, A2A). The robust form of the principle is substrate agnosticism: being legible at once to graph retrieval, to vector retrieval and to tool calling.

Authority is contested

When a tactic generalizes it stops being a moat. The only thing that remains a moat is what is expensive to falsify: third-party-verifiable results, public proof, cryptographic identity. The defensible advantage is not the markup — it is verifiability.

The weights remain a black box

The control problem tells you how to be robust with respect to , not how to know . Every conclusion about is inferential and provisional.

10 The statement

The Data-Identity Principle

In the passage from a web mediated by human attention to a web mediated by autonomous agents, the unit of existence online is no longer the page optimized to be chosen, but the resolved entity: a node of a knowledge graph whose authority is not declared but propagated by its neighbours along typed semantic relations. Because the weights of this propagation are exogenous and mutable, the optimal strategy is not to optimize for a given system, but to make the entity’s recognition invariant with respect to the plausible set of systems, maximizing coherent redundancy and margin. In a decision-theoretic sense, this strategy dominates keyword optimization in every state of the world in which agentic mediation has non-zero probability, and enjoys a convex — antifragile — payoff with respect to the uncertainty of the transition.

— Paolo Galbiati

A field-hand before a theorist: I stopped trying to sit at the top of the list, and began to become the entry the list points to.

See the field evidence Read the Manifesto

★ Field proof · verified case

2 hours and 1 minute: from publication to the AI’s answer

On 13 June 2026 I published an interview page on a new domain, with no history. I logged three events with a certain timestamp. The third is the one that counts: Google AI Mode spontaneously retrieved the page as a source for a proper-name query — “intervista mattia isella” — without any trigger of the coined phrase.

06:39

Publication

Page online · new domain, no history

07:10:16

PageSpeed 100/100/100/100

+31 min · performance, accessibility, best practices, SEO

08:40:20

Spontaneous retrieval in AI Mode

+2h 01m · neutral query, no trigger

Total window: 2h 01m from publication to spontaneous generative retrieval.

What this case rules out

In two hours, on a domain with no history, there is no time to accrue domain authority, backlinks or age. The explanation “Google took time to trust it” is ruled out by construction. What remains is entity authority, inherited from the upstream graph, as the explanation for the retrieval — and the AI even incorporated the contract renewal published the day before, a sign of near-real-time propagation.

What remains to be shown

The page entered the sources, but the higher level has not yet been observed: that the AI on its own opens the answer using the coined phrase as a conceptual lens, on a neutral query. It is a measure of entity saturation, not a limit of nature. I state it because a datum is defended by also saying what it does not prove.

New domain, zero history, publication 06:39. At 08:40 the same day — 2h 01m later — Google AI Mode spontaneously retrieves the page as a source for a proper-name query, without any trigger of the coined phrase, even incorporating a fact published the day before. Three certified timestamps. Inherited domain authority is ruled out by construction; entity authority remains as the explanation.

— Verified case, 13 June 2026

References

Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review.
Minsky, M.; Papert, S. (1969). Perceptrons. MIT Press.
Page, L.; Brin, S.; Motwani, R.; Winograd, T. (1998). The PageRank Citation Ranking. Stanford.
Scarselli, F. et al. (2009). The Graph Neural Network Model. IEEE TNN.
Kipf, T. N.; Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. ICLR.
Popper, K. (1934/1959). The Logic of Scientific Discovery.
Taleb, N. N. (2012). Antifragile: Things That Gain from Disorder.