Every speech in the Index is split into individual sentences and used as the unit of analysis. A sentence is a tractable, recognisable unit: it has clear boundaries (punctuation), it is short enough that a single dominant frame usually applies, and it maps cleanly to the way readers and listeners actually parse rhetoric.
An alternative unit — the quasi-sentence used by parts of the political-text-analysis literature — splits sentences further whenever a single sentence carries more than one distinct argument. Quasi-sentences are useful in principle but fraught in practice: their boundaries are subjective, two coders rarely segment a long sentence the same way, and the resulting tag fragments are hard to compare across a corpus that already spans 240 years of evolving prose. The Index therefore stays at the sentence level and tags a sentence by its dominant frame.
For each dimension, every sentence is passed to a frontier language model with a fixed rubric (see below). The model returns one tag per dimension, or null when no tag fits. Classification is run in batches of 40 sentences per request to avoid drop-off, with a single retry on length mismatch. Tags are stored alongside the sentence text in each speech's JSON file.
The rubric is identical for every model call. Speakers, dates, and external metadata are deliberately withheld from the prompt — the tag must come from the sentence in front of the model, not from priors about who is speaking.
Where resources allow, a sample of speeches is also tagged by human reviewers using the same rubric. Reviewers see the sentence and the rubric definitions but not the AI's tag.
AI and human tags are then compared sentence by sentence. A sentence's dominant classification is the one both methods agree on; disagreements are flagged for re-tagging at scale and used to refine the rubric for the next pass. The Index is therefore a moving artefact — rubrics get tighter as the corpus grows.
Each sentence is tagged on six dimensions. Use the options below as the working rubric.
Time orientation reveals whether a leader is framing the moment as one of inheritance, of present condition, or of imminent change — a quick proxy for whether a speech is defending a record, asserting authority, or making a promise.
Expression separates the speech's rhetorical function: factual claims, binding commitments, or calls to act. The mix exposes how much of a leader's discourse is descriptive, performative, or mobilising.
Stance captures the power posture a speaker takes toward the audience — humble petitioner, equal collaborator, or judging authority. It is how the speaker positions themselves relative to those who can act on the message.
Agency identifies who the speech casts as the actor. Patterns here reveal whether a leader is centring their own state, building solidarity with allies, or focusing the audience on adversaries.
Reference tracks the scope a leader speaks to — domestic, bilateral, or regional/global. It exposes whether a speech is internally focused or projecting outward, and whose attention it seeks.
Capability maps each sentence onto the nine domains of the GINC National Capability Framework. It surfaces which levers of national power — hard, soft, or economic — a leader is signalling intent to invest in, defend, or wield.
Transcripts are sourced from public archives. In addition to those listed below, GINC sources transcripts from public addresses to top up the index where useful coverage is missing.