Norming model

Our first model addresses a fundamental question: how do we infer the “shape” of people’s prior beliefs about the degrees that gradable adjectives operate on? The norming study provides a clean test case where participants directly report degrees on scales.

We’ll start with a realistic model of the norming data that one might design as a means for analyzing that dataset. What we’ll do is to build up the model block-by-block, explaining each line. Then, we’ll turn to how we might analyze this experiment using PDS and show which components of this model correspond to the PDS kernel model and which ones are extensions of the model by the analyst.

Understanding the experimental setup

Before diving into the Stan code, let’s consider how we’ll represent the norming data, since this is important for understanding how we design Stan code. Here’s a sample of the data:

participant item item_number adjective adjective_number condition condition_number scale_type scale_type_number response
1 closed_mid 6 closed 2 mid 3 absolute 1 0.66
1 old_mid 24 old 8 mid 3 relative 2 0.51
1 expensive_mid 15 expensive 5 mid 3 relative 2 0.62
1 full_high 16 full 6 high 1 absolute 1 1
1 deep_low 8 deep 3 low 2 relative 2 0.22

Each row represents one judgment: - participant: Which person made this judgment (participant 1, 2, etc.) - item: A unique identifier combining adjective and condition (e.g., “tall_high”) - item_number: Numeric ID for the item (used in Stan) - adjective: The gradable adjective being tested - condition: Whether this is a high/mid/low standard context - response: The participant’s slider response (0-1)

Our simplest model asks: what degree does each item have on its scale, and how do participants map these degrees to slider responses?

The structure of a Stan program

Every Stan program follows a particular architecture with blocks that appear in a specific order. Each block serves a specific purpose in defining our statistical model. Let’s build up our norming model block by block to understand how Stan works. This structure parallels the modular architecture of PDS itself—each block handles a distinct aspect of the modeling problem.

The data block

Every Stan program begins with a data block that tells Stan what information will be provided from the outside world—our experimental observations. Let’s build this up piece by piece:

data {
  int<lower=1> N_item;        // number of items
  int<lower=1> N_participant; // number of participants  
  int<lower=1> N_data;        // number of data points

These first lines declare basic counts. The syntax breaks down as:

  • int: This will be an integer (whole number)
  • <lower=1>: This integer must be at least 1 (no negative counts!)
  • N_item: The variable name (we’ll use this throughout our program)
  • // number of items: A comment explaining what this represents

Why do we need these constraints? Stan uses them to:

  1. Catch data errors early (if we accidentally pass 0 items, Stan will complain)
  2. Optimize its algorithms (knowing bounds helps the sampler work efficiently)

Next, we handle a subtle but important issue—boundary responses:

  int<lower=1> N_0;           // number of 0s
  int<lower=1> N_1;           // number of 1s

Why separate these out? Slider responses of exactly 0 or 1 are “censored”—they might represent even more extreme judgments that the scale can’t capture. We’ll handle these in a second.

For the actual response data:

  vector<lower=0, upper=1>[N_data] y; // response in (0, 1)

This declares a vector (like an array) of length N_data, where each element must be between 0 and 1. Notice this is for responses between 0 and 1, not including the boundaries.

Finally, we need to map responses to items and participants:

  array[N_data] int<lower=1, upper=N_item> item;        // which item for each response
  array[N_0] int<lower=1, upper=N_item> item_0;         // which item for each 0
  array[N_1] int<lower=1, upper=N_item> item_1;         // which item for each 1
  array[N_data] int<lower=1, upper=N_participant> participant;     
  array[N_0] int<lower=1, upper=N_participant> participant_0;
  array[N_1] int<lower=1, upper=N_participant> participant_1;
}

These arrays work like lookup tables. If item[5] = 3, then the 5th response in our data was about item #3. This indexing structure connects our flat data file to the hierarchical structure of our experiment.

Looking back at our CSV data, when Stan reads it, it will: 1. Count unique items → N_item (e.g., 36 if we have 12 adjectives × 3 conditions) 2. Count unique participants → N_participant 3. Extract all responses between 0 and 1 → y vector 4. Build index arrays mapping each response to its item and participant

The parameters block: What we want to learn

After declaring our data, we declare the parameters—the unknown quantities we want to infer. This is where semantic theory meets statistical inference:

parameters {
  // Fixed effects
  vector[N_item] mu_guess;

This declares a vector of “guesses” (degrees) for each item. Why mu_guess? In statistics, μ (mu) traditionally denotes a mean or central tendency. These means can be understood as representing our best guess, as researchers, about each item’s true degree on its scale—the theoretical degrees that the semantic analysis posits. Crucially, they can also be viewed as representing subjects’ uncertainty about these degrees—what unresolved uncertainty do they maintain when they make these guesses?

But people differ! We need random effects to capture individual variation:

  // Random effects
  real<lower=0> sigma_epsilon_guess;     // how much people vary
  vector[N_participant] z_epsilon_guess; // each person's deviation

This uses a clever trick called “non-centered parameterization”: - sigma_epsilon_guess: The overall amount of person-to-person variation - z_epsilon_guess: Standardized (z-score) deviations for each person

We’ll combine these later to get each person’s actual adjustment. Why not just use vector[N_participant] epsilon_guess directly? This separation often helps Stan’s algorithms converge much faster—a practical consideration that doesn’t affect the semantic theory but matters for implementation.

Next, measurement noise:

  real<lower=0,upper=1> sigma_e;  // response variability

Even if two people agree on an item’s degree, their slider responses might differ slightly. This parameter captures that noise.

Finally, those boundary responses:

  // Censored data
  array[N_0] real<upper=0> y_0;  // true values for observed 0s
  array[N_1] real<lower=1> y_1;  // true values for observed 1s
}

This is subtle but important. When someone gives a 0 response, their “true” judgment might be -0.1 or -0.5—we just can’t see below 0. These parameters let Stan infer what those true values might have been.

The transformed parameters block: Building predictions

Now we combine our basic parameters to build what we actually need. This block serves as a bridge between abstract parameters and concrete predictions:

transformed parameters {
  vector[N_participant] epsilon_guess;
  vector[N_data] guess;
  vector[N_0] guess_0;
  vector[N_1] guess_1;

First, we convert those z-scores to actual participant adjustments:

  // Non-centered parameterization
  epsilon_guess = sigma_epsilon_guess * z_epsilon_guess;

If sigma_epsilon_guess = 0.2 and participant 3 has z_epsilon_guess[3] = 1.5, then participant 3 tends to give responses 0.3 units higher than average.

Now we can compute predicted responses:

  for (i in 1:N_data) {
    guess[i] = mu_guess[item[i]] + epsilon_guess[participant[i]];

Let’s trace through one prediction: - Response i is about item 5 by participant 3 - item[i] = 5, so we look up mu_guess[5] (say it’s 0.7) - participant[i] = 3, so we add epsilon_guess[3] (say it’s 0.1) - guess[i] = 0.7 + 0.1 = 0.8

We repeat this for the boundary responses:

  for (i in 1:N_0) {
    guess_0[i] = mu_guess[item_0[i]] + epsilon_guess[participant_0[i]];
  }
  
  for (i in 1:N_1) {
    guess_1[i] = mu_guess[item_1[i]] + epsilon_guess[participant_1[i]];
  }
}

The model block

The model block is where we specify our statistical assumptions—both our prior beliefs and how the data was generated. This is where most of the action in terms of how PDS relates to data.

model {
  // Priors on random effects
  sigma_epsilon_guess ~ exponential(1);
  z_epsilon_guess ~ std_normal();

These priors encode mild assumptions: - exponential(1): We expect person-to-person variation to be moderate (not huge) - std_normal(): By construction, z-scores have a standard normal distribution

Notice we don’t specify priors for mu_guess—Stan treats this as an implicit uniform prior over the real numbers. Since our responses are bounded, the data will naturally constrain these values.

Now the likelihood—how data relates to parameters:

  // Likelihood
  for (i in 1:N_data) {
    y[i] ~ normal(guess[i], sigma_e);
  }

This says: each response is drawn from a normal distribution centered at our prediction with standard deviation sigma_e. The ~ symbol means “is distributed as.”

For boundary responses, we use the latent values:

  for (i in 1:N_0) {
    y_0[i] ~ normal(guess_0[i], sigma_e);
  }
  
  for (i in 1:N_1) {
    y_1[i] ~ normal(guess_1[i], sigma_e);
  } 
}

Remember: we’re inferring y_0 and y_1 as parameters! Stan will sample plausible values that are consistent with both the model and the fact that we observed 0s and 1s.

The generated quantities block

Finally, we compute quantities that help us understand and evaluate our model:

generated quantities {
  vector[N_data] ll; // log-likelihoods
  
  for (i in 1:N_data) {
    if (y[i] >= 0 && y[i] <= 1)
      ll[i] = normal_lpdf(y[i] | guess[i], sigma_e);
    else
      ll[i] = negative_infinity();
  }
}

The log-likelihood tells us how probable each observation is under our model. We’ll use these for model comparison—models that assign higher probability to the actual data are better.

The complete model

Here’s our complete model with consistent naming:

data {
  int<lower=1> N_item;              // number of items
  int<lower=1> N_participant;       // number of participants
  int<lower=1> N_data;              // number of data points in (0, 1)
  int<lower=1> N_0;                 // number of 0s
  int<lower=1> N_1;                 // number of 1s
  vector<lower=0, upper=1>[N_data] y; // response in (0, 1)
  array[N_data] int<lower=1, upper=N_item> item;
  array[N_0] int<lower=1, upper=N_item> item_0;
  array[N_1] int<lower=1, upper=N_item> item_1;
  array[N_data] int<lower=1, upper=N_participant> participant;
  array[N_0] int<lower=1, upper=N_participant> participant_0;
  array[N_1] int<lower=1, upper=N_participant> participant_1;
}

parameters {
  vector[N_item] mu_guess;
  real<lower=0> sigma_epsilon_guess;
  vector[N_participant] z_epsilon_guess;
  real<lower=0,upper=1> sigma_e;
  array[N_0] real<upper=0> y_0;
  array[N_1] real<lower=1> y_1;
}

transformed parameters {
  vector[N_participant] epsilon_guess = sigma_epsilon_guess * z_epsilon_guess;
  vector[N_data] guess;
  vector[N_0] guess_0;
  vector[N_1] guess_1;

  for (i in 1:N_data) {
    guess[i] = mu_guess[item[i]] + epsilon_guess[participant[i]];
  }
  for (i in 1:N_0) {
    guess_0[i] = mu_guess[item_0[i]] + epsilon_guess[participant_0[i]];
  }
  for (i in 1:N_1) {
    guess_1[i] = mu_guess[item_1[i]] + epsilon_guess[participant_1[i]];
  }
}

model {
  sigma_epsilon_guess ~ exponential(1);
  z_epsilon_guess ~ std_normal();

  for (i in 1:N_data) {
    y[i] ~ normal(guess[i], sigma_e);
  }
  for (i in 1:N_0) {
    y_0[i] ~ normal(guess_0[i], sigma_e);
  }
  for (i in 1:N_1) {
    y_1[i] ~ normal(guess_1[i], sigma_e);
  } 
}

generated quantities {
  vector[N_data] ll;
  for (i in 1:N_data) {
    if (y[i] >= 0 && y[i] <= 1)
      ll[i] = normal_lpdf(y[i] | guess[i], sigma_e);
    else
      ll[i] = negative_infinity();
  }
}

This baseline model treats each item as having an inherent degree along the relevant scale, with participants providing noisy measurements of these degrees. The censoring approach handles the common issue of responses at the boundaries (0 and 1) of the slider scale.

PDS-to-Stan

So what components of the above model are derived from PDS? To answer this, we need to define our PDS model of the norming task itself. Here it is:

s1'        = termOf $ getSemantics @Adjectives 1 ["jo", "is", "a", "soccer player"]
q1'        = termOf $ getSemantics @Adjectives 0 ["how", "tall", "jo", "is"]
discourse' = ty tau $ assert s1' >>> ask q1'
scaleNormingExample = asTyped tau (betaDeltaNormal deltaRules . adjectivesRespond scaleNormingPrior) discourse'

This code:

  1. Asserts that Jo is a soccer player (establishing context)
  2. Asks “how tall is Jo?” using the degree-argument version of the adjective
  3. Applies beta and delta-reduction rules via betaDeltaNormal
  4. Uses scaleNormingPrior to generate prior distributions
  5. Applies adjectivesRespond to specify the response function

Note the use of certain convenience functions. For example, getSemantics retrieves one of the meanings (in the λ-calculus) for the expression it is given as a string of strings, using the parser implemented at Grammar.Parser. The other functions, termOf, ty, and tau serve as basic plumbing and can be found in the documentation.

Working through degree questions

Degree questions like “how tall is Jo?” use a special lexical entry for adjectives that exposes the degree argument. From Grammar.Lexica.SynSem.Adjectives:

instance Interpretation Adjectives SynSem where
  combineR = Convenience.combineR
  combineL = Convenience.combineL
  
  lexica = [lex]
    where lex = \case
      ...
      "tall"          -> [ SynSem {
                              syn = AP :\: Deg,
                              sem = ty tau (purePP (lam d (lam x (lam i (sCon "(≥)" @@ (sCon "height" @@ i @@ x) @@ d)))))
                              }
                           ...
                         ]
      ...
      "how"           ->  [ SynSem {
                              syn =  Qdeg :/: (S :/: AP) :/: (AP :\: Deg),
                              sem = ty tau (purePP (lam x (lam y (lam z (y @@ (x @@ z))))))
                            }
                            ...
                          ]
      ...

delta-rules and semantic computation

PDS applies delta-rules to simplify these complex λ-terms. As discussed in the implementation section, delta-rules enable different semantic computations while preserving semantic equivalence. The formalism is strongly normalizing and confluent, so the order of rule application doesn’t affect the final result—a crucial property that ensures our semantic theory remains consistent.

Key delta-rules for adjectives include: - Arithmetic operations: Simplifying comparisons like \(\ct{(≥)}\) when applied to constants - State/index extraction: Rules for \(\ct{height}\), \(\ct{d\_tall}\), etc. - Beta reduction: Standard λ-calculus reduction

These rules transform the complex compositional semantics into simpler forms suitable for compilation to Stan. The transformation preserves the semantic content while making it computationally tractable.

Working through delta-reductions

Having seen the PDS code for degree questions, we now trace through how delta-rules transform these complex λ-terms into forms suitable for Stan compilation. delta-rules, as introduced in Lambda.Delta, are partial functions from terms to terms that implement semantic computations.

For degree questions like how tall is Jo?, the compositional semantics produces:

\[ λd, i.\ct{height}(i)(j) ≥ d \]

When \(\abbr{respond}\) comes into the picture, some index \(i^{\prime}\) is sampled from the common ground, and the maximal answer to the question is determined to be:

\[ \ct{max}(λd.\ct{height}(i^{\prime})(j) ≥ d) \] This term undergoes several delta-reductions. First, the indices rule extracts the height value from whatever actual index is sampled from the common ground of the current discourse state. From Lambda.Delta:

indices :: DeltaRule
indices = \case
  ...
  Height (UpdHeight p _) -> Just p
  Height (UpdSocPla _ i) -> Just (Height i)
  ...
  _                      -> Nothing

Calling this height value \(h\), this rule yields:

\[ \ct{max}(λd.h ≥ d) \]

where \(h\) represents Jo’s actual height at index \(i\). The \(\ct{max}\) operator then extracts this unique value using the following delta-rule:

maxes :: DeltaRule
maxes = \case
   Max (Lam y (GE x (Var y'))) | y' == y -> Just x
   _                                     -> Nothing  

This gives us \(h\).

This final form directly corresponds to the Stan parameter we need to infer—the degree on the height scale.

From lambda terms to Stan parameters

The challenge is translating abstract semantic computations into Stan’s parameter space. This translation embodies (some of) our linking hypothesis between semantic competence and performance.

  1. Degree extraction becomes parameter inference:
    • \(\ct{max}(λd.\ct{height}(i)(j) ≥ d)\) → Infer parameter height_jo
    • The unique degree satisfying the equation becomes a parameter to estimate
  2. Functions become arrays:
    • \(\ct{height} : \iota \to e \to r\) → Array height[person]
    • Function application → Array indexing
  3. Propositions become probabilities:
    • Truth values → Real numbers in [0,1]
    • Logical operations → Probabilistic operations
  4. The monad becomes Stan’s target:
    • The \(\Do\)-notation structures sequential computation:

      \(\begin{array}[t]{l} x ∼ \ct{normal}(0, 1) \\ y ∼ \ct{normal}(x, 1) \\ \pure{y} \end{array}\)

    • This determines Stan’s log probability

This translation embodies our linking hypothesis: semantic computations generate behavioral data through a noisy measurement process captured by adjectivesRespond.

PDS Compilation Details

Input PDS:

discourse' = ty tau $ assert s1' >>> ask q1'
scaleNormingExample = asTyped tau (betaDeltaNormal deltaRules . adjectivesRespond scaleNormingPrior) discourse'

delta-reductions:

  1. Parse the answer to “how tall jo is” → \(\ct{max}(λd.\ct{height}(i)(j) ≥ d)\)
  2. Apply indices rule → \(\ct{max}(λd.h ≥ d)\)
  3. Apply \(\ct{max}\) extraction → \(h\)
  4. Monadic structure maps to Stan parameter inference

Kernel output:1

model {
  // FIXED EFFECTS
  w ~ normal(0.0, 1.0);
  
  // LIKELIHOOD
  target += normal_lpdf(y | w, sigma);
}

The PDS kernel model

The PDS system outputs the following kernel model:2

model {
  // FIXED EFFECTS
  w ~ normal(0.0, 1.0);
  
  // LIKELIHOOD
  target += normal_lpdf(y | w, sigma);
}

This is the semantic core—it captures the essential degree-based semantics where w represents the degree on the height scale. But reality is complicated: we need random effects, the ability to model censored data, and proper indexing for multiple items and participants. This gap between the kernel model and a full statistical implementation represents ongoing research: how to get from here (PDS output) to here (actual implementation).

The full model with analyst augmentations looks like:

model {
  // PRIORS (analyst-added)
  sigma_epsilon_guess ~ exponential(1);
  sigma_e ~ beta(2, 10);
  
  // FIXED EFFECTS (PDS kernel)
  mu_guess ~ normal(0.0, 1.0);
  
  // RANDOM EFFECTS (analyst-added)
  z_epsilon_guess ~ std_normal();
  
  // LIKELIHOOD (PDS kernel with modifications)
  y[i] ~ normal(mu_guess[item[i]] + epsilon_guess[participant[i]], sigma_e);
}

Lines 6 and 13 (highlighted) show the kernel model from PDS. The unhighlighted portions add statistical machinery for real data: hierarchical priors, random effects, and indexed parameters. The kernel captures the core semantic computation—degrees on scales—while the augmentations handle the realities of experimental data.

This baseline model establishes how PDS transforms degree questions into parameter inference. Next, we’ll see how this extends to modeling the vagueness inherent in gradable adjective judgments.

Footnotes

  1. Actual PDS output: model { w ~ normal(0.0, 1.0); target += normal_lpdf(y | w, sigma); }↩︎

  2. Actual PDS output: model { w ~ normal(0.0, 1.0); target += normal_lpdf(y | w, sigma); }↩︎