From PDS to Stan: implementing the theories
Building on the compilation pipeline introduced for adjectives, let’s see how PDS handles the two competing theories of factivity. Recall that PDS outputs kernel models—the semantic core corresponding directly to our theoretical commitments. We’ll focus mainly on the actual experimental items here, since the norming models for this task have a very similar character to the ones we introduced for gradable adjectives.
Discrete-factivity in PDS
For the discrete-factivity hypothesis, we can derive projection judgments using:
-- From Grammar.Parser and Grammar.Lexica.SynSem.Factivity
-- Define the discourse: "Jo knows that Bo is a linguist"
= ["jo", "knows", "that", "bo", "is", "a", "linguist"]
expr1 = ["how", "likely", "that", "bo", "is", "a", "linguist"]
expr2 = getSemantics @Factivity 0 expr1
s1 = getSemantics @Factivity 0 expr2
q1 = ty tau $ assert s1 >>> ask q1
discourse
-- Compile to Stan using factivityPrior and factivityRespond
= asTyped tau (betaDeltaNormal deltaRules . factivityRespond factivityPrior) discourse factivityExample
This compilation process involves several key components. First, the lexical entry for know branches based on discourse state:
-- From Grammar.Lexica.SynSem.Factivity
"knows" -> [ SynSem {
= S :\: NP :/: S,
syn = ty tau (lam s (purePP (lam p (lam x (lam i
sem ITE (TauKnow s)
(And (epi i @@ x @@ p) (p @@ i)) -- factive: belief AND truth
(@@ x @@ p)))))) -- non-factive: belief only
(epi i @@ s)
} ]
The ITE
(if-then-else) creates a discrete choice: either the speaker interprets the predicate as requiring the complement to be true (factive interpretation) or only the belief component is required (non-factive interpretation). The TauKnow
parameter, which can vary by context, determines which branch is taken.
To accommodate this contextual variation, the prior over states must be updated:
-- From Grammar.Lexica.SynSem.Factivity
= let' x (LogitNormal 0 1) (let' y (LogitNormal 0 1) (let' z (LogitNormal 0 1) (let' b (Bern x) (Return (UpdCG (let' c (Bern y) (let' d (Bern z) (Return (UpdLing (lam x c) (UpdEpi (lam x (lam p d)) _0))))) (UpdTauKnow b ϵ))))) factivityPrior
The delta rules must also be modified to handle the contextual factivity parameter:
-- From Lambda.Delta: Computes functions on states
states :: DeltaRule
= \case
states TauKnow (UpdTauKnow b _) -> Just b
TauKnow (UpdCG _ s) -> Just (TauKnow s)
TauKnow (UpdQUD _ s) -> Just (TauKnow s)
-- ... other cases
PDS compiles this discrete-factivity theory to the following kernel model:1
model {
// FIXED EFFECTS
0.0, 1.0); // probability of factive interpretation
v ~ logit_normal(0.0, 1.0); // world knowledge (from norming)
w ~ logit_normal(
// LIKELIHOOD
target += log_mix(v,
1.0, sigma, 0.0, 1.0), // factive branch
truncated_normal_lpdf(y | 0.0, 1.0)); // non-factive branch
truncated_normal_lpdf(y | w, sigma, }
This kernel captures the discrete branching: with probability v
, the response is near 1.0 (factive interpretation); otherwise, it depends on world knowledge w
(non-factive interpretation).
Understanding the Stan implementation
Before augmenting this kernel to handle real data, let’s briefly review the key additions for factivity.2
The factivity-specific components are: - Hierarchical structure for verb-specific and context-specific effects - Integration with norming study priors via mu_omega
and sigma_omega
- Mixture likelihood for discrete-factivity (highlighted lines 25-27 in the full model below)
Here’s how the discrete-factivity kernel is augmented for real data:
model {
// PRIORS (analyst-added)
1);
verb_intercept_std ~ exponential(1);
context_intercept_std ~ exponential(1);
subj_verb_std ~ exponential(1);
subj_context_std ~ exponential(2, 10); // mildly informative prior keeping sigma_e small
sigma_e ~ beta(
// Hierarchical priors
verb_logit_raw ~ std_normal();// informed by norming
context_logit_raw ~ normal(mu_omega, sigma_omega);
to_vector(subj_verb_raw) ~ std_normal();
to_vector(subj_context_raw) ~ std_normal();
// DISCRETE FACTIVITY (PDS kernel structure)
for (n in 1:N) {
// Probability of factive interpretation for this verb/subject combo
real verb_prob = inv_logit(verb_intercept[verb[n]] +
subj_intercept_verb[subj[n], verb[n]]);
// World knowledge probability for this context/subject combo
real context_prob = inv_logit(context_intercept[context[n]] +
subj_intercept_context[subj[n], context[n]]);
// MIXTURE LIKELIHOOD (PDS kernel structure)
target += log_mix(verb_prob,
1.0, sigma_e, 0.0, 1.0),
truncated_normal_lpdf(y[n] | 0.0, 1.0));
truncated_normal_lpdf(y[n] | context_prob, sigma_e,
} }
The highlighted lines represent the kernel model from PDS—encoding discrete factivity as a mixture of two response distributions. The unhighlighted portions add the statistical machinery needed for real data.
Wholly-gradient factivity in PDS
The compilation process for the wholly-gradient model involves analogous components. In this case, the lexical entry for know branches not based on discourse state, but based on indices of the common ground:
"knows" -> [ SynSem {
= S :\: NP :/: S,
syn = ty tau (lam s (purePP (lam p (lam x (lam i
sem ITE (TauKnow i)
(And (epi i @@ x @@ p) (p @@ i)) -- factive: belief AND truth
(@@ x @@ p))))) @@ s)) -- non-factive: belief only
(epi i } ]
Following the discrete lexical entry for know, the ITE
here creates a discrete choice, but now about something different; i.e., how to use the index it receives from the common ground.
To accommodate the novel index-sensitivity of know, the prior over indices encoded in the common ground must be also updated:
= let' x (LogitNormal 0 1) (let' y (LogitNormal 0 1) (let' z (LogitNormal 0 1) (Return (UpdCG (let' b (Bern x) (let' c (Bern y) (let' d (Bern z) (Return (UpdTauKnow b (UpdLing (lam x c) (UpdEpi (lam x (lam p d)) _0))))))) ϵ)))) factivityPrior
Here, the Bernoulli statement that regulates whether or not know is factive has been added to the definition of the common ground itself.
The delta rules must also be modified to handle the factivity parameter as it is regulated by indices:
indices :: DeltaRule
= \case
indices TauKnow (UpdTauKnow b _) -> Just b
TauKnow (UpdEpi _ i) -> Just (TauKnow i)
TauKnow (UpdLing _ i) -> Just (TauKnow i)
-- ... other cases
The alternative wholly-gradient hypothesis treats factivity as continuously variable. The key modification to the PDS code is in how we encode the gradience. Instead of discrete branching, the gradient model computes a weighted combination.
PDS outputs this kernel for the wholly-gradient model:3
model {
// FIXED EFFECTS
0.0, 1.0); // degree of factivity
v ~ logit_normal(0.0, 1.0); // world knowledge
w ~ logit_normal(
// LIKELIHOOD
target += truncated_normal_lpdf(y | v + (1.0 - v) * w, sigma, 0.0, 1.0);
}
Here, v
represents the degree of factivity—it provides a “boost” to the world knowledge probability w
, but never forces the response to 1.0. The response probability is computed as: response = v + (1-v) * w
.
Let’s trace through this computation: - If v = 0
(no factivity): response = 0 + 1 * w = w
(pure world knowledge) - If v = 1
(full factivity): response = 1 + 0 * w = 1
(certain) - If v = 0.5
(partial factivity): response = 0.5 + 0.5 * w
(boosted world knowledge)
The full model augments this kernel with the same hierarchical structure as before:
model {
// PRIORS (analyst-added)
1);
verb_intercept_std ~ exponential(1);
context_intercept_std ~ exponential(1);
subj_verb_std ~ exponential(1);
subj_context_std ~ exponential(2, 10);
sigma_e ~ beta(
// Hierarchical priors
verb_logit_raw ~ std_normal();
context_logit_raw ~ normal(mu_omega, sigma_omega);
to_vector(subj_verb_raw) ~ std_normal();
to_vector(subj_context_raw) ~ std_normal();
// GRADIENT COMPUTATION (PDS kernel structure)
for (n in 1:N) {
// Degree of factivity for this verb/subject
real verb_boost = inv_logit(verb_intercept[verb[n]] +
subj_intercept_verb[subj[n], verb[n]]);
// World knowledge for this context/subject
real context_prob = inv_logit(context_intercept[context[n]] +
subj_intercept_context[subj[n], context[n]]);
// GRADIENT LIKELIHOOD (PDS kernel computation)
real response_prob = verb_boost + (1.0 - verb_boost) * context_prob;
target += truncated_normal_lpdf(y[n] | response_prob, sigma_e, 0.0, 1.0);
} }
The highlighted lines show the PDS kernel—a continuous computation rather than discrete branching. The gradient contribution of factivity is clear in line 25: the verb provides a multiplicative boost to world knowledge rather than overriding it entirely.
Response distributions
Both models use truncated normal distributions as response functions. As discussed in the adjectives section, this handles the bounded nature of slider scales:
real truncated_normal_lpdf(real y | real mu, real sigma, real lower, real upper) {
// Log probability of y under Normal(mu, sigma) truncated to [lower, upper]
real lpdf = normal_lpdf(y | mu, sigma);
real normalizer = log(normal_cdf(upper | mu, sigma) - normal_cdf(lower | mu, sigma));
return lpdf - normalizer;
}
The truncation is crucial because many responses cluster at the scale boundaries (0 and 1), which standard distributions like Beta cannot handle directly.
Generated quantities
Both models can include a generated quantities block to compute posterior predictions:
generated quantities {
array[N] real y_pred; // posterior predictive samples
for (n in 1:N) {
real verb_prob = inv_logit(verb_intercept[verb[n]] +
subj_intercept_verb[subj[n], verb[n]]);real context_prob = inv_logit(context_intercept[context[n]] +
subj_intercept_context[subj[n], context[n]]);
if (model_type == "discrete") {
// Discrete: first sample branch, then response
int branch = bernoulli_rng(verb_prob);
if (branch == 1) {
1.0, sigma_e, 0.0, 1.0);
y_pred[n] = truncated_normal_rng(else {
} 0.0, 1.0);
y_pred[n] = truncated_normal_rng(context_prob, sigma_e,
}else {
} // Gradient: compute blended probability
real response_prob = verb_prob + (1.0 - verb_prob) * context_prob;
0.0, 1.0);
y_pred[n] = truncated_normal_rng(response_prob, sigma_e,
}
} }
These posterior predictions let us visualize how well each model captures the empirical patterns.
Footnotes
Actual PDS output:
model { v ~ logit_normal(0.0, 1.0); w ~ logit_normal(0.0, 1.0); target += log_mix(v, truncated_normal_lpdf(y | 1.0, sigma, 0.0, 1.0), truncated_normal_lpdf(y | w, sigma, 0.0, 1.0)); }
↩︎For a detailed introduction to Stan’s blocks and syntax, see the Stan introduction in the adjectives section.↩︎
Actual PDS output after adding rendering hooks:
model { v ~ logit_normal(0.0, 1.0); w ~ logit_normal(0.0, 1.0); target += truncated_normal_lpdf(y | v + (1.0 - v) * w, sigma, 0.0, 1.0); }
↩︎