Understanding gradience

One of the most striking findings from experimental semantics and pragmatics is the pervasiveness of gradience in aggregated measures.¹ While semanticists have long recognized the existence of gradience in some domains–e.g. gradable adjectives–we often assume categorical distinctions in other domains–e.g. factivity. And even where traditional approaches assume categorical distinctions, experimental methods often reveal continuous variation. For the reasons laid out above, understanding this gradience is crucial for developing theories that connect formal semantics to behavioral data.

Examples of Potentially Unexpected Gradience

The kinds of distributionally and inferentially defined properties we develop generalizations around are not always readily apparent in large-scale datasets. An example we will look at in-depth in our second case study of the course is that, when attempting to measure veridicality/factivity, we end up with more gradience than we might have expected. We can illustrate this using the MegaAttitude datasets.

Figure 1 shows veridicality judgments collected by White and Rawlins (2018) as part of their MegaVeridicality dataset.

Figure 1: Veridicality judgments collected by White and Rawlins (2018) as part of their MegaVeridicality dataset.

One thing White and Rawlins (2018) note is the apparent gradience in these measures. This gradience presents a challenge if we want to use these measures to evaluate generalizations about the relationship between two properties. For instance, say we are interested in understanding the relationship betwen factivity and neg(ation)-raising. A predicate is neg-raising if it gives rise to inferences of the form from (1) to (2):

Jo doesn’t think that Mo left.
Jo thinks that Mo didn’t leave.

One way of deriving a factivity measure from the MegaVeridicality dataset is to take the max along both dimensions, as shown in Figure 2. The idea here is that, it will give rise to veridicality inferences with both positive and negative matrix polarity.

Figure 2: One way of deriving a factivity measure from the MegaVeridicality dataset.

Now let’s suppose we’re interested in generalizations about the relationship between two measures. For instance, maybe want to evaluate the relationship between factivity and neg-raising, where we might tend to suspect that factives are not neg-raisers.

Figure 3 shows a comparison of the measure of neg(ation)-raising from the MegaNegRaising dataset collected by An and White (2020) and the derived factivity measure from the MegaVeridicality dataset collected by White and Rawlins (2018).

Figure 3: A comparison of the measure of neg(ation)-raising from the MegaNegRaising dataset collected by An and White (2020) and the derived factivity measure from the MegaVeridicality dataset collected by White and Rawlins (2018).

The challenge is that, once we move to relating continuous measures, rather than categorical distinctions, we don’t know what the relationship between measures should look like in any particular case. To illustrate, let’s consider another example. Anand and Hacquard (2014) propose that, if a predicate gives rise to inferences about both beliefs and preferences, it backgrounds the belief inferences. To evaluate this hypothesis, we might try to derive a measure of belief inferences and preference inferences and then relate them.

To this end, we can use the MegaIntensionality dataset collected by Kane, Gantt, and White (2022). Figure 4 shows a measure of belief inferences and Figure 5 shows a measure of desire inferences.

Figure 4: A measure of belief inferences from the MegaIntensionality dataset collected by Kane, Gantt, and White (2022).

And Figure 6 shows a comparison of the desire and belief measures.

Figure 5: A measure of desire inferences from the MegaIntensionality dataset collected by Kane, Gantt, and White (2022).

Figure 6 show the relationship between these two measures.

Figure 6: A comparison of the desire and belief measures from the MegaIntensionality dataset collected by Kane, Gantt, and White (2022).

There are two main takeaways from this example. First, the generalization proposed by@anand_factivity_2014 is indeed supported by the data. Second, the relationship between these two measures is strikingly different from the relationship we observe between the continuous measures of factivity and neg-raising. We need some way of theorizing about these continuous relationships.

Two Fundamental Types of Uncertainty

The framework we’ll explore distinguishes two general types of uncertainty that can produce gradience: resolved (or type-level) uncertainty and unresolved (or token-level) uncertainty, both of which can arise from multiple sources.

Sources of Gradience in Inference Judgments
├── Resolved (Type-Level) Uncertainty
│   ├── Ambiguity
│   │   ├── Lexical (e.g., "run" = locomote vs. manage)
│   │   ├── Syntactic (e.g., attachment ambiguities)
│   │   └── Semantic (e.g., scope ambiguities)
│   └── Discourse Status
│       └── QUD (Question Under Discussion)
└── Unresolved (Token-Level) Uncertainty
    ├── Vagueness (e.g., height of a "tall" person)
    ├── World knowledge (e.g., likelihood that facts are true)
    └── Task effects
        ├── Response strategies
        └── Response error

Resolved Uncertainty: Multiple Discrete Possibilities

Resolved uncertainty arises when speakers must choose among discrete interpretations. Consider (3):

My uncle is running the race.

The verb run is ambiguous—the uncle might be a participant (locomotion) or the organizer (management). Asked “How likely is it that my uncle has good managerial skills?”, participants who interpret run as locomotion might respond near 0.2, while those interpreting it as management might respond near 0.8. The population average might be 0.5, but this reflects a mixture of discrete interpretations, not genuine gradience.

This uncertainty is “resolved” because once speakers fix an interpretation, the inference follows determinately. The gradience emerges from averaging across different resolutions, not from uncertainty within any single interpretation.

A similar phenomenon is observable with anaphora. Consider (4):

Whenever anyone laughed, the magician scowled and their assistant smirked. They were secretly pleased.

One is quite likely to infer from (4) that the magician’s assistant is secretly pleased, but not necessarily that the magician is pleased, even though, in principle, it may be that both are, or even that only the magician is. Ultimately, the ambiguity is resolved when we fix the referent.

Unresolved Uncertainty: Gradient Within Interpretations

Unresolved uncertainty contrasts with resolved uncertainty in that it persists even after fixing all ambiguities. Consider (5):

My uncle is tall.

Even with no ambiguity about tall’s meaning, speakers remain uncertain whether the uncle exceeds any particular height threshold. This is classic vagueness—the predicate’s application conditions are inherently gradient (Fine 1975; Graff 2000; Kennedy 2007; Rooij 2011; Sorensen 2023).

World knowledge creates another layer: even knowing someone runs races (locomotion sense), we remain uncertain about their speed, endurance, or likelihood of finishing. These uncertainties appear within individual trials, not just across participants.

Why This Distinction Matters

The type of uncertainty has profound implications for semantic theory:

Resolved uncertainty suggests discrete semantic representations with probabilistic selection
Unresolved uncertainty suggests gradient representations or probabilistic reasoning within fixed meanings

Different phenomena may involve different uncertainty types. As we’ll see, vagueness seems to give rise to unresolved uncertainty (the conditions of application of tall seem inherently uncertain), while factivity’s gradience is perhaps more puzzling: is it resolved uncertainty from ambiguous predicates, or unresolved uncertainty in projection itself?

References

An, Hannah, and Aaron White. 2020. “The Lexical and Grammatical Sources of Neg-Raising Inferences.” Proceedings of the Society for Computation in Linguistics 3 (1): 220–33. https://doi.org/https://doi.org/10.7275/yts0-q989.

Anand, Pranav, and Valentine Hacquard. 2014. “Factivity, Belief and Discourse.” In The Art and Craft of Semantics: A Festschrift for Irene Heim, edited by Luka Crni\v{c} and Uli Sauerland, 1:69–90. MITWPL 70. MITWPL. https://semanticsarchive.net/Archive/jZiNmM4N/.

Bard, Ellen Gurman, Dan Robertson, and Antonella Sorace. 1996. “Magnitude Estimation of Linguistic Acceptability.” Language 72 (1): 32–68. https://doi.org/10.2307/416793.

Featherston, Sam. 2005. “Magnitude Estimation and What It Can Do for Your Syntax: Some Wh-Constraints in German.” Lingua 115 (11): 1525–50. https://doi.org/10.1016/j.lingua.2004.07.003.

———. 2007. “Data in Generative Grammar: The Stick and the Carrot.” Theoretical Linguistics 33 (3): 269–318. https://doi.org/10.1515/TL.2007.020.

Fine, Kit. 1975. “Vagueness, Truth and Logic.” Synthese 30 (3/4): 265–300. https://www.jstor.org/stable/20115033.

Gibson, Edward, and Evelina Fedorenko. 2010. “Weak Quantitative Standards in Linguistics Research.” Trends in Cognitive Science 14 (6): 233–34. https://doi.org/10.1016/j.tics.2010.03.005.

———. 2013. “The Need for Quantitative Methods in Syntax and Semantics Research.” Language and Cognitive Processes 28 (1-2): 88–124. https://doi.org/10.1080/01690965.2010.515080.

Graff, Delia. 2000. “Shifting Sands: An Interest-Relative Theory of Vagueness.” Philosophical Topics 28 (1): 45–81. https://www.jstor.org/stable/43154331.

Kane, Benjamin, Will Gantt, and Aaron Steven White. 2022. “Intensional Gaps: Relating Veridicality, Factivity, Doxasticity, Bouleticity, and Neg-Raising.” Semantics and Linguistic Theory 31 (0): 570–605. https://doi.org/10.3765/salt.v31i0.5137.

Keller, Frank. 2000. “Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality.” PhD thesis, Edinburgh, UK: University of Edinburgh.

Kennedy, Christopher. 2007. “Vagueness and Grammar: The Semantics of Relative and Absolute Gradable Adjectives.” Linguistics and Philosophy 30 (1): 1–45. https://doi.org/10.1007/s10988-006-9008-0.

Lau, Jey Han, Alexander Clark, and Shalom Lappin. 2017. “Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge.” Cognitive Science 41 (5): 1202–41. https://doi.org/10.1111/cogs.12414.

Rooij, Robert van. 2011. “Vagueness and Linguistics.” In Vagueness: A Guide, edited by Giuseppina Ronzitti, 123–70. Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-94-007-0375-9_6.

Schütze, Carson T., and Jon Sprouse. 2014. “Judgment Data.” In Research Methods in Linguistics, edited by Robert J. Podesva and Devyani Sharma, 27–50. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139013734.004.

Sorace, Antonella, and Frank Keller. 2005. “Gradience in Linguistic Data.” Lingua 115 (11): 1497–1524. https://doi.org/10.1016/j.lingua.2004.07.002.

Sorensen, Roy. 2023. “Vagueness.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta and Uri Nodelman, Winter 2023. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/win2023/entries/vagueness/.

Sprouse, Jon. 2007. “Continuous Acceptability, Categorical Grammaticality, and Experimental Syntax.” Biolinguistics 1: 123–34.

———. 2011. “A Validation of Amazon Mechanical Turk for the Collection of Acceptability Judgments in Linguistic Theory.” Behavorial Research 43 (1): 155–67. https://doi.org/10.3758/s13428-010-0039-7.

Sprouse, Jon, and Diogo Almeida. 2013. “The Empirical Status of Data in Syntax: A Reply to Gibson and Fedorenko.” Language and Cognitive Processes 28 (3): 222–28. https://doi.org/10.1080/01690965.2012.703782.

Sprouse, Jon, Carson T. Schütze, and Diogo Almeida. 2013. “A Comparison of Informal and Formal Acceptability Judgments Using a Random Sample from Linguistic Inquiry 2001–2010.” Lingua 134 (September): 219–48. https://doi.org/10.1016/j.lingua.2013.07.002.

Sprouse, Jon, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick. 2018. “Colorless Green Ideas Do Sleep Furiously: Gradient Acceptability and the Nature of the Grammar.” The Linguistic Review 35 (3): 575–99. https://doi.org/10.1515/tlr-2018-0005.

White, Aaron Steven, and Kyle Rawlins. 2018. “The Role of Veridicality and Factivity in Clause Selection.” In NELS 48: Proceedings of the Forty-Eighth Annual Meeting of the North East Linguistic Society, edited by Sherry Hucklebridge and Max Nelson, 48:221–34. University of Iceland: GLSA (Graduate Linguistics Student Association), Department of Linguistics, University of Massachusetts.

Footnotes

In this course, we will focus mainly on gradience in aggregated inference judgments, but there is a deep literature on gradience in acceptability judgments within the experimental syntax literature (Bard, Robertson, and Sorace 1996; Keller 2000; Sorace and Keller 2005; Sprouse 2007, 2011; Featherston 2005, 2007; Gibson and Fedorenko 2010, 2013; Sprouse and Almeida 2013; Sprouse, Schütze, and Almeida 2013; Schütze and Sprouse 2014; Lau, Clark, and Lappin 2017; Sprouse et al. 2018).↩︎