Updating a probabilistic context set

Table of Contents

1. Overview

This short set of notes shows how we may begin to use our probabilistic DSL, along with its associated interpretation scheme, to describe common grounds as probability distributions over sets of possible worlds, and sentence interpretations as updates to such sets.

2. Context sets as probability distributions

How might that be done? Let's say we have a constant

data Constant:: Type) where
  ...
  WorldKnowledge :: P I
  ...

representing a probabilistic program that returns possible worlds. Such a constant can represent a starting common ground; that is, a probability distribution over the set of possible worlds determined by our general world knowledge—knowledge of which sets of facts are probable.

For example, we might want to consider four possible worlds, where each one settles whether or not Julian sleeps, as well as whether or not Carina sleeps. Then to WorldKnowledge, we might assign the meaning

interpCon _ WorldKnowedge = worldKnowledge
  where worldKnowledge :: ProbProg [FOL.Form]
        worldKnowledge = do f1 <- categorical [0.5, 0.5] [id, FOL.Not]
                            f2 <- categorical [0.5, 0.5] [id, FOL.Not]
                            return [ f1 (sleep (FOL.N (FOL.Name 0)))
                                   , f2 (sleep (FOL.N (FOL.Name 1))) ]
        categorical :: [Double] -> [a] -> ProbProg a
        categorical ws vs = PP (\f -> sum (zipWith (*) ws (map f vs)))

Here, worldKnowledge returns any of these four possible worlds with equal probability—it thus provides a kind of uniform distribution. Note the use in its definition of a function categorical, which takes a list of numbers, along with a list of values of the same length, in order to give back a probabilistic program that returns the \(n\)th value with probability proportional to the \(n\)th number. For example,

categorical [1, 2, 3] ['a', 'b', 'c']

is a probabilistic program of type ProbProg Char which returns 'a' with probability \(1/(1 + 2 + 3) = 1/6\), 'b' with probability \(1/3\) and 'c' with probability \(1/2\). categorical is defined using zipWith, which allows a weight to be paired up with each value that might be returned.

3. Updating context sets with English sentences

What we would like to do now is to determine a way of interpreting sentences of English as true or false at a given possible world, and to use such an interpretation scheme to update a context set—which are encoding as a probability distribution over possible worlds—given a sentence. One way we can do this is by encoding an interpretation function directly in our λ-calculus; that is, a map from sentences to sets of possible worlds.

To set this up, we should introduce a new atomic type U for utterances to our λ-calculus, so that our full set of types is

data Type = E | T | I | U | R
          | Type :-> Type
          | Type :/\ Type
          | Unit
          | P Type

The type family Domain should then be extended, in order to handle utterances. These can be interpreted as Haskell values of type Expr S (i.e., we may represent an utterance using our Haskell encoding of English sentences).

type family Domain:: Type) where
  Domain E = FOL.Term
  Domain T = FOL.Form
  Domain I = [FOL.Form]
  Domain U = Expr S
  Domain R = Double
  Domain:-> ψ) = Domain φ -> Domain ψ
  Domain:/\ ψ) = (Domain φ, Domain ψ)
  Domain Unit = ()
  Domain (P φ) = ProbProg (Domain φ)

Finally, we should add a couple of new constants to our λ-calculus which will allow us to both represent and interpret English utterances.

data Constant:: Type) where
  ...
  ToUtt :: Expr S -> Constant U
  Interp :: Constant (U :-> (I :-> T))
  ...

ToUtt allows us to represent an English expression (encoded in our applicative categorial grammar fragment) in the λ-calculus, as term of type U, while Interp allows us to take an utterance onto what is effectively a set of worlds—that is, a function of type I :-> T. We may in turn interpret these constants by extending our definition of interpCon.

interpCon _ (ToUtt u) = u
interpCon _ Interp = \φ i -> if entails 11
                            (englishToFOL φ : i)
                            false
                            then false
                            else true

Recall from here that englishToFOL is defined as

englishToFOL :: Expr S -> FOL.Form
englishToFOL = interpClosedTerm . lower . interpExpr

Thus for an English sentence to be true at some possible world is for its translation into first-order logic to be logically compatible with the formulae which are true at that possible world.

We are now able to encode discourses involving English sentences (woohoo!)? Take the following one, for instance. \[\begin{array}{l}w ∼ worldKnowledge \\ observe(⟦\textit{someone sleeps}⟧^w) \\ return(w)\end{array}\] We can encode this discourse as a probabilistic program in our λ-calculus as follows:

example :: Term γ (P I)
example = Let (Con WorldKnowledge) (Let (App observe (App (App (Con Interp) (Con (ToUtt someoneSleeps))) (Var First))) (Return (Var (Next First))))

Recall that observe is defined by applying Factor to the indicator function.

observe :: Term γ (T :-> P Unit)
observe = Lam (App (Con Factor) (App (Con Indi) (Var First)))

Indeed, if we interpret this new context set and check whether or not Carina sleeps in the result,

runTest :: ProbProg Bool
runTest = do i <- interpClosedTerm example
             return (entails 11 i (sleep (FOL.N (FOL.Name 0))))

we get a program in Haskell from which we may extract a probability. To do this, we may define a function

p :: ProbProg Bool -> Double
p m = expVal m indi

which takes the expected value of the indicator function; that is, it computes a probability from a program of type ProbProg Bool as the proportion of the mass it assigns to either True or False that it assigns to True. Now we can take the probability of runTest!

>>> p runTest
0.6666666666666666

Hm, why do you think this is the result?

Author: Julian Grove

Created: 2023-12-13 Wed 18:36

Validate