Composed stochastic proof : joint distribution monad as dependent type (part 3 of 4)

composed stochastic proofs - motivation

I posted some background links and concepts on Reddit r/DepdendentTypes
Dependent Types for Giry probability monads, e.g. Kantorovich / Wasserstein (+ followup comment)

In this post, we begin tackling the kinds of distributions needed for our
Asset Allocation Example.

joint distribution monad as dependent type

We define our strong typed stochastic computing structure in these 3 stages

1) Mathematically, in informal human-readable equations and type descriptions
     (formally encoded and exposed as visible API in 3 below, using infra from 2)

2) As functional abstractions, using the kantorovich monad formulation
     (formally encoded as submerged infrastructure, used by 3 below)

3) As a formal dependent-typed API usable from suitably strong-typed environments such as

  • Lean, F*
  • Idris
  • Agda, Coq, Isabelle, HOL, MMT
  • Scala 3 (Dotty)

Restating the WHAT:  The software types defined formally in 3 are refinements of the informal types from 1, relying on an infrastructure of types and proofs for the kantorovich monad from 2.  
Restating the HOW:    Given a proof environment 3 containing sufficient elements from 2, then the API types 1 may be implemented faithfully, using dependent type engineering principles.

Our workhorse example is a robust platform-neutral distribution monad, for a finite-dimensional vector space of possible events.   

  • One outcome/sample is one vector. 
  • The distribution is presumed to be Lebesque integrable.  (This presumption is enforced by our monad types).
  • Primarily we are interested in cases with bounded support (where the coordinates of the sample vector are bounded).

Users see this distribution as a robust joint distribution over the dimensions of the event space.   
A common example uses events in a convex subspace of euclidean R^N == The ordinary N
dimensional real vector space, with euclidean norm, which is a (Banach) complete metric space. 

Then one sample X is one N-vector of real components {X_n} == (X_1, X_2, ... X_N), where each of the X_n is an in-bounds real number.  In a simple example, the bounds are separable as N pairs of constraint inequalities:
   {B_n_MIN <= X_n <=  B_n_MAX}    

The probability of an outcome X is governed by D_X, the distribution of X, which to a user is a
joint probability distribution over the components {X_n}.   

D_X is usable both analytically and numerically:

  • D_X is composable using monadic laws, to produce conditionals, marginals, expectations, moments
  • D_X may be used as a source of samples, for visualization and monte carlo methods
  • D_X may be projected into a reduced resolution sample space.
    • For example, a continuous distribution over the X_n may be projected into a coarse discrete distribution, which is still a joint distribution.
      • Imagine a histogram with M bins for each X_n :  {B_n_m}
      • Each of the bin-sets {B_n_*} is selected independently from the others
      • Total bin-level parameters required:  N * M
      • Reduced sample space is just Z_M ^ N, i.e. an N vector of integers between 1 and M.  
      • Reduced event space has M ^ N elements.   
      • Each of these possible events may be assigned a discrete probability, from either a computable function or a literal dataset (e.g. using frequentist statistics).  
      • Each discrete probability may be reliably assigned as a rational number, enabling proof that the distribution function is properly normalized ; i.e. it sums to 1.
      • This finite description of a finite distribution over R^N is computationally tractable using integer methods.  
        • Demonstrate for small euclidean event-space
        • Demonstrate for larger spaces and different event-space norms
      • However, this finite description has fewer analytic guarantees than the original continuous distribution.
      • M may be increased / decreased to trade resolution for computational workload.
    • Both the continuous and discrete distributions are composable monads.

This flexible applicability of D_X allows us to design and perform useful, sharable
experiments with minimal attention (initially) to platform considerations.   
D_X is a verifiable stochastic model, not a platform-dependent piece of implementation code.
When a particular distribution is proven useful, then it may be promoted into use in high-scale computing environments (potentially with our proof system compiled into the runtime).  

Domain types based on probability monads is the fourth and final article in this series, addressing some aspects of implementation, for cases with low event-space dimensions.

Resources for understanding the Kantorovich monad are listed and discussed in this Reddit thread:

More reference pages:


Popular posts from this blog

Streaming analytics in Scala using ZIO and Apache DataSketches (part 1)

Overview of AxLam model : Nouns, Types, Morphisms

AxioMagic under Spark on cloud (AWS, Azure, ...)