Explainable models : Asset Allocation example (part 01)
Here we build on the approach outlined in the last 3 blog posts:
- General Background in knowledge graphs and mathematical type theory
- Informal Survey of open satisfaction solvers and provers
- Infrastructure Options for typed knowledge graphs
Now we outline a practical modeling approach in an example knowledge domain:
Asset Portfolios and the Allocation Decision Problem
We develop a flexible, minimalist theory for describing collections of assets, and estimated
knowledge about the assets and collections. These descriptions may be aligned with FIBO
ontology concepts, for commercial integration purposes. Here we are concerned only with
knowledge about the assets and collections. These descriptions may be aligned with FIBO
ontology concepts, for commercial integration purposes. Here we are concerned only with
abstract mathematical description sufficient to analyze portfolio allocation decisions,
in the context of available knowledge about the constituent assets, and given investor objectives.
in the context of available knowledge about the constituent assets, and given investor objectives.
Here is a first simple problem, to motivate our thinking:
Given a list of 10 available equity ticker symbols, assuming we may buy any fraction of a share
with perfect liquidity, then how should we choose to weight these assets in a given account?
A constructive answer is a list of 10 fractional numbers that sum to 1, which is an allocation.
This allocation answer represents our decision, made in light of available information.
with perfect liquidity, then how should we choose to weight these assets in a given account?
A constructive answer is a list of 10 fractional numbers that sum to 1, which is an allocation.
This allocation answer represents our decision, made in light of available information.
This specific allocation decision does not itself provide an explanation of how it was made.
Let us try adding some assumptions about how the decision may be made, in an attempt
to build in explain-ability of allocation decisions, without loss of generality in the model
assumptions.
to build in explain-ability of allocation decisions, without loss of generality in the model
assumptions.
Assume the decision is a deterministic function of certain inputs.
Assume the set of relevant assets is known and constant - this is input #0.
Assume that allocation is decided based on these 5 further input data items.
1) Horizon: A finite planning (or backtesting) time horizon,
such as 3 months or 18 months, beginning and ending on specific dates and times.
such as 3 months or 18 months, beginning and ending on specific dates and times.
2) Scenarios: A finite set of few ( ~10 ~< 100 ) distinct possible macro-outcome scenarios,
which subdivide the total outcome space. These scenarios are typically summed away during
estimation. This explicit model subdivision offers pragmatic description leverage to the
inquiry and explanation process, by injecting model sub-containers of complexity.
which subdivide the total outcome space. These scenarios are typically summed away during
estimation. This explicit model subdivision offers pragmatic description leverage to the
inquiry and explanation process, by injecting model sub-containers of complexity.
The identification of these scenarios (and corresponding assigned probabilities, #3)
is at the discretion of the modeler; they do not have to be good or accurate.
is at the discretion of the modeler; they do not have to be good or accurate.
This set of scenarios is a sample space for our non-financial model of major world
events over the given time horizon #1. In general they should describe independent
potential world outcomes, which is a subtle topic that we will return to below.
Within these "major outcome scenarios", we expect a detailed asset return model to apply -
these are #4 below. The activation of one instance of #4 joint PD is determined by nature (or
a simulation process of ours) choosing one of our outcome scenarios.
events over the given time horizon #1. In general they should describe independent
potential world outcomes, which is a subtle topic that we will return to below.
Within these "major outcome scenarios", we expect a detailed asset return model to apply -
these are #4 below. The activation of one instance of #4 joint PD is determined by nature (or
a simulation process of ours) choosing one of our outcome scenarios.
Selection (by nature) of one such scenario is then equivalent to outcome of a single
integer-valued random variable. Independence of scenarios is discussed at length later.
integer-valued random variable. Independence of scenarios is discussed at length later.
3) Likelihoods: Discrete probability distribution of likelihoods of our scenarios #3,
which must sum to 1. (If scenarios are not independent, further
specification is needed.).
which must sum to 1. (If scenarios are not independent, further
specification is needed.).
4) Asset Outcomes as joint distribution, per scenario:
For each scenario, a joint probability distribution (PD) for a minimal tuple of random variables
summarizing asset returns and risk measures. These measures may distributed and
correlated in distinctly non-gaussian ways.
For example for each given asset, in a given scenario, our outcome indicator RVs
may often include one return statistic and one risk statistic.
For each scenario, a joint probability distribution (PD) for a minimal tuple of random variables
summarizing asset returns and risk measures. These measures may distributed and
correlated in distinctly non-gaussian ways.
For example for each given asset, in a given scenario, our outcome indicator RVs
may often include one return statistic and one risk statistic.
- Return measure:
4.a.1 - total arithmetic return at horizon (which maps easily to geometric rate of return)
- Risk measure, one of:
4.b.1 - non-gaussian: lowest drawdown from initial (not from peak).
OR, messier bridge to familiar gaussian statistics using brownian assumptions
4.b.2 - Daily uncorrelated Wiener volatility of asset value = sqrt(dailyVariance)
Best understood by multiplying by annualization factor sqrt(252))
We consider this 4.b.2. model useful for some informal human comparative
purposes, but less useful for automated decision making.
purposes, but less useful for automated decision making.
Then each Joint PD (for a given macro-outcome scenario)
has dimension of 2 * numAssets
((rtrnMeas01, riskMeas01), (rtrnMeas02, riskMeas02), ...)
All correlation of these outcome statistics (across all assets-0, per scenario-2, per horizon-1)
is expressed through this single multivariate joint PD 4.
is expressed through this single multivariate joint PD 4.
5) Investor Preferences for risk v. reward. We will make some more specific descriptions, later.
Then our task of explaining our allocation decision may be factored using these inputs.
We note that inputs 0, 1 & 5 are likely set directly by an end user, whereas 2, 3, & 4 involve estimates computed from some models and datasets, where user involvement may be light or heavy. 2 and 3 are relatively trivial from a calculation and explanation perspective, although they inject hazards of bias and correlation. (The main issue is how we think about independence of scenarios, discussed later).
Element 4 involves the greatest potential complexity, as these individual scenario-outcome PDs may
each (in principle) involve an arbitrarily complex model of asset return and risk correlation.
We note that inputs 0, 1 & 5 are likely set directly by an end user, whereas 2, 3, & 4 involve estimates computed from some models and datasets, where user involvement may be light or heavy. 2 and 3 are relatively trivial from a calculation and explanation perspective, although they inject hazards of bias and correlation. (The main issue is how we think about independence of scenarios, discussed later).
Element 4 involves the greatest potential complexity, as these individual scenario-outcome PDs may
each (in principle) involve an arbitrarily complex model of asset return and risk correlation.
Fortunately the simplicity of 2 and 3 allows us to make tidy weighted sums over the 4s, which in turn allows for strong explanatory power of the combined model. For example, we may derive output
statements such as:
"In 74% of modeled scenarios, Ticker A has the highest expected annualized return rate"
statements such as:
"In 74% of modeled scenarios, Ticker A has the highest expected annualized return rate"
"In 68% of modeled scenarios, Ticker B has an expected annualized volatility of under 14%"
"In 62% of modeled scenarios, Ticker A has an expected arithmetic return rate that is more than 1.73 times the arithmetic return rate of Ticker B, and gaussian volatility of less than 1.2 times the gaussian volatility of Ticker A"
scenGrp.coverage = 0.62
&& ( A.arithRtrn_expct > 1.73 B.arithRtrn_expct ) && ( A.gaussVol < 1.2 B.gaussVol)
These explanations may be useful even when they are not on the direct path of allocation decision.
However, note that we have said nothing yet about how to choose them, or how to search for
interesting statements. We must also hasten to point out that when using such partitioned model
However, note that we have said nothing yet about how to choose them, or how to search for
interesting statements. We must also hasten to point out that when using such partitioned model
forms, it is important to consider carefully the meaning of the partitioning into scenarios.
What do we know about the 38% of outcomes not included in scenGrp.coverage above, in terms of
both independence, and the scenarios required to "complete the coverage"?
Here we also face common concerns about how we have biased our calculations through
What do we know about the 38% of outcomes not included in scenGrp.coverage above, in terms of
both independence, and the scenarios required to "complete the coverage"?
Here we also face common concerns about how we have biased our calculations through
partitioning with potentially hazy assumptions about independence, which we will return to later.
We note that elements 1 thru 4 may be easily condensed into a single joint PD for the asset returns over the time horizon, in the process discarding the explanatory power of our scenarios. The choice of whether and when to do this consolidation depends on how the interactive application is serving the user, so this is when we look harder at our immediate goals and priority requirements.
Regardless, our allocation decision boils down to choosing the best (according to #5, which we have not discussed yet) asset weights for the given model of asset returns described by elements 1-4, which are in the form of either a single consolidated PD or a likelihood-weighted sum of independent
scenario PDs.
scenario PDs.
Within this broad framework, how does our representation of asset-return joint PDs affect our ability to explain the outcome of the allocation decision?
Continue on to part 02 of this series. We enumerate some useful approaches to describing these joint
probability distributions.
probability distributions.