What

Have you ever dabbled with rankings? League tables suffer from some common deficiencies such as e.g. scores assigned on an ad hoc basis, subjective criteria, general sense of arbitrariness, and sometimes as much as disregarding the sources of uncertainty. But what if there was a Bayesian approach that could eliminate, minimize/optimize, or at least quantify all that? Let’s poke around in some data to illustrate one such approach…

Mock Data

I will hold off on divulging too much here upfront. The dataset is mine, and is stitched together using publicly available data. I hone in on the top 15 tech companies in 2019, by revenue, found on Wikipedia. That slims down after parsing them further by whether or not they are U.S.-traded. (Trims it down to just 7.) I then layer on more stuff including:

scores/ranking scraped from a company website
some Yahoo Finance indicators
historical market cap data from YCharts
some corporate governance/ESG indicators from MSCI reports (Columbia Library sources)

There’s some missingness and whatnot so we curate the dataset a little bit before usage…

EDA

Let’s take a look at what we’re dealing with.

(Easy “gg” plots.)

Low PCC. Too many things happening all at once. Let’s simplify…

So, we have that the better the ESG adjusted industry score, the better the overall JUST rank.

Investigating Associations…

Takeaway: I have far too few data points to draw conclusions, which in this case sound counterintuitive. The worse one’s score on social_pillar_score, and privacy_data_sec, the the better their just_overall_rank. One idea would be, provided that the dataset were somewhat bigger, to run some regressions to determine stat. significance regarding some of these correlations, and proceed with the ones that make sense. I have few datapoints to be particular, so why not use all of the ESG indicators?

Okay, based on some EDA, let’s build a model such that: just_rank_overall ~ market_cap + global_emp + ESG_industry_adj_score + env_pillar_score + env_sub + clean_tech + social_pillar_score + human_capital + privacy_data_sec + ex_adj_close

NB If we had real data, deploying ML approaches would be plausible. Given the paucity of large datasets, Bayes is far more sound. (But also generally.)

This is a lot like “the Race problem”, where we’re determining which driver is going to win in a given race based on a bunch of predictors, such as e.g. one’s finish time/fastness in a practice lap. Recall that the logic there was, inter alia, that having a lower Start variable - the starting position of one’s car in a race, itself dependent upon the practice lap performance - increases the probability of winning that race in a number of ways. Predicating a company’s position in a ranking is not an altogether different exercise. Indeed, the Start variable could be the company’s position in the previous year’s/alternatively some other ranking. It could be said to increase that company’s probability to perform well again on account of the following considerations:

It means the company may possess some desirable underlying characteristics and/or factors that led to its past performance
It means that having done well, it has to put in less effort this year relative to its competitors to retain that position, and it possibly has fewer obstacles to surmount and competitors to overtake
There may be some synergy effects compounding with the good performance that, conceivably, may make success easier going forward, economies of scale etc.

That said, there are two theoretical obstacles to overcome. (i) I am playing around with a mock dataset that has a lot of missingness in the sense that the real data would have multiple years, and hence multiple “winners” over the years, and the ranking roster would be complete - neither of which is true with regards to my mock data. This means that the mock set is a redux of what would normally be just the top 15 blue-chip tech companies, only further constrained by whether their stock is U.S.-traded. But even if the roster/ranking was fully populated - which contains more info compared to simply whether some company “XYZ” ranked first or not - (ii) the likelihood for individual ranking would be difficult to arrive at since the ranks are not conditionally independent. Simplifying the problem, we can try a conditional logit (clogit) of who ranks first. Drawing on the wisdom from the Bayes class, we can paraphrase the (ex-ante) probability that the i-th company ranks first as follows…

\[\begin{eqnarray*} Pr(i \mid \beta)=\frac{e^{\eta_i}}{\sum_{j=1}^{N} e^{\eta_j}} \end{eqnarray*}\]

…where N is the number of companies considered, eta_i is a function of beta (~linear predictor). Then the log-likelihood contribution for each ranking is simply the logarithm of the above equation where i is the index of “the best” company, and we sum the log-likelihood contributions for each ranking over all the rankings in a year.

So, what we have is many individual rankings on things like the environment and, say, corporate governance, stakeholders, and the like, in a given year, and we consider their contributions toward the overall ranking.

We shall consider some suitably logistic distribution like the one by a fellow Columbian, Mr. Gumbel, i.e. the Gumbel distribution, with the scale parameter fixed at 1, and the mode at mu.

Inverse CDF

Creating a Gumbelian random numbers’ generator…

# library(rstanarm)
options(mc.cores = parallel::detectCores())
set.seed(1984)

gumbel_rng <- function(n, mu) {
  return(mu + log(-log(runif(n, min = 0, max = 1))))
}

The difference between the top-ranked company’s position within the “league table” and any other company’s position has a logistic distribution (prior to being observed) and therefore a logit model applies to the probability of being ranked number 1. We say that “[t]he difference between realizations from two Gumbel distributions (that may have different mu but the same scale) is distributed logistic with expectation equal to the difference between the expectations of the Gumbel distributions.”

(The benefit of this approach is, among other things, that we end up with some credibility interval or uncertainty about the ranking, rather than saying “XYZ” is unconditionally the best at some parameter(s) “W”. That does not comport with reality despite the idea’s apparently frequent deployment from academia through to corporate metrics.)

Prior Predictive Distribution (PPD)

sub.e <- e[, c("company_name", "ex_adj_close", "market_cap", "global_emp", "just_rank_overall",
               "ESG_industry_adj_score", "env_pillar_score", "env_sub", "clean_tech", "social_pillar_score",
               "human_capital", "privacy_data_sec")]
# sub.e <- na.omit(sub.e)
sub.e <- sub.e %>% group_by(company_name) %>% mutate(clean_tech = impute.mean(clean_tech), # Replace NAs with col means.
                                                     privacy_data_sec = impute.mean(privacy_data_sec))

Now, consider a figurative ranking composed of the measure of the aggregate score of the various chosen variables. That is, ESG_industry_adj_score, env_pillar_score, env_sub, clean_tech, social_pillar_score, human_capital, privacy_data_sec are all accrue a score between 1 and 10. We add those individual scores and rank the highest-scoring company #1, and so on…

scoresSum <- rowSums(sub.e[, c("ESG_industry_adj_score", "env_pillar_score", "env_sub", "clean_tech",
                               "social_pillar_score", "human_capital", "privacy_data_sec")])

sub.e["scoresSum"] <- scoresSum
numRank <- as.numeric(as.factor(sub.e$scoresSum)) # Rank the scores with 1 being the lowest.

predRank <- factor(numRank) # Rank based on the variables' aggregate score that I deem to be "predictive"/relevant.
levels(predRank) <- rev(levels(predRank)) # Reverse order.

sub.e$predRank <- predRank
sub.e$predRank <- as.numeric(as.character(predRank)) # Change back to numeric.

sub.e$predRank_ <- sub.e$predRank - mean(sub.e$predRank) # Centering.

Drawing from the PPD…

…of each company’s ranking on the “predictive” variables’ list…

ranks_ <- replicate(1000, {
  alpha_ <- rnorm(1, mean = 4, sd = 2.16) # Some intercept.
  beta_ <- rnorm(1, mean = 0.10, sd = 1.05) # Some coef. (NB Do these priors make sense?)
  gamma_ <- rnorm(nlevels(sub.e$company_name), mean = 0, sd = 0.10) # Aggregate independent variable.
  mu_ <- alpha_ + beta_ * sub.e$predRank_ + gamma_ # The model.
  rank_ <- gumbel_rng(length(mu_), mu_) # My PPD.
})

# summary(sub.e$predRank)

Given (draws from) a marginal distribution of our predictive factors’ ranking, the predictive distribution of the most highly ranked company in a given year is just how often (within these draws) a company ranks the highest (i.e. lowest number)…

best_ <- apply(ranks_, MARGIN = 2, FUN = which.min)
mean(best_ == 1) # Thus, Microsoft - that ranked #1 in terms of the aggregate score, has about a 23% chance to come on top again under this model.

[1] 0.226

Posterior Distribution

We still have to condition on the actual (observed) data…

post <- stan_clogit(just_best ~ company_name + predRank - 1, data = sub.e, strata = year,
                    prior = normal(location = c(rep(0, n_distinct(sub.e$company_name)), -1),
                                   scale = c(rep(0.1, n_distinct(sub.e$company_name)), .1),
                                   autoscale = FALSE))

Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
argument ignored

We get a warning and no wonder, we have one stratum, with just_rank for only one year, i.e. TRUE for the best company in 2020, but we would like/need multiple iterations of being “the best”. The paucity of our data is an issue here but given it’s a mock set, we will proceed… (NB Real data would be fine here.)

print(post$stanfit, pars = "predRank", include = TRUE, digits = 3)

Inference for Stan model: bernoulli.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean    sd   2.5%    25%    50%    75% 97.5% n_eff Rhat
predRank -1.007   0.001 0.101 -1.203 -1.075 -1.006 -0.939 -0.81  6733    1

Samples were drawn using NUTS(diag_e) at Wed Aug 12 07:15:34 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The coefficients for individual companies would be difficult to estimate given our data but the coefficient on the ranking that stems from the companies’ performance on the chosen variables’ aggregate score is rather negative.

Which is best?

(I.e. most “just”…)

companies <- as.matrix(post) # Has 4000 rows.
companies <- companies[ , -ncol(companies)] # Drops the draws for the coefficient on predRank.
most_just <- apply(companies, MARGIN = 1, FUN = function(x) levels(sub.e$company_name)[which.max(x)])
format(prop.table(sort(table(most_just), decreasing = TRUE)))

most_just
microsoft  facebook     apple       ibm    hp_inc    google     intel 
"0.15575" "0.14575" "0.14425" "0.14025" "0.13875" "0.13850" "0.13675"

With the caveat that the relative differences do appear negligible, Microsoft has the most realizations with the highest company coefficient. That said, the probability that Microsoft is the most “just” company - based on a mock data marshalled herein for the purposes of this simulation, bear in mind - is not huge and only marginally higher than any other individual company.

Going Forward

One approach would be, having run the model once and identified the best performer, to remove #1 from the roster, and proceed with figuring out who the “next best” company is, and so on. This would yield a preliminary ranking. Suppose we had three companies A, B, and C. We would ask ourselves, given that company A is the best, which company is second best? The process could be said to be stochastic, so we would have e.g. that A is best, then B, then C; or, A, C, B; or, B, A, C… We would do permutations, and we might consider some credibility interval within which different results are possible given some XYZ… This, to me, seems better than/preferable to having the respective ranks set in stone like what we can observe with respect to, for instnace, college league tables. Real/more data and further experimentation/research are necessary to develop these concepts.

Some Preliminary Thoughts on Rankings

Andrej Arpáš

8/6/2020