Long description
An artificial neural network dreams up new faces whilst it’s training on a well-known dataset of thousands of celebrities. Every face seen here is fictional, imagined by the neural network based on what it’s seeing and learning. But what is it learning?
The politics of this dataset, who’s in it, who isn’t in it, how it’s collected, how it’s used and the consequences of this, is in itself a crucial topic, but not the subject of this work.
When I first started seeing these kinds of results, I was fascinated by how everything was so blurry and smoothed out. Indeed the dataset may not be representative of the diversity of the wider population. However, the learning algorithm itself provides a further layer of homogenisation. In these images produced by the neural network, whatever diversity was present in the dataset is mostly lost. All variety in shape, detail, texture, individuality and blemishes are all erased. They are smoothed out with the most common attributes dominating the results.
The network is learning an idealised sense of hyper-real beauty, a race of ‘perfect’, homogeneous specimens.
This is not a behaviour that is explicitly programmed in, it is an inherent property of the learning algorithm used to train the neural network. And while there are recent algorithms which specialise in generating more photo-realistic images (especially faces), the algorithm used here is one of the most widespread algorithms used in Machine Learning and Statistical Inference — Maximum Likelihood Estimation (MLE).
That means — quite intuitively speaking — given a set of observations (i.e. data points), out of all possible hypotheses, find the hypothesis that has the maximum likelihood of giving rise to those observations. Or in fewer words: given some data, find the hypothesis which is most likely to have produced that data.
It makes a lot of sense right?
But it has some shortcomings.
Imagine we find a coin on the street. And we’d like to know whether it’s a fair coin, or weighted (i.e. ‘biased’) towards one side, possibly favouring heads or tails. We have no way of determining this by physically examining the coin itself. So instead, we decide to conduct an experiment. We flip the coin ten times. And let’s say we get 7 heads and 3 tails.
A maximum likelihood approach would lead one to conclude that the most likely hypothesis that can give rise to these observations, is that the coin is biased in favour of heads. In fact, such a ‘frequentist’ might conclude that the coin is biased 7:3 in favour of heads (with a 26.7% probability of throwing 7 heads). An extreme frequentist might even commit to that hypothesis, unable to consider the possibility that actually there are many other alternative hypotheses, one of which — though less likely — might actually be the correct one. E.g. it’s very possible that actually the coin is indeed a fair coin, and simply by chance we threw 7 heads (for a fair coin, the probability of this is 11.7%, not negligible at all). Or maybe, the coin is biased only 6:4 in favour of heads (probability of throwing 7 heads is then 21.5%, still very likely). In fact the coin might have any kind of bias, even in favour of tails. In this case it is very unlikely that we would have thrown 7 heads, but it’s not impossible.
Some of you might be critical of this example, citing that 10 throws is far too small a sample size to infer the fairness of the coin. “Throw it 100 times, or 1000 times” you might say. Heck, let’s throw it a million times.
If we were to get 700,000 heads out of 1 million throws, then can we be sure that the coin is biased 7:3 in favour of heads?
For practical reasons, we might be inclined to assume so. We have indeed quite radically increased our confidence in that particular hypothesis. But again, that does not eradicate the possibility that another hypothesis might actually be the correct one.
Evidence is not proof. It is merely evidence. I.e. a set of observations that increase or decrease the likelihood of — and our confidence in — some hypotheses relative to others.
MLE is unable to deal with uncertainty. It’s unable to deal with the possibility that a less likely, a less common hypothesis might actually be valid.
MLE is binary. It has no room for alternative hypotheses. Instead, the hypothesis with the highest apparent likelihood is assumed to be unequivocally true.
MLE commits to this dominant truth, and everything else is irrelevant, incorrect and ignored. Any variations, outliers, or blemishes are erased; they’re blurred and bent to conform to this one dominant truth, this binary world view.
So are there alternatives?
At the opposite end of the spectrum, one could maintain a distribution of beliefs over all possible hypotheses, with varying levels of confidence (or uncertainty) assigned to each hypothesis. These levels of confidence might range from “I’m really very certain of this” (e.g. the sun will rise again tomorrow from the east) to “I’m really very certain this is wrong” (e.g. I will let go of this book and it will stay floating in the air) — and everywhere in between. Applying this form of thinking to the coin example, we would not assume absolute truth in any single hypothesis regarding how the coin is biased. Instead, we would calculate likelihoods for every possible scenario, based on our observations. If need be, we can also incorporate a prior belief into this distribution. E.g. since most coins are not biased, we’d be inclined to assume that this coin too is not biased, but nevertheless, as we make observations and gather new evidence, we will update this distribution, and adjust our confidences and uncertainties for every possible value of coin bias.
And most critically, when making decisions or predictions, we don’t act assuming a single hypothesis to be true, but instead we consider every possible outcome for every possible hypothesis of coin bias, weighted by the likelihood of each hypothesis. (Those interested in this line of thinking can look up Bayesian logic or statistics).
But, as I’ve mentioned in my other works and texts, my main interest is not necessarily these algorithms themselves. In a real world situation, no fool would settle on a maximum likelihood solution with only 10 samples (one would hope!).
My main interest is in using machines that learn as a reflection on ourselves, and how we navigate our world, how we learn and ‘understand’, and ultimately how we make decisions and take actions.
I’m concerned we’re losing the ability to safely consider and parse multiple views within our social groups, let alone within our own minds. I’m concerned we’re losing the ability to recognise and navigate the complexities of situations which may require acknowledging — or even understanding — the existence of lines of reasoning that may lead to different views than our own; maybe even radically different, opposing views. I’m concerned that ignoring these opposing lines of reasoning, and pretending that they don’t exist, might be more damaging in the long run; compared to acknowledging their existence and trying to identify at what point(s) our lines of reasoning are diverging, and how, and why, and trying to tackle these more specific issues at point; whether they are due to what we consider to be incorrect premises, flawed logic, differing priorities or desires.
I’m concerned we’re becoming more and more polarised and divided on so many different points on so many different topics. Views, opinions and discourse in general seems to be becoming more binary, with no room to consider multiple or opposing views. I’m concerned we want to ignore the messy complexities of our ugly, entangled world; to erase the blemishes; and commit unequivocally to what seems (to us) to be the most likely truth — the one absolute truth, in a simple, black and white world of unquestionable rights and wrongs.
Or maybe that’s just my over-simplified view of it.