# AI in Society: 2. Algorithmic Fairness

Data is to machine learning as fuel is to cars. None of the AI superpowers can be gained without abundant data. But it is not only the quantity of the data that matters. Quality matters, and it matters a lot especially if the task can impact human rights. If a model is trained on a biased data, AI can be accurate, but unfair.

# A popular criminal risk assessment AI is racially biased.

AI is already used in the criminal justice system. COMPAS 1 is a crime risk assessment algorithm that is most widely used in the US. Developed by a private company Equivant, COMPAS has assessed the “risk scores” of more than 1 million criminals. The algorithm uses more than 100 features of each criminal, and predicts how likely they will commit crimes again and accordingly, the types of supervision needed 2.

The very unfortunate truth is that COMPAS is racially biased. It has been shown that the software is twice more likely to falsely predict black defendants to commit crimes again than white defendants 3. This is a fact. Now the question is why it’s happening and how this can be prevented.

Equivant only makes the COMPAS prediction results public and does not release the actual model used. So it is hard to see why. But there are two reasons we can speculate. One is that the data fed into the system was biased. The data is of course generated by past decisions made by the human judges. They are by no means unbiased. People discriminate all the time, both consciously and subconsciously. If COMPAS is modeled to “mimick” human judges, they will learn to discriminate just as us humans do. The second reason that stems from this is that the model didn’t take this fact into account very well. Learning a fair classifier from an unbiased data is a very hard thing to do (as we shall see later in the post). It is plausible that Equivant is aware of this unfairness, but had not been able to cope with it.

# Algorithmic Fairness definitions

COMPAS motivates us to rigorously study how we can enforce algorithmic fairness. Just like we define accuracy and losses mathematically, we also need to define fairness quantitatively. Otherwise, a model would not know what to optimize for. There are many fairness definitions proposed, and there’s not yet a standard definition everyone uses. It is even debatable whether there should be one. In this section, I will survey some of the proposed fairness criteria.

# Fairness Through Unawareness

If we don’t want to discriminate based on race, can’t we just remove that from the set of features? This is the idea behind Fairness Through Unawareness (FTU). Essentially, FTU claims that a model is fair if it’s not trained using sensitive attributes. FTU is a good starting point, but it is quite easy to see that this definition is too naïve. For example, what if there was a feature about the zip code of where the criminals lived? Typically, zip code is highly associated with the race of the residents. Hence, even if we remove race from the set of features, zip code can “signal” the race of the defendants.

# Group Fairness

Another notion of fairness which is widely known, but is recently believed to be “not enough” is group fairness. For example, one notion of group fairness is statistical parity, which can be formulated as $P(x \in S \vert \text{outcome}=o) = P(x \in S)$. $S$ can be a set of all people who are black and the outcome can be that people are hired for a job. Then, this equation states that the probability of black people in the group of those who got hired, is the same as the probability of black people in the general population. More generally, statistical parity holds when the demographics of the selected group (e.g. people who got hired) is the same as the demographics of the population. Similar definitions can be defined based on equal false positive rates, false negative rates, false discovery rates, etc.

There are two subtle drawbacks to this approach. One is that even if statistical parity is satisfied, social welfare might not be maximized. For example, if you are a university, trying to hire most talented students from group $S$ and $S^C$. If people in $S$ tend to value tech jobs as more prestige, and those in $S^C$ tend to value finance as more prestige, it can be the case that statistical parity is satisfied, but the university chose the wrong set of talents from e.g. group $S^C$ by hiring a bunch of tech people within it. Another drawback is that even if statistical parity is satisfied for $S$, this doesn’t mean it is satisfied for a subset of $S$. 4

A recent result that is perhaps striking is that no classifier can ensure multiple reasonable fairness criteria at the same time. These are false positive rates (FPR), false negative rates (FNR) and positive predictive value (PPV). This is because $FPR=\frac{p}{1-p}\frac{1-PPV}{PPV}(1-FNR)$ always holds, and thus there are always tradeoffs among these three criteria. In the case of COMPAS, it has been studied that PPV is well satisfied, but FPR and FNR are not. It is not possible for a model to be fair in all any respect. 5

# Individual Fairness

A more fine grained definition of fairness is individual fairness, which essentially states that “people who are similar should be treated similarly”. Individual fairness can be enforced as a constraint to linear program. Once similarities are defined using distance metrics, individual fairness can be defined with a Lipschitz constraint 4. Under some conditions, individual fairness can be shown to imply group fairness, and thus it is a more general approach.

# Counterfactual Fairness

The final line of work on fairness is pretty distinct from the other ones. All of the above definition relied on association rather than causation. Counterfactual fairness borrows tools from causal inference to reason fairness. The most classic counterfactual fairness definition 6 is: $P(\hat{Y}_{A \leftarrow a}(U)=y \vert X=x,A=a)=P(\hat{Y}_{A \leftarrow a'}(U)=y \vert X=x,A=a)$

This $\hat{Y}_{A \leftarrow a}$ means an “intervention” to change the value of $A$ to $a$. Intutively, this is saying that a model is fair if the predictions are the same under the situation that the sensitive attribute is changed but everything else is held constant.

Let’s look at an example of how counterfactual fairness captures a notion not captured by the above definitins. Think about a car insurance company pricing insurance based on accident rate prediction of a person. Following conditions have been observed:

• People who drive more aggressively tend to have red cars more often.
• Black people tend to prefer red cars more often than people with other races.
• But, race does not affect aggressiveness of the drivers. What happens in this scenario? We can use the red car feature to predict accident rate (and this will certainly be effective), but this can potentially discriminate against the blacks! Counterfactual fairness definition takes into account these concerns. There are recent improvements made to counterfactual fairness, so take a look at them if intereted 7.

# Conclusion

There are many definitions of fairness out there, and we don’t yet have an agreement on which one to use. Regardless, I hope these definitions and results convinced you that algorithmic fairness is not only philosophically subtle, but also technically subtle. Unlike other fields in AI, fields like fairness and interpretability are very hard to have cohesive arguments, and impress everyone by beating some benchmarks. This can make some researchers shy away, but I think it’s at least worth knowing about.

Updated: