Methodology

We introduced two fairness measures to deal with three cases: 1) when facing one bi-valued sensitive attribute (sen-att); 2) when facing one multi-valued sen-att; and 3) when facing several sensitive attributes (sen-att-s). The proposed fairness measures are built upon the Euclidean Hausdorff distance of sets, to describe the distance between manifolds.

Note

Notations: Given a dataset $S=\{(\mathbf{x}_i,\mathbf{a}_i,y_i)\mid i=1,2,...,n\}$ composed of i.i.d. instances including sen-att-s, we denote one instance by $(\mathbf{x,a})=[x_1,...,x_{n_d},a_1,...,a_{n_a}]^\mathsf{T}$ for clarity.

Therein, $n_a\geqslant 1$ is the number of sensitive/protected attributes and $n_d$ is that of unprotected attributes in the instance. Note that $a_i=1$ means the corresponding instance is a member from the privileged group, and $i\in[n]$ is used to represent $i\in\{1,2,...,n\}$ for brevity. The instances in $S$ are i.i.d. (independent and identically distributed); the task is binary or multi-class classification.

Distance between sets

For one bi-valued sen-att $\mathbf{a}=[a_i]^\mathsf{T}$ and $a_i\in\mathcal{A}_i=\{0,1\}$,

The dataset $S$ can be divided into two subsets $S_1=\{(\mathbf{x},a_i,y)\in S\mid a_i=1\}$ and $\bar{S}_1=S\setminus S_1=\{(\mathbf{x},a_i,y)\in S\mid a_i\neq 1\}$.

Given a specific distance metric $\mathbf{d}(\cdot,\cdot)$ (e.g. the standard Euclidean metric) on the feature space, the distance between these two subsets (that is, $S_1$ and $\bar{S}_1$) is

\[\begin{split}\begin{align} \mathbf{D}_\cdot(S_1,\bar{S}_1) \triangleq \max\Big\{ & \max_{(\mathbf{x,a},y)\in S_1} \overbrace{ \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) }^\text{to find the nearest point in $\bar{S}_1$} ,\\ & \max_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \min_{(\mathbf{x,a},y)\in S_1} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) \Big\} \end{align}\end{split}\]

Notice that it becomes $\mathbf{D}(S_1,\bar{S}_1)$ using $y$, and $\mathbf{D}_f(S_1,\bar{S}_1)$ when using $\hat{y}$ for classifiers.

For one multi-valued sen-att $a_i\in\mathcal{A}_i=\{1,2,...,n_{a_i}\}$, where $n_{a_i}\geqslant 3$

The dataset $S$ can be divided into a few disjoint sets according to the value of this attribute $a_i$, that is, $S_j=\{(\mathbf{x},a_i,y)\in S\mid a_i=j\} ,\forall j\in\mathcal{A}_i \,$.

The distance above can be extended to: i) maximal distance measure for one sen-att, and ii) average distance measure for one sen-att

\[\begin{split}\begin{align} \mathbf{D}_{\cdot,\mathbf{a}}(S,a_i) &= \max_{1\leqslant j\leqslant n_{a_i}}\Big\{ \max_{(\mathbf{x,a},y)\in S_j} \overbrace{ \min_{(\mathbf{x}',\mathbf{a}',y')\in \bar{S}_j} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) }^\text{to find the nearest point in $\bar{S}_j$} \Big\} \\ \mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i) &= \frac{1}{n} \sum_{j=1}^{n_{a_i}} \sum_{(\mathbf{x,a},y)\in S_j} \underbrace{ \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_j} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) }_\text{to find the nearest point in $\bar{S}_j$} \end{align}\end{split}\]

Notice that $\bar{S}_j=S\setminus S_j$, and $\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)= \mathbf{D}_\cdot(S_1,\bar{S}_1)$ when $\mathcal{A}_i=\{0,1\}$.

For more than one sen-att $\mathbf{a}=[a_1,...,a_{n_a}]^\mathsf{T}$, where $n_a\geqslant 2$ and $n_{a_i}\geqslant 2$

The generalised distance measures include: i) maximal distance measure for sen-att-s, and ii) average distance measure for sen-att-s

\[\begin{split}\begin{align} \mathbf{D}_{\cdot,\mathbf{a}}(S) &= \max_{1\leqslant i\leqslant n_a} \mathbf{D}_{\cdot,\mathbf{a}}(S,a_i) \\ \mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S) &= \frac{1}{n_a} \sum_{i=1}^{n_a} \mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i) \end{align}\end{split}\]

Model fairness assessment

For one bi-valued sen-att [1], we remark $\mathbf{D}(S_1,\bar{S}_1)$ reflects the biases from the data and $\mathbf{D}_f(S_1,\bar{S}_1)$ reflects the biases from the algorithm. Then the following value could be used to reflect the fairness degree of this classifier, that is,

\[\mathbf{df}_\text{prev}(f)= \frac{ \mathbf{D}_f(S_1,\bar{S}_1) }{ \mathbf{D}(S_1,\bar{S}_1) }-1\]

For multi-valued sen-att-s [2], we remark that $\mathbf{D}_{\mathbf{a}}(S), \mathbf{D}_{\mathbf{a}}^\text{avg}(S)$ reflect the biases from the data and $\mathbf{D}_{f,\mathbf{a}}(S), \mathbf{D}_{f,\mathbf{a}}^\text{avg}(S)$ reflect the biases in the learning algorithm. Then the following value could be used to reflect the fairness degree of this classifier, that is,

\[\begin{split}\begin{align} \mathbf{df}(f) &= \log\left( \frac{ \mathbf{D}_{f,\mathbf{a}}(S) }{ \mathbf{D}_{\mathbf{a}}(S) } \right) \\ \mathbf{df}^\text{avg}(f) &= \log\left( \frac{ \mathbf{D}_{f,\mathbf{a}}^\text{avg}(S) }{ \mathbf{D}_{\mathbf{a}}^\text{avg}(S) } \right) \end{align}\end{split}\]

Up to now, we got a harmonic fairness measure via manifolds (HFM), with three optional versions (that is, previous, maximal, and average HFM).

Approximation of distances

As the direct computation of the distances above is rather heavy, we’d like to estimate them quickly to facilitate the bias evaluation.

Hint

In Euclidean spaces, the distance between similar data points tends to be closer than others after projecting them onto a general one-dimensional linear subspace.

For multi-valued sen-att-s [2],

Algorithm 3. ExtendDist calling algo2, to estimate $\mathbf{D}_{\cdot,\mathbf{a}}(S)$ and $\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S)$
Algorithm 2. ApproxDist calling algo1, to estimate $\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)$ and $\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)$
Algorithm 1. AcceleDist, to estimate $\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)$ and $n\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)$

For one bi-valued sen-att [1],

Algorithm 4. Simplified [3] ApproxDist calling algo1, to estimate $\mathbf{D}_{\cdot}(S_1,\bar{S}_1)$ or $\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)$
Algorithm 1. AcceleDist, to estimate $\mathbf{D}_{\cdot}(S_1,\bar{S}_1)$ or $\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)$

In this way, we reduce the high computational complexity of the Hausdorff distance’s direct computation from $\mathcal{O}(n^2)$ to $\mathcal{O}(n\log n)$.