.. methodology.rst Methodology ================ .. This work is about how to evaluate the discrimination level of ML models from a manifold perspective, that is, *if the instances (with the same sensitive attribute values) are viewed as data points on certain manifolds, the manifold(s) representing members from the marginalized/unprivileged group(s) are supposed to be as close as possible to that representing members from the privileged group*. We introduced two fairness measures to deal with three cases: 1) when facing one bi-valued sensitive attribute (sen-att); 2) when facing one multi-valued sen-att; and 3) when facing several sensitive attributes (sen-att-s). The proposed fairness measures are built upon the Euclidean Hausdorff distance of sets, to describe the distance between manifolds. .. They, introduced in mathematics. .. https://web.stanford.edu/class/cs273/scribing/2004/class8/scribe8.pdf .. https://cgm.cs.mcgill.ca/~godfried/teaching/cg-projects/98/normand/main.html .. A dataset is denoted by :math:`S` where the .. drawn from an feature-label space :math:`\mathcal{X\times Y}` based on an unknown distribution. .. that is, finite :math:`\mathcal{Y}=\{1,2,...,n_c\} (n_c\geqslant 2)`, where :math:`n_c` is the number of labels. .. The feature/ .. , depending on the number of labels :math:`n_c`. .. note:: **Notations:** Given a dataset :math:`S=\{(\mathbf{x}_i,\mathbf{a}_i,y_i)\mid i=1,2,...,n\}` composed of i.i.d. instances including sen-att-s, we denote one instance by :math:`(\mathbf{x,a})=[x_1,...,x_{n_d},a_1,...,a_{n_a}]^\mathsf{T}` for clarity. Therein, :math:`n_a\geqslant 1` is the number of sensitive/protected attributes and :math:`n_d` is that of unprotected attributes in the instance. Note that :math:`a_i=1` means the corresponding instance is a member from the privileged group, and :math:`i\in[n]` is used to represent :math:`i\in\{1,2,...,n\}` for brevity. The instances in :math:`S` are i.i.d. (independent and identically distributed); the task is binary or multi-class classification. .. [#footnote]_ .. . [#footnote] (i.i.d.) independent and identically distributed .. Therein, .. We use :math:`a_i=1` to denote that the corresponding instance is a member from the privileged group. .. Note that :math:`i\in[n]` is used to represent :math:`i\in\{1,2,...,n\}` for brevity. Distance between sets ^^^^^^^^^^^^^^^^^^^^^ 1. For one bi-valued sen-att :math:`\mathbf{a}=[a_i]^\mathsf{T}` and :math:`a_i\in\mathcal{A}_i=\{0,1\}`, - The dataset :math:`S` can be divided into two subsets :math:`S_1=\{(\mathbf{x},a_i,y)\in S\mid a_i=1\}` and :math:`\bar{S}_1=S\setminus S_1=\{(\mathbf{x},a_i,y)\in S\mid a_i\neq 1\}`. - Given a specific distance metric :math:`\mathbf{d}(\cdot,\cdot)` (e.g. the standard Euclidean metric) on the feature space, the distance between these two subsets (that is, :math:`S_1` and :math:`\bar{S}_1`) is .. math:: \begin{align} \mathbf{D}_\cdot(S_1,\bar{S}_1) \triangleq \max\Big\{ & \max_{(\mathbf{x,a},y)\in S_1} \overbrace{ \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) }^\text{to find the nearest point in $\bar{S}_1$} ,\\ & \max_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \min_{(\mathbf{x,a},y)\in S_1} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) \Big\} \end{align} - Notice that it becomes :math:`\mathbf{D}(S_1,\bar{S}_1)` using :math:`y`, and :math:`\mathbf{D}_f(S_1,\bar{S}_1)` when using :math:`\hat{y}` for classifiers. .. - Given a specific distance metric :math:`\mathbf{d}(\cdot,\cdot)` (e.g. the standard Euclidean metric) on the feature space, the distance between these two subsets (that is, :math:`S_1` and :math:`\bar{S}_1`) is .. .. .. math: : .. \begin{align} .. \mathbf{D}(S_1,\bar{S}_1)= \max\Big\{ .. & \max_{(\mathbf{x,a},y)\in S_1} \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \mathbf{d}\big( (\mathbf{x},y),(\mathbf{x}',y') \big) ,\\ .. & \max_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \min_{(\mathbf{x,a},y)\in S_1} \mathbf{d}\big( (\mathbf{x},y),(\mathbf{x}',y') \big) .. \Big\} \,. .. \end{align} .. .. - For a trained classifier :math:`f(\cdot)` where :math:`\hat{y}=f(\mathbf{x,a})`, .. .. .. math: : .. \begin{align} .. \mathbf{D}_f(S_1,\bar{S}_1)= \max\Big\{ .. & \max_{(\mathbf{x,a},y)\in S_1} \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \mathbf{d}\big( (\mathbf{x},\hat{y}),(\mathbf{x}',\hat{y}') \big) ,\\ .. & \max_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \min_{(\mathbf{x,a},y)\in S_1} \mathbf{d}\big( (\mathbf{x},\hat{y}),(\mathbf{x}',\hat{y}') \big) .. \Big\} .. \end{align} .. .. - By recording the true label :math:`y` and the prediction :math:`\hat{y}` as one denotation (say :math:`\ddot{y}`) for simplification, we can rewrite them as .. .. .. math: : .. \begin{align} .. \mathbf{D}_{\cdot}(S_1,\bar{S}_1)= \max\Big\{ .. & \max_{(\mathbf{x,a},y)\in S_1} \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \mathbf{d}\big( (\mathbf{x},\ddot{y}),(\mathbf{x}',\ddot{y}') \big) ,\\ .. & \max_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_1} \min_{(\mathbf{x,a},y)\in S_1} \mathbf{d}\big( (\mathbf{x},\ddot{y}),(\mathbf{x}',\ddot{y}') \big) .. \Big\} .. \end{align} 2. For one multi-valued sen-att :math:`a_i\in\mathcal{A}_i=\{1,2,...,n_{a_i}\}`, where :math:`n_{a_i}\geqslant 3` - The dataset :math:`S` can be divided into a few disjoint sets according to the value of this attribute :math:`a_i`, that is, :math:`S_j=\{(\mathbf{x},a_i,y)\in S\mid a_i=j\} ,\forall j\in\mathcal{A}_i \,`. - The distance above can be extended to: i) *maximal distance measure for one sen-att*, and ii) *average distance measure for one sen-att* .. math:: \begin{align} \mathbf{D}_{\cdot,\mathbf{a}}(S,a_i) &= \max_{1\leqslant j\leqslant n_{a_i}}\Big\{ \max_{(\mathbf{x,a},y)\in S_j} \overbrace{ \min_{(\mathbf{x}',\mathbf{a}',y')\in \bar{S}_j} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) }^\text{to find the nearest point in $\bar{S}_j$} \Big\} \\ \mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i) &= \frac{1}{n} \sum_{j=1}^{n_{a_i}} \sum_{(\mathbf{x,a},y)\in S_j} \underbrace{ \min_{(\mathbf{x}',\mathbf{a}',y')\in\bar{S}_j} \mathbf{d}\big( (\mathbf{x},\ddot{y}), (\mathbf{x}',\ddot{y}') \big) }_\text{to find the nearest point in $\bar{S}_j$} \end{align} - Notice that :math:`\bar{S}_j=S\setminus S_j`, and :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)= \mathbf{D}_\cdot(S_1,\bar{S}_1)` when :math:`\mathcal{A}_i=\{0,1\}`. 3. For more than one sen-att :math:`\mathbf{a}=[a_1,...,a_{n_a}]^\mathsf{T}`, where :math:`n_a\geqslant 2` and :math:`n_{a_i}\geqslant 2` - The generalised distance measures include: i) *maximal distance measure for sen-att-s*, and ii) *average distance measure for sen-att-s* .. math:: \begin{align} \mathbf{D}_{\cdot,\mathbf{a}}(S) &= \max_{1\leqslant i\leqslant n_a} \mathbf{D}_{\cdot,\mathbf{a}}(S,a_i) \\ \mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S) &= \frac{1}{n_a} \sum_{i=1}^{n_a} \mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i) \end{align} Model fairness assessment ^^^^^^^^^^^^^^^^^^^^^^^^^ For one bi-valued sen-att [#P1]_, we remark :math:`\mathbf{D}(S_1,\bar{S}_1)` reflects the biases from the data and :math:`\mathbf{D}_f(S_1,\bar{S}_1)` reflects the biases from the algorithm. Then the following value could be used to reflect the fairness degree of this classifier, that is, .. math:: \mathbf{df}_\text{prev}(f)= \frac{ \mathbf{D}_f(S_1,\bar{S}_1) }{ \mathbf{D}(S_1,\bar{S}_1) }-1 For multi-valued sen-att-s [#P2]_, we remark that :math:`\mathbf{D}_{\mathbf{a}}(S), \mathbf{D}_{\mathbf{a}}^\text{avg}(S)` reflect the biases from the data and :math:`\mathbf{D}_{f,\mathbf{a}}(S), \mathbf{D}_{f,\mathbf{a}}^\text{avg}(S)` reflect the biases in the learning algorithm. Then the following value could be used to reflect the fairness degree of this classifier, that is, .. math:: \begin{align} \mathbf{df}(f) &= \log\left( \frac{ \mathbf{D}_{f,\mathbf{a}}(S) }{ \mathbf{D}_{\mathbf{a}}(S) } \right) \\ \mathbf{df}^\text{avg}(f) &= \log\left( \frac{ \mathbf{D}_{f,\mathbf{a}}^\text{avg}(S) }{ \mathbf{D}_{\mathbf{a}}^\text{avg}(S) } \right) \end{align} Up to now, we got a *harmonic fairness measure via manifolds (HFM)*, with three optional versions (that is, previous, maximal, and average HFM). .. [#P1] Does machine bring in extra bias in learning? Approximating fairness in models promptly https://arxiv.org/pdf/2405.09251 .. [#P2] Approximating discrimination within models when faced with several non-binary sensitive attributes https://arxiv.org/pdf/2408.06099 .. [#P3] Does machine bring in extra bias in learning? Approximating discrimination within models quickly. In *NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning (M3L)*, Vancouver, Dec 2024. (Non-archival `poster `_, `OpenReview `_) .. Quick approximation of distances .. -------------------------------- Approximation of distances ^^^^^^^^^^^^^^^^^^^^^^^^^^ As the direct computation of the distances above is rather heavy, we'd like to estimate them quickly to facilitate the bias evaluation. .. To do that, We use .. **Approximation of distances between sets for Euclidean spaces** .. hint:: In **Euclidean spaces**, the distance between similar data points tends to be closer than others after projecting them onto a general one-dimensional linear subspace. .. To estimate the distance between data points inside :math:`\mathcal{X\times A\times Y}`, we use a random projection :math:`g: \mathcal{X\times A\times Y} \mapsto\mathbb{R}` and a non-zero random vector :math:`\mathbf{w}=[w_0,w_1,...,w_{n_d}]^\mathsf{T}`, that is, .. .. .. math: : .. g(\mathbf{x,a},\ddot{y};\mathbf{w})= [\ddot{y},x_1,...,x_{n_d}]^\mathsf{T} \mathbf{w} .. .. After sorting all the projected data points on :math:`\mathbb{R}`, it is likely that for one :math:`(\mathbf{x,a},y)` in :math:`S_j`, the desired instance :math:`\arg\min_{(\mathbf{x}',\mathbf{a}',y') \in\bar{S}_j} \mathbf{d}\big( (\mathbf{x},\ddot{y}),(\mathbf{x}',\ddot{y}') \big)` would be somewhere near it after the projection, and vice versa. Thus, searching for it could be *accelerated* by checking several adjacent instances rather than traversing the whole dataset. For multi-valued sen-att-s [#P2]_, .. - *Algorithm 3. ExtendDist* (to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S)`) .. - *Algorithm 2. ApproxDist* (to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)`) .. - *Algorithm 1. AcceleDist* (to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)`) - *Algorithm 3. ExtendDist* calling algo2, to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S)` - *Algorithm 2. ApproxDist* calling algo1, to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)` - *Algorithm 1. AcceleDist*, to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` and :math:`n\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)` .. return max{d_max^(j) | j=1,..,n_a}, sum(d_avg^(j))/n_a .. d_max^(j), d_avg^(j) = ApproxDist({(mathbf_x,a)}, {ddot_y}; m1,m2) .. .. code-block: : python .. :linenos: .. .. # Algorithm 3. ExtendDist .. for j from 1 to n_a .. d_max^(j), d_avg^(j) = ApproxDist({(x,a)}, {ddot_y}; m1,m2) .. d_max = [d_max^(1), ..., d_max^(n_a)] .. d_avg = [d_avg^(1), ..., d_avg^(n_a)] .. return max(d_max), sum(d_avg)/n_a .. .. # Algorithm 2. ApproxDist .. for j from 1 to m1 .. take two orthogonal vectors w0 and w1 .. for k from 0 to 1 .. get t_max^k, t_avg^k = AcceleDist({(x,a)}, {ddot_y}, wk; m2) .. d_max^j = min(t_max^0, t_max^1) .. d_avg^j = min(t_avg^0, t_avg^1) .. d_max = [d_max^1, ..., d_max^(m1)] .. d_avg = [d_avg^1, ..., d_avg^(m1)] .. # n is the number of instances in the dataset .. return min(d_max), min(d_avg)/n .. .. # Algorithm 1. AcceleDist .. project data points onto a 1-dim space and obtain {g(x,a,ddot_y;w)} .. sort original data points using g(.,.;w) in ascending order .. for i from 1 to n .. set the anchor data point .. return For one bi-valued sen-att [#P1]_, .. - *Algorithm 4. Simplified ApproxDist* (to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)`) .. - *Algorithm 1. AcceleDist* (to estimate :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` and :math:`\mathbf{D}_{\cdot,\mathbf{a}}^\text{avg}(S,a_i)`) - *Algorithm 4. Simplified* [#P3]_ *ApproxDist* calling algo1, to estimate :math:`\mathbf{D}_{\cdot}(S_1,\bar{S}_1)` or :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` - *Algorithm 1. AcceleDist*, to estimate :math:`\mathbf{D}_{\cdot}(S_1,\bar{S}_1)` or :math:`\mathbf{D}_{\cdot,\mathbf{a}}(S,a_i)` .. In this way, we reduce the high computational complexity :math:`\mathcal{O}(n^2)` of direct computation to :math:`\mathcal{O}(n\log n)`. In this way, we reduce the high computational complexity of the Hausdorff distance's direct computation from :math:`\mathcal{O}(n^2)` to :math:`\mathcal{O}(n\log n)`.