Getting started

We provide the evaluation of extra discrimination introduced in the learning procedure for three cases: 1) only one bi-valued sensitive attribute (sen-att); 2) one multi-valued sen-att; and 3) more than one sen-att. Among them, case 1 comes from [P1], and two others come from [P2].

[P1]

Does machine bring in extra bias in learning? Approximating fairness in models promptly https://arxiv.org/pdf/2405.09251

[P2]

Approximating discrimination within models when faced with several non-binary sensitive attributes https://arxiv.org/pdf/2408.06099

Here is a short tutorial covering all the aforementioned cases and methods (see Examples); Note to check your configuration please before running the example (see Requirements).

Requirements

We developed ApproxBias with Python 3.8, and also tested it with Python 3.11 at the time of release. Remember to choose the requirements.txt accordingly.

 1$ # Install anaconda/miniconda if you didn't
 2$
 3$ # To create a virtual environment
 4$ conda create -n test python=3.8  # or 3.11
 5$ source activate test
 6$
 7$ # To install packages
 8$ pip install --upgrade pip
 9$ pip install -r requirements.txt  # Python 3.8
10$ # pip install -r reqs_dev.txt    # Python 3.11
11$ # python -m pytest
12$
13$ # To delete the virtual environment
14$ source deactivate
15$ conda remove -n test --all

We borrow some auxiliary functions from PyFairness, and to use it, please do the following.

 1$ # Two ways to install (& uninstall) PyFairness
 2$ git clone git@github.com:eustomaqua/PyFairness.git
 3$
 4$ # pip install -r PyFairness/reqs_py311.txt
 5$ pip install -e ./PyFairness
 6$ # pip uninstall pyfair
 7$
 8$ cp -r ./PyFairness/pyfair ./
 9$ # rm -r pyfair
10$ yes | rm -r PyFairness

Examples

You may need to adjust the forms of the data you use as follows.

 1# Load data: X, A, y, f(x)
 2#   X: non-sen-att, shape=(#, #non-sen-att)
 3#   A: sen-att, shape=(#, #sen-att)
 4#   y: label, shape=(#,)
 5#   f(x): prediction, shape=(#,)
 6import numpy as np
 7
 8# param priv_val: indicating the privileged group
 9#   Note that it may vary for different sen-att-s; in that case, modify
10#   `sa_val` accordingly.
11X_nA_y = np.concatenate([y.reshape(-1, 1).astype('float'), X], axis=1)
12sa_val = [set(A[:, i].tolist()) for i in range(A.shape[1])]
13sa_val = [[priv_val]+list(i - set({priv_val})) for i in sa_val]
14sa_idx = [[A[:, i] == k for k in j]  for i, j in enumerate(sa_val)]
15X_nA_fx = np.concatenate([fx.reshape(-1, 1).astype('float'), X], axis=1)
16
17# How to modify `sa_val`, for example, if we have a list of privileged
18# values to indicate their members, that is,
19# param priv_val: a list of priv_vals, shape=(#sen-att,)
20sa_val = [set(A[:, i].tolist()) for i in range(A.shape[1])]
21sa_val = [[j]+list(i - set({j})) for i,j in zip(sa_val, priv_val)]
22sa_idx = [[A[:, i] == k for k in j]  for i, j in enumerate(sa_val)]

Here are examples of three aforementioned cases respectively.

Case 1, bi-valued

 1# Case 1: one bi-valued sen-att, take the k-th sen-att for example
 2
 3from hfm.dist_drt import DirectDist_bin
 4from hfm.hfm_df import bias_degree_bin
 5(D, _), _ = DirectDist_bin(X_nA_y, sa_idx[k][0])
 6(Df, _), _ = DirectDist_bin(X_nA_fx, sa_idx[k][0])
 7df_prev, _ = bias_degree_bin(D, Df)
 8
 9# If you'd like to compute the distances quicker
10from hfm.dist_est_bin import ApproxDist_bin
11# param m1: designated number for repetition
12# param m2: designated number for comparison
13hat_D, _ = ApproxDist_bin(X_nA_y, A[:, k], sa_idx[k][0], m1, m2)
14hat_Df, _ = ApproxDist_bin(X_nA_fx, A[:, k],sa_idx[k][0], m1, m2)
15hat_df_prev, _ = bias_degree_bin(hat_D, hat_Df)

Case 2, multi-valued

 1# Case 2: one multi-valued sen-att, take the k-th sen-att for example
 2
 3from hfm.dist_drt import DirectDist_nonbin
 4from hfm.hfm_df import bias_degree_nonbin
 5D, _ = DirectDist_nonbin(X_nA_y, sa_idx[k])
 6Df, _ = DirectDist_nonbin(X_nA_fx, sa_idx[k])
 7df_max, _ = bias_degree_nonbin(D[0], Df[0])
 8df_avg, _ = bias_degree_nonbin(D[1], Df[1])
 9
10# If you'd like to compute the distances quicker
11from hfm.dist_est_nonbin import ApproxDist_nonbin
12hat_D, _ = ApproxDist_nonbin(X_nA_y, A[:, k], m1, m2)
13hat_Df, _ = ApproxDist_nonbin(X_nA_fx, A[:, k], m1, m2)
14# compute hat_Df, hat_df_{max, avg} analogously
15hat_df_max, _ = bias_degree_nonbin(hat_D[0], hat_Df[0])
16hat_df_avg, _ = bias_degree_nonbin(hat_D[1], hat_Df[1])

Case 3, more than one

 1# Case 3: more than one sen-att
 2
 3from hfm.dist_drt import DirectDist_multiver
 4D = DirectDist_multiver(X_nA_y, sa_idx)[0][:-1]
 5Df = DirectDist_multiver(X_nA_fx, sa_idx)[0][:-1]
 6df_max, _ = bias_degree_nonbin(D[0], Df[0])
 7df_avg, _ = bias_degree_nonbin(D[1], Df[1])
 8
 9# If you'd like to compute the distances quicker
10from hfm.dist_est_nonbin import ExtendDist_multiver_mp
11hat_D = ExtendDist_multiver_mp(X_nA_y, A, m1, m2)[0][:-1]
12hat_Df = ExtendDist_multiver_mp(X_nA_fx, A, m1, m2)[0][:-1]
13# compute hat_Df, hat_df_{max, avg} analogously
14hat_df_max, _ = bias_degree_nonbin(hat_D[0], hat_Df[0])
15hat_df_avg, _ = bias_degree_nonbin(hat_D[1], hat_Df[1])

You’re welcome to adjust the parameters (except priv_val, which depends on the data you use) as needed or to explore potential improvements. Please note that this version may contain typos or errors; If you find any, feel free to contact us or raise an issue.

Hint

To observe the consumed time of each operation, just use the _ that we omitted earlier.

For example,

 1# Case 1, bi-valued
 2(D, _), tim_elapsed = DirectDist_bin(X_nA_y, sa_idx[k][0])
 3df_prev, tim_elapsed = bias_degree_bin(D, Df)
 4hat_D, tim_consumed = ApproxDist_bin(X_nA_y, A[:, k], sa_idx[k][0], m1, m2)
 5
 6# Case 2, multi-valued
 7D, tim_elapsed = DirectDist_nonbin(X_nA_y, sa_idx[k])
 8df_max, tim_elapsed = bias_degree_nonbin(D[0], Df[0])
 9hat_D, tim_consumed = ApproxDist_nonbin(X_nA_y, A[:, k], m1, m2)
10
11# Case 3, more than one
12D, tim_elapsed = DirectDist_multiver(X_nA_y, sa_idx)
13D = D[:-1]
14df_max, tim_elapsed = bias_degree_nonbin(D[0], Df[0])
15hat_D, tim_consumed = ExtendDist_multiver_mp(X_nA_y, A, m1, m2)
16hat_D = hat_D[:-1]

To understand these distances and HFM in more detail, see methodology.