One point that I think is under-discussed in the AI bias area: While it is true ...

slg · on May 13, 2022

>With a human, you can’t really do an A/B test to determine if they would have prioritized a candidate if they hadn’t included some signal; it’s really easy to rationalize away discrimination at the margins.

Which is part of the reason that discrimination doesn't have to be intentional for it to be punishable. This is a concept known as "disparate impact". The Supreme Court has issued decisions[1] that a policy which negatively impacts a protected class and has no justifiable business related reason for existing can be deemed discrimination regardless of the motivations behind that policy.

[1] - https://en.wikipedia.org/wiki/Griggs_v._Duke_Power_Co.

TimPC · on May 13, 2022

Justifiable business reason is still a strong bar. For example, with no evidence in either direction for a claim there is no justifiable business reason even if the claim is somewhat intuitive. So if you want to require high-school diplomas because you think people who have them will do the job better you better track that data for years and be prepared to demonstrate it if sued. If you want to use IQ tests because you anticipate smarter people will do the job better you better have IQ tests done on your previous employee population demonstrating the correlation before imposing the requirement.

cmeacham98 · on May 13, 2022

EDIT: my parent edited and replaced their entire comment, it originally said "you can't use IQ tests even if you prove they lead to better job performance". I leave my original comment below for posterity:

This is not true, IQ tests in the mentioned Griggs v. Duke Power Co. (and similar cases) were rejected as disparate impact specifically because the company provided no evidence they lead to better performance. To quote the majority opinion of Griggs:

> On the record before us, neither the high school completion requirement nor the general intelligence test is shown to bear a demonstrable relationship to successful performance of the jobs for which it was used. Both were adopted, as the Court of Appeals noted, without meaningful study of their relationship to job performance ability.

tgsovlerkhgsel · on May 13, 2022

Wouldn't that be trivial if you have your training data set?

darawk · on May 13, 2022

He didn't say anything about intention, though. He just talked about the counterfactual. Disparate impact is about the counterfactual scenario.

slg · on May 13, 2022

They said "it’s really easy to rationalize away discrimination at the margins." My reply was pointing out that there is little legal protection in rationalizing away discrimination at the margins because tests for disparate impact require the approach to also stand up holistically which can't easily be rationalized away.

theptip · on May 13, 2022

I think perhaps you are looking at a different part of the funnel; disparate impact seems to be around the sort of requirements you are allowed to put in a job description. Like “must have a college degree”.

However the sort of insidious discrimination at the margin I was imagining are things like “equally-good resumes (meets all requirements), but one had a female/stereotypically-black name”. Interpreting resumes is not a science and humans apply judgement to pick which ones feel good, which leaves a lot of room for hidden bias to creep in.

My point was that I think algorithmic processes are more testable for these sorts of bias; do you feel that existing disparate impact regulations are good at catching/preventing this kind of thing? (I’m aware of some large-scale research on name-bias on resumes but it seems hard to do in the context of a single company.)

slg · on May 13, 2022

>disparate impact seems to be around the sort of requirements you are allowed to put in a job description.

That is a common example, but it is much broader than what goes on a job ad. For example, I have heard occasional rumblings about how whiteboard interviews are a hiring practice that would not stand up to these laws (IANAL).

>My point was that I think algorithmic processes are more testable for these sorts of bias

Yes, this is true, but that doesn't really matter. If there is consistent discrimination happening at the margins, that will be evident holistically. If that is evident holistically and there is no justification for it, that is all we need. We don't need to run resumes through an algorithm to show that discrimination is happening at an individual level. We just need to show that a policy negatively impacts a protected group and that the policy is not related to job performance.

>do you feel that existing disparate impact regulations are good at catching/preventing this kind of thing?

I think the bigger problem than the regulations is that there is an inherent bias against these type of cases actually being pursued. First, it is difficult to identify this as an individual so people don't know when it is happening. Additionally, people fear the retribution that would come from pursuing this legally. People don't want to be viewed as a pariah by future employers so they often will simply move on even if their accusations are valid.

darawk · on May 13, 2022

Yes, but a holistic test requires a realistic counterfactual. That's the problem. There is no way to evaluate that counterfactual for a human interviewer.

It is true that extreme bias/discrimination will be evident, but smaller bias/discrimination, particularly in an environment where the pool is small (say, black women for engineering roles) is extremely hard to prove for a human interviewer. Your sample size is just going to be too small. On the other hand, if you have an ML algorithm, you can feed it arbitrary amounts of synthetic data, and get precise loadings on protected attributes.

Ferrotin · on May 13, 2022

Everything has a disparate impact, so now everything is illegal.

pc86 · on May 13, 2022

Might I suggest: https://www.merriam-webster.com/dictionary/term%20of%20art

Ferrotin · on May 13, 2022

No, you’re wrong here, it’s not a term of art.

pc86 · on May 13, 2022

It sure looks like it is.

> Disparate impact in United States labor law refers to ...

https://en.wikipedia.org/wiki/Disparate_impact

Ferrotin · on May 13, 2022

Your link is evidence for my side. It uses the plain definition. The plain meaning of the words.

kelseyfrog · on May 13, 2022

If you ever intent to study law, become involved in a situation dealing with disparate impact, or are at the receiving end of disparate impact, knowing the legal definition may be helpful too. The DoJ spells[1] out the legal definition of disparate impact as so:

    ELEMENTS TO ESTABLISH ADVERSE DISPARATE IMPACT UNDER TITLE VI

    Identify the specific policy or practice at issue; see Section C.3.a.
    Establish adversity/harm; see Section C.3.b.
    Establish disparity; see Section C.3.c.
    Establish causation; see Section C.3.d.

1. https://www.justice.gov/crt/fcs/T6Manual7#D

pc86 · on May 13, 2022

My point is that by the plain meaning of words you're right, disparate impact means any two groups impacted differently, regardless of anything else. In law, it means that an employment, housing, etc. policy has a disproportionately adverse impact on members of a protected class compared to non-members of that same class. It's much more specific and narrowly defined.

thaumasiotes · on May 14, 2022

That is in fact not more narrow; Ferrotin's original claim that "everything has a disparate impact" is correct with "disparate impact" so defined.

MontyCarloHall · on May 13, 2022

I agree that discrimination would be a lot easier to objectively prove after the fact, but it also would be far easier to occur in the first place, since many hiring managers would blindly "trust the AI" without a second thought.

fshbbdssbbgdd · on May 13, 2022

From my experience working on projects where we trained models, usually it’s obviously completely broken the first attempt and requires a lot of iteration to get to a decent state. “Trust the AI” is not a phrase anyone involved would utter. It’s more like: trust that it is wrong for any edge case we didn’t discover yet. Can we constrain the possibility space any more?

pc86 · on May 13, 2022

Most hiring managers wouldn't make it to the end of the phrase "constrain the possibility space"

MichaelBurge · on May 13, 2022

"Trust the AI" could mean uploading a resume to a website and getting a "candidate score" from somebody else's model.

Because I'll tell you, there's millions of landlords and they blindly trust FICO when screening candidates. Maybe not as the only signal, but they do trust it without testing it for edge cases.

theptip · on May 13, 2022

Definitely could be so, particularly in these early days where frameworks and best-practices are very immature. Inasmuch as you think this is likely, I suspect you should favor regulation of algorithmic processes instead of voluntary industry best-practices.

TimPC · on May 13, 2022

There is a very real danger of models being biased in a way that doesn't show up when you apply these crude hacks to inputs. It seems to me we have to be much more deliberate, much more analytical, and much more thorough in testing models if we want to substantially reduce or even eliminate discrimination.

Yes, you can A/B test the model if you can design reasonable experiments. You still don't have the general discrimination test because you have to define what a reasonable input distribution and what reasonable outputs are.

If an employer is looking to hire an engineer with a CS degree from a top-tier university, and they use an AI model to evaluate resumes and it returns a number of successes on black people very similar to the population distribution of graduates from those programs is the model discriminatory?

There are still hard problems here because any natural baseline you use for a model may in fact be wrong and designing a reasonable distribution of input data is almost impossibly hard as well.

theptip · on May 13, 2022

Yes, in practice it’s actually way more complex than I gestured at. The Google bias toolkit I linked does discuss in much more detail, but I am not a data scientist and haven’t used it; I’d be interested in expert opinions. (They also have some very good non-technical articles discussing the general problems of defining “fairness” in the first place.)

TimPC · on May 13, 2022

I don’t think it’s adequate to attempt to prevent discrimination. Discrimination is core to our fundamental human rights. It’s necessary to succeed at preventing discrimination.

“We applied best practices in the field to limit discrimination” should not be an adequate legal defence if the model can be shown to discriminate.

To clarify further, just because you tried to prevent discrimination doesn’t mean you should be off the hook for the material harms of discrimination to a specific individual. Otherwise people don’t have a right to be protected against discrimination they only have a right to people ‘trying’ to prevent discrimination. We shouldn’t want to weaken rights that much even if it means we have to be cautious in how we adopt new technologies.

Manuel_D · on May 13, 2022

> With a human, you can’t really do an A/B test to determine if they would have prioritized a candidate if they hadn’t included some signal; it’s really easy to rationalize away discrimination at the margins.

Not for individual candidates, no. But you can introduce a parallel anonymized interview process and compare the results.

TimPC · on May 13, 2022

Actually you kind of can't. You don't have a legal basis for forcing the company to run that experiment.

indymike · on May 13, 2022

The problem with AI is that when it does make discriminatory decisions on hiring, is that it does so systematically and mechanically. Incidentally, systematic and discrimination are two words you never want to see consecutively on a letter from the EEOC or OFCCP.

TimPC · on May 13, 2022

The reason you never want to see those words together is that isolated discrimination may result in a single lawsuit but systemic discrimination is a basis for class action.

mjburgess · on May 13, 2022

It's under-discussed as with any discussion of an empirical study of ML systems, ie., treating them as targets of analysis.

As soon as you do this, they're revealed to exploit only statistical coincidences and highly fragile heuristics embedded within the data provided. And likewise, pretty universally discriminatory when human data is inovlved