A forthcoming report by the Algorithmic Justice League (AJL), a private nonprofit, recommends requiring disclosure when an AI model is used and creating a public repository of incidents where AI caused harm. The repository could help auditors spot potential problems with algorithms, and help regulators investigate or fine repeat offenders. AJL cofounder Joy Buolamwini coauthored an influential 2018 audit that found facial-recognition algorithms work best on white men and worst on women with dark skin.
The report says it’s crucial that auditors be independent and results be publicly reviewable. Without those safeguards, “there’s no accountability mechanism at all,” says AJL head of research Sasha Costanza-Chock. “If they want to, they can just bury it; if a problem is found, there’s no guarantee that it’s addressed. It’s toothless, it’s secretive, and the auditors have no leverage.”
Deb Raji is a fellow at the AJL who evaluates audits, and she participated in the 2018 audit of facial-recognition algorithms. She cautions that Big Tech companies appear to be taking a more adversarial approach to outside auditors, sometimes threatening lawsuits based on privacy or anti-hacking grounds. In August, Facebook prevented NYU academics from monitoring political ad spending and thwarted efforts by a German researcher to investigate the Instagram algorithm.
Raji calls for creating an audit oversight board within a federal agency to do things like enforce standards or mediate disputes between auditors and companies. Such a board could be fashioned after the Financial Accounting Standards Board or the Food and Drug Administration’s standards for evaluating medical devices.
Standards for audits and auditors are important because growing calls to regulate AI have led to the creation of a number of auditing startups, some by critics of AI, and others that might be more favorable to the companies they are auditing. In 2019, a coalition of AI researchers from 30 organizations recommended outside audits and regulation that creates a marketplace for auditors as part of building AI that people trust with verifiable results.
Cathy O’Neil started a company, O’Neil Risk Consulting & Algorithmic Auditing (Orcaa), in part to assess AI that’s invisible or inaccessible to the public. For example, Orcaa works with the attorneys general of four US states to evaluate financial or consumer product algorithms. But O’Neil says she loses potential customers because companies want to maintain plausible deniability and don’t want to know if or how their AI harms people.
Earlier this year Orcaa performed an audit of an algorithm used by HireVue to analyze people’s faces during job interviews. A press release by the company claimed the audit found no accuracy or bias issues, but the audit made no attempt to assess the system’s code, training data, or performance for different groups of people. Critics said HireVue’s characterization of the audit was misleading and disingenuous. Shortly before the release of the audit, HireVue said it would stop using the AI in video job interviews.
O’Neil thinks audits can be useful, but she says in some respects it’s too early to take the approach prescribed by the AJL, in part because there are no standards for audits and we don’t fully understand the ways in which AI harms people. Instead, O’Neil favors another approach: algorithmic impact assessments.
While an audit may evaluate the output of an AI model to see if, for example, it treats men differently than women, an impact assessment may focus more on how an algorithm was designed, who could be harmed, and who’s responsible if things go wrong. In Canada, businesses must assess the risk to individuals and communities of deploying an algorithm; in the US, assessments are being developed to decide when AI is low- or high-risk and to quantify how much people trust AI.
The idea of measuring impact and potential harm began in the 1970s with the National Environmental Protection Act, which led to the creation of environmental impact statements. Those reports take into account factors from pollution to the potential discovery of ancient artifacts; similarly impact assessments for algorithms would consider a broad range of factors.