Data Is What We Make of It—For Good or Ill

Catherine D’Ignazio, J97, shows how data can expose inequity—or contribute to it

A few years ago, if you used an online program to translate the phrase “She won a Nobel Prize” from English to Hungarian, which has non-gendered pronouns, and then back to English, the result would have been “He won a Nobel Prize.” Why? Because of an automatic presumption in the program that if a pronoun were needed in English, it should be masculine.

But if you do that now with, say, Google Translate, the phrase comes back simply “won the Nobel Prize.” Granted, it’s awkward—but somebody’s learned a lesson.

That’s one problem with all the data flowing around us—implicit biases in how it is collected and used—but some people are trying to fix that.

One of those people is Catherine D’Ignazio, J97. She’s an assistant professor of urban science and planning at MIT, and co-author of the recent book Data Feminism, with Lauren F. Klein. It’s a deep dive into the world of data and how it intersects with feminism, bias, and justice.

D’Ignazio is not just trying to fix biases, but is actively working to harness data to work toward equity. She and colleagues are running a project that helps nonprofits trying to combat femicide in the Americas, and she has led reproductive justice hackathons and designed global news recommendation systems. Her work extends to art, too—on February 11, she’s giving the plenary lecture at the Art Datathon hosted at Tufts.

Definitions come in handy here. “Data is information that is systematically collected and tabulated,” says D’Ignazio. Just one fact is not data, until it’s been assembled along with other observations of a similar type.

Feminism, she says, encompasses three things. “First, it’s the belief in equality for all genders. Second, it also implies a political commitment, because if you believe all genders are equal, you’re committing to taking action to realize that belief. Third, it is an intellectual heritage, meaning learning from all the amazing feminist scholarship and action over time.”

Data can be problematic in many ways. One example: facial recognition systems. Four years ago, Joy Buolamwini, an MIT colleague of D’Ignazio, used facial recognition software to see how it worked on herself, a Black woman—and the system failed to even detect her face. Then she put on a white theater mask, which instantly was recognized. “For women with darker skin in particular, there were about 35% error rates,” says D’Ignazio.

How did this happen? It turns out the data set used by the application to learn human faces was “heavily pale and male,” she says. Even before creating the algorithms that seem biased, someone made a decision about which data to collect in the first place—and which not to collect. “Even though data and algorithms are not human things, they are products of these collective human decisions and reflect our biases as a society,” she says.

Dialogue in 2,000 screenplays broken down by gender. At left, 100% of the words by males; at right, 100% are spoken by females. Image: Matt Daniels for The PuddingDialogue in 2,000 screenplays broken down by gender. At left, 100% of the words by males; at right, 100% are spoken by females. Image: Matt Daniels for The Pudding
In response, several companies offered a much more varied data set to train the facial recognition software. But that’s not enough, D’Ignazio says. “It’s also thinking about who these technologies serve—who are they being developed for?” She points out that Black populations are among the most heavily surveilled in the United States; it’s not clear that a better data set for facial recognition will in fact be beneficial to Black women. 

“Are these outcomes ultimately in the service of justice and rebalancing inequality, or are they disproportionately going to be used to harm?” she asks. “You can have a beautifully representative data set that then is ultimately deployed in very harmful ways to communities.”

Investigating the Algorithms

It’s this complexity that is especially important to understand, D’Ignazio says, especially in areas like artificial intelligence. AI is permeating many aspects of our lives now, she says, from social work to medicine to law. “One of our principles is to challenge power,” she says. One way to do that is to audit algorithms that form the core of AI programs.

She applauds the journalists and computational social science researchers who are “at the forefront of this work to audit algorithms in the public interest.” They can’t get into the proprietary software’s source code, but can carefully judge an AI program by its results, and bring its flaws to light.

“If data is power, then how do we democratize that power and put that power in the hands of a more dispersed set of actors, rather than only the very large elite corporations and well-resourced governments?” —Catherine D’Ignazio

For example, a few years ago reporters from ProPublica discovered that an artificial intelligence program that many judges rely on to determine sentencing based on potential recidivism rates is deeply flawed. It is presented as free from bias, but in fact is deeply biased against Black defendants. For example, the program recommended much more lenient terms for violent white offenders than for non-violent Black offenders, and the reporters found the whites, when released, were far more likely to end up back in prison than the Blacks.

Investigating AI is about “pulling out the technical innards and placing them in the public square and saying, we have to have this conversation together,” D’Ignazio says. “It can’t only be big tech deciding. It also can’t only be big government deciding. We all need to get together and have a public process about this.”

Data can be powerful in highlighting inequities, too. That’s where data visualization plays a key role, helping us understand information in a deeper way than written descriptions could achieve.

In Data Feminism, D’Ignazio and Klein include a striking graphic that depicts how much of 2,000 screenplays’ dialogue is delivered by male speakers and how much by female speakers. It shows  how men dominate onscreen conversations—and seeing it displayed visually is much more effective than reading about it.

“We have this perceptual and cognitive strength, which is that we can discern a lot of different varied information with our eyes,” D’Ignazio says. She credits a psychology class at Tufts with Professor Holly Taylor as her first introduction to the idea of visualization’s importance.

A Restless Learner

D’Ignazio may be focused on data now, but she was an international relations major at Tufts. “I’ve always loved foreign languages and learning about and understanding different cultures,” she says. After graduating, she did tech work for her father’s business, and started getting into computer programming.

Soon she was working at a startup, learning Java programming and Perl scripting, and then headed out on her own as a freelance programmer. Always a restless learner, she added an M.F.A. in media arts and computationally driven art, and then began teaching in addition to her programming work and making art.

But three careers became untenable with the birth of her first child. D’Ignazio headed back to school, this time to MIT’s Media Lab for another master’s degree.

An example of effective data visualization, according to Catherine D’Ignazio—maternal and paternal leave polices by country. Image: The Women’s Atlas, Oxford, UK: Myriad Editions 2018; use by permission of the author, Joni SeagerAn example of effective data visualization, according to Catherine D’Ignazio—maternal and paternal leave polices by country. Image: The Women’s Atlas, Oxford, UK: Myriad Editions 2018; use by permission of the author, Joni Seager
She melded those strands at Emerson College, where she was assistant professor of data visualization and civic media in the journalism department. Some students were intimidated by numbers and data, but D’Ignazio taught them to be more confident in their skills and to be skeptical of data served up by others.

In 2019 she joined MIT’s Department of Urban Studies & Planning, where she now teaches and also directs the Data + Feminism Lab, “which uses data and computational methods to work towards gender and racial equity, particularly as they relate to space and place,” she says.

Data Tools for Good Causes

For example, the Data Against Feminicide project works with community-based and nonprofit organizations that monitor gender-based violence in the U.S., Latin America, and Canada. The project is a collaboration with Rahul Bhargava, an assistant professor at Northeastern University.

Project members interview staff at the nonprofits about the methods they use to collect data, and then “build tools and technologies to support them and reduce the labor of collecting the data, which is often very manual, very copy-and-paste, but also very emotionally challenging” D’Ignazio says.

The groups piloted the data tools, which are built on AI and machine learning but with simple user interfaces, “and gave us ideas, and then we do an iteration and build new tools and deploy those,” she says. “We think of it as participatory artificial intelligence, where the community informs us every step of the way on what should be the next feature or thing that we should do.”

She and Bhargava have also developed a suite of tools called DataBasic, which they use to train journalists, nonprofit organizations, librarians, community-based organizations, and artists about how to use data—focusing on basic data analysis and visualization techniques. “We also help people challenge their own belief in the numbers,” she says. Too often, non-technical people take it as a given that numbers are objective, even though how they were arrived at often isn’t.

We all need to understand how these data come into the world, who collects them, and why, she insists. “What are the set of interests at play? They don’t have to be nefarious—it’s just more that they’re collecting data with a specific purpose in mind, which will capture some things and leave out other things,” she says.

A few years ago, data was called “the new oil” in business publications, and that’s apt, D’Ignazio says, even if not in the way it was originally intended. There are certainly similarities between oil and data: we talk of extracting data, refining data, data as a vital resource, data as a source of new power and wealth.

“I think power is at the center of it,” says D’Ignazio. “If you look at the companies with the most money, they’re the ones which have the capability of collecting, storing, maintaining, analyzing, and deploying very large data sets—Google, Facebook, Microsoft. It really is a way in which power is being concentrated very unevenly.”

That’s why she works on data literacy as well. “If data is power, then how do we democratize that power and put that power in the hands of a more dispersed set of actors, rather than only the very large elite corporations and well-resourced governments?”

Taylor McNeil can be reached at taylor.mcneil@tufts.edu.

Back to Top