Graduate students participating in the Data Driven Decision Making at Tufts initiative learn to leverage analysis in solving problems and engaging communities
Data science as a discipline is growing by leaps and bounds—the U.S. Bureau of Labor Statistics expects U.S. employment in data science to grow by 35% by 2032. Thanks to a $3 million National Science Foundation-funded initiative, students at Tufts have been taking a deeper dive into the field, learning how to apply data science to a wide range of scientific and policy questions.
Launched in 2020, the Data Driven Decision Making at Tufts initiative has funded 15 research fellows from the School of Engineering and Graduate School of Arts and Sciences and educated many other students from across Tufts. Problem-focused immersion is a hallmark of the effort, with students learning by doing.
Each year, a new cohort of graduate students starts work on a complex problem or question, from contamination of the water supply in Flint, Michigan, to the relationship between food, energy, and water insecurity and political instability. The students join the initiative from many academic disciplines, from data visualization to environmental engineering to particle physics to the humanities.
Joe Hilleary, EG24, a master’s student in computer science, took the second year’s class, and then returned the following year as a research fellow. “These are big topics, and more importantly they’re the kind of topics that can only really be tackled from a multidisciplinary perspective,” he says. “They don’t fit nicely inside a single domain.” A recent graduate, he began work this summer as a data analyst with the MBTA in Boston.
Extracting Key Points in Huge Datasets
In the first year of the Data Driven Decision Making initiative, called D3M, the problem-focused immersion class studied lead contamination of the Flint water supply under the direction of Shafiqul Islam, professor in the Department of Civil and Environmental Engineering.
Students quickly understood the sheer breadth of the topic. Rather than their initial big idea of predicting or preventing the “next Flint,” they chose to establish models of how scientists and policymakers can responsibly engage and communicate with the communities they serve.
Ph.D. student Kevin Smith and M.S. student Peter Nadel combined their backgrounds in water resources engineering and classics, respectively, to analyze data gathered through Freedom of Information Act (FOIA) requests. The enormous datasets were rich with information, but the breadth of the data rendered it largely inaccessible.
Smith and Nadel identified tools and techniques to search the database and sort through the more than 300,000 images in the dataset, and made their demo publicly available online. They continued the project after the end of their class, thanks to support from a Tufts Data Intensive Studies Center seed grant. They are now working on packaging their efforts into an open-source framework for analyzing FOIA data, with the end goal of making information more freely available to policymakers and interested communities.
Overcoming Real-World Data Shortcomings
Remco Chang, professor of computer science, led the group in the initiative’s second year. The class initially focused on the use of data in communicating misinformation. “As we began to dig into data journalism, it seemed that to the extent there was misleading information, it stemmed from a disconnect between the data analysis and the narrative in which it was embedded,” said Hilleary. “We were looking less at a technical phenomenon and more of a rhetorical one.”
The third year’s group, guided by Jonathan Lamontagne, associate professor of civil and environmental engineering, studied how to choose the right tools for the right questions. They started using integrated assessment models, which seek to capture linkages between socioeconomics, human sectors like energy and food, and climate.
Using real-world data and thorny questions for the projects meant facing real-world headaches, too. In problem sets developed for classroom use, students assume that they are given clean datasets and that the model will work. That’s not the case in many research environments, and that wasn’t the case for the problem-focused immersion students, who experienced those challenges firsthand.
“Our projects rarely went according to plan, but we were allowed to play within that space and ended up learning in a very engaging way,” says Catherine Knox, a Ph.D. candidate in civil and environmental engineering who was a Data Driven Decision Making 2023-2024 research fellow.
She and classmates looked at the Pardee-RAND Index, which measures food, energy, and water insecurity. Researching it, they found a relationship between those insecurity factors and political instability. “Thinking ahead, we could assess an ensemble of potential futures to be able to identify areas of elevated risk of food, energy, and water-related conflict, acknowledging that the actual presence of conflict is also highly dependent on context and local institutions,” Knox says.
This fall, the graduate students will be taught by Abani Patra, director of the Data Intensive Studies Center and Stern Family Professor of Engineering, whose primary appointments are in computer science and mathematics.
Guided by these Data Driven Decision Making initiative faculty—including David Hammer, co-director of Tufts’ Institute for Research on Learning and Instruction and professor of education—each successive group builds on the work done by those who came before.
Many D3M fellows and students have chosen to work with subsequent student groups long after they’ve received a grade for their own class, too. “This is a community of lifelong learners interested in understanding what it takes to address complex social and environmental issues,” says Smith.