Responsible data science
Embracing a more holistic approach to digital technology
The accelerating use of digital technology over the last decade has raised thorny questions about ownership, privacy, accuracy, and bias. To address these issues, some concerned scientists have begun to call for “responsible data science”—a new approach that requires thinking critically about the increasing interactions between people and digital technology.
“The potential benefits to society of improved data-driven discovery and decision making are clear,” said Lise Getoor, UC Santa Cruz professor of computer science, director of the D3 Data Science Research Center, and a leader of this nascent call-to-action. “But we also need to take into account unintended societal effects of technological systems and seek a deeper understanding of the ethical implications.”
From social media to biological systems to the Internet of Things, there is a pressing need for principled computational methods that intentionally consider the full range of interactions in richly interconnected systems, Getoor said. Among other projects, her research group is working to develop such methods; one example is an open-source software toolkit they’ve created called probabilistic soft logic (PSL).
By using statistical science and logic to process and interpret highly interconnected systems so that contextual information is not lost, the highly scalable PSL enables users to take a more holistic approach to modeling and interpreting the connections between data in a particular context. Getoor’s team has successfully used the toolkit to understand cyberbullying, analyze online debates and online learning, study the effect of severe environmental events on human trafficking, and integrate multiple sources to probe how drugs might interact in the human body. “We model the interactions and dependencies,” Getoor said. “In this way we are looking at richer and more nuanced models than people often do within machine learning.”
Responsible data science is especially warranted for projects intended to promote social good, Getoor said. Such work might include, for example, applications in education, the environment, health, homelessness, and when data are used to inform criminal justice issues. “Complex societal problems require more holistic approaches that account for all of the stake holders, their values, and the interactions between them,“ she said.
In addition to new tools like PSL, embracing responsible data science will also require collaborative efforts between scientists across disciplines, Getoor said. “Computational scientists need to work with domain experts who can provide additional context about interpretations,” she said. “And then both need to work with ethicists to ask, what are the benefits? What are the potential harms?”
“Lise Getoor is leading the way in trying to bridge the gaps between these communities.” said Bill Howe, associate professor in the Information School at the University of Washington and head of Urbanalytics, an interdisciplinary group that applies responsible data science to advance urban projects. “She’s a data science expert who is able to put her work in the context of the social sciences.”
Such interdisciplinary collaborations will mean raising awareness among computer scientists about the potential impacts of the algorithms they develop. But, equally importantly, Getoor said, everyone should receive basic literacy training in computational and data-science methods, to help them critically interpret the output of the data-driven and algorithmic systems that will only become more commonplace in the future. “People need to understand the ways in which data-driven systems are useful but also how they can go wrong,” said Getoor.