Huygens Institute tackles historical bias in datasets
The Huygens Institute has been awarded an NWO grant to tackle bias and subjectivity in historical datasets. The Combatting Bias project aims to change the way researchers and AI systems work with historical data. The project is embedded in an international network.
NWO is funding the Huygens Institute’s Combatting Bias project through a TDCC-SSH grant. NWO sees ‘bias’ as a major problem in social science and humanities data and recognises the need for an ethical framework. Combatting bias will establish guidelines for critical reflection when compiling and structuring datasets and thus aims to identify and reduce bias in datasets.
‘In this way, society will get more nuanced and diverse representations of the past – representations that include the voices and experiences of those who have historically not been heard or heard less,’ promises Lodewijk Petram, project leader. ‘This also has positive effects on AI systems, such as language models, in which historical bias often comes through unintentionally.
Biased narratives
Especially in the social sciences and humanities, datasets often reflect the perspectives of historical power brokers, perpetuating unequal relationships.
‘It is the job of dataset creators to notice skewed representations of the past in datasets, straighten them out as much as possible, and inform users about this,’ believes Manjusha Kuruppath, coordinator of the project. ‘But the creators of datasets could use help with this.’
Over the next year, the Combatting Bias team will join four projects collecting data from colonial archives, an eminently biased data source. Using principles from data ethics, it will advise on the use of terminology and categorisations that do justice to the past.
Different perspectives
Collaboration is central to the project. ‘By working with a diverse group of advisers, we can draw on a wealth of expertise, experiences and perspectives to develop comprehensive guidelines that effectively address bias in social science and humanities datasets,’ says Mrinalini Luthra, data steward at the initiative.
Advisors and partners contribute regional perspectives from the Netherlands, Palestine, the US and South Africa, and expertise ranging from the study of music across cultures to social science research, political activism and FAIR data practices. FAIR stands for Findable, Accessable, Interoperable and Reusable. It means that both humans and computers should be able to find, understand and (re)use the data.
The project will be embedded within the Huygens Institute and the International Institute of Social History (IISH). Data stewards from both institutes’ data management departments will be assigned to the project. These stewards will collaborate with partner projects to develop guidelines for reducing bias in dataset and knowledge creation. Furthermore, they will ensure that insights gained from the Combatting Bias project are integrated into broader institutional contexts and practices.
Individual or commodity
GLOBALISE, a partner project providing online access to the vast VOC archives, regularly faces the challenges that Combatting bias will address. While working on a dataset on goods mentioned in the VOC archives, the question arose whether enslaved people should be included in the dataset. The project did not want to conceal the violent history of slavery, but neither did it want to impart some form of legitimacy to the VOC’s treatment of slaves through the dataset. GLOBALISE took an intersectional approach, including enslaved people in the dataset as commodities, but also as individuals with rich personal histories, relationships and social positions. This helps researchers write histories that do justice to the humanity of enslaved people.