An excerpt from “A Sea of Data: Apophenia and Pattern (Mis-)Recognition” by Hito Steyerl:
“The NSA’s SKYNET program was trained to find terrorists in Pakistan by sifting through cell phone customer metadata. But experts criticize the NSA’s methodologies. “There are very few ‘known terrorists’ to use to train and test the model,” explained Patrick Ball, a data scientist and director of the Human Rights Data Analysis Group, to Ars Technica. “If they are using the same records to train the model as they are using to test the model, their assessment of the fit is completely bullshit.”18
Human Rights Data Analysis Group estimates that around 99,000 Pakistanis might have ended up wrongly classified as terrorists by SKYNET, a statistical margin of error that might have had deadly consequences given the fact that the US is waging a drone war on suspected militants in the country and between 2500 and four thousand people are estimated to have been killed since 2004: “In the years that have followed, thousands of innocent people in Pakistan may have been mislabelled as terrorists by that ‘scientifically unsound’ algorithm, possibly resulting in their untimely demise.”
This is a reading we had for DataArt, where we talked about identifying “signal” from “noise” in datasets. The point is pretty straight forward – How can you train a model (important as it is) with broken data? or test it with the same data that it was trained on?
There is A LOT*10^84 of machine-learning work happening in intelligence and security all over the world. For me, this field is all messed up, because it kinda works. I assume that the inherent biases in a model that is suppose to detect insurgents among refugees and asylum seekers, is crucial to the output. But on the other hand – innocent people get hurt.
So my question is what is the price we are willing to pay, i.e. how biased are we willing to be in order to train these models better and prevent mistakes and save lives? how many fault drone-attacks on unrelated houses need to happen in order to not-hurt the next family?