Facial recognition’s ‘dirty little secret’
Millions of online photos scraped without consent
Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.
The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.
To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.
As the algorithms get more advanced — meaning they are better able to identify women and people of color, a task they have historically struggled with — legal experts and civil rights advocates are sounding the alarm on researchers’ use of photos of ordinary people. These people’s faces are being used without their consent, in order to power technology that could eventually be used to surveil them.
That’s a particular concern for minorities who could be profiled and targeted, the experts and advocates say.
“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.
The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were scraped from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.
But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)
“None of the people I photographed had any idea their images were being used in this way,” said Greg Peverill-Conti, a Boston-based public relations executive who has more than 700 photos in IBM’s collection, known as a “training dataset.”
“It seems a little sketchy that IBM can use these pictures without saying anything to anybody,” he said.
John Smith, who oversees AI research at IBM, said that the company was committed to “protecting the privacy of individuals” and “will work with anyone who requests a URL to be removed from the dataset.”
Despite IBM’s assurances that Flickr users can opt out of the database, NBC News discovered that it’s almost impossible to get photos removed. IBM requires photographers to email links to photos they want removed, but the company has not publicly shared the list of Flickr users and photos included in the dataset, so there is no easy way of finding out whose photos are included. IBM did not respond to questions about this process.
To see if your Flickr photos are part of the dataset, enter your username in a tool NBC News created based on the IBM dataset:
IBM says that its dataset is designed to help academic researchers make facial recognition technology fairer. The company is not alone in using publicly available photos on the internet in this way. Dozens of other research organizations have collected photos for training facial recognition systems, and many of the larger, more recent collections have been scraped from the web.
Some experts and activists argue that this is not just an infringement on the privacy of the millions of people whose images have been swept up — it also raises broader concerns about the improvement of facial recognition technology, and the fear that it will be used by law enforcement agencies to disproportionately target minorities.
“People gave their consent to sharing their photos in a different internet ecosystem,” said Meredith Whittaker, co-director of the AI Now Institute, which studies the social implications of artificial intelligence. “Now they are being unwillingly or unknowingly cast in the training of systems that could potentially be used in oppressive ways against their communities.”
More Top Stories
The United States needs to correct its wrong actions if it wants to continue negotiations with China to end a damaging tariff war, China’s Commerce …read more
Tornadoes killed at least three people in southwest Missouri and slammed into the state capital just before midnight on Wednesday, with rescue workers in Jefferson …read more
In Colorado Springs, businesses are suing the military for perfluorinated compounds, which some are calling ‘Agent Orange 2.0’read more
President Trump signed a presidential memorandum on Thursday cracking down on welfare-dependent legal immigration to the United Statesread more
Young people blame climate change for their small 401(k) balancesread more
Acting U.S. Defense Secretary Patrick Shanahan on Thursday confirmed that the Pentagon was considering sending additional U.S. troops to the Middle East as one of …read more