The Operation and Purpose of PhotoDNA Algorithm

The internet has made many things easier, from keeping in touch with friends and family to getting a job and even working remotely. The benefits of this connected system of computers are immense, but there’s a downside too.

Unlike nation-states, the internet is a global network that no single government or authority can control. Consequently, illegal material ends up online, and it’s incredibly hard to prevent children from suffering and catch those responsible.

However, a technology co-developed by Microsoft called PhotoDNA is a step towards creating a safer online space for kids and adults alike.

What Is PhotoDNA?

PhotoDNA is an image-identification tool, first developed in 2009. Although primarily a Microsoft-backed service, it was co-developed by Professor Hany Farid of Dartmouth College, an expert in digital photo analysis. The purpose of PhotoDNA is to identify illegal images, including Child Sexual Abuse Material, commonly known as CSAM.

As smartphones, digital cameras, and high-speed internet have become more commonplace, so has the amount of CSAM found online. In an attempt to identify and remove these images, alongside other illegal material, the PhotoDNA database contains millions of entries for known images of abuse.

Microsoft operates the system, and the database is maintained by the US-based National Center for Missing & Exploited Children (NCMEC), an organization dedicated to preventing child abuse. Images make their way to the database once they’ve been reported to NCMEC.

Although not the only service to search for known CSAM, PhotoDNA is one of the most common methods, including many digital services like Reddit, Twitter, and most Google-owned products.

PhotoDNA had to be physically set up on-premise in the early days, but Microsoft now operates the cloud-based PhotoDNA Cloud service. This allows smaller organizations without a vast infrastructure to undertake CSAM detection.

How Does PhotoDNA Work?

When internet users or law enforcement agencies come across abuse images, they are reported to NCMEC via the CyberTipline . These are cataloged, and the information is shared with law enforcement if it weren’t already. The images are uploaded to PhotoDNA, which then sets about creating a hash, or digital signature, for each individual image.

To get to this unique value, the photo is converted to black and white, divided into squares, and the software analyses the resulting shading. The unique hash is added to PhotoDNA’s database, shared between physical installations and the PhotoDNA Cloud.

Software providers, law enforcement agencies, and other trusted organizations can implement PhotoDNA scanning in their products, cloud software, or other storage mediums. The system scans each image, converts it into a hash value, and compares it against the CSAM database hashes.

If a match is found, the responsible organization is alerted, and the details are passed onto law enforcement for prosecution. The images are removed from the service, and the user’s account is terminated.

Importantly, no information on your photos is stored, the service is fully automated with no human involvement, and you can’t recreate an image from a hash value.

Since 2015, organizations have been able to use PhotoDNA to analyze videos, too.

In August 2021, Apple broke step with most other Big Tech firms and announced they would use their own service to scan user’s iPhones for CSAM .

Understandably, these plans received considerable backlash for appearing to violate the company’s privacy-friendly stance, and many people worried that the scanning would gradually include non-CSAM, eventually leading to a backdoor for law enforcement.

Does PhotoDNA Use Facial Recognition?

These days, we’re familiar enough with algorithms. These coded instructions show us relevant, interesting posts on our social media feeds, support facial recognition systems, and even decide whether we get offered a job interview or get into college.

You might think that algorithms would be at the core of PhotoDNA, but automating image detection in this way would be highly problematic. For instance, it’d be incredibly invasive, would violate our privacy, and that’s not to mention that algorithms aren’t always right.

Google, for example, has had well-documented issues with its facial recognition software. When Google Photos first launched, it offensively miscategorized black people as gorillas. In March 2017, a House oversight committee heard that some facial recognition algorithms were wrong 15 percent of the time and more likely to misidentify black people.

These types of machine learning algorithms are increasingly commonplace but can be challenging to monitor appropriately. Effectively, the software makes its own decisions, and you have to reverse engineer how it arrived at a specific outcome.

Understandably, given the type of content PhotoDNA looks for, the effect of misidentification could be catastrophic. Fortunately, the system doesn’t rely on facial recognition and can only find pre-identified images with a known hash.

Does Facebook Use PhotoDNA?

As the owner and operator of the world’s largest and most popular social networks, Facebook deals with a lot of user-generated content each day. Although it’s hard to find reliable, current estimates,analysis in 2013 suggested that some 350 million images are uploaded to Facebook each day.

This will likely be a lot higher now as more people have joined the service, the company operates multiple networks (including Instagram and WhatsApp), and we have easier access to smartphone cameras and reliable internet. Given its role in society, Facebook must reduce and remove CSAM and other illegal material.

Fortunately, the company addressed this early on, opting into Microsoft’s PhotoDNA service in 2011. Since the announcement over a decade ago, there’s been little data about how effective this has been. However,91 percent of all reports of CSAM in 2018 were from Facebook and Facebook Messenger.

Does PhotoDNA Make the Internet Safer?

The Microsoft-developed service is undoubtedly an essential tool. PhotoDNA plays a crucial role in preventing these images from spreading and may even help to help at-risk children.

However, the main flaw in the system is that it can only look for pre-identified pictures. If PhotoDNA doesn’t have a hash stored, then it can’t identify abusive images.

It’s easier than ever to take and upload high-resolution abuse images online, and the abusers are increasingly taking to more secure platforms like the Dark Web and encrypted messaging apps to share the illegal material. If you’ve not come across the Dark Web before, it’s worth reading about the risks associated with the hidden side of the internet.