-->

Enterprise AI World 2024 Is Nov. 20-21 in Washington, DC. Register now for $100 off!

Deepfake and AI Generated Security Threats

We live in a period of history where the Internet has become one of the most important tools available to humankind. It has enhanced global communication by orders of magnitude, increased the knowledge available to any given individual by incomprehensible amounts, and provided on-demand entertainment that we could only dream of just a decade ago. Unfortunately, as with any tool this powerful, it has also been used for nefarious purposes. The most prevalent and well-publicized of these malicious uses in recent years has been the proliferation of fake news and propaganda via Web sites, social media, and online communications hubs. This trend has impacted humanity at every level, causing suffering to innocent individuals, and swaying the fates of entire nations by influencing the election of their top officials. However, to this point, that influence has been tempered by the ability of careful, thoughtful people to see through the lies and attempts at trickery by simply using their own common sense and intuition. Now, that safeguard may be taken away by emerging technologies capable of creating depictions of actual and fictional human beings so realistic, so convincing, that even a family member might have a hard time telling it apart from the real thing. This report details the threat technologies like deepfake and AI-generated online personas pose to personal privacy and security, the methods being used to produce these deceptive images and videos, and the countermeasures available to protect against the dystopian threat of a forthcoming information age in which no form of electronic communication can be completely trusted.

Definition and Explanation

Although this report focuses on the singular topic of how AI-based image and video manipulation could pose a global security threat, there are two very distinct forks to this possible threat. The first is the manipulation of images by AI-based software for the purpose of changing the appearance of a real person, or creating a completely fictional person that is indiscernible from one of actual flesh and blood. The second branch is the usage of AI-based software to alter video and audio to make it appear that a person is saying something they did not say or doing something they did not do. Although both of these software tools can be used to achieve similar goals, the actual products that can be produced by using them vary greatly in the purposes to which they can be put and the goals they can achieve. For this reason, this section – and some subsequent sections of the report – will be split into two segments: image manipulation and video manipulation.

Image Manipulation

Image manipulation is by no means a new technology. Pictures have been getting "Photoshopped" since Adobe debuted the photo editing software in the 80s, and non-electronic methods of image manipulation had been in use for many years prior. However, this report’s focus is specific to methods of manipulating and creating images that employ artificial intelligence to accomplish their goals. While a skilled artist could use Photoshop to make nearly anything seem real, people with that skillset are rare, often requiring years of training to reach that level of aptitude. But, thanks to AI, the same ends can now be produced by a novice with no experience whatsoever. On the one hand, this may seem relatively benign, including things like cute Instagram and Whatsapp filters designed to add crowns to an image or make it look like a person is sticking out a giant dog’s tongue. However, a very similar technology can be used to make individuals appear old or weak, or even to make it appear as if they are wearing blackface. Given the constant flood of news the public is inundated with about which politician is apologizing for which controversial blunder, it is easy to see how such images could be employed by purveyors of fake news to sow discord and misinformation.

In a similar vein, those same propagandists now also have tools in their arsenal to create AI-generated images of photorealistic human beings to use as the face for their influence campaigns. Need a young woman to prove that female voters support your controversial candidate? Just cook one up using AI without any of the pesky legal risks of using an image of an actual person who may or may not agree with your point. Need a man to pose as a member of your opposition while writing vile social media posts in order to damage their credibility? Simply compile one using AI to serve as the straw man your plan requires. The ability to create imaginary people at a whim is not, in and of itself frightening, but the uses to which this option can be put (covered in more detail later in the report) are truly terrifying.

Video and Audio Manipulation

While the aforementioned act of "Photoshopping" an image is common enough to have entered the public vernacular, most people have been fairly confident, to this point in history, that similarly convincing manipulation of video cannot be accomplished by anything short of a big budget Hollywood film. Even then, special effects often fail to completely fool the audience thanks to tell-tale signs of manipulation, computer generated imagery, or camera trickery. However, this is now changing thanks to the ability of newly developed AI-based software to manipulate video in ways that are far more convincing than even the best Hollywood has to offer. Perhaps most frightening of all is the fact that this can be accomplished with relatively few tools, for relatively little cost, using consumer-level hardware many readers likely already have in their homes. If manipulating an image is enough to convince some fake news readers that a person has participated in an illicit act, imagine the impact of a similarly doctored video of that person appearing to do or say something completely unforgivable.

Although deepfake has become something of the face of this type of technology, it is by no means the only tool to accomplish the task, nor is it a monolithic piece of software produced by a single developer. Indeed, deepfake, or some slight variation of the word, is already creeping towards filling the same linguistic purpose as "Photoshop," a catch-all verb designed to reference any AI-based manipulation of a video.

Alongside video manipulation, AI-based audio manipulation has also progressed to the point where it is literally possible to put words into a person’s mouth. Using technology similar to that powering our AI-based digital assistants, bad actors can now take a database of voice clips of a public figure, process that data, and create a tool that allows them to script anything they wish. The result will be a convincing replica of the person in question actually reading that script. Although no examples yet exist of this technology verifiably being used in criminal or nefarious activity, there are already instances of it being used in the entertainment industry to literally put words into the mouths of individuals who have passed on, including a documentary on the late chef Anthony Bourdain that includes quotes never actually spoken by the man, but generated via AI following his death.1 While this use is relatively harmless, if arguably morbid, it could be combined with the aforementioned video manipulation tools create a recipe for putting any person you choose into any number of compromising, embarrassing, or illegal situations.

Current AI-Based Security Threats

While the previous section laid the groundwork for the nature and capabilities of the threats posed by AI-based image and video manipulation, this section will dive deeper into the specific ways in which those threats can be employed with examples of how they have already been used to demonstrate possible vectors of attack, as well as a few early examples of malicious parties attempting to use the technology to fool the public.

Image Manipulation

As stated above, image manipulation has posed a danger to the trust we have placed in photographs for several decades now. However, with AI-powered manipulation added to the mix, images can now be changed quicker, and often to a more convincing degree, than almost any human can match. In fact, one of the key capabilities of AI to deceive is its ability to manipulate images in near real time. Imagine a much more advanced version of the Instagram or Snapchat filters mentioned above that alters the subject's appearance not into something cartoonish and whimsical but realistic and believable. While this may seem like an advanced capability that should require either a lengthy processing time or significant computing power, it has already reached the point where it can be accomplished by a simple, freely available smartphone app. This reality has lawmakers greatly concerned.

The best available example of this technology so far is likely is a mobile app called FaceApp. The app originally launched in 2017 but took more than a year to come into the spotlight due to a viral surge of celebrities using an age changing filter to show what they could look like as older or younger versions of themselves. The surprisingly convincing results of the app’s aging or de-aging efforts led many users to try it out for themselves, giving the software and its developers access to detailed photos and facial scans of millions of users, including some very important, very high profile individuals. It is this information gathering capability that began to concern lawmakers in mid-2019. The first official backlash came from the US Democratic National Committee (DNC), which advised its members, particularly those participating in active campaigns, to avoid using the software entirely.2 The primary reason for this cautionary statement seemed to be the origin of the app, which was developed by a team located primarily in Russia. The DNC’s well-known history of being hacked by Russian citizens makes the precaution understandable, even without the nearly unfettered access the app requests to the user’s uploaded photos.

Figure 1. FaceApp’s Age Progression Demo Image

Figure 1. FaceApp’s Age Progression Demo Image

While the app is used to alter photos for the user’s own enjoyment, those same images could be manipulated to a politician’s detriment. One need only look at the history of FaceApp itself for a prime example. The app originally launched with "Black, White and Asian" filters in its repertoire.3 Even prior to recent controversies surrounding public figures and their use of or stance on "blackface," the backlash this caused was monumental. The filters were eventually removed from the app and the developer later apologized. Imagine a scenario in which the DNC’s fears prove true and the developers of FaceApp provide access to its image database, knowingly or unknowingly, to Russia-based hackers. A cache of celebrity and politician images could be mined for anyone that the malicious actors want to target for character assassination. The image could then have this "digital blackface" filter applied and be posted somewhere it could be seen by millions of people. The subject of the image could protest its existence, try to prove that it was altered without their knowledge or against their will. But, recent history shows quite well how impossible it is for even the most outlandish fake news stories to be pushed completely out of the public consciousness, even after evidence disproving their claims has been found and verified.

It is the danger posed by that this level of access to images of public figures that led Senate Minority Leader Chuck Schumer to call on the FBI and Federal Trade Commission (FTC) to investigate the app.4 Schumer wrote in his letter to the FBI that "FaceApp’s location in Russia raises questions regarding how and when the company provides access to the data of U.S. citizens to third parties, including foreign governments."5 He went on to ask the FBI to "assess whether the personal data uploaded by millions of Americans onto FaceApp may be finding it way into the hands of the Russian government, or entities with ties to the Russian government." Although FaceApp’s developers vehemently deny the idea that they share users' data with any unauthorized third parties, the accusations remain troubling and could very well lead to foreign actors attempting to exploit similar repositories of these types of images in the future, even if they have not already attempted it.

Image Generation

Manipulating the appearance of real-world individuals can undoubtedly be used to damage their public image. But, what if no usable images are available? Or, what if you, as a purveyor of libelous propaganda, wish to create a false narrative that cannot be supported by any amount of image manipulation? How could these new AI-based security threats benefit you? One extremely useful way is by creating a digital straw man through which your narrative can be spread via social media channels. Of course, this avenue of attack generally requires an accompanying online persona to back these opinions. One could simply pilfer a real person’s image, but that brings the risk of being accused of identity theft and having your operation shut down. What if, instead of using a real person, you could create one from whole cloth, tailored specifically to the needs of your political or social stance? This is already quite possible.

Figure 2. An Artificial Headshot Generated by ThisPersonDoesNotExist.com
Figure 2. An Artificial Headshot Generated by ThisPersonDoesNotExist.com

The availability of this type of technology came into the public eye in 2019 when a Web site titled ThisPersonDoesNotExist.com was launched.6 The Web site employs a technology known as "generative adversarial networks" or GANs, originally developed by video processing tech maker Nvidia, to create photo-realistic images of human beings that, as the name would suggest, do not actually exist.7 Although the person in the image may slightly resemble any of the actual people from which the various portions of their face were drawn, the final result is a perfect digital Frankenstein’s Monster, almost completely indiscernible from a real image of a real human being. One article on the site’s launch called the images it generates "disturbingly convincing, a warning against trusting images and a whisper of just how gnostically paranoid everything is going to get."8 This description is an apt one that gets to the heart of what this report is truly about: that technology is quickly advancing to the point where it can alter the forms of media we rely on for the truth to the point where any scenario can be made to seem real. While the concept of a complete loss of trust in photographic images may seem unrealistically dystopian, it is, nonetheless, coming to pass, and the criminals and malefactors who could benefit from it are salivating at the ways in which it could help their cause.

Video Manipulation

Video of an event is often seen as the ultimate arbiter of its veracity. Eyewitness accounts of something having happened can be misremembered or an outright lie, and images can be manipulated, as noted in detail above. But, video was thought to be immune to such deception due to the difficulty that even movie studios have in making mundane, but untrue, events seem completely real. Take, for example, the now-infamous case of Superman’s moustache. Actor Henry Cavill grew a moustache for his role in the sixth film of the Mission: Impossible franchise. Due to a contractual obligation requiring him to maintain his public appearance for that role, Cavill was forced to refrain from shaving the mustache during the period in which he was filming his scenes as Superman in the Justice League movie. It was decided that the moustache would have to be digitally removed from the film to conceal the very un-Superman facial hair from appearing in Justice League. Despite the seeming simplicity of this special effect and the massive budget available to the film’s makers, the resulting digital alterations to Cavill’s face went down in film history as one of the most embarrassing CGI (computer generated imagery) failures ever to besmirch a blockbuster offering.9 While this rather lengthy aside may seem like a complete nonsequitur, it is included to illustrate the extreme difficulty that even highly skilled professionals have when attempting to convincingly alter the human face while it is in motion in a video. The results, despite their limitless resources and best efforts, almost always seem slightly off, usually straying into what’s know as the Uncanny Valley, a term referring to the tendency for imperfect attempts at artificial replications of human being to create unease or disgust within the people viewing them.

Ironically, the artificial intelligence that we find so hard to convincingly humanize has surpassed us in the goal of convincingly replicating human motion to the point where video can be produced of a real, recognizable human being doing or saying something that he or she never did or said. As with image manipulation, this new wave of technological trickery can be best shown off by a single example of AI-based video manipulation: deepfake. The term deepfake was created in 2017 when it was demonstrated that the same generative adversarial networks that power FaceApp could be applied to video.10 While the technology has obvious benign or even beneficial applications in the entertainment industry, it has generally made the news not for the special effects it can produce but for the danger it poses. The most obvious, and arguably greatest, of these is the ability to put words into the mouth of a powerful politician or world leader. This was demonstrated most famously by actor and filmmaker Jordan Peele when he commissioned and helped produce a video of former President Barack Obama presenting a public service announcement on the dangers of deepfake.11 In the video, Obama appears to lay out the possibilities deepfake offers, references one of Peele’s own movies, and insults his successor, all before it is revealed that the audio portion of the clip is actually being spoken by Peele himself, with deepfake technology handling the task of nearly perfectly syncing the video’s lips to his words. While Peele used the video to illustrate how easily someone could be fooled by the first few moments of it, he did eventually reveal it to be fake. If, instead, someone produced a similar video of Barack Obama swearing, insulting the sitting president, or saying anything they chose for him to say, the backlash and outrage would be monumental, with Obama’s political enemies likely latching onto the video as the proof they have always wanted that he is a truly terrible human being.

Some might believe that even visually convincing videos such as the Barack Obama example above would never be trotted out by political adversaries for fear of discovery that the content was produced by fraudulent means. Those people would be verifiably wrong. In fact, a simple video editing trick that could be accomplished by even the barest novice video editor has already fooled many high-powered individuals, up to and including the sitting President of the United States, and has been used in a subsequent attempt to politically weaponize its content against a political rival. The incident in question, which did not even need to rely on deepfake technology, involved a pair of videos of Speaker of the House Nancy Pelosi. In one of these doctored films, the Speaker’s speech was slowed down by a significant amount to make it appear that her speech was being slurred during a news conference.12 This was followed by a second video in which that similar footage of the Speaker was aired on the Fox News program Lou Dobbs Tonight.13 In this segment, the video was edited in a similar fashion while also stringing together several verbal stumbles in order to make it appear that the Democratic Representative was having severe difficulty speaking. The clip was portrayed by Dobbs as evidence that Pelosi was in cognitive decline. Making matters worse is the fact that former President Trump himself tweeted out that exact segment with text saying "PELOSI STAMMERS THROUGH NEWS CONFERENCE."14

Whether Trump was actually fooled by the video or not is largely irrelevant in this case. The true point of this example is simply to show the impact something as simple as a misleading video edit can have on the news cycle. Imagine, then, something as sophisticated as deepfake technology being applied to the same video. Rather than simply appearing to be drunk or suffering from cognitive difficulties, Pelosi could have been made to say whatever a political rival wished. Indeed, the congresswoman could have been made to extol the virtues of Satan or her love for Adolf Hitler, and the video would have been nearly indiscernible from the truth to some human eyes.

Readers may be wondering at this point about the accompanying audio required for these media nightmare scenarios to play out. After all, if Pelosi had never said the things mentioned in the theoretical example listed above, then there would be no appropriate voice recording to provide the necessary phrases needed to back the visual fakery with audio. Unfortunately for fans of the truth, audio processing technology has also advanced to a point where it is able to produce artificial speech that is nearly indistinguishable from an actual human voice. Two standouts in this area are WaveNet, a Google-backed audio processing technology able to "generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems," and Adobe’s Project Voco, a "Photoshop for voices" that can take as little as twenty minutes of audio speech recordings from a subject and use it to artificially reproduce that person’s speech uttering phrases they have never said in entire their life.15,16 Either of these software solutions, as well as several others currently being developed, can be used to handle the speech portion of deepfake videos. Or, one could take a less technical approach and simply find a talented individual to mimic the person’s voice, an option which the aforementioned example of Jordan Peele’s Barack Obama video showed can be eerily effective.

Countermeasures

One may feel that the world is doomed to descend into an apocalyptic confusion in which no recorded image or video can ever be trusted again. You may ask, "If this type of technology already exists, what will the news look like in ten years, or twenty? Will it all be AI-generated lies designed to advance the agenda of some corporation or politician?" While that outcome can, unfortunately, not be entirely ruled out, it most definitely can be avoided. This is because technology is ultimately a tool. In nearly any area of technological development where humans are using it for nefarious purposes, there are other humans using it to combat those malignant goals. These "good guys" are essentially fighting fire with fire by employing the same technological advancements being put to criminal uses in order to maintain law and order, and, particularly in this case, to protect the truth. "White Hat" hackers, cybersecurity experts, cryptographers, and many other professions have all trod this road before, using fraudsters’ own tools against them. This section will cover some of the most important tools currently available for combating image and video manipulation, while also illustrating ways in which the average person can employ their own common sense to fight the influx of AI-powered fake news.

Tech Tools for Fighting Image and Video Manipulation

There are two ways in which fake images and videos can be fought: by stopping them before they are created or by detecting them after they are made. The first is perhaps the more cutting edge. In a paper titled "Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations," a team of researchers from Cornell University suggest disrupting the generative adversarial networks (GANs) used by FaceApp, deepfake, and other AI-based image and photo manipulation tools by purposely injecting noise into published images.17 Essentially, they suggest introducing randomized digital artifacts into images as a way to reduce their usability by synthesis tools. These artifacts would corrupt the information provided by source images, reducing the quality of the final product to the point that the video or image would have obvious visual distortions or artifacts of its own. While this option sounds promising in theory, the application of such a technique could prove problematic. It would essentially require every publicly available image of a given individual to be digitally altered in order for this noise to be added. This might be feasible for all images published via first party or official channels, but it becomes less likely that all news outlets capturing images of the individual would comply, and essentially impossible that all private citizens would, or could, participate. Just imagine the number of high-quality selfies that Barack Obama must have taken in his political career. Surely these would provide more than enough information for image or video manipulation, and would never have had the necessary noise added to prevent their use.

While preventative measures such as the one above do show promise, and they may eventually reach the point where they can adequately protect a person’s image from being used as a source in the first place, it is with the detection of image and photo manipulation that the most immediately promising technology now lies. One of the current leaders in this area is a professor at Dartmouth College named Henry Farid. The professor and one of his graduate students have published literature on a new method of detecting deepfakes that they call a "softbiometric signature."18 This technique uses a similar process to the technology it is trying to counteract by processing hours upon hours of video of a given individual. Rather than using this data to create a fraudulent representation of them, it detects movements and expressions particular to them. This profile can then be compared to a given video to detect any discrepancies with that person’s typical behavior. The professor, once again using Barack Obama as an example, explained to CNN that the former President has a tendency to frown and tilt his head downward when delivering negative facts, while he tilts it upward when delivering happy news. These tendencies, and others like them can be compiled into a comprehensive reference against which new videos of an individual can be checked. Farid claims he is also working on other systems that use GANs themselves to detect deepfakes, rather than create them. However, the professor told CNN he is reluctant to go into too much detail on how such detection systems work, due to the fact that that information could be employed by fraudsters to improve their own technique. A colleague of the professor – Siwei Lyu, director of the computer vision learning lab at Albany SUNY – illustrated such dangers in the same CNN article, referencing an instance in which he publicly mentioned that deepfakes could be detected by examining the unusual patterns of blinking by the individual in the video. Lyu noted that less than one month after he made this statement someone generated a deepfake video with realistic blinking.19

If this fraud virus detection battle seems like somewhat of an arms race, that is because it very much is, and it is not even a particularly new one. Going back to the earlier mention of Adobe’s Photoshop being one of the first tools available to accomplish digital image manipulation, that company has now begun employing AI to detect when someone has used its software to alter an image.20 It does this by looking for edits made by Photoshop’s popular liquify tool that allows image manipulators turn a frown into a smile, change the position of a limb, or alter various other physical characteristics. The company fed an AI program numerous pairs of images, one of which had been altered using the tool. The software eventually learned to detect telltale signs of digital trickery, allowing it to correctly spot alterations in 99 percent of subsequent test images. For comparison, human test subjects were only able to detect the same alterations with an accuracy of 53 percent.

Figure 3. An Example of Adobe’s Detection AI at WorkFigure 3. An Example of Adobe’s Detection AI at Work

While this tool is not specifically designed to detect the types of AI-generated images referenced throughout most of this report, the technology powering it could certainly be applied to that application. In fact, it is with this AI vs AI scenario that the most promising research into combating deepfakes and AI-manipulated imagery lies. Artificial intelligences can process millions upon millions of images to detect differences so minute that they would be unrecognizable to the human eye. This ability will prove absolutely paramount in the race against fakers and fraudsters. However, now that the genie is out of the bottle, it will almost definitely be a nearly endless battle between those building ever-better fakes and those trying to detect and stop them.

What you can do to detect image and video manipulation

Unfortunately, as mentioned above, humans are not the best at detecting when they are being duped. Deepfakes and AI-generated images have already well outstripped the capabilities of most human beings to detect them. That said, very few manipulations are completely perfect. While some are very, very close, nearly all of them include some minor flaw that, while easily written off as a photographic artifact, could instead be used to out the image or video as a fake. Below is a list of telltale signs to look for in images and videos that may betray the fact that they have been manipulated.

Images

Figure 4. An Example of AI-Generated Imagery Creating Distortion via Incorrect Processing
Figure 4. An Example of AI-Generated Imagery Creating Distortion via Incorrect Processing
Source:ThisPersonDoesNotExist.com

Distortion – While AI-based manipulation of images does not employ Photoshop or any human-interface software for that matter, it does employ similar techniques. One of these is distorting an image to make it appear closer to the desired end result. The same techniques used to make models appear impossibly perfect can also be employed to overlay the face of an unwilling victim onto an image of someone else entirely. However, the process of making that source image of the victim’s face fit into the final image requires it to be distorted. If done well, this will take into account everything from perspective to lighting sources and even barely perceptible facets like the way in which light passes through the slightly translucent surface of human skin. However, it is rare that such perfection can be accomplished by humans or machines. Slight variations in the line of a silhouette, an isolated discrepancy in the resolution of a certain area of an image, or a facial or body feature that seems to be bent unnaturally are all examples of the type of distortion that can be a dead giveaway. In the above image, this type of flaw can be seen in the right lens of the woman’s glasses. Rather than correctly rendering the background to match the rest of the image, the AI produced a distorted, blurred spot that includes colors that shouldn’t be present and bent, wavy representations of what should be a fairly clear image of what is behind that lens.

Figure 5. An Example of an AI Attempting to Blur Portions of an Image to Match Features
Figure 5. An Example of an AI Attempting to Blur Portions of an Image to Match Features
Source:ThisPersonDoesNotExist.com

Blurring – Both humans and AI can often use subtle blurring to mask portions of an image that have been altered. In the above image, this can be seen in the man’s mouth. While the majority of his face is in sharp focus, the image becomes inexplicably blurry around the left side of his lips and teeth. This is likely due to the mouth portion of the image having been sourced from a photo taken from a slightly different perspective. The automated process of reshaping that mouth to fit the finished product resulted in the pixels contained within that part of the mouth becoming compressed or stretched to the point where the blue was introduced. A similar effect can be seen at the top of the subject’s visible ear. Although this may appear fairly obvious when pointed out, it is still a relatively subtle effect, and one that could very easily be missed when looking at the often low-resolution images used on social media.

Figure 6. An Example of a Somewhat Obvious Artifact within an AI-Generated Image
Figure 6. An Example of a Somewhat Obvious Artifact within an AI-Generated Image
Source:ThisPersonDoesNotExist.com

Artifacts – Artifacts in digital images can take many forms, such as strange blocks of pixels, inexplicable rainbow colors where none should be, or simple black spots within an image. In the above example, several artifacts can be seen. The most obvious are the ones present in the woman's hat. The AI that generated this image was unable to correctly replicate the texture of the knitted material. It misconstrued or inadequately distorted its source imagery in several spots, producing white voids surrounding by inexplicable bright spots. This may have been due to it interpreting something within its source image as a reflection or a simple inability for it to knit (no pun intended) the gaps in the source images together correctly. Another slightly more subtle example can be seen in the subject’s smile. While all of the other visible teeth appear normal, the last visible incisor on the left shows a strange bubble-like formation that almost appears to overlap the woman’s lip, despite the tooth clearly being behind the lip. This might be another example of reflectivity throwing off the AI, suggesting that it has some difficulty with processing images using strong light sources.

Putting it all together – These are just a few examples of how AI can essentially "mess up" when attempting to create a fake image. Signs like these can appear in human manipulations as well as AI-based manipulations of existing images and those created from whole cloth. Unfortunately, not all fakes will be as imperfect as the three example images provided. This can be seen in the fact that the same AI that produced these also produced the much closer to flawless example in Figure 2 of this report. However, if there are flaws within a given image, humans can be surprisingly good at spotting them. Yes, training and education can play a big part in the ability to detect fakes. However, instinct also has its role. The earlier reference to the "Uncanny Valley" effect is an important one. Nearly everyone has looked at an image or a video and thought "something about this seems … off." That gut feeling is typically the brain detecting some discrepancy between what it is seeing and what it knows a human should look like. It could be something as simple as a lack of reflection in the subject’s eye or the strange musculature visible when they move their mouth. In any case, it is often wise to trust feelings like these. Our brains are better than we give them credit for at spotting manipulation, even if it is often at a subconscious level.

Video

Unfortunately it is much more difficult to detect a well-faked video. Although there are many reasons for this, the primary one is that they simply tend to move too fast. This is not to say that the subject of the video is actually moving quickly. Rather, it refers to the fact that each frame of the video is only visible for a fraction of a second, giving the viewer almost no time at all to detect alterations that may have been made to it. Imagine if one of the three example images above was shown for only 1/24th of a second, a common rate of frames per second used for many videos. Would the viewer still be able to detect the same, seemingly obvious discrepancies under those circumstances? Assuming the flaws were gone in the next frame or were replaced by a different set of flaws in each subsequent frame, almost certainly not. The aspect of the human brain that allows us to see rapidly switched static images as motion rather than as a slideshow is the same aspect that could facilitate such trickery. The brain processes a smoothed over average of the images it is perceiving, often ignoring jagged changes in position between frames and mentally smoothing that motion into a much more natural-seeming one. This same tendency is what could hide the flaws that might be present in only a small portion of the frames within a deepfake. With all of this said, there are still a few signs that can often given away a faked video.

Odd mannerisms – This is a variation on the research of Professor Farid mentioned above. Human beings tend to have a unique set of mannerisms. Certain facial expressions, talking with ones hands, emotional tells, and more can all identify a person even if their appearance is somehow obscured. Actors study these physical traits for years in an attempt to hone their craft, often still failing to hide their own tendencies within the artificial mannerisms of a given role. This method of detection would of course work best when the viewer is already familiar with the mannerisms of the subject of a potentially fake video. This could be because the subject is a friend or family member, or simply because the subject is a famous celebrity or politician. This is the rare case when being well-known to the public consciousness may provide a protection against fakery.

Skin tone discrepancies – A person’s skin tone is a highly unique characteristic of their appearance. Some AIs are capable of altering the tone and color of their source imagery to create a convincing series of frames within a video, even if those frames were drawn from source images with subjects of varying skin tones. However, an adequate selection of source imagery is not always available. This can lead to a subject’s face, typically drawn from the actual target of the fake, not matching the skin tone of the body, which is often drawn from third-party sources that may not involve the target of the fake at all. In a similar vein, skin tones change based on the available lighting. Even if the AI is able to correctly match factors like facial and body skin tones, it might not also match those same tones to the apparent light source in a video. Imagine, for example, a video in which the subject’s left hand appears lighter than their right, even though the light source is situated closer to the subject’s right side. This odd bit of shadow could be a dead give away.

Muscle movement – This aspect of faked videos may be the easiest to spot. Realistic muscle movements from computer generated characters has been something of a holy grail in Hollywood for decades. Unfortunately for makers of sci-fi and action films, it has rarely been achieved. While some AIs are already better at it than some big-budget filmmakers, it is still very, very difficult to precisely match the miniscule muscle movements of the human face when manipulating a video. This is because of the numerous micro-expressions any person will tend to go through even when having a relatively emotionless exchange. Minor tics in the subject’s brows, widening of the eyes, tiny smirks, nostril flares, a tightening of the jaw … these are all things that need to be synced to the words a subject is supposed to be speaking. While researchers like Professor Farid are designing AI programs capable of spotting such oddities, humans can, with varying degrees of success, look for them as well.

Summary

Ultimately, we are only at the very beginning of an era in which AI-generated imagery and deepfake-like video manipulation begin suffusing the world’s media production. It may be that, in just a few years, we look back at this time and laugh at how naive we were about the possibilities, or wish for the days when AI-based manipulation was as crude as it is now. Some people might look to lawmakers and regulators to ask, "Why aren’t you doing something to stop this?" But how could they? Should they outlaw special effects companies that are too good at their jobs? Stifle and curtail tools to make fantasies come to life on the big screen just so some foreign propagandist would be unable to replicate a public figure’s image to fulfill their nefarious goals? Even if they choose to attempt this, some of the most impressive advancements in fakery mentioned in this report are accomplished by private individuals and complete unknowns tinkering with increasingly sophisticated AI frameworks available to nearly everyone. Once again, the genie is already out of its bottle and there’s not stuffing it back in.

How, then, can we protect ourselves from a nightmare scenario in which no form of media can ever be trusted again? Of course, prevention and detection will play a major role in any counter-offensive against the likes of deepfake and its ilk. Researchers and security experts are only now getting their feet wet in this arena and will have to become veteran soldiers in an ongoing war against AIs that aim to deceive, honing their tools and techniques while malicious parties continue to refine their own art.

However, none of this will matter if we allow ourselves to stop paying attention. Most attempts at media fraud, AI-based or otherwise, succeed because the victim was simply not paying close enough attention to whatever was duping them. A perfect example would be the proliferation of fake news perpetrated by social media browsers that like to "just read the headlines." Nearly everyone has said something along the lines of "I read on the Internet that…" based solely on having seen a headline without ever looking into the article it topped. Although this is typically a benign form of laziness, it can have dire consequences when the content of that article would have revealed the headline as complete hogwash. It is in this way that lies are spread, by people that chose not to take the extra second to think about what their eyes were seeing, to ask themselves if it really did make sense, and to apply what facts they already know to every new fact they learn to verify their likelihood of being true.

Imagine if the oft-reference video of Barack Obama did not include the reveal of Jordan Peele as its producer. Some would be enraged at the insult Obama issued to the sitting president. But, in the entirety of his public life, did Barack Obama ever use such language? Would it make any sense for a former President that has been extremely private since leaving office to suddenly make a video in which he made such an inflammatory statement? Maybe the viewer would think, "It might be a good idea to examine this video a bit more closely." Maybe that examination would lead them to say, "Hey, come to think of it … his voice sounds kind of strange…." The point of this extended "what if" scenario is to illustrate the importance of always remaining vigilant. We are entering a period in human history where an ever-growing number of malefactors stand to benefit from lying to us, duping us, and making us complicit in their efforts to manipulate and deceive. It is, therefore, up to us to do everything we can to guard against being unwilling tools in the hands of these shadowy forces by always keeping a keen eye and a sharp mind bent towards looking for the truth in everything we see, and never just taking anything for granted.

References

1 Bonifacic. I. "The New Anthony Bourdain Documentary 'Roadrunner' Leans Partly on Deepfaked Audio." Engadget. July 2021

2 O’Sullivan, Donnie. "DNC Warns 2020 Campaigns Not to Use FaceApp ‘Developed by Russians.’" CNN. July 2019.

3 McCoogan, Cara. "FaceApp Deletes New Black, White, and Asian Filters after Racism Storm." The Telegraph. August 2017.

 4 O’Sullivan, Donnie. "Schumer Calls for Feds to Investigate FaceApp." CNN. July 2019.

 5 Ibid.

 6 "This Person Does Not Exist." ThisPersonDoesNotExist.com. Retrieved July 2019.

 7 Paez, Danny. "This Person Does Not Exist Is the Best One-Off Website of 2019." Inverse. February 2019.

 8 "This Person Does Not Exist." BoingBoing. February 2019.

9 Koerber, Brian. "Henry Cavill’s Mustache Was Digitally Removed in ‘Justice League’ and It’s Laughably Bad." Mashable. November 2017.

 10 Schwartz, Oscar. "You Thought Fake News Was Bad? Deep Fakes Are Where Truth Goes to Die." The Guardian. November 2018.

 11 Romana, Aja. "Jordan Peele’s Simulated Obama PSA Is a Doubled-Edged Warning Against Fake News." Vox. April 2018.

 12 Harwell, Drew. "Faked Pelosi Videos, Slowed to Make Her Appear Drunk, Spread Across Social Media." The Washington Post. May 2019. 

 13 Novak, Matt. "Bull**** Viral Videos of Nancy Pelosi Show Fake Content Doesn’t Have to Be a Deepfake." Gizmodo. May 2019.

 14 Ibid. 

 15 "WaveNet: A Generative Model for Raw Audio." Deepmind.com. Retrieved July 2019

16 "Adobe Voco ‘Photoshop-for-Voice’ Causes Concern." BBC News. November 2016.

 17 Li, Yuezun, et al. "Hiding in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations." arxiv.org. June 2019.

 18 Metz, Racel. "The Fight to Stay Ahead of Deepfake Videos Before the 2020 US Election." CNN Business. June 2019.

 19 Ibid.

 20 Vincent, James. "Adobe’s Prototype AI Tool Automatically Spots Photoshopped Faces." The Verge. June 2019.

EAIWorld Cover
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues