It looks like you're new here. If you want to get involved, click one of these buttons!
In our recent Theatre Journal article we explored the question of why bias exists in AI, focusing in part on a critique of a performance of “digital whiteface.” Researchers Abeba Birhane and Olivia Guest demonstrated how Duke University’s PULSE face-hallucination software package would restore a blurred image of Birhane’s face into what appeared to be that of a white man or woman. In their words: "[w]hen confronted with a Black woman's face, it 'corrects' her Blackness and femininity," and they refer to the result as "imposing digital whiteface” (Birhane and Guest, see Figure 2 on page 66).
Very briefly, face-hallucination is the process of creating a realistic, artificial face from a blurred image. PULSE (Menon et al.) relies on the nVidia StyleGAN engine (Karras et al.), a more general machine-learning framework that, in this case, learns facial features in order to create plausibly realistic artificial faces . StyleGAN in turn was trained using the ffhq dataset, a collection of 70,000 high-quality images of faces collected from Flickr (Karras et al.).
The first impulse in a code critique might be to find a mechanistic explanation within the text of the code used in the performance. We determined quickly, however, that no such simple mechanism existed. The original Flickr images were not encoded by race, and while the distribution of subjects was skewed as compared to the US population (young women being overrepresented, Black people being underrepresented, (Salminen et al.), StyleGAN appeared to be able to generate realistic faces that reflect Western perceptions of different races. The StyleGAN code extracts features from any set of images at multiple scales, with the same code used to generate artificial images of bedrooms, cars, cats and people. The PULSE code simply constrained the facial generation process of StyleGAN to conform to a blurred image given as input. (This video shows StyleGAN in action.)
Therefore, rather than basing our critique in text, we turned to performance theory. Below one can see StyleGAN performing.
Source: Karras et. al.
The above are images of bedrooms that do not exist. StyleGAN took as input 50k images of bedrooms, extracted features at scales ranging from fine-grained textures to larger shapes, and then generated the images above based on those learned styles. The images are the result of the interaction of StyleGAN’s two neural networks. One generated images, and the other sought to distinguish between generated and actual images. As both networks improved, the quality of the resulting artificial images also improved (thus the name of the technique: Generative Adversarial Networks).
Below are faces generated by StyleGAN using the ffhq dataset. Like the bedrooms above, these people never existed.
Source: Karras et. al.
The top row and leftmost column are source images. The remaining faces are constructed based on combining styles from the two sources. It works—we can see the similarities in the source images reflected in the new images.
PULSE is a more narrowly focused project that leverages StyleGAN to solve a specific problem: Given a blurred image as input, create a plausible constructed image that can be downscaled back to the original blurred image. Again, all of these are constructed images. Note that the purpose is not to retrieve the original unblurred image from the blurred image, but rather to create a novel image that is consistent with the blurred image.
Source: Menon et al.
Birhane’s performance uses PULSE slightly differently (again, refer to Figure 2 on page 66). She begins by presenting three non-blurred images of her own face, then blurs each of those images, and finally requests PULSE/StyleGAN to construct an artificial face based on the blurred image. The audience would expect the constructed face to match, perhaps imperfectly, Birhane’s initial images. They do not, although we can see echoes of the styles chosen by StyleGAN.
The focus of Birhane and Guest's article is suggested by their title: "Towards Decolonising Computational Sciences," and PULSE is offered as only one brief example. Birhane and Guest argue that the problem is not only the lack of diversity in software teams and training data, though they do identify those as problems. For Birhane and Guest, lack of diversity "is a symptom of the subtle white and male supremacy under which the computational fields operate, which assume and promote whiteness and maleness as the ideal standards” (Birhane and Guest).
We add the following observations to their analysis. StyleGAN, for all its capability in identifying classes of features, did not (and could not) derive the feature of race. As long as the audience understands the images in play are artificial, the morphing of images across race (and sex, and age) is simply an interesting trick. When a real face enters the mix, the audience expects that race would be preserved across generated images. In Birhane’s performance, PULSE/StyleGAN fails to conserve race, and we are left with a profound feeling of disquiet that was not present for the manipulation of artificial images. Our Theatre Journal paper is entitled “The Nonmaterial Mirror,” and in Birhane’s performance, she figuratively looks into the PULSE/StyleGAN mirror and asks, “How do you see me?”
In our original TDR article, we defined four tenets of Nonmaterial Performance, which we relate here to PULSE:
We return to our essential question: Why does bias exist in AI? In particular, why does bias exist in this AI when the codebase contains no reference to race and the training set has at least some degree of diversity?
StyleGAN could only create abstractions based on the pixels it was given. When humans look at images, particularly faces, our perception is informed by far more than the pixels. When real faces are introduced and manipulated, particularly when restoring a real face, StyleGAN’s limitations become jarring.
From a Critical Code Studies perspective, the rhizomatic performance of code becomes untethered from its text. The code is inscrutable without its performance.
Condee, William, and Barry Routree. “Nonmaterial Performance.” TDR: The Drama Review, 64 (2020): 147-57.
Menon, Sachit, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin, “PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 2437–45. (arXiv, homepage)
Karras, Tero, Samuli Laine, and Timo Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401-4410. (github, arXiv, ffhq-github, video)
Rountree, Barry, and William Condee. “Imagining the Nonmaterial.” Imagined Theatres, 03 (2019): 14.
Rountree, Barry, and Condee, William. “The Nonmaterial Mirror: Performing Vibrant Abstractions in AI Networks.” Theatre Journal, 73 (2021): 299–318.
Salminen, Joni, Soon-gyo Jung, Shammur Chowdhury, and Bernard J. Jansen, “Analyzing Demographic Bias in Artificially Generated Facial Pictures,” in Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA ’20) 2020, Association for Computing Machinery, New York City (2020), 1–8.