The data science community is well aware of the risk that dataset imbalances can lead to biased models that negatively impact downstream predictions [1-3] and create unfair or undesirable outcomes for underrepresented groups. Racial bias is one of the most discussed contributors to poor model performance on underrepresented populations [4-9]; however, analyzing racial biases in image and video datasets can be challenging without a labor-intensive manual review effort to quantify demographic diversity.
To measure racial biases in image and video datasets and understand how dataset demographics relate to machine learning model performance, we developed a unique approach to quantify skin tones from facial images.
Fundamentally, our method combines computer vision with machine learning models to categorize skin tone into one of the six Fitzpatrick skin types [10].
We first process video frames using a face detection algorithm to locate faces within individual frames. Then, we apply a face landmark prediction model and a convex hull masking approach to strategically extract areas of the face only containing skin, thus eliminating color interference from areas like the eyes, nostrils and mouth. To transform the resulting skin mask into meaningful skin tone colors, we utilize unsupervised machine learning to cluster the population of all skin pixels into a summary set of 10 colors. Finally, our concluding model maps the 10-color summary to one of six Fitzpatrick skin types that correspond to the amount of melanin in a person’s skin.
Ensuring racial bias is not baked into machine learning models due to imbalanced training data is essential, but the time required to manually estimate the demographic makeup of faces in a large dataset is prohibitive. Execution of our model takes advantage of state-of-the-art cloud computing technology, enabling effective quantification of demographic diversity in video and image datasets that are orders of magnitude faster than manual review. Monitoring diversity during a company’s dataset and model development phases is a significant mechanism by which to exclude racial bias from model performance.
Furthermore, we utilized our model to compile over 10,000 faces into a novel face dataset that is unique for its approximately uniform distribution across the six Fitzpatrick skin tones. This broad dataset enabled us to study differences in the performance of common facial recognition models across demographic groups and work to improve performance across the board. It also led us to conduct experiments on emotion perception between people with different demographics, further informing our model design.
Our continued work focuses on improving the robustness of our model as well as examining biases in facial recognition and face emotion detection algorithms. Extremely high or low amounts of illumination in a video can interfere with accurate skin tone predictions, so we are currently incorporating a robust illumination normalization technique to combat outlier video lighting and coloration. Lastly, skin tone diversity is rich, and thus the Fitzpatrick scale is a simplified and suboptimal representation of all skin complexions and undertones. Our team is currently testing alternatives such as the individual topology angle for skin tone categorization [11].
We believe it’s important to apply our expertise in computer vision and machine learning towards building tools for fair AI. Many of our clients are interested in leveraging video and image data to develop AI solutions in the technology and healthcare sectors. With insight from our racial bias quantification tool, we can inform clients on where additional data needs to be collected and apply nuanced modeling to achieve more accurate predictions for all clients and use cases.