Facial landmark detection is still easy with MediaPipe (2023 update)

Samuel Pröll included in Artificial Intelligence

2023-11-01 603 words 5 minutes

/posts/ai/mediapipe-update-2023/featured.jpg

Contents

In 2023, MediaPipe has seen a major overhaul and now provides various new features in addition to a more versatile API. While code from my older post still works (as of writing - November 2023, mediapipe==0.10.7), I want to briefly take a look at the new API and recreate the rotating face using it.

MediaPipe preview

Note that as of November 2023, MediaPipe is still in preview and the API could change again in coming versions.

Recap: What is MediaPipe?

MediaPipe¹ is an open-source library providing ready-to-use AI solutions for popular tasks in computer vision, audio processing and more. With only a few lines of code, you can create powerful applications that leverage pre-trained and optimized deep learning models. MediaPipe supports multiple platforms - even mobile - and offers APIs in C++, JavaScript and Python.

The pre-trained models are provided through the MediaPipe Tasks API. You can see the full list of solutions here. For example, there are models for image classification and segmentation, face detection and audio classification. In addition, some of the provided models can even be further customized. Using the Model Maker, you can leverage transfer learning to fine-tune a model for your specific application - more on that in this post.

Installing MediaPipe for Python

MediaPipe for Python is installed with:

1

pip install mediapipe

Additionally, you will have to download the pre-trained models for each task you want to perform. For this post, we will only get the FaceLandmarker from here. Download the appropriate .task or .tflite file containing the pre-trained models you need.

Running MediaPipe tasks in Python

The new programming interface is similar across all available tasks. In order to run a model, you first need to perform the following steps:

import MediaPipe
1

import mediapipe as mp

define basic and model-specific options

1
2
3
4
5
6
7


base_options = mp.tasks.BaseOptions(
    model_asset_path="path/to/model.task",
)
options = mp.tasks.vision.FaceLandmarkerOptions(
    base_options=base_options,
    running_mode=mp.tasks.vision.RunningMode.IMAGE,
)

initialize the model

1
2


with mp.tasks.vision.FaceLandmarker.create_from_options(options) as landmarker:
    ...

prepare inputs (mp.Image for vision, AudioData for audio tasks).

1
2
3


# e.g.: image from OpenCV
cv_image = cv2.cvtColor(cv2.imread("filename.png"), cv2.COLOR_BGR2RGB)
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv_image)

Finally you can run the model like below.

1
2


with mp.tasks.vision.FaceLandmarker.create_from_options(options) as landmarker:
    results = landmarker.detect(mp_image)

The code above is showcased with the FaceLandmarker task, but other tasks are performed in the same manner. For example, to run an image classifier you would need to swap the FaceLandmarker(Options) for the ImageClassifier(Options) classes and call .classify() instead of .detect().

Detecting face landmarks in Python

Here is a complete example of how to use MediaPipe’s FaceLandmarker solution to detect 478 facial landmarks from an image. The locations of each landmark are detailed in the documentation here.

1
2
3
4
5
6
7
8
9


import mediapipe as mp

base_options = mp.tasks.BaseOptions(model_asset_path="./face_landmarker.task")
options = mp.tasks.vision.FaceLandmarkerOptions(
    base_options=base_options, running_mode=mp.tasks.vision.RunningMode.IMAGE
)
with mp.tasks.vision.FaceLandmarker.create_from_options(options) as landmarker:
    mp_image = mp.Image.create_from_file("path/to/image.jpg")
    results = landmarker.detect(mp_image)

Retrieving landmark coordinates as a NumPy array

The primary output provided by the face landmarker are landmark positions for each detected face given in normalized image coordinates. Each item in the results.face_landmarks list is itself a list of NormalizedLandmark containers. We can convert these landmarks to a numpy array of pixel coordinates with the following function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# clunky import only needed for a more concise type annotation
# optionally, use mp.tasks.components.containers.NormalizedLandmark
from mediapipe.tasks.python.components.containers import NormalizedLandmark
import numpy as np

def get_landmark_coords(
    landmarks: list[NormalizedLandmark], width: int, height: int
) -> np.ndarray:
    """Extract normalized landmark coordinates to array of pixel coordinates."""
    xyz = [(lm.x, lm.y, lm.z) for lm in landmarks]
    return np.multiply(xyz, [width, height, width]).astype(int)

# example call
coords = get_landmark_coords(results.face_landmarks[0], image.shape[1], image.shape[0])

Combining the above tools and matplotlib, we can create a nice animation of the 3D landmarks. See the script below for more details.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51


import matplotlib.pyplot as plt
import mediapipe as mp
import numpy as np
from matplotlib.animation import FuncAnimation, PillowWriter
from mediapipe.tasks.python.components.containers import NormalizedLandmark

def get_landmark_coords(
    landmarks: list[NormalizedLandmark], width: int, height: int
) -> np.ndarray:
    """Extract normalized landmark coordinates to array of pixel coordinates."""
    xyz = [(lm.x, lm.y, lm.z) for lm in landmarks]
    return np.multiply(xyz, [width, height, width]).astype(int)

base_options = mp.tasks.BaseOptions(model_asset_path="./face_landmarker.task")
options = mp.tasks.vision.FaceLandmarkerOptions(
    base_options=base_options, running_mode=mp.tasks.vision.RunningMode.IMAGE
)
with mp.tasks.vision.FaceLandmarker.create_from_options(options) as landmarker:
    mp_image = mp.Image.create_from_file("path/to/image.jpg")
    results = landmarker.detect(mp_image)

if len(results.face_landmarks) > 0:
    height, width = mp_image.numpy_view().shape[:2]
    coords = get_landmark_coords(results.face_landmarks[0], width, height)

    fig = plt.figure(figsize=[4, 4])
    ax = fig.add_axes([0, 0, 1, 1], projection="3d")
    ax.view_init(elev=180, azim=0, vertical_axis="y")
    ax.axis("off")

    # thanks to https://stackoverflow.com/a/63625222
    # and https://stackoverflow.com/a/76115126
    ax.set_box_aspect([1, 1, 1], zoom=1.5)

    x, y, z = coords[:, 0], coords[:, 1], coords[:, 2]
    ax.scatter(x, y, z, c=z, cmap="PuBuGn_r", clip_on=False, vmax=2 * z.max())

    def loop_update(frame, fps=15, rot_per_sec=1 / 12):
        dt = frame / fps
        ax.azim = 360 * dt * rot_per_sec
        ax.elev = 180 + 15 * np.sin(2 * np.pi * dt * rot_per_sec * 2)

    animation = FuncAnimation(
        fig,
        lambda f: loop_update(f, rot_per_sec=15 / 150),
        frames=150,
        interval=1, # set very low to speed up writing
    )

    writer = PillowWriter(fps=15)
    animation.save("rotating-face.gif", writer=writer, dpi=72)

Conclusion

With the new MediaPipe Tasks API, applying computer vision models for common applications is still incredibly easy. A few lines of code provide the basis for wide-ranging opportunities. This post shows how to retrieve facial landmarks. But MediaPipe offers a few more solutions such as image/audio classification, selfie segmentation or pose detection (see below).

The discussed code only shows how to get the landmark coordinates. If you want to see how this could be used in action, check out my previous post about extracting heartbeat signals from webcam video.

Finally, keep in mind that MediaPipe is still in active development. The API could still see changes and some solutions may be altered significantly or even removed in a later version.

(References)

C. Lugaresi et al., “MediaPipe: A Framework for Building Perception Pipelines,” 2019, arXiv:1906.08172v1. ↩︎