Understanding the Linear Camera Model in Camera Calibration (Part 2/5)

Calibrate the camera to determine the intrinsic and extrinsic camera parameters.

Deeraj Manjaray
8 min readMar 27, 2024
Photo by Yin Weiquan on Unsplash

Introduction:
In the first part of this blog series, we introduced the concept of camera calibration in 3D computer vision.

In this second part, we will delve deeper into the linear camera model, which is crucial for camera calibration. We will discuss the forward imaging model, which is a 3D to 2D transformation, and learn how to develop a comprehensive linear model for the camera.

The Forward Imaging Model

Forward Imaging Model: 3D to 2D

Here, we have a world coordinate frame w focus our attention on single point P. And camera coordinate, c where the Z axis of the camera coordinate frame is aligned with the optical axis of the camera with effective focal length f.

The relative position and orientation of the camera coordinate frame with respect to the world coordinate frame, the point P in the world coordinate frame to its projection Xi on the image plane.

Forward imaging model is a complete mapping that takes a point in the world coordinate frame and projects it onto the image plane. This model consists of two main steps:

  1. Transformation from the world coordinate frame to the camera coordinate frame.

World Coordinates :

Camera Coordinates :

2. Perspective projection from the camera coordinate frame to the image plane.

Image Coordinates:

We will discuss these steps in detail and explore the various components involved in the process.

Transformation from World to Camera Coordinate Frame

The transformation from the world coordinate frame to the camera coordinate frame is a 3D to 3D transformation. This is achieved using the extrinsic parameters of the camera, which are the rotation matrix (R) and the translation vector (T). The rotation matrix represents the orientation of the camera, while the translation vector represents its position in the world coordinate frame. The process is developed step-by-step, and the results are shown in the end.

Homogeneous Coordinates

To linearize the camera model, we introduce homogeneous coordinates. Homogeneous coordinates allow us to represent a 2D u = (u, v) or 3D point using an additional fictitious coordinate. This enables us to express the perspective projection equation as a matrix multiplication, which is a linear operation.

3D Point:

where the third coordinate:

which is fictitious such that:

Source: Wikipedia — Rational Bézier curve — polynomial curve defined in homogeneous coordinates (blue) and its projection on plane — rational curve (red)

In the homogenous representation of a 3D point:

is a 4D point

And the fourth coordinate

is fictitious such that:

Which is :

Perspective Projection

Image Plane to Image Sensor Mapping

Here, in perspective projection, the pixels may be rectangular.

If mx and my are the pixels densities (pixels/mm) in x and y directions, respectively , then pixel coordinates are:

To be more clear, Perspective projection is the process of transforming a 3D point in the camera coordinate frame to a 2D point on the image plane. And this is done using the intrinsic parameters of the camera, which include the focal length (fx, fy), principal point (ox, oy), and pixel density (mx, my).

By taking the top-left corner of the image sensor as
its origin (easier for indexing). If pixel (ox,oy) is the Principle
Point where the optical axis pierces the sensor, then:

Using Homogenous coordinates of (u, v)

The perspective projection equation can be written as:

This includes all the internal parameters of the camera effects are multiplied by the homogeneous coordinates of the three dimensional point defined in the camera.

Remember, Camera’s internal geometry — (fx, fy, Ox, Oy) is the Intrinsic parameters of the camera.

where (xc, yc, zc) is the 3D point in the camera coordinate frame, and (u, v) is the corresponding 2D point on the image plane.

Now we get the Intrinsic Matrix.

Intrinsic Matrix:

where, Calibration Matrix Scale (which is upper right triangular matrix):

The intrinsic matrix (K) is a 3x3 upper triangular matrix that contains the intrinsic parameters of the camera. It is also a part of the overall intrinsic matrix, which is a 3x4 matrix obtained by concatenating the intrinsic matrix (K) and a column of zeros.

As we already said, intrinsic matrix is used to transform a 3D point in the camera coordinate frame to its corresponding 2D point on the image plane using homogeneous coordinates.

So, this homogeneous coordinate representation of a point in the camera, coordinate frame 3D to its pixel coordinates in the image

Extrinsic Parameters

World-to-Camera Transformation

Position cw and Orientation R of the camera in the world
coordinate frame W are the camera’s Extrinsic Parameters.

Here, row 1, 2 and 3 indicates the direction of

in the world coordinate frame.

So that’s our interpretation of our rotational matrix which is orthonormal vector.

The extrinsic matrix is a 4x4 matrix that contains the extrinsic parameters of the camera, which are the rotation matrix (R) and the translation vector (T). It is used to transform a 3D point in the world coordinate frame to its corresponding 3D point in the camera coordinate frame using homogeneous coordinates.

Coming back to world-to-camera transformation:

Given the extrinsic parameters of the camera, the camera-centric location of the point P in the world coordinate frame is:

where:

which is called the translation vector gives you xy and we can now write this out in matrix vector form as follows:

Extrinsic Matrix

Rewriting using homogenous coordinates:

So, here we have 3x3 matrix which is rotational matrix and this matrix is called an extrinsic matrix.

here,

Now that we have addressed the perspective, projection, and linear model for the coordinate transformation from 3D to 3D, we possess a linear model for it. Consequently, we are now able to determine the mapping of a point in the world coordinate frame to pixels in the image.

Projection Matrix:

Camera to Pixel:

World to Camera:

The projection matrix (P) is a 3x4 matrix that combines the intrinsic matrix and the extrinsic matrix. It is used to transform a 3D point in the world coordinate frame directly to its corresponding 2D point on the image plane.

Combining the above two equations, we get the full projection matrix P in

So, when we need camera calibration we need to know this 3x4 matrix points.

Finally, Projection matrix (P) is the key component in camera calibration, as it encapsulates all the necessary information to map 3D points to their 2D projections.

Conclusion

In this blog post, we discussed the linear camera model and its importance in camera calibration. We explored the forward imaging model, which consists of the transformation from the world coordinate frame to the camera coordinate frame and perspective projection. We also learned about homogeneous coordinates, intrinsic and extrinsic matrices, and the projection matrix. Understanding these concepts is crucial for camera calibration and various computer vision applications.

In the next part of this series, we will discuss how to perform camera calibration using these linear camera models.

I would like to thank Dr. Shree Nayar, First Principles of Computer Vision Specialization, This article is based on my learning from a faculty member in the Computer Science Department at the School of Engineering and Applied Sciences, Columbia University.

Thank you for taking the time to read! Don’t forget to 👏 if you liked the article.

A Note on This Article

The insights and perspectives shared in this article are drawn from my personal experiences. As with any subjective matter, there may be differing viewpoints or approaches.

If you have any questions, concerns, or alternative perspectives to offer, I’d be glad to hear from you. An open dialogue allows us all to gain a deeper understanding of the topic at hand.

Feel free to share your thoughts or feedback in the comments below or reach out to me directly. I’m always eager to learn and grow through respectful discourse.

Hungry for AI? Follow, bite-sized brilliance awaits! ⚡

🔔 Follow Me: LinkedIn | GitHub | Twitter

Buy me a coffee:

Buy Me a Coffee

--

--

Deeraj Manjaray

Machine Learning Engineer focused on building technology that helps people around us in easy ways. Follow : in.linkedin.com/in/deeraj-manjaray