We present a novel multi-modal extrinsic calibration framework designed to simul- taneously estimate the relative poses between RGB cameras, LiDARs, and Event cameras. Core of our approach is a novel 3D calibration target, specifically designed and constructed to be concurrently perceived by all three sensing modalities. The target encodes features in planes, ChArUco, and active LED patterns – each tailored to the unique characteristics of LiDARs, RGB cameras, and event cameras respectively. This unique design enables a one-shot, joint extrinsic calibration process, in contrast to existing approaches that typically rely on separate, pairwise calibrations. Our calibration pipeline is suited for complex vision systems, and we demonstrate the effectiveness of our approach in the context of autonomous driving, where precise multi-sensor alignment is critical. We validate the benefit of our approach through an extensive experimental evaluation on a custom built dataset, recorded with an advanced autonomous driving sensor setup, confirming the accuracy and robustness of our method.
The experimental setup consists of a fully functional autonomous vehicle. In particular, we evaluated our calibration approach on a Maserati GranCabrio Folgore, equipped with a flexible platform that enables testing and swapping of sensor configurations to explore novel approaches. As outlined in the paper, the configured scene is static (as shown in the rendering below), which offers certain advantages but also introduces specific challenges, discussed in detail in the manuscript. From a practical perspective, a scene such as the one proposed in our work is highly convenient in a production environment: the calibration target can be permanently installed, allowing vehicles to be parked in front of it for sensor calibration.
In Figure 1, an overview of the main sensors tested is provided. The sensing suite includes two Ouster LiDAR 360° Mid-Range units, which deliver dense 3D representations of the surroundings with 128 scan lines and a maximum range of 200 m; a Seyond Falcon K Ultra-Long Range LiDAR, offering a 120° horizontal field of view and capable of measuring distances up to 500 m over 152 scan lines; up to three Lucid Vision IP68 RGB cameras; and an EVK4 Prophesee event-based camera equipped with the IMX636 (HD) sensor. All sensors' data are transmitted to the processing unit via the ROS 2 middleware (ROS2 DOCS), ensuring synchronized acquisition and seamless integration with the perception and calibration modules.
The calibration target is a cube with three visible adjacent faces, each equipped with a 3×3 ChArUco board for detecting 2D features {a0, ..., a59} in RGB images. The 3D counterparts {A0, ..., A59} and cube's corners {E0, ..., E6} are extracted from the pointcloud. Seven LEDs placed on the cube’s corners blink at unique, non-uniform frequencies, enabling their identification in the event stream as {e1, ..., e6}. These multi-modal correspondences are used to estimate the sensor's pose via PnP.
Below we report some measures to build your own target, notice that the target should be entirely captured by the cameras, LiDARs and event cameras, it also needs to be close to the sensors in order to distinguish the features. As a guideline distance from sensors you can consider 1 to 3 meters, but the closest the better (in our case was ~2m).
| ID | Component | Dimension | Unit |
|---|---|---|---|
| 1 | Cube face side | 0.60 | [m] |
| 2 | Aruco square side | 0.118 | [m] |
| 3 | Square dimension | 0.148 | [m] |
| 4 | Origin Offset on x axis | 0.075 | [m] |
| 5 | Origin Offset on y axis | 0.075 | [m] |
Here you can find a comprehensive document of a thesis, which provides valuable insights into this work. While the thesis covers a broader range of topics, it presents them in a more foundational and less elaborated manner compared to the depth of analysis found in this paper.
Dowload resource