A survey: which features are required for dynamic visual simultaneous localization and mapping?

Visual Computing for Industry, Biomedicine, and Art

Table 2 Summary of recent robust SLAM systems

References	System properties			Implementation details		Practical consideration
References	Backbone	CT	Env	MS	HE	P/S	BI	OH	LC	HL
Low-level based SLAM (Robust SLAM section)
Point-based or pixel-patch-based SLAM
Yang et al. [20]	ORB-SLAM2 [21]	D	I	RE	–	–	–	–	–	–
Du et al. [22]	ORB-SLAM2	D	I	E + RE	–	√	–	–	√	–
Zhang et al. [23]	–	D	I	OF + DI	–	√	√	–	–	–
Tan et al. [24]	PTAM [6]	M	I	RE	–	–	–	√	–	–
Point-line-based SLAM
Zhang et al. [25]	–	D	I	3DE	–	√	–	–	√	√
Using high-level feature as semantic priors in low-level feature-based SLAM (Using high-level features as semantic priors for low-level-feature-based SLAM section)
Point-based SLAM
Bescos et al. [26]	ORB-SLAM2	M, S, D	I, O	SI + DI	S [27]	–	√	√	√	–
Yu et al. [28]	ORB-SLAM2	D	I	SI + E	S [29]	–	–	–	–	–
Cui and Ma [30]	ORB-SLAM2	D	I	SI + E	S [29]	–	–	–	–	–
Han and Xi [31]	ORB-SLAM2	D	I	SI + OF	S [32]	–	–	–	–	–
Long et al. [33]	ORB-SLAM2	D	I, O	SI + DI	S [32]	–	√	–	–	–
Ai et al. [34]	ORB-SLAM2	S, D	I, O	SI	O [35]	√	–	–	√	–
Xiao et al. [36]	ORB-SLAM2	M	I, O	SI + RE	O [37]	√	–	–	√	–
Brasch et al. [38]	ORB-SLAM [39]	M	O	SI + T	S [40]	√	–	–	√	–
Point-line-based SLAM
Zhang et al. [41]	–	D	I	SI + DI + E*	O [42]	–	–	–	–	√
Using high-level features in object SLAM (Using high-level features in object SLAM section)
Yang and Scherer [14]	–	M	I, O	E	O [43]	–	–	–	–	√

System properties: The backbone of the system (Backbone). Camera type (CT): RGB-D (D), monocular (M), stereo (S). Environment (Env): indoor (I), outdoor (O). Implementation details: Method of motion segmentation (MS): reprojection error (RE), epipolar (E), distance between matched and predicted 3D landmarks (3DE), semantic information (SI), depth information (DI), optical flow (OF), triangulation (T). High-level feature extractor (HE): semantic segmentation network (S), object detection network (O). Practical consideration: Use a probabilistic model or dynamic score (wight) to judge dynamic features (P/S). Long-term consistency (LC). Handle low-texture or less static point-feature man-made scenes (HL). *The epipolar constraint is only used on point features