Skip to main content

Table 2 Summary of recent robust SLAM systems

From: A survey: which features are required for dynamic visual simultaneous localization and mapping?

References

System properties

Implementation details

Practical consideration

Backbone

CT

Env

MS

HE

P/S

BI

OH

LC

HL

Low-level based SLAM (Robust SLAM section)

 Point-based or pixel-patch-based SLAM

  Yang et al. [20]

ORB-SLAM2 [21]

D

I

RE

–

–

–

–

–

–

  Du et al. [22]

ORB-SLAM2

D

I

E + RE

–

√

–

–

√

–

  Zhang et al. [23]

–

D

I

OF + DI

–

√

√

–

–

–

  Tan et al. [24]

PTAM [6]

M

I

RE

–

–

–

√

–

–

 Point-line-based SLAM

  Zhang et al. [25]

–

D

I

3DE

–

√

–

–

√

√

Using high-level feature as semantic priors in low-level feature-based SLAM (Using high-level features as semantic priors for low-level-feature-based SLAM section)

 Point-based SLAM

  Bescos et al. [26]

ORB-SLAM2

M, S, D

I, O

SI + DI

S [27]

–

√

√

√

–

  Yu et al. [28]

ORB-SLAM2

D

I

SI + E

S [29]

–

–

–

–

–

  Cui and Ma [30]

ORB-SLAM2

D

I

SI + E

S [29]

–

–

–

–

–

  Han and Xi [31]

ORB-SLAM2

D

I

SI + OF

S [32]

–

–

–

–

–

  Long et al. [33]

ORB-SLAM2

D

I, O

SI + DI

S [32]

–

√

–

–

–

  Ai et al. [34]

ORB-SLAM2

S, D

I, O

SI

O [35]

√

–

–

√

–

  Xiao et al. [36]

ORB-SLAM2

M

I, O

SI + RE

O [37]

√

–

–

√

–

  Brasch et al. [38]

ORB-SLAM [39]

M

O

SI + T

S [40]

√

–

–

√

–

 Point-line-based SLAM

  Zhang et al. [41]

–

D

I

SI + DI + E*

O [42]

–

–

–

–

√

Using high-level features in object SLAM (Using high-level features in object SLAM section)

 Yang and Scherer [14]

–

M

I, O

E

O [43]

–

–

–

–

√

  1. System properties: The backbone of the system (Backbone). Camera type (CT): RGB-D (D), monocular (M), stereo (S). Environment (Env): indoor (I), outdoor (O). Implementation details: Method of motion segmentation (MS): reprojection error (RE), epipolar (E), distance between matched and predicted 3D landmarks (3DE), semantic information (SI), depth information (DI), optical flow (OF), triangulation (T). High-level feature extractor (HE): semantic segmentation network (S), object detection network (O). Practical consideration: Use a probabilistic model or dynamic score (wight) to judge dynamic features (P/S). Long-term consistency (LC). Handle low-texture or less static point-feature man-made scenes (HL). *The epipolar constraint is only used on point features