Molecular surface definitions and meshing tools
Definitions for molecular surface
The molecular surface is defined in various senses. The most widely used molecular surfaces include the van der Waals surface (VDWs) [29], the solvent accessible surface (SAS) [30], the solvent excluded surface (SES) [31], the minimal molecular surface (MMS) [32], the molecular skin surface [33] and the Gaussian surface [34].
The VDWs is defined as the surface of the union of the spherical atomic surfaces with the VDW radius of each atom within the molecule. Using the VDWs makes it convenient to calculate the molecular surface area and the surface normal, but it contains many pores that cannot contain water molecules.
The solvent accessible surface [30], also known as the Lee-Richards molecular surface, is the trace of the centers of probe spheres rolling over the VDWs. The radii of probe spheres are typically 1.4 Å. From the geometrical perspective, the SAS is equivalent to the VDWs obtained through increasing the VDW radius by the radius of a water molecule. The SES (also known as the “molecular surface” or “Connolly surface”) is the surface traced by the inward-facing surface of the probe sphere [31].
In contrast to the VDWs, the SES contains less cracks and surface dimples. The SAS and SES are represented by the trajectory of the center and the inter-boundary of a rolling probe on the VDWs, respectively. Figure 2 shows an illustration of the VDWs, SAS and SES.
The skin surface is defined by a set of weighted points representing the atoms and a scalar called the shrink factor controlling the hyperboloidal connections between neighboring spheres. The skin surface is smoother than the VDWs and its tangent is continuous [33]. For the MMS, Wei et al. [32] constructed a surface-based energy function and used minimization and iso-surface extraction processes to obtain the so-called minimal molecular surface.
The MMS [32] is defined as the smallest area in all the surfaces of the VDW atoms containing proteins. Based on the theory of differential geometry, they proposed that the MMS can be determined by the mean curvature equation with constraints:
$$ \frac{dS}{dt}=\left\Vert \nabla S\right\Vert \nabla \cdot \left(\frac{\nabla S}{\left\Vert \nabla S\right\Vert}\right) $$
(1)
where S(x, y, z, t) is the hypersurface function.
Unlike the definitions above, the Gaussian surface [34] is defined as a level set of the summation of the Gaussian kernel functions as follows:
$$ \left\{\overrightarrow{x}\in {R}^3,\phi \left(\overrightarrow{x}\right)=c\right\} $$
(2)
where
$$ \phi \left(\overrightarrow{x}\right)=\sum \limits_{i=1}^N{e}^{-d\left({\left\Vert \overrightarrow{x}-{\overrightarrow{x}}_i\right\Vert}^2-{r}_i^2\right)} $$
(3)
The parameter d is positive and controls the decay speed of the kernel functions; xiand ri are the location and radius of atom i, respectively; canddare usually set as 1 and 0.5, respectively. Compared with other definitions of molecular surface, the Gaussian surface is smooth and more suitable to represent the electron density of a molecule [35]. These two parameters, c and d can be chosen such that the Gaussian surface approximates the SES, SAS and VDWs well [35, 36]. Compared with the other definitions, the Gaussian surface has the following advantages.
-
1.
The Gaussian surface is smoother.
-
2.
The Gaussian surface provides a realistic representation of the electron density of a molecule compared to other molecular surface definitions [35].
-
3.
The Gaussian surface is well established [37,38,39,40] and has a wide range of applications in computational biology, such as docking problems [41], molecular shape comparisons [42], calculating SAS areas [43] and the generalized Born models [44].
Advances in biomolecular surface mesh generation tools
With the various definitions of molecular surface proposed, numerous works have been devoted to the computation of molecular surfaces. The biomolecular surface mesh substantiates that it is indeed a discrete representation of the biomolecular surfaces, which has a wide range of applications in the visualization, geometric calculation and solution of the implicit solvent model. The development of mathematical modeling and numerical simulation of biomolecular systems, especially the solution of implicit solvent model, proposes new requirements for the biomolecular surface mesh, such as high quality, efficiency and stability.
In recent years, as a variety of molecular surfaces definitions have been put forward, many kinds of algorithms for calculating the molecular surface meshes are constantly emerging, as described below. In 1983, Connolly [45, 46], proposed an algorithm for calculating the SAS and SES analytically. In the work, he separated the molecular surfaces into three parts: convex spherical surface, saddle-shaped toroidal surface and concave spherical triangular surface. These surfaces can be detected by the number of atoms touched by the probe. In 1995, a popular software, GRASP [23], for visualizing molecular surfaces was presented by Nicholls. In 1996, Sanner et al. [47] proposed an algorithm, called MSMS for generating triangular meshes based on an “reduced surface”, which is extremely useful for its high efficiency. MSMS contains four steps. First, it computes the reduced surface of the atoms. Second, it constructs the analytical representation of the SES based on the reduced surface produced in the first step. Third, the singularities created in the second step are handled. Finally, the SES is triangulated. MSMS is one of the most widely used software for molecular surface triangulation because of its high efficiency. In the next year, Vobrobjev et al. [48] introduced SIMS, a new method of calculating the solvent SES surface, which can eliminate the self-intersecting parts and the smooth singular regions of the SES. Ryu et al. [49] proposed a method based on Beta-shapes, which is a generalization of a shape [50]. In 2006, Can et al. [51] developed LSMS to generate the SES on grid points utilizing level-set algorithms. The software used the fast marching method to reach the molecular surface by propagating an initial seed surface.Yu et al. [39] designed a new tool GAMer, for mesh generation and quality improvement on the Gaussian surface. In 2008, Bajaj et al. [52] implemented a new program Molsurf to generate meshes on various types of molecular surfaces using high-order level-set methods. In 2009, Zhang et al. [53] presented a tool EDTsurf for mesh generation of the VDWs, SAS, and SES based on the LSMS algorithm. Chavent et al. [54] applied a ray-casting algorithm to visualize the molecular skin surface. In 2011, Chen et al. proposed a skin surface meshing software, known as TMSmesh [37, 55], for generating arbitrary macromolecules. In 2013, Decherchi et al. [56] presented NanoShaper, a software based on the ray-casting algorithm, that can generate surface meshes for the SES, molecular skin surface and Gaussian surface. NanoShaper primarily includes five parts [56]: 1) a surface build-up part, where the shape of the surface is calculated, analytically if possible; 2) a ray-casting part, where grid-consistent rays are cast, the corresponding intersections with the surface are collected, and the enclosed volume is estimated; 3) a cavity detection part, where the identified cavities are possibly removed according to their volume or shape; 4) a Marching Cubes part, where the surface is triangulated consistently with the previous cavity detection/removal and the corresponding surface area is calculated; and 5) a projection part, where a subset of the grid points are projected onto the surface, with steps 1), 2), and 4) for surfacing and steps 2) and 4) for triangular mesh generation. The surface is assumed to be a manifold. In the same year, Liao et al. [38] proposed a new mesh generation algorithm using multi core GPU and CPU accelerations.
In 2015, MSMS, Molsurf, NanoShaper, EDTsurf, TMSmesh and GAMer were compared and discussed by Liu et al. [34]. These methods or tools are typically successful in calculating the surface of small and medium biomolecules, most of which are not applicable to the calculation of large molecules. In addition, in the calculation of structural biology and structural bioinformatics, most methods are primarily used for the visualization of molecules, such as GRASP, MSMS, and LSMS, and the quality of mesh generation cannot reach the standard of numerical simulation. NanoShaper can quickly generate multiple surface meshes for biomolecules, but for molecules with complex shapes, the resulting meshes are not guaranteed to be manifold. TMSmesh [37] is a tool that can produce high quality meshes for biological macromolecules, but its operational efficiency requires further improvement. In this regard, Liu et al. proposed an improved version, i.e. TMSmesh 2.0 [55]. Their results show that TMSmesh 2.0 is robust, efficient and more than 30 times faster as compared to the previous version.
The reasons that TMSmesh 2.0 [57] is at least 30 times faster than the old version of TMSmesh are as follows: First, the new adaptive way of partition process to locate the surface reduces the number of surface-intersecting cubes and different sizes of cubes are used according to the approximation accuracy of the piecewise trilinear surface in the new method, instead of using the same-sized cubes in the previous method. A smaller number of cubes are used to precisely locate the surface.
Second, a more efficient and much sharper bound estimator of the summation of Gaussian kernels in a cube is adopted as shown in Fig. 3.
Third, trilinear polynomials are used to approximate the surface, which reduces the computation cost significantly. For a trilinear surface (see Fig. 4), the surface points and fold curves can be computed explicitly, and the fold curves are explicitly straight lines, thus rendering the tracing process easier.
Mesh generation and surface remeshing
Mesh generation
A mesh is a discretization of a geometric domain into small and simple units or elements [58], where triangles and quadrilaterals are most commonly used as the basic elements. This study is primarily concerned with triangular meshing. In addition to triangular meshing, quad meshing [59,60,61,62] and hexahedral meshing [63] are also important and involves further challenges due to their computational complexities. In this regard, a wave-based method [64] is used to remesh a given surface into anisotropic-sized quads. This method is capable of symmetric structure preservation and anisotropic/isotropic size control. Geometric objects are typically converted to meshes for efficient rendering and numerical solution of partial differential equations. Therefore, mesh generation becomes one of the essential steps for most geometry processing applications [65].
Challenges in surface remeshing
Various approaches in surface remeshing have their own targeted goals and objectives. However, the following challenges are mostly considered for analysis of the applicability and validity of a specific approach [66, 67].
Geometric fidelity
The output mesh is typically required to have a best approximation of the input mesh geometry. Numerically, the approximation error is computed for analysis of geometric fidelity. Typically, the Hausdorff distance is used to estimate the approximation between the original input mesh and the improved one. A number of studies [68,69,70] have focused on the calculation of the Hausdorff distance. The Hausdorff distance can be one sided from the input mesh Mi to the output mesh Mf, as calculated by Eq. (4), or two sided as calculated by Eq. (6) [66].
$$ {d}_h\left({M}_i,{M}_f\right)=\underset{p\in {M}_i}{\max}\left(p,{M}_f\right) $$
(4)
whered(p, q)is the Euclidean distance between two points p and q in a 3D space. The distance from point p to surface M is defined as the shortest distance between p and the nearest point in M, as given in Eq. (5)
$$ d\left(p,M\right)=\underset{q\in M}{\min }d\left(p,q\right) $$
(5)
Note that in one sided Hausdorff distancedh(Mi, Mf) ≠ dh(Mf, Mi). The two-sided Hausdorff distance is given in Eq. (6) [66].
$$ {d}_H\left({M}_i,{M}_f\right)=\max \left\{{d}_h\left({M}_i,{M}_f\right),{d}_h\left({M}_f,{M}_i\right)\right\} $$
(6)
Element quality
Quality improvement of mesh elements (edges, vertices, triangles) is an important goal in surface remeshing. Typically regular vertices are preferred. Short edges and triangles with small or large angles are avoided to improve the efficiency of numerical simulations [71]. Similarly, aspect ratio close to 0 is avoided [67, 72].
Validity and complexity
The output mesh should be valid. The validity of mesh ensures that the mesh is closed and a simple manifold [67]. The mesh complexity is typically computed as the number of elements. This number is usually required to be minimal, yet sufficient elements are required to ensure mesh quality and geometric fidelity [66].
Input mesh: uncertainties and defects
Prior to surface remeshing, the mesh is generated with any suitable method for a given application. In order to ensure a better approximation, the mesh is generated with high complexity, e.g., with 3D scanners [67]. Similarly, the meshes generated with TMSmesh [37, 55] for molecular surfaces also contain irregular elements, redundant vertices and complex geometries due to the irregular shapes of molecular surfaces. Therefore, surface remeshing becomes further challenging when the input mesh has several defects and complex structures.
Meshing quality measurements
The main objective of quality meshing is to improve different quality parameters. The parameters used for meshing quality measurements in previous papers [55, 65, 66, 73] are described in the following. The triangle quality calculated for a triangle t is used for the mesh quality analysis which is given as
$$ Q(t)=\frac{6}{\sqrt{3}}\frac{A_t}{p_t{h}_t} $$
where At is the area of the triangle t, ptis its half-perimeter, and ht is the length of its longest edge [74]. Typically, Qmin(minimum quality) and Qavg(average quality) are used for analysis of the meshing results. Similarly, θmin(minimal angles) and θmax(maximal angles), are also used for comparison. In addition, \( {\overline{\theta}}_{\mathrm{min}} \) representing the average value of the minimum angles in each triangle, and the percentage ratio of the small and large angles triangle are used. The area and volume preservation are also used for some of the applications [73, 75]. Similarly, for feature preservation, the Hausdorff distance is also used in the results analysis [66, 76]. The improvement in mesh regularity is also considered i.e. the ratio of the regular vertices. A regular vertex has a valance of six for the interior vertices and four for the boundary vertices. Furthermore, the aspect ratios (min, max) are computed using Eq. (7).
$$ AR=\frac{abc}{8\left(S-a\right)\left(S-b\right)\left(S-c\right)} $$
(7)
where a, b, and care the lengths of the triangle’s edges and S = (a + b + c)/2 .
Molecular surface remeshing
Molecular surface mesh generation pipeline
A benchmark for molecular structures in the PDB (protein data bank) and PQR formats can be found in the following website: (http://lsec.cc.ac.cn/~lubz/Meshing.html), which was used in the previous TMSmesh tests. In the PQR format, the occupancy and temperature factor columns of a PDB file is replaced with charge Q and radius R, respectively. These files are compatible with several popular computational biology tools [77]. The PQR files are used in TMSmesh [37, 57] for surface mesh generation. The surface mesh generated by TMSmesh 2.0 typically has a number of zero degree angles and redundant vertices, which requires further refinement. For example, SMOPT, ISO2mesh or Taubin method [78] can be used for mesh improvement at this stage.
State-of-the-art methods in molecular surface remeshing
In computer graphics, researchers have presented many surface remeshing methods. These methods include mesh simplification-based methods [79, 80], Delaunay insertion methods [81], advancing-front-based method [82], field-based approaches [83, 84], and local operators-based mesh optimization [85, 86]. In addition to these, global optimization methods are widely used, including parameterization-based methods [87, 88], discrete clustering [89], and direct 3D optimization [90,91,92,93]. Furthermore, segmentation-based meshing can use the input meshes as a segment prior to remeshing, which facilitates in sharp feature preservation [94, 95]. For implicit feature preservation, several efficient feature functions are proposed [66, 89]. Laplacian smoothing [96] is the simplest method that moves each vertex to the central position of its neighbor. Equation (8) computes the new position vffor a free vertex vi as the median of the positions of the n vertices q1, q2, q3, ⋯, qn in its one-ring neighborhood.
$$ {v}_f=\frac{1}{n}\sum \limits_{j=1}^n{q}_j $$
(8)
Taubin [78] presented a LowPass filter method by combining two Laplacian-like filters, one with a positive parameter and the other with a negative parameter. The method computes the new position pf from the old positionpi using Eq. (9). Here, the weighting factor w is typically set to 1. Here, λ is the weighting factor, which is replaced by another weighting factor m = − (λ + e) with a small value e = 0.02. The parameter e is used to set the value of m to be just smaller than − λ. These two weighting factors, including l and m, are alternatively applied for the backward translation.
$$ {p}_f={p}_i+\lambda \sum \limits_{j=1}^n\omega \left({q}_j-{p}_i\right) $$
(9)
In the field of molecular modeling, Decherchi and Rocchia [97] triangulated complex manifold surfaces using the ray-casting method in the Nano-bioscience field. They provided an overview of the molecular surfaces in implicit solvent modeling and simulations utilizing the BEM and the FEM. TMSmesh [37, 55], (as described in “Background” section) is a software for generating arbitrary molecular surface meshes. The improved version TMSmesh 2.0 is being used for efficiently generating manifold surface meshes for biomolecules that exceed one million atoms with shape and feature preservations [34]. A new tool, named Molecular Finite Element Solver (mFES) [98], uses tetrahedral finite elements to calculate the electrostatic potentials of large molecular systems. ISO2mesh [99] is a free matlab/octave-based toolbox, which is widely applied for mesh generation and processing. In general, ISO2mesh is used to create tetrahedral meshes from surface meshes and 3D binary and gray-scale volumetric images, which include segmented MRI/CT scans. It is also used for molecular mesh smoothing; however, it cannot process self-intersecting triangle pairs and small angle triangles. Liu et al. [75] proposed an algorithm called SMOPT, which is used for molecular surface remeshing. They used local modifications on the mesh to improve the mesh quality, eliminate redundant vertices, avoid non-manifold errors and remove intersecting triangles. For mesh smoothing, SMOPT has improved the Laplacian smoothing that is given in Eq. (10).
$$ {p}_i=\left(1-\beta \right){q}_i+\frac{\beta }{N}\sum \limits_{j=1}^{N_i}{q}_j $$
(10)
where β ∈ (0, 1) is the parameter to control the rate of smoothing, Nirepresents the number of vertices in one ring and qj represent the jth adjacent vertex in the one-ring of the ith vertex. The results of SMOPT show a significant improvement in the mesh quality. However, there still exist very small angles that destroy the quality of triangles. In our recent paper [73], we used a cut-and-fill strategy to remove invalid regions and to refill the holes. In addition, we used local operators to refine the local regions in the neighborhood of the small triangles. This method showed a significant improvement in minimal/maximal angle improvements, aspect ratios, and other meshing quality parameters. However, further improvements such as none-obtuse remeshing and the improvement in the adaptive density are possible in this method.