Machine learning approach for the prediction of macrosomia

Fetal macrosomia is associated with maternal and newborn complications due to incorrect fetal weight estimation or inappropriate choice of delivery models. The early screening and evaluation of macrosomia in the third trimester can improve delivery outcomes and reduce complications. However, traditional clinical and ultrasound examinations face difficulties in obtaining accurate fetal measurements during the third trimester of pregnancy. This study aims to develop a comprehensive predictive model for detecting macrosomia using machine learning (ML) algorithms. The accuracy of macrosomia prediction using logistic regression, k-nearest neighbors, support vector machine, random forest (RF), XGBoost, and LightGBM algorithms was explored. Each approach was trained and validated using data from 3244 pregnant women at a hospital in southern China. The information gain method was employed to identify deterministic features associated with the occurrence of macrosomia. The performance of six ML algorithms based on the recall and area under the curve evaluation metrics were compared. To develop an efficient prediction model, two sets of experiments based on ultrasound examination records within 1-7 days and 8-14 days prior to delivery were conducted. The ensemble model, comprising the RF, XGBoost, and LightGBM algorithms, showed encouraging results. For each experimental group, the proposed ensemble model outperformed other ML approaches and the traditional Hadlock formula. The experimental results indicate that, with the most risk-relevant features, the ML algorithms presented in this study can predict macrosomia and assist obstetricians in selecting more appropriate delivery models.


Introduction
A newborn with a birth weight of 4000 g or more is described by the term macrosomia [1].Fetal macrosomia can cause multiple maternal and fetal complications.For instance, it can increase the risk of cesarean section for mothers, leading to prolonged labor, labor block, postpartum bleeding, chorioamnionitis, and a higher likelihood of soft birth canal laceration.It also increases the risk of shoulder dystocia, brachial plexus injury, and clavicle fracture in fetuses and newborns.During vaginal delivery, the baseline incidence of shoulder dystocia is 0.2%-3.0%,but when the birth weight reaches 4500 g, the risk of shoulder dystocia dramatically increases to 9%-14% [2].Given the serious complications involved, timely diagnosis of macrosomia and selection of a more appropriate mode of delivery for pregnant women are clinically significant.
The size and shape of the pelvis are crucial in guiding the choice of delivery model.Owing to factors such as race, nutrition, genetics, and endocrine influences, the morphological structure of the pelvis varies considerably among Chinese and Western women [3].The pelvises of Western women, such as those from the United States, are often anthropoid.The anteroposterior diameter of an anthropoid pelvis is larger than its lateral diameter, which is a physiological structure that facilitates spontaneous delivery.Approximately half of the pelvises of Chinese women are gynecoid, characterized by a shallow pelvic cavity [4].If the fetus is overweight or has a large head, the risk of cesarean section significantly increases [5].Therefore, screening and evaluation of macrosomia in the third trimester are particularly important for Chinese women, as they can improve delivery outcomes and reduce maternal and fetal complications [6,7].Among the reports for predicting macrosomia, two broad categories of screening methods are clinical examination and ultrasound assessment [8,9].
Obstetricians have developed simple formulas to predict fetal weight, which are combined with clinical features, such as maternal abdominal or fundal height [10].However, factors such as the degree of obesity in pregnant women, abdominal wall thickness, uterine tension, fetal posture, and amniotic fluid depth can lead to significant errors in these formulas when predicting fetal weight, making them insufficiently accurate.Previous studies have indicated that clinical examinations often result in large prediction errors that do not meet clinical requirements [11,12].
With ongoing advancements in ultrasound equipment and technology, the prediction of fetal weight based on ultrasound measurements of various fetal biological features has become widely utilized.Reports suggest that the ultrasound examination method demonstrates higher accuracy than those of clinical examination methods [13].Fetal biparietal diameter (BPD), head circumference (HC), abdominal circumference (AC), and femur length (FL) are the most commonly used biological parameters for estimating fetal weight (EFW).Siemer et al. [14] compared the accuracy of 11 widely used EFW formulas and found that the estimation of fetal weight based on Hadlock formulas [15] tends to be more accurate, and multiparameter estimation methods can enhance the precision of EFW.However, these formulas, established by Western scholars, may not account for variations among individuals from different ethnic groups [16].When applying these methods in China, particularly to large or low-weight fetuses, individual differences among populations can result in significant errors.For instance, the birth weights of only 33%-44% of cases with ultrasound-estimated fetal weights over 4500 g can be accurately predicted [9,[17][18][19][20][21].Additionally, maternal self-parameters and measurement techniques directly affect the accuracy of EFW.Obtaining precise fetal measurements during the third trimester is often challenging [22], and the absolute error tends to increase with higher estimated fetal weights [19,21].Consequently, there is still room for improvement in the current ultrasound examinations used for fetal weight estimation.
Furthermore, machine learning (ML) technologies have previously shown benefits in numerous application domains, including speech recognition, image processing, facial recognition, and automatic diagnosis [23][24][25][26].Additionally, they have been validated for their precision and value in predicting disease outcomes [27].Consequently, ML techniques have the potential to enhance the efficiency and rationality of decision-making in the prognosis of macrosomia, ultimately aiming to minimize birth defects.
Recently, ML technologies have been recognized and utilized as tools for predicting birth weight.Akhtar et al. [28] conducted a comprehensive study on predicting large for gestational age (LGA) using ML techniques and proposed a support vector machine (SVM) model with a subset of 30 features as the most effective classification model, achieving a precision score of 85% and area under the curve (AUC) of 72%.Their dataset encompassed 220 pilot counties across all 31 provinces of China from 2010 to 2013 [29].Ye et al. [30] evaluated and compared the accuracies of nonlinear and quadratic mixed-effects models combined with 26 different empirical formulas for EFW.They suggested that ensemble learning could enhance the prediction of LGA.Their datasets were compiled in Norway and Sweden between 1986 and 1988.Lu et al. [31] introduced an ensemble model comprising random forest (RF), XGBoost, and LightGBM algorithms, achieving 64.3% accuracy and 7% mean relative error in predicting fetal weight.Although numerous studies have aimed to predict fetal weight with a certain level of accuracy, there is a paucity of research applying ML techniques with a limited number of features to comprehensively detect macrosomia in pregnant women from southern China.
In this study, a dataset of pregnant women in southern China was established.In addition to the features derived from ultrasound examinations, the clinical characteristics of the pregnant women, including pre-pregnancy body mass index (BMI), gestational weight gain (GWG), fasting blood glucose (FBG), and 2-h postprandial blood glucose (2hPG) were also considered.Subsequently, information gain (IG), a standard univariate filtering method, was used to select the top-ranked features from a pool of 12 [32].For comparative analysis, six different ML classifiers and the most effective Hadlock formula method [15] were utilized to assess the macrosomia classification performance.However, applying classifiers directly to an imbalanced dataset can significantly affect the experimental outcomes.Therefore, class balancing procedures are essential.An ensemble model was used to refine the results, drawing inspiration from a previous study [31].

Data preprocessing
Data from 3244 pregnant women who delivered between September 2017 and August 2019 at the University of Hong Kong-Shenzhen Hospital were collected and analyzed retrospectively.The dataset was based on electronic health records, which included maternal, fetal, and neonatal clinical features.
Before conducting formal experiments, basic preprocessing steps were implemented.Special cases, such as twins, premature births before 37 weeks, and infants with birth weights below 2500 g, were excluded.Additionally, pregnant women with incomplete records or apparent errors in the clinical data of mothers, fetal parameters, and neonatal outcomes were also excluded.The actual weight data for these fetuses were accurate, with no missing or apparent errors.This study was approved by the Medical Ethics Committee of The University of Hong Kong-Shenzhen Hospital.
To diagnose macrosomia accurately, weighing newborns after birth is essential.In early pregnancy, additional ultrasound examinations do not improve accuracy.A single ultrasound examination during the third trimester is currently the simplest and most effective method for predicting macrosomia [33].Most birth weight prediction formulas rely primarily on prenatal ultrasound measurements obtained within one week prior to delivery [14,31,34].This study used ultrasound measurements taken within 1-7 days and 8-14 days prior to delivery as input data to establish an efficient prediction model that ensures the accuracy of macrosomia screening before birth.The former aimed to predict macrosomia as comprehensively as possible, while the latter was intended to validate the model.
In this study, we conducted two groups of experiments.The first group utilized six different ML algorithms to classify macrosomia for pregnant women with ultrasound examination records within 1-7 days prior to delivery.Subsequently, the most effective models were combined to form a new model expected to achieve optimal prediction performance.The second experimental setup was designed to validate the model for pregnant women with ultrasound examination records within 8--14 days before delivery.
In this study, macrosomia refers to infants with a birth weight of 4000 g or more.In the first experiment, 46 infants were classified as macrosomia and 1044 as nonmacrosomia, totaling 1090 samples with ultrasound examination records within 1-7 days prior to delivery.The second group comprised 936 samples, including 37 cases of macrosomia and 899 non-macrosomia, with records of ultrasound examinations within 8-14 days prior to delivery.We encoded the actual weight of the newborn and estimated fetal weight using the equation derived by Hadlock et al. [15].The label was assigned a value of 1 if the actual or estimated weight was 4000 g or more and 0 otherwise.
The dataset included the following 12 features: prepregnancy BMI, GWG, in kg, gestational week (GA), gestational diabetes mellitus (GDM), amniotic fluid index (AFI), time interval between the last ultrasound examination and delivery (Interval), FBG, in mmol/L, 2hPG, in mmol/L, fetal AC, in mm, fetal HC, in mm, fetal FL, in mm, and fetal BPD, in mm.Table 1 presents the definitions of each feature.

Feature standardization
Since different features can have varying units and orders of magnitude, normalizing the data is essential to minimize their impact on the prediction outcomes and ensure that each feature is on a comparable scale.The normalization is shown in Eq. (1): where x represents the current feature value; x min and x max represent the minimum and maximum values of the current feature, respectively; and y is the normalized fea- ture value [35].The data range is [0, 1]. (1)

Feature selection
Feature selection is a widely utilized technique for identifying features that exhibit a strong correlation with the target class while remaining uncorrelated with other classes.The primary goal of applying feature selection in this study was to develop a classification model that offers enhanced performance and reduced computational overhead.Recently, IG has been employed in various medical domains to screen the top features, yielding positive outcomes [36,37].This study adopted IG as a feature selection method to enhance model performance based on these findings.
Generally, this is the difference between the information entropy of the macrosomia dataset A with and without feature t.There are L class labels in dataset A, and the information entropy of a class [38] in dataset A is denoted by H(A) , which is defined as where P i is the probability of a labeled class in the mac- rosomia dataset A.
The macrosomia dataset A is further divided into K groups by feature t with K different values, namely, A k (k = 1,2, . . ., K ) .The entropy of each group is calcu- lated as where P ki defines the probability of a labeled class in sub- set data A k of the basic data A. As each group of subset data A k contains W k samples where ( k = 1,2, . . ., K ), the weight of each group is set to W k ∕ W .The IG [39] of each feature t can be written as Subsequently, the scores generated by IG are sorted in descending order, and the top i features are selected as the best variable set for classification.

ML algorithms
This study aims to address the binary classification problem [40], enabling doctors to detect and diagnose macrosomia as early as possible and provide guidance for delivery methods.ML algorithms offer advantages such as self-training, generalization, self-organization, and learning capabilities.The objective of this study is to develop an effective ML prediction model capable of classifying and predicting macrosomia and non-macrosomia.The performance of the logistic regression (LR), k-nearest neighbors (KNN), SVM, RF, XGBoost, and LightGBM algorithms using the scikit-learn Python toolkit with default parameters were evaluated.
The LR [41] algorithm is a statistical method used for binary classification problems.It estimates the probability of a target variable belonging to a particular class using a logistic function that transforms linear combinations of features into probabilities.The KNN [42] is a simple and effective classification method that assigns a new data point to the class of the majority of its KNN.The SVM classifier [43] is a binary classification method that uses hyperplanes to separate the data points of different classes.It aims to maximize the distance between the hyperplanes and the closest data points of each class, resulting in a robust and accurate classifier.For RF, many regression decision trees are incorporated to improve the accuracy of classification and regression tasks by constructing multiple decision trees and combining their predictive results [44].It reduces overfitting by randomly sampling data and features and has good generalization ability [45].XGBoost [46] is an efficient and scalable ML algorithm that uses gradient boosting to build strong predictive models that provide accurate and robust solutions for various classification, regression, and ranking tasks.LightGBM [47] is a gradient-boosting framework that uses efficient parallel training to achieve high performance and low memory consumption.It offers better accuracy and faster training and supports large-scale datasets, making it a versatile tool for ML tasks.Light-GBM uses the many-vs-many segmentation method to divide the category features into two subsets to achieve optimal segmentation of the category features.Ensemble methods of creating multiple models in ML are effective prediction methods because they can improve the prediction performance and generalization ability by ( 4)

Feature Definition
x 2hPG 2-h postprandial blood glucose (mmol/L) x BMI Pre-pregnancy body mass index x AC Fetal abdominal circumference (mm) x FBG Fasting blood glucose (mmol/L) x HC Fetal head circumference (mm) x FL Fetal femur length (mm) x GWG Gestational weight gain (kg) x BPD Fetal biparietal diameter (mm) x GA Gestational week (week) x Interval Time interval between the last ultrasound examination and delivery (day) x GDM Gestational diabetes mellitus x AFI Amniotic fluid index combining multiple base learners, thereby reducing overfitting and enhancing the classification accuracy [48].

Performance evaluation indices
The model performance was assessed and compared using two key indicators to determine the optimal prediction model for macrosomia.The first indicator is Recall, which quantifies the accuracy of correctly identifying true cases of macrosomia.Clinically, a high sensitivity in predicting macrosomia is crucial.The second indicator is the AUC, which provides stable results even with imbalanced datasets.A model with a higher AUC indicates superior performance.

Comparison of ML prediction models
As defined above, we executed the experiment using tenfold cross-validation on 1090 cases with ultrasound examination records within 1-7 days prior to delivery and containing 12 features.Table 2 lists the Recall and AUC values of all six ML classifiers and ultrasound estimation based on the Hadlock formula.
From the results, we can see that LR, SVM, and KNN performed inefficiently by producing notably low results for Recall and AUC; RF performed best in terms of Recall (0.8300), and XGBoost performed best in terms of AUC (0.8193), whereas LightGBM performed well in both Recall and AUC values.Compared to the ultrasonic estimation, all six ML classifiers performed better.These results demonstrate that ML algorithms may further improve the accuracy of macrosomia screening than that of the Hadlock formula.

Analysis of feature selection method
We chose IG as our feature selection method to achieve better performance for each classifier.It ranks the features in descending order based on their high IG entropy.Applying this process, three features with significantly lower IG entropy, namely, x Interval , x GDM , and x AFI , were excluded from the 12 features.The selection results for these features are shown in Fig. 1.
Subsequently, all six ML classifiers were trained using tenfold cross-validation on a dataset that included 1090 cases with nine features for each case.The classification results are listed in Table 3.On comparing the results of Tables 2 and 3, it can be observed that all six ML classifiers demonstrated better prediction performance on the dataset containing nine features than on the dataset containing 12 features.It suggests that the use of the IG method to select features helps improve the prediction performance to some extent.

Analysis of the ensemble model
As previously discussed, we utilized an ensemble model that integrates the top-performing models from Table 3, RF, XGBoost, and LightGBM, to enhance classification performance.This ensemble model was constructed using a voting mechanism.Subsequently, the three individual models and ensemble model underwent tenfold cross-validation training on a dataset comprising 1090 cases, each featuring nine features, to evaluate their effectiveness in screening for macrosomia.The results of these experiments are presented in Table 4.
Table 4 indicates that the ensemble model marginally improved the Recall and AUC values compared to the individual ML algorithms and Hadlock formula.Specifically, Recall showed an improvement of 0.5, while the AUC increased by an estimated 0.17.

Effectiveness verification of ensemble model
Traditionally, fetal weight estimation has often relied on the Hadlock formula, which is based on multiple ultrasound measurements.However, some fetuses grow rapidly after reaching term, leading to varying ultrasound examination records across different GAs.Consequently, the use of the Hadlock formula may result in significant deviations in the predicted fetal and birth weights owing to the extended time interval between ultrasound examination and delivery [33].
To further validate the ensemble model for predicting fetal weight, it was applied to a dataset comprising ultrasound examination records from 8-14 days before delivery.This dataset includes 936 samples, each featuring nine key features.The classification results are listed in Table 5.
As shown in Table 5, the ensemble model described in this study demonstrated a notable improvement in Recall and AUC values when compared to the Hadlock formula and the other three ML classifiers.Specifically, Recall and AUC were improved by 0.75 and 0.24, respectively.

Discussion
There remains scope for advancement in the detection of macrosomia, particularly among pregnant women in southern China.This study focused on pregnant women in southern China who delivered singleton infants at term.We utilized ML algorithms, feeding in maternal and infant features as inputs and EFW labels as outputs.The IG method, grounded in information entropy theory, was employed to evaluate the features that are most predictive of macrosomia.Following feature selection, we conducted two sets of evaluations to assess the prediction performance of macrosomia using ML algorithms.Our approach began with six ML classifiers; the results indicated that RF, XGBoost, and LightGBM performed the best (Table 4).To optimize prediction performance, an ensemble model that integrates the top-performing classifiers (RF, XGBoost, and LightGBM) to predict macrosomia was developed.

Effective prediction of macrosomia within 1-7 days prior to delivery
The first experimental group, which included ultrasound examination records from days prior to delivery, was used to validate our hypothesis.Table 4 shows that the ensemble model introduced in this study is highly effective for screening macrosomia when the interval between the final ultrasound and delivery is 1-7 days [22].This ensemble model outperformed the traditional Hadlock formula in terms of predictive accuracy.

Predicting macrosomia within 8-14 days prior to delivery remains valuable
To verify whether this model was also effective in predicting macrosomia in pregnant women who had ultrasound examination records within 8--14 days before delivery, we conducted a second set of experiments.As shown in Table 5, the ensemble model remains effective in detecting macrosomia, whereas the Hadlock formula has limitations [16] within this timeframe.In summary, the ensemble model may be more suitable for screening macrosomia based on ultrasound data obtained 1-2 weeks before delivery.The accurate prediction of macrosomia in the third trimester can encourage pregnant women to be more mindful of their diet and nutritional status, which are crucial for eugenics.Additionally, precise macrosomia assessment enables obstetricians to offer tailored counseling and advice to women at risk of delivering a macrosomic infant, thereby guiding them on appropriate delivery options.Careful consideration of delivery models in clinical practice can reduce the incidence of abnormal deliveries and prevent adverse outcomes for both mothers and infants.

Ensemble model construction and external validation driven by "1-14 days prior to delivery" data
In clinical practice, a full-term pregnancy is defined as a period of 37 weeks of gestation.Ultrasound examinations are recommended every two weeks during the late stages of pregnancy to ensure the safety of both mother and baby, with particular attention paid to data from the last two weeks before delivery, which is crucial for predicting fetal weight.While previous research has primarily utilized data from 1-7 days before delivery, our inclusion of data from 8-14 days aimed to maximize the capability of the model to detect macrosomia at an earlier stage; accordingly, we categorized the data into two groups: one for the period from 1-7 days before delivery and another for 8-14 days before delivery.
Based on the experimental results, the ensemble model showed significant accuracy in predicting macrosomia.Thus, we applied this model to assess its predictive performance using data from 1-14 days before delivery, with the goal of creating a broadly applicable model for late pregnancy.We aimed to enhance the accuracy of fetal growth assessment and provide robust decision support for clinical practice.
This dataset comprises 2026 cases with ultrasound examination records within 1-14 days prior to delivery, each featuring nine key features.The classification results are listed in Table 6.
Table 6 indicates that the ensemble model described in this study improves the Recall and AUC values to some extent when applied to the dataset with ultrasound examination records within 1-14 days prior to delivery compared to the Hadlock formula and three other ML classifiers.The improvements in Recall and AUC were 0.60 and 0.20, respectively.
Additionally, to provide a more intuitive illustration of the classification effectiveness of the ensemble model, a confusion matrix was constructed.This matrix features the predicted macrosomia cases along the horizontal axis and actual macrosomia instances along the vertical axis.The displayed confusion matrix has undergone normalization, resulting in the sum of each row (or column) equaling 1, which represents conditional probabilities.The diagonal values within this matrix represent the accurate classification probability for each category, also known as the recall or true positive rate.As shown in Fig. 2, the model exhibits an accuracy rate of 0.88 for predicting macrosomia, which underscores its robust predictive capability.
To comprehensively evaluate the generalization capability of the ensemble model, we additionally collected 44 samples from pregnant women who underwent ultrasound examinations at The Second People's Hospital of Shenzhen within 1-14 days before delivery, which served as an external validation test set.There were five cases of macrosomia in the sample set.External validation data were reviewed and approved by the Ethics Committee of The Second People's Hospital of Shenzhen.
The classification results are listed in Table 7.The confusion matrix is shown in Fig. 2. Based on external data validation, it can be seen that the ensemble model has good generalization performance and holds certain clinical significance.The Recall, accuracy, and AUC values were 0.6, 0.77, and 0.75, respectively.The relatively low Recall may be attributed to the small sample size.

Predicting low-birth-weight infants is still applicable
It is evident that our model is sensitive to changes in fetal weight; hence, we endeavored to predict another weight extreme: low birth weight (LBW) infants (birth weight under 2500 g [33,49]).LBW infants exhibit significantly higher morbidity and mortality rates than infants with adequate birth weight [50].According to Consequently, an ensemble model combining RF, XGBoost, and LightGBM to predict LBW infant weights, aiming to substantiate the efficacy of the model, was employed.
Initially, cases of twins and macrosomia (birth weight ≥ 4000 g), as well as records with incomplete data or apparent errors, were excluded and then focused on 1128 pregnant women with ultrasound records 1-7 days before delivery, featuring nine key features.Table 8 shows that our ensemble model outperformed the traditional Hadlock formula in predicting LBW infants.Early detection of at-risk pregnancies for LBW infants allows obstetricians to identify fetal growth restrictions and promptly enhance monitoring.This method enables more precise predictions of fetal weight at both extremes, aiding in determining the most suitable delivery method and timing to maximize maternal and fetal safety.

Limitations and future work
Our study has certain limitations that should be considered.Cases with incomplete records or evident errors were excluded from the analysis.This exclusion may   have introduced a selection bias because the removed data represented approximately 17% of the entire dataset.In the future, we plan to expand the model by incorporating data from diverse populations to ensure its applicability across various geographical regions, thereby validating the safety and predictability of our prediction model.

Conclusions
In this study, an ensemble model utilizing data from pregnant women in southern China for the precise screening of macrosomia was introduced.Our findings identified the key determinants of pregnancy that can assist obstetricians in prioritizing and enhancing monitoring.This ensemble model, which integrates RF, XGBoost, and LightGBM, exhibited a high level of reliability in detecting macrosomia in the third trimester.
The future application of this model in clinical prenatal care could significantly benefit pre-pregnancy counseling, prenatal evaluation, intrapartum care, postnatal management, and long-term reproductive health.

Table 2
Comparison of Recall and AUC values for six ML methods on 1090 samples with ultrasound examination records 1-7 days before delivery and containing 12 features Fig. 1 Feature selection using IG

Table 3
Comparison of Recall and AUC values for six ML methods on 1090 samples with ultrasound examination records 1-7 days before delivery and containing 9 features

Table 4
Prediction performance of macrosomia by the ensemble model on 1090 samples with ultrasound examination records 1-7 days before delivery and containing 9 features

Table 5
Prediction performance of macrosomia by the ensemble model on 936 samples with ultrasound examination records 8-14 days before delivery and containing 9 features

Table 6
Prediction performance of macrosomia by the ensemble model on 2026 samples with ultrasound examination records 1-14 days before delivery and containing 9 features term disabilities.Accurate prenatal fetal weight estimation is crucial for preventing complications related to LBW.Although the Hadlock formula is a widely used clinical tool for fetal weight estimation from ultrasound data, its accuracy is insufficient for LBW infants.

Table 7
Prediction performance of the ensemble model based on external data

Table 8
Prediction performance of LBW infants by the ensemble model on 1128 samples with ultrasound examination records 1-7 days before delivery and containing 9 features