A comprehensive review of machine learning techniques on diabetes detection

Sharma, Toshita; Shah, Manan

doi:10.1186/s42492-021-00097-7

Visual Computing for Industry, Biomedicine, and Art

Table 4 A comprehensive study of the deep learning methods done by some researchers

From: A comprehensive review of machine learning techniques on diabetes detection

Algorithm	Method used/innovation	Application	Results and limitations (if specified)	References
Deep belief neural network	The dataset used was PIMA Indian diabetes dataset with 768 instances and 8 features. The activation hidden function used was ReLU rectifier linear unit with three hidden layers and sigmoid as the input activation function. The batch size at 100 and epochs set to 5.	As the author compares his network with conventional methods of machine learning classifiers and it obtained high results, they believe that this network can be tweaked according to convenience and used for detection of other diseases as well.	The results of the network proposed by the author and conventional methods were seen on three parameters: recall value, precision and F1 measure. This network obtained high recall, precision, and F1 measure values, which shows that it is a good model.	[46]
Long short-term memory (LSTM) neural networks – RNNs	The author used Direct Net Inpatient Accuracy Study dataset which contains approximately 110 instances. The neural network model proposed here has one LSTM layer. The model works on predicting blood sugar levels. For the search grid, the author takes into consideration LSTM units, Dense units, and sequence lengths. The parameter taken for evaluating model performance is RMSE.	The author elaborated on the real-life applications by proposing to deploy LSTMs models on mobile platforms, apps, and cloud servers for availability to the masses.	It was concluded that the use of LSTM for blood glucose level prediction is promising. The values of RMSE obtained are 4.67 which is the minimum and 29.12 being the max value. Missing data is an issue for the model as if the patient removes the CGM device, the model should be trained in a format that it can automatically handle the missing data.	[47]
Deep prediction model	The data taken was of six individuals in the age group 22-29 years. Deep prediction model is a multiple layer model consisting of data driven predictors which is fed with glucose measurement and time series. The first part is the autoregressive models with external inputs or the ANN’s followed by the extreme learning machine (ELM) and the glucose level predictions are given. The learning speed of ELM’s is very fast and also, they are easier to implement.	ELM models are easy to implement any classification application and can do approximations for continuous functions; they are widely used in this field of research for implementing efficient models.	The model was evaluated on three parameters–RMSE, CC, and time lag (TL) at different glucose concentrations for PH = 15, 30, and 45 mins with input combinations of three types involving glucose and sugar concentrations. For all PHs the linear models along with the ELM outperforms the ANN-ELM achieving reduced TL, reduced RMSE and increased CC.	[48]
Empirical mode decomposition (EDM) and LSTM	The dataset was obtained from shanghai hospital containing 174 instances. The data is used for training two models: one is LSTM and the other is LSTM+EDM. LSTMs are improved versions of RNN and EDM is an adaptive signal decomposition method for non-linear and non-stationary methods. The performance evaluation measures are MAE and RMSE.	The author proposed a more accurate model by treating it with real-time data or personalized data to give better results. Deploying of the model to mobile clients was also suggested.	The parameters considered for evaluating the performance accuracy were the MAE and RMSE. The MAE measures the prediction error and the RMSE gives the deviation between the observed value and the truth value. The results are observed over a time interval of 30, 60, 90 and 120 mins.	[49]
Temporal convolutional network with vanilla LSTM	The raw dataset included two male instances and four female instances. Feature selection was performed on the dataset followed by the hyperparameter tuning. The LSTM network used had 3 hidden layers with size equals 50. The TCN block has 10 layers with each having a dilation factor of 2 and kernel size 4.	The author stated that this study of using DL algorithms in time-series prediction would help practitioners and researchers for selecting appropriate models with pragmatical parameters.	The author used three parameters for evaluation, namely RMSE, temporal gain, and normalized energy of second order. The values of RMSE, temporal gain, and energy were much higher for the vanilla LSTM than the TCN. The limitation for the model was that it does not collect data that involves deeper meaning for personalized output.	[50]
ANN	Medical dataset taken from Noakhali medical college containing data of 9483 patients with 14 features. The dataset was split into 80% for training and the rest 20% for testing. For ANN, the author chose the SoftMax activation function with six hidden layers. The training was done using ReLU activation function with 25 epochs.	Because ANNs achieve high accuracy, the author recommends that using those for model prediction and detection of diseases would be a much better alternative than other ML models.	For increasing model performance, an extra hidden layer was added and the epochs were increased to 100. The accuracy came out to be 95.14% on the testing data and 96.42% on training. The author claims that models achieved low accuracy because smaller datasets and models incapability of adaption to various datasets.	[30]
LSTM and Bi-LSTM (RNN)	The author collected the dataset from real patients by monitoring their health. The model was trained and tested on 26 datasets containing real-time CGM data. The model consists of one LSTM layer and one Bi-LSTM layer each with 4 units, and three fully connected layers with 8, 64 and 8 units, respectively. The epochs ranged from 100 to 2000. The parameters used for evaluation were RMSE, CC, and TL.	The author proposed the model use for oral drugs, insulin pens, and the CSII pumps, which all incorporate CGM measurements.	The results were calculated at different PH = 15, 30, 45, and 60 mins. On running the models for different epochs from 100 to 2000 on a difference of 100, epoch number 900, 1300, 1500, 1700 were the ones that showed good accuracy and hence they were chosen as pre-train epoch number for particular PH levels.	[51]
CNN	A risk prediction can accurately identify the risk of the disease using CNN. The study was based on a group of steel workers numbered around 5900. A research survey was conducted to gain real-time data on features like gender, age, disease history, lifestyle, and physical examinations. The model was evaluated using ROCs and area under curve.	It provides a basis for self-health management of steel workers, facilitate the rational allocation of medical and health resources and the development of health services, and provide a basis for government departments to make decisions.	The prediction accuracy of the model in the three data sets was 94.5%, 91.0%, and 89.0%, respectively, and the AUC was 0.950 (95%CI: 0.938–0.962), 0.916 (95%CI: 0.888–0.945), and 0.899 (95%CI: 0.859–0.939). It shows that the established model can accurately predict the risk of type-2 diabetes in steel workers.	[52]
AlexNet and GoogLeNet	In this study, the author developed a model called IGRNet to detect and diagnose prediabetes effectively using a 12-lead electrocardiogram lasting 5 s of 2251 case data. The neural networks used were compared with traditional ML algorithms like SVM, RF, and KNN.	The author suggested to use this hybrid model for future predictions due to its efficient performance.	The results of the networks were compared over different activation functions. The IGRNet model gained an accuracy of 85.6%, which was higher than any other model used for comparison.	[53]
DNN	The data collection was retrieved from the UCI machine learning repository – PIMA Indian diabetes dataset. For the neural network, the hidden layer count is 4. The numbers of neurons in the layers were 12, 16, 16, and 14.	The proposed system will be supportive for the medical staff and as well as for the common people because with five-fold cross-validation, the accuracy was more than any other model and on comparing it with other established models of authors, it came out to be the highest.	The data samples were divided into 5-fold and 10-fold cross-validation. The accuracy gained on this model for five-fold was 98.04% and for ten-fold was 97.27%. The results were also analyzed through ROCs and F1 score.	[54]
LSTM and GRU	The dataset used included records of over 14000 patients from 2010 to 2015. Episodes were used to represent those measures. Each sequence had 30 features. The dataset used for training the LSTM and GRU models. The results were compared with the MLP models.	The models received a high accuracy of 97% even with 3-d length sequence.	The models, LSTM and GRU achieved higher than MLP. For longer dependencies, LSTM outperformed while on shorter sequences, GLU performed better. They gained an accuracy of over 97%. Due to lack of datasets that are more concentrated toward type-2 diabetes, replicating this work using different datasets can be difficult.	[55]
Multilayer perceptron network	The author used the PIMA Indian diabetes dataset. The ANN developed contains 4 layers for the MLP network having 8-12-8-1 nodes. Three more networks were formed with nodes 8-32-32-1, 8-64-64-1 and 8-128-128-1. With the ReLU activation function in the input layer and hidden layers, the output layer is sigmoid.	The author proposed to extend this work for further accuracy increase to help in early prediction for diabetes.	The results for the perceptron were calculated for 10 runs for 150 epochs. The highest average accuracy was gained by the network having nodes 8-128-128-1 followed by 8-64-64-1 and then 8-32-32-1.	[56]
ELM algorithm	The data was acquired from multiple sources, including medical laboratories, hospitals, and public datasets. The dataset was pre-processed and meant to include 12 important attributes affecting diabetes. ELM has faster ability for training. Three hundred and twenty samples were used for training and 480 for training.	The goal of this research was the study of diabetic treatment in healthcare using big data and machine learning. It presents a big data processing system by employing an ELM algorithm.	The proposed approach was proven to be efficient. The goal was to reduce the FP and FN and boost the precision and recall rates, which was achieved by the author.	[57]
Neural network	The author obtained dataset from Luzhou from hospital physical examination. An independent test set was taken out with 13700 samples. The data contained 14 attributes. Another dataset was the PIMA Indian diabetes dataset. This method is a two-layer network with sigmoid hidden and SoftMax output neurons.	The author hopes to predict the type of diabetes using a dataset which is a lead for improving accuracy.	The neural network gained an accuracy of 74.14% on Luzhou dataset and 74.75% on PIMA Indian dataset with the use of principle component analysis.	[37]
CNN	Dataset used by the author was taken from National Institute of Diabetes, which consists of nine parameters. Fuzzification was done on the dataset for CNN which becomes populated. The α values of the neural network were 2 and 5. These networks were compared with a CNN.	Fuzzification proves to be of a useful nature since it provides diverse data to train on.	The neural network with value of α = 2 performed better α value 5. CNN performed better than both of them.	[58]

Back to article page