Skip to main content

Table 3 Comparison of strengths and weaknesses of machine learning algorithms in electronic nose studies

From: Diagnosis of ventilator-associated pneumonia using electronic nose sensor array signals: solutions to improve the application of machine learning in respiratory research

k-nearest neighbors• Make no assumption about underlying data distribution• Does not produce a model, limiting the ability to understand how the features are related to the class
 • If there are more samples of one class than other class, the dominant class will control the classification and cause wrong classification
Naive Bayes• Requires relatively few examples for training• Relies on an often-faulty assumption of equally important and independent features
 • Not ideal for datasets with many numeric features
Decision tree• Can be used on small dataset• It is easy to overfit or underfit the model
 • Model is easy to interpret• Small changes in the training data can result in large changes to decision logic
Neural network• Conceptually similar to human neural function• Very prone to overfitting training data
 • Capable of modeling more complex patterns• Susceptible to multicollinearity
Support vector machines• High accuracy but not overly influenced by noisy data and not very prone to overfitting• Finding the best model requires testing of various combinations of kernels and model parameters
 • Easier for users due to the existence of several well-supported SVM algorithms
 • Most commonly used
Random forest• Can handle noisy or missing data• The model is not easily interpretable
 • Suitable for class imbalance problems
  1. Summarized from [27, 53, 54]