×
1 Choose EITC/EITCA Certificates
2 Learn and take online exams
3 Get your IT skills certified

Confirm your IT skills and competencies under the European IT Certification framework from anywhere in the world fully online.

EITCA Academy

Digital skills attestation standard by the European IT Certification Institute aiming to support Digital Society development

LOG IN TO YOUR ACCOUNT

CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR PASSWORD?

AAH, WAIT, I REMEMBER NOW!

CREATE AN ACCOUNT

ALREADY HAVE AN ACCOUNT?
EUROPEAN INFORMATION TECHNOLOGIES CERTIFICATION ACADEMY - ATTESTING YOUR PROFESSIONAL DIGITAL SKILLS
  • SIGN UP
  • LOGIN
  • INFO

EITCA Academy

EITCA Academy

The European Information Technologies Certification Institute - EITCI ASBL

Certification Provider

EITCI Institute ASBL

Brussels, European Union

Governing European IT Certification (EITC) framework in support of the IT professionalism and Digital Society

  • CERTIFICATES
    • EITCA ACADEMIES
      • EITCA ACADEMIES CATALOGUE<
      • EITCA/CG COMPUTER GRAPHICS
      • EITCA/IS INFORMATION SECURITY
      • EITCA/BI BUSINESS INFORMATION
      • EITCA/KC KEY COMPETENCIES
      • EITCA/EG E-GOVERNMENT
      • EITCA/WD WEB DEVELOPMENT
      • EITCA/AI ARTIFICIAL INTELLIGENCE
    • EITC CERTIFICATES
      • EITC CERTIFICATES CATALOGUE<
      • COMPUTER GRAPHICS CERTIFICATES
      • WEB DESIGN CERTIFICATES
      • 3D DESIGN CERTIFICATES
      • OFFICE IT CERTIFICATES
      • BITCOIN BLOCKCHAIN CERTIFICATE
      • WORDPRESS CERTIFICATE
      • CLOUD PLATFORM CERTIFICATENEW
    • EITC CERTIFICATES
      • INTERNET CERTIFICATES
      • CRYPTOGRAPHY CERTIFICATES
      • BUSINESS IT CERTIFICATES
      • TELEWORK CERTIFICATES
      • PROGRAMMING CERTIFICATES
      • DIGITAL PORTRAIT CERTIFICATE
      • WEB DEVELOPMENT CERTIFICATES
      • DEEP LEARNING CERTIFICATESNEW
    • CERTIFICATES FOR
      • EU PUBLIC ADMINISTRATION
      • TEACHERS AND EDUCATORS
      • IT SECURITY PROFESSIONALS
      • GRAPHICS DESIGNERS & ARTISTS
      • BUSINESSMEN AND MANAGERS
      • BLOCKCHAIN DEVELOPERS
      • WEB DEVELOPERS
      • CLOUD AI EXPERTSNEW
  • FEATURED
  • SUBSIDY
  • HOW IT WORKS
  •   IT ID
  • ABOUT
  • CONTACT
  • MY ORDER
    Your current order is empty.
EITCIINSTITUTE
CERTIFIED

What considerations are relevant for choosing the right training algorithm to start with?

by Tina Lykke Kristensen / Wednesday, 13 May 2026 / Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, First steps in Machine Learning, The 7 steps of machine learning

Selecting an appropriate training algorithm constitutes a foundational decision in the initial phases of any machine learning project. The choice impacts model performance, interpretability, efficiency, and the amount of effort required for subsequent development. In the context of applying machine learning methods using modern cloud platforms such as Google Cloud, practitioners must evaluate a range of considerations grounded in both theoretical understanding and practical constraints. The following exposition thoroughly examines these considerations, supported by factual insights and illustrative examples.

1. Nature and Structure of the Data

The characteristics of the data at hand heavily influence the selection of a training algorithm. Key aspects include:

a) Data Type:
– Structured Data: Tabular datasets with clear features and labels (e.g., sales records) often suit algorithms like logistic regression, decision trees, or gradient-boosted trees.
– Unstructured Data: For text, images, audio, or video, specialized algorithms such as convolutional neural networks (CNNs) for images or recurrent neural networks (RNNs)/transformers for text are more appropriate.

b) Dimensionality and Sample Size:
– High-dimensional data (many features) may benefit from algorithms capable of handling feature selection or reduction, such as regularized linear models or tree ensembles.
– For small datasets, simpler models (e.g., linear regression, decision trees with depth limits) are less likely to overfit compared to deep neural networks, which require large datasets to generalize well.

Example:
A medical dataset with 300 patient records and 20 features is likely to yield better results with logistic regression or random forests than with deep learning methods, which would overfit due to insufficient training data.

2. Problem Type

The algorithm must align with the target task. Main categories are:

a) Supervised Learning:
– Classification: Predicting discrete labels (e.g., spam detection). Algorithms: logistic regression, support vector machines (SVM), decision trees, random forests, neural networks.
– Regression: Predicting continuous values (e.g., house prices). Algorithms: linear regression, ridge regression, random forests, gradient boosting.

b) Unsupervised Learning:
– Clustering: Grouping similar items (e.g., customer segmentation). Algorithms: k-means, hierarchical clustering, DBSCAN.
– Dimensionality Reduction: Reducing features while retaining information (e.g., PCA, t-SNE).

c) Other Tasks:
– Time Series Forecasting: ARIMA, LSTM networks.
– Recommendation: Matrix factorization, collaborative filtering.

Example:
When building a customer churn prediction model (binary classification), logistic regression, decision trees, or gradient-boosted machines are reasonable starting points.

3. Interpretability Requirements

The need for understanding and explaining the model's decisions is a critical factor:

– High Interpretability: Sectors such as healthcare and finance may require models whose predictions can be explained to regulators or stakeholders. Linear models, decision trees, and rule-based systems are preferable.
– Lower Interpretability: In applications where predictive performance matters more than transparency (e.g., image recognition), complex models like deep neural networks or ensemble methods can be considered.

Example:
A bank predicting loan defaults may prefer logistic regression or decision trees due to their transparency, allowing clear explanations for each prediction.

4. Scalability and Computational Constraints

The computational resources available and the expected scale of the data influence algorithm selection:

– Efficiency: Linear models (e.g., ordinary least squares, logistic regression) are computationally efficient and scale well to large datasets.
– Resource Intensity: Deep learning algorithms require significant computational power (often GPUs), particularly for large-scale data or unstructured data. Gradient-boosted trees (e.g., XGBoost) are more resource-intensive than random forests but offer higher accuracy for many structured tasks.

Example:
For a dataset with millions of rows and hundreds of features, logistic regression and distributed implementations of decision trees are often feasible on standard hardware, while deep neural networks may necessitate specialized infrastructure.

5. Availability of Labeled Data

The volume and quality of labeled data are important:

– Abundant Labeled Data: Deep learning algorithms excel when large, labeled datasets are available (e.g., millions of annotated images).
– Limited Labeled Data: Simpler models or semi-supervised/transfer learning approaches are preferable when data is scarce.

Example:
For text classification with only a few thousand labeled documents, SVMs or logistic regression may outperform deep neural networks.

6. Handling of Missing Data and Outliers

Different algorithms vary in their robustness to incomplete or noisy data:

– Robust Algorithms: Tree-based methods (random forests, gradient boosting) can handle missing values and outliers well.
– Sensitive Algorithms: Linear models and neural networks may require preprocessing steps such as imputation or normalization.

Example:
If a dataset contains many missing features, random forests or XGBoost are more accommodating than SVMs, which typically require complete data.

7. Training Time and Ease of Use

Early experimentation benefits from algorithms that are quick to train and easy to tune:

– Quick Prototyping: Linear and logistic regression, small decision trees, and k-means clustering provide fast feedback, allowing rapid iteration.
– Long Training Times: Neural networks and large ensemble methods can require significant time and tuning.

Example:
A marketing analyst exploring customer segmentation can rapidly iterate with k-means clustering, compared to the complexity of training autoencoders for representation learning.

8. Support and Integration with Cloud Platforms

The practical aspect of tooling and integration should not be overlooked. Availability and support for the chosen algorithm in Google Cloud Machine Learning Engine or other cloud services is important:

– Managed Services: Google Cloud AutoML, BigQuery ML, and Vertex AI support a variety of algorithms, often providing automated hyperparameter tuning and scalability.
– Custom Models: For advanced use cases, frameworks like TensorFlow or PyTorch can be used on AI Platform with custom code.

Example:
A data scientist using BigQuery ML can quickly build and deploy logistic regression or boosted tree models directly within BigQuery, accelerating the workflow.

9. Hyperparameter Sensitivity

Some algorithms require careful tuning of hyperparameters, while others work well with default settings:

– Low Sensitivity: Logistic regression, k-nearest neighbors, and simple decision trees often perform reasonably with minimal tuning.
– High Sensitivity: Deep neural networks, SVMs with RBF kernels, and gradient-boosted trees often need grid or random search for optimal performance.

Example:
For initial baseline models, selecting random forests or logistic regression reduces the need for extensive hyperparameter optimization.

10. Model Performance Benchmarks

Benchmark studies and prior literature provide useful guidance:

– Competitive Baselines: Random forests and gradient-boosted machines often perform strongly on structured data benchmarks.
– Specialized Tasks: CNNs are well-established as top performers for image-related tasks, while transformers are state-of-the-art for text processing.

Example:
In Kaggle competitions involving tabular data, gradient-boosted trees like XGBoost or LightGBM are frequently used as starting points due to their robust out-of-the-box performance.

11. Regulatory and Ethical Considerations

The regulatory landscape can enforce algorithmic constraints:

– Fairness and Bias: Some algorithms can inadvertently amplify biases present in data. Simpler, interpretable models facilitate auditing.
– Auditability: Regulatory compliance may require the ability to audit and explain individual predictions, favoring algorithms where feature importance and decision paths are clear.

Example:
A healthcare provider seeking FDA approval for a diagnostic tool may face stricter requirements on explainability, making linear or tree-based models preferable over black-box deep learning architectures.

12. Future Maintenance and Model Lifecycle

The ease of maintaining and updating a model in production is practical:

– Simplicity: Models with fewer parameters and simple architectures are easier to retrain, monitor, and debug.
– Complexity: Deep learning models may require regular retraining and more involved monitoring for concept drift and performance degradation.

Example:
A recommendation system updated monthly with new data can be efficiently maintained if based on matrix factorization rather than a complex neural collaborative filtering model.

13. Transferability and Extensibility

The potential need for extending the model to new tasks or domains may influence initial algorithm selection:

– Transfer Learning: Pretrained deep learning models (e.g., BERT for text, ResNet for images) can be fine-tuned for specific tasks.
– Modular Frameworks: Algorithms implemented in modular frameworks like TensorFlow or scikit-learn facilitate adaptation to new problem statements.

Example:
A vision application intended for multiple object categories may benefit from starting with a pretrained CNN that can be extended to new classes over time.

14. Community and Documentation Support

A well-supported algorithm backed by a strong community and comprehensive documentation ensures easier troubleshooting and continuous improvement:

– Mature Libraries: Algorithms available in scikit-learn, TensorFlow, and XGBoost are supported by extensive documentation and community forums.
– Open Source: Open-source implementations foster transparency and rapid innovation.

Example:
A practitioner new to time series forecasting may prefer ARIMA or Prophet, as both have broad community support and thorough documentation.

Illustrative Workflow Example

Step 1: Problem Definition
Suppose the objective is to predict customer churn for a telecommunications provider.

Step 2: Data Exploration
The dataset consists of 10,000 rows and 50 structured features, with some missing entries and moderate class imbalance.

Step 3: Algorithm Selection
– Given the structured, tabular nature of the data, tree-based models (random forest, XGBoost), logistic regression, and possibly support vector machines come into consideration.
– The moderate dataset size makes both linear and tree-based algorithms feasible.
– Missing data and outliers suggest tree-based models for their robustness.
– If interpretability is important for business stakeholders, logistic regression or shallow decision trees can be prioritized.

Step 4: Rapid Prototyping
Rapidly train logistic regression and random forest models using default settings to establish baseline performance.

Step 5: Iterative Refinement
Based on validation results, proceed to hyperparameter tuning or consider more sophisticated models if warranted.

This example illustrates how practical considerations—data structure, interpretability, missing data, performance, and stakeholder needs—all converge in the decision process.

Recommendations for Initial Algorithm Selection

For practitioners beginning a machine learning project, the following pragmatic guidelines are frequently adopted:

– Start Simple: Begin with interpretable, easy-to-train models to obtain a performance baseline.
– Consider Robustness: If data quality is uncertain, favor algorithms tolerant to missing values and outliers.
– Align to Task: Choose algorithms with a proven track record for similar problem types and data modalities.
– Iterate Quickly: Select models that allow for rapid experimentation, enabling early feedback and adjustment.

As the project advances, more complex algorithms can be introduced selectively, always weighing resource constraints, interpretability, and the evolving requirements of the business or scientific objective.

Other recent questions and answers regarding The 7 steps of machine learning:

  • What are the techniques for handling missing data? How do I realize I am missing data? Are there general references on pretraining treatment of data?
  • How similar is machine learning with genetic optimization of an algorithm?
  • Can we use streaming data to train and use a model continuously and improve it at the same time?
  • What is PINN-based simulation?
  • What are the hyperparameters m and b from the video?
  • What data do I need for machine learning? Pictures, text?
  • What is the most effective way to create test data for the ML algorithm? Can we use synthetic data?
  • Can PINNs-based simulation and dynamic knowledge graph layers be used as a fabric together with an optimization layer in a competitive environment model? Is this okay for small sample size ambiguous real-world data sets?
  • Could training data be smaller than evaluation data to force a model to learn at higher rates via hyperparameter tuning, as in self-optimizing knowledge-based models?
  • Since the ML process is iterative, is it the same test data used for evaluation? If yes, does repeated exposure to the same test data compromise its usefulness as an unseen dataset?

View more questions and answers in The 7 steps of machine learning

More questions and answers:

  • Field: Artificial Intelligence
  • Programme: EITC/AI/GCML Google Cloud Machine Learning (go to the certification programme)
  • Lesson: First steps in Machine Learning (go to related lesson)
  • Topic: The 7 steps of machine learning (go to related topic)
Tagged under: Algorithm Selection, Artificial Intelligence, Data Science, Google Cloud, Machine Learning, Model Interpretability
Home » Artificial Intelligence » EITC/AI/GCML Google Cloud Machine Learning » First steps in Machine Learning » The 7 steps of machine learning » » What considerations are relevant for choosing the right training algorithm to start with?

Certification Center

USER MENU

  • My Account

CERTIFICATE CATEGORY

  • EITC Certification (105)
  • EITCA Certification (9)

What are you looking for?

  • Introduction
  • How it works?
  • EITCA Academies
  • EITCI DSJC Subsidy
  • Full EITC catalogue
  • Your order
  • Featured
  •   IT ID
  • EITCA reviews (Medium publ.)
  • About
  • Contact

EITCA Academy is a part of the European IT Certification framework

The European IT Certification framework has been established in 2008 as a Europe based and vendor independent standard in widely accessible online certification of digital skills and competencies in many areas of professional digital specializations. The EITC framework is governed by the European IT Certification Institute (EITCI), a non-profit certification authority supporting information society growth and bridging the digital skills gap in the EU.
Eligibility for EITCA Academy 90% EITCI DSJC Subsidy support
90% of EITCA Academy fees subsidized in enrolment

    EITCA Academy Secretary Office

    European IT Certification Institute ASBL
    Brussels, Belgium, European Union

    EITC / EITCA Certification Framework Operator
    Governing European IT Certification Standard
    Access contact form or call +32 25887351

    Follow EITCI on X
    Visit EITCA Academy on Facebook
    Engage with EITCA Academy on LinkedIn
    Check out EITCI and EITCA videos on YouTube

    Funded by the European Union

    Funded by the European Regional Development Fund (ERDF) and the European Social Fund (ESF) in series of projects since 2007, currently governed by the European IT Certification Institute (EITCI) since 2008

    Information Security Policy | DSRRM and GDPR Policy | Data Protection Policy | Record of Processing Activities | HSE Policy | Anti-Corruption Policy | Modern Slavery Policy

    Automatically translate to your language

    Terms and Conditions | Privacy Policy
    EITCA Academy
    • EITCA Academy on social media
    EITCA Academy


    © 2008-2026  European IT Certification Institute
    Brussels, Belgium, European Union

    TOP
    CHAT WITH SUPPORT
    Do you have any questions?
    Attach files with the paperclip or paste screenshots into the message box (Ctrl+V). Max 5 file(s), 10 MB each.
    We will reply here and by email. Your conversation is tracked with a support token.