Why does the training process become computationally expensive for large datasets?

by EITCA Academy / Monday, 07 August 2023 / Published in Artificial Intelligence, EITC/AI/MLP Machine Learning with Python, Support vector machine, SVM training, Examination review

The training process in Support Vector Machines (SVMs) can become computationally expensive for large datasets due to several factors. SVMs are a popular machine learning algorithm used for classification and regression tasks. They work by finding an optimal hyperplane that separates different classes or predicts continuous values. The training process involves finding the parameters that define this hyperplane, which can be time-consuming for large datasets.

One reason for the computational expense is the need to compute the kernel function for each pair of data points. The kernel function measures the similarity between two data points in a higher-dimensional feature space. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid. For large datasets, the number of pairwise kernel computations can be very high, resulting in increased computational time.

Another factor that contributes to the computational expense is the optimization process used to find the optimal hyperplane. SVMs aim to maximize the margin between the decision boundary and the closest data points of different classes. This optimization is typically solved using quadratic programming, which involves solving a set of linear equations subject to linear constraints. As the number of data points increases, the number of variables and constraints in the optimization problem also increases, leading to longer computation times.

Furthermore, SVMs are sensitive to the choice of hyperparameters, such as the regularization parameter (C) and the kernel parameters. To find the best values for these hyperparameters, a common approach is to perform a grid search or use more advanced optimization techniques like Bayesian optimization. However, exploring a large hyperparameter space can significantly increase the computational cost, especially for large datasets.

To mitigate the computational expense, several techniques can be applied. One approach is to use a subset of the data, known as "mini-batch" training, instead of the entire dataset. This can reduce the number of pairwise kernel computations and the size of the optimization problem, at the cost of potentially sacrificing some accuracy. Another technique is to employ parallel computing, distributing the computations across multiple processors or machines. This can significantly speed up the training process, especially when dealing with large-scale datasets.

The training process in SVMs can become computationally expensive for large datasets due to the need for pairwise kernel computations, the optimization process, and the exploration of hyperparameter space. However, by employing techniques such as mini-batch training and parallel computing, the computational cost can be mitigated to some extent.

More questions and answers:

Field: Artificial Intelligence
Programme: EITC/AI/MLP Machine Learning with Python (go to the certification programme)
Lesson: Support vector machine (go to related lesson)
Topic: SVM training (go to related topic)
Examination review

Tagged under: Artificial Intelligence, Computational Complexity, Machine Learning, Python, Support Vector Machines, SVM

We care about your privacy

EITCI uses cookies and similar technologies to keep this site secure, remember your choices, provide personalized experience, measure the traffic, serve more relevant content and certification programmes. You can accept all cookies or customize your preferences. Cookies are variables used to store website specific information on your device to facilitate processing of data for personalized website visit, such as login to your account, accessing the programmes, placing enrolment orders in chosen programmes and improving your EITC certification journey. You can change or withdraw your consent at any time by clicking the Consent Preferences button at the left-bottom of your screen. We respect your choices and are committed to providing you with a transparent and secure browsing experience, which may be limited when cookies aren't accepted. For more details refer to the Privacy Policy

EITCA Academy

Why does the training process become computationally expensive for large datasets?

Other recent questions and answers regarding Examination review:

More questions and answers:

EITCA Academy is a part of the European IT Certification framework

We care about your privacy

Necessary

Functional

Preferences

External media and social features

Analytics

Marketing and conversions

EITCA Academy

LOG IN TO YOUR ACCOUNT

FORGOT YOUR PASSWORD?

CREATE AN ACCOUNT

Why does the training process become computationally expensive for large datasets?

Other recent questions and answers regarding Examination review:

More questions and answers:

We care about your privacy