What are the two major algorithms discussed in this tutorial for testing assumptions in machine learning?

by EITCA Academy / Monday, 07 August 2023 / Published in Artificial Intelligence, EITC/AI/MLP Machine Learning with Python, Programming machine learning, Testing assumptions, Examination review

In the field of machine learning, testing assumptions is a important step in the model development process. It helps ensure that the underlying assumptions of the chosen algorithm are valid and that the model's predictions are reliable. In this tutorial, we discuss two major algorithms commonly used for testing assumptions in machine learning: the Shapiro-Wilk test and the Kolmogorov-Smirnov test.

The Shapiro-Wilk test is a statistical test used to determine whether a given dataset follows a normal distribution. It is particularly useful when the assumption of normality is required for further analysis or modeling. The test calculates a test statistic, W, which is based on the correlation between the data and the corresponding normal scores. The null hypothesis of the test is that the data is normally distributed. If the p-value associated with the test statistic is below a predetermined significance level (e.g., 0.05), we reject the null hypothesis and conclude that the data does not follow a normal distribution.

Here is an example of how the Shapiro-Wilk test can be applied in Python using the scipy library:

python
from scipy.stats import shapiro

# Generate a random dataset
data = [0.1, 0.2, 0.3, 0.4, 0.5]

# Perform the Shapiro-Wilk test
statistic, p_value = shapiro(data)

# Print the results
print("Test statistic:", statistic)
print("p-value:", p_value)

The Kolmogorov-Smirnov test, on the other hand, is a non-parametric test used to compare the distribution of a sample to a reference distribution. It is often used to test whether two samples are drawn from the same distribution or to test the goodness-of-fit of a sample to a theoretical distribution. The test calculates a test statistic, D, which represents the maximum absolute difference between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The null hypothesis of the test is that the two distributions are the same. If the p-value associated with the test statistic is below a predetermined significance level, we reject the null hypothesis and conclude that the distributions are different.

Here is an example of how the Kolmogorov-Smirnov test can be applied in Python using the scipy library:

python
from scipy.stats import kstest

# Generate two random datasets
data1 = [0.1, 0.2, 0.3, 0.4, 0.5]
data2 = [0.2, 0.4, 0.6, 0.8, 1.0]

# Perform the Kolmogorov-Smirnov test
statistic, p_value = kstest(data1, data2)

# Print the results
print("Test statistic:", statistic)
print("p-value:", p_value)

The Shapiro-Wilk test is used to test the assumption of normality in a dataset, while the Kolmogorov-Smirnov test is used to compare the distribution of a sample to a reference distribution. By applying these tests, we can assess the validity of the assumptions underlying our machine learning models and make informed decisions about further analysis or modeling.

More questions and answers:

Field: Artificial Intelligence
Programme: EITC/AI/MLP Machine Learning with Python (go to the certification programme)
Lesson: Programming machine learning (go to related lesson)
Topic: Testing assumptions (go to related topic)
Examination review

Tagged under: Artificial Intelligence, Assumption Testing, Kolmogorov-Smirnov Test, Normality Assumption, Shapiro-Wilk Test, Statistical Tests

We care about your privacy

EITCI uses cookies and similar technologies to keep this site secure, remember your choices, provide personalized experience, measure the traffic, serve more relevant content and certification programmes. You can accept all cookies or customize your preferences. Cookies are variables used to store website specific information on your device to facilitate processing of data for personalized website visit, such as login to your account, accessing the programmes, placing enrolment orders in chosen programmes and improving your EITC certification journey. You can change or withdraw your consent at any time by clicking the Consent Preferences button at the left-bottom of your screen. We respect your choices and are committed to providing you with a transparent and secure browsing experience, which may be limited when cookies aren't accepted. For more details refer to the Privacy Policy

EITCA Academy

What are the two major algorithms discussed in this tutorial for testing assumptions in machine learning?

Other recent questions and answers regarding Examination review:

More questions and answers:

EITCA Academy is a part of the European IT Certification framework

We care about your privacy

Necessary

Functional

Preferences

External media and social features

Analytics

Marketing and conversions

EITCA Academy

LOG IN TO YOUR ACCOUNT

FORGOT YOUR PASSWORD?

CREATE AN ACCOUNT

What are the two major algorithms discussed in this tutorial for testing assumptions in machine learning?

Other recent questions and answers regarding Examination review:

More questions and answers:

We care about your privacy