The question about the hyperparameters m and b refers to a common point of confusion in introductory machine learning, particularly in the context of linear regression, as typically introduced in Google Cloud Machine Learning context. To clarify this, it is essential to distinguish between model parameters and hyperparameters, using precise definitions and examples.
1. Understanding Parameters and Hyperparameters
Model Parameters: In machine learning, parameters are the internal variables of a model that are learned from the training data through the learning or training process. These variables directly define the model's behavior. For linear regression—often the first example presented in machine learning tutorials—the most common parameters are the slope and intercept in the equation of a straight line:
![]()
–
: The slope of the line.
–
: The y-intercept of the line.
During training, the machine learning algorithm finds the optimal values of
and
so that the line best fits the data according to a loss function, commonly mean squared error.
Hyperparameters: Hyperparameters, on the other hand, are configuration settings external to the model that are set before the learning process begins. They are not learned from the data but instead control aspects of the training process or the structure of the model itself. Examples include the learning rate, number of training epochs, batch size, regularization strength, and, in some algorithms, the number of hidden layers or the number of trees in a random forest. Determining the optimal set of hyperparameters is often accomplished through processes such as grid search, random search, or Bayesian optimization.
2. The Role of m and b in Model Training
In the linear regression context often discussed in introductory machine learning videos,
and
are not hyperparameters. They are parameters. The distinction is based on their function:
– They are *learned from the data* by the algorithm during training.
– They directly define the predictive function of the model (i.e., the line fitted through the data points).
– They change as the algorithm iteratively improves the fit to the data.
For instance, if you provide different training data to the same linear regression algorithm, the resulting values of
and
will likely change, reflecting the new data's underlying trend.
3. Examples of Hyperparameters in Linear Regression and Other Algorithms
While
and
are parameters, linear regression and other models do have hyperparameters. In the case of basic, unregularized linear regression, hyperparameters might be minimal or even absent, but in practical applications or more advanced versions, common hyperparameters include:
– Learning Rate: Determines the size of steps taken in the direction of the gradient during optimization. Too high a learning rate can cause the model to overshoot the minimum; too low can result in slow convergence.
– Number of Epochs: The number of complete passes through the training dataset. Selecting too few epochs may result in underfitting, while too many may lead to overfitting.
– Batch Size: The number of training samples used to compute each update to the model parameters. A smaller batch size can lead to noisier updates but may generalize better.
– Regularization Strength (e.g., Ridge or Lasso Regression): Controls the penalty for large parameter values, helping prevent overfitting by discouraging overly complex models.
These hyperparameters must be selected or tuned by the practitioner, typically before training the model.
4. Didactic Value of the Distinction
Understanding the difference between parameters and hyperparameters is foundational in machine learning. This distinction impacts model training, experimentation, and deployment strategies. For example:
– Model Training: Only parameters are updated during training via algorithms such as gradient descent. Hyperparameters remain fixed unless the training loop is explicitly re-run with different values.
– Experimentation: Hyperparameter tuning is a separate process from training. Practitioners often set aside a validation set or use cross-validation to evaluate the effect of different hyperparameter values.
– Reproducibility: Documenting hyperparameters is important for reproducibility, while model parameters are typically saved with the trained model for inference.
5. Common Misconceptions and Clarifications
A recurring misconception is that any variable in a model is a hyperparameter. The video referenced in the question likely uses
and
to illustrate how a model "learns" from data, possibly using animation or stepwise fitting. These variables change as the model optimizes its loss function. Hyperparameters, in contrast, might be discussed in the context of setting up the learning process, such as specifying a learning rate for gradient descent:
Example:
– Linear regression with gradient descent might use a learning rate (
) as a hyperparameter. The values of
and
start at initial guesses (often random or zero) and are updated iteratively according to the computed gradients and the learning rate.
Another example from logistic regression:
– Model parameters: weights and bias (analogous to
and
in linear regression).
– Hyperparameters: learning rate, number of iterations, regularization type and strength.
6. Broader Perspectives and Applications
The distinction is not unique to linear regression. In neural networks, for example:
– Parameters: Weights and biases of each neuron, learned during training.
– Hyperparameters: Number of layers, number of neurons per layer, activation functions, learning rate, batch size, optimizer type, and others.
In decision trees:
– Parameters: The specific splits chosen at each node, learned from data.
– Hyperparameters: Maximum depth of the tree, minimum samples per leaf, criterion for split selection.
7. Conclusion and Practical Tips
When approaching a new machine learning problem, a clear understanding of which variables are parameters (to be learned) and which are hyperparameters (to be set before training) enables more efficient experimentation and better model performance. Proper hyperparameter tuning can dramatically improve results, while correct parameter estimation ensures the model accurately captures the patterns in the data.
In direct response to the original question:
and
are parameters, not hyperparameters, in the context of the video and standard machine learning practice. The hyperparameters are other, external settings such as those controlling the optimization process or model complexity. Recognizing this distinction is fundamental for successful machine learning workflows.
Other recent questions and answers regarding EITC/AI/GCML Google Cloud Machine Learning:
- How is a neural network built?
- How can ML be used in construction and during the construction warranty period?
- How are the algorithms that we can choose created?
- How is an ML model created?
- What are the most advanced uses of machine learning in retail?
- Why is machine learning still weak with streamed data (for example, trading)? Is it because of data (not enough diversity to get the patterns) or too much noise?
- Why, when the loss consistently decreases, does it indicate ongoing improvement?
- How do ML algorithms learn to optimize themselves so that they are reliable and accurate when used on new/unseen data?
- What data do I need for machine learning? Pictures, text?
- Answer in Slovak to the question "How can I know which type of learning is the best for my situation?
View more questions and answers in EITC/AI/GCML Google Cloud Machine Learning

