Is it correct that initial dataset can be spit into three main subsets: the training set, the validation set (to fine-tune parameters), and the testing set (checking performance on unseen data)?
It is indeed correct that the initial dataset in machine learning can be divided into three main subsets: the training set, the validation set, and the testing set. These subsets serve specific purposes in the machine learning workflow and play a important role in developing and evaluating models. The training set is the largest subset
- Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, First steps in Machine Learning, The 7 steps of machine learning
Is testing a ML model against data that could have been previously used in model training a proper evaluation phase in machine learning?
The evaluation phase in machine learning is a critical step that involves testing the model against data to assess its performance and effectiveness. When evaluating a model, it is generally recommended to use data that has not been seen by the model during the training phase. This helps to ensure unbiased and reliable evaluation results.
- Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, First steps in Machine Learning, The 7 steps of machine learning
Why is it important to split the data into training and validation sets? How much data is typically allocated for validation?
Splitting the data into training and validation sets is a important step in training convolutional neural networks (CNNs) for deep learning tasks. This process allows us to assess the performance and generalization ability of our model, as well as prevent overfitting. In this field, it is common practice to allocate a certain portion of the

