What are the main differences in loading and training the Iris dataset between Tensorflow 1 and Tensorflow 2 versions?

by Robert Gore / Monday, 25 September 2023 / Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, First steps in Machine Learning, Plain and simple estimators

The original code provided to load and train the iris dataset was designed for TensorFlow 1 and may not work with TensorFlow 2. This discrepancy arises due to certain changes and updates introduced in this newer version of TensorFlow, which wll be however covered in detail in subsequent topics that will directly relate to TensorFlow 2.

To address the issue of working with the iris dataset, it is necessary to update the code to be compatible with the TensorFlow 2. Let’s consider a revised code snippet that can be used to load and train the iris dataset using TensorFlow 2.

First, let’s briefly discuss the differences between TensorFlow 1 and TensorFlow 2 that affect the code.

TensorFlow 2 introduced a higher-level API called Keras, which is now the recommended way to build and train models. This API simplifies the process and provides a more intuitive interface for machine learning tasks. Additionally, TensorFlow 2 enables eager execution by default, allowing for immediate evaluation of operations.

To load and train the iris dataset using TensorFlow 2, we can utilize the following code (firstly though, we would need to install the sklearn with the following command: pip install scikit-learn):

from sklearn import datasets
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target.reshape(-1, 1)

# One-hot encoding
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42)

# Define a simple model and train
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
tf.keras.layers.Dense(3, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test))

A similar, alternative implementation would be the following:

import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the iris dataset
iris = load_iris()
features = iris.data
labels = iris.target

# Split the dataset into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size=0.2)

# Standardize the features
scaler = StandardScaler()
train_features = scaler.fit_transform(train_features)
test_features = scaler.transform(test_features)

# Create a TensorFlow dataset
train_dataset = tf.data.Dataset.from_tensor_slices((train_features, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_features, test_labels))

# Shuffle and batch the dataset
train_dataset = train_dataset.shuffle(100).batch(32)
test_dataset = test_dataset.batch(32)

# Define the model architecture
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_dim=4),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])

# Train the model
model.fit(train_dataset, epochs=10)

# Evaluate the model
model.evaluate(test_dataset)

In the updated code, we first import the necessary libraries, including TensorFlow 2 and the required modules from scikit-learn. We then load the iris dataset using the `load_iris` function and split it into training and testing sets using `train_test_split`. Next, we standardize the features using `StandardScaler` from scikit-learn.

To create TensorFlow datasets, we use the `from_tensor_slices` method, passing in the features and labels of the training and testing sets. We then shuffle and batch the datasets using the appropriate methods.

The model architecture is defined using the Keras Sequential API. In this example, we use two dense layers with ReLU activation and a final dense layer with softmax activation for multi-class classification. We compile the model with the Adam optimizer, sparse categorical cross-entropy loss, and accuracy as the evaluation metric.

Finally, we train the model using the `fit` method, passing in the training dataset and specifying the number of epochs. After training, we evaluate the model's performance on the testing dataset using the `evaluate` method.

By using this updated code, one should be able to load and train the iris dataset successfully with TensorFlow 2.

It should be added, that sometimes there are quite significant differences in regard to dependencies across various system platforms. For example if one works on Windows or MacOS the dependencies issues are quite likely to occur (but clear errors about it should be returned). In particular the issues with dependencies also affect the shuffle.py code (for example with an error related to the “resource” module, which is a Unix-specific service for resource usage, and is not available for example on Windows, but is still used in the shuffle.py). This type of issues are also quite specific to the version of the TensorFlow Datasets (as well as the TensorFlow itself) one has installed and to what extent these particular versions are compatible with the Python environment. On the other hand the sklearn does not rely on the same dependencies as the TensorFlow Datasets and is generally more lightweight and overally compatible. However, using sklearn for just dataset loading requires handling data transformation and batching all manually.

EITCA Academy

What are the main differences in loading and training the Iris dataset between Tensorflow 1 and Tensorflow 2 versions?

Other recent questions and answers regarding EITC/AI/GCML Google Cloud Machine Learning:

More questions and answers:

EITCA Academy is a part of the European IT Certification framework

Eligibility for EITCA Academy 80% EITCI DSJC Subsidy support

EITCA Academy

LOG IN TO YOUR ACCOUNT BY EITHER YOUR USERNAME OR EMAIL ADDRESS

FORGOT YOUR DETAILS?

CREATE AN ACCOUNT

What are the main differences in loading and training the Iris dataset between Tensorflow 1 and Tensorflow 2 versions?

Other recent questions and answers regarding EITC/AI/GCML Google Cloud Machine Learning:

More questions and answers:

Eligibility for EITCA Academy 80% EITCI DSJC Subsidy support