Defining a layer of an artificial neural network (ANN) with biases included in the model does not require multiplying the input data matrices by the sums of weights and biases. Instead, the process involves two distinct operations: the weighted sum of the inputs and the addition of biases. This distinction is important for understanding the mechanics of neural networks and their implementation in frameworks such as TensorFlow.
Theoretical Background
In the context of artificial neural networks, a layer is composed of multiple neurons, each of which performs a specific computation. For a given neuron, the input consists of a vector of features
and a corresponding vector of weights
. The neuron computes a weighted sum of the inputs and then adds a bias term
. Mathematically, this can be represented as:
![]()
Here,
denotes the dot product of the weight vector and the input vector, resulting in a scalar value
. The bias
is then added to this scalar to shift the activation function's input.
Implementation in TensorFlow
TensorFlow, a popular deep learning framework, provides functionalities to define and manipulate neural network layers efficiently. When defining a layer in TensorFlow, such as using `tf.keras.layers.Dense`, the framework internally handles the operations involving weights and biases. Consider the following example:
python import tensorflow as tf # Define a dense layer with 10 units dense_layer = tf.keras.layers.Dense(units=10, use_bias=True) # Assume input data input_data = tf.random.normal(shape=(5, 3)) # Batch size of 5, input features of 3 # Forward pass through the dense layer output = dense_layer(input_data)
In this example, the `Dense` layer is initialized with 10 units (neurons) and is configured to use biases (`use_bias=True`). When the `input_data` passes through the `dense_layer`, TensorFlow performs the following operations internally:
1. Weighted Sum: Each neuron's output is computed by taking the dot product of the input data matrix and the weight matrix. If the input data matrix
has a shape of
and the weight matrix
has a shape of
, the resulting matrix
will have a shape of
.
![]()
2. Bias Addition: The bias vector
is added to each row of the resulting matrix
. The bias vector
has a shape of
, and broadcasting is used to add it to each row of
.
![]()
Detailed Explanation
Weighted Sum
The weighted sum operation involves matrix multiplication. Given an input matrix
with dimensions
, where
is the batch size and
is the number of input features, and a weight matrix
with dimensions
, where
is the number of neurons in the layer, the matrix multiplication
results in a matrix
with dimensions
.
Bias Addition
After computing the weighted sum, the bias vector
is added to each row of the matrix
. The bias vector
has dimensions
, corresponding to the number of neurons in the layer. This addition is performed using broadcasting, a technique that allows element-wise operations on arrays of different shapes.
Example with Numerical Values
Consider a simple example with numerical values to illustrate the process. Let the input data matrix
be:
![Rendered by QuickLaTeX.com \[ \mathbf{X} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ \end{bmatrix} \]](https://eitca.org/wp-content/ql-cache/quicklatex.com-a20334aa5069c85d8e4d0edffe09ff56_l3.png)
Assume the weight matrix
is:
![Rendered by QuickLaTeX.com \[ \mathbf{W} = \begin{bmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \\ 0.5 & 0.6 \\ \end{bmatrix} \]](https://eitca.org/wp-content/ql-cache/quicklatex.com-1e62ff644175303f3a67dee2368044d5_l3.png)
And the bias vector
is:
![]()
The weighted sum
is computed as:
![Rendered by QuickLaTeX.com \[ \mathbf{Z} = \mathbf{X} \cdot \mathbf{W} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ \end{bmatrix} \cdot \begin{bmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \\ 0.5 & 0.6 \\ \end{bmatrix} = \begin{bmatrix} 2.2 & 2.8 \\ 4.9 & 6.4 \\ 7.6 & 10.0 \\ \end{bmatrix} \]](https://eitca.org/wp-content/ql-cache/quicklatex.com-c86b9f09a75bdb95b2c2f41967bdea91_l3.png)
Next, the bias vector
is added to each row of
:
![Rendered by QuickLaTeX.com \[ \mathbf{Z} = \begin{bmatrix} 2.2 & 2.8 \\ 4.9 & 6.4 \\ 7.6 & 10.0 \\ \end{bmatrix} + \begin{bmatrix} 0.1 & 0.2 \\ \end{bmatrix} = \begin{bmatrix} 2.3 & 3.0 \\ 5.0 & 6.6 \\ 7.7 & 10.2 \\ \end{bmatrix} \]](https://eitca.org/wp-content/ql-cache/quicklatex.com-4c4602bd1abedaf3682de4dcc0a26fc2_l3.png)
Activation Function
After computing the weighted sum and adding the bias, the resulting matrix is typically passed through an activation function, such as ReLU (Rectified Linear Unit), sigmoid, or tanh, to introduce non-linearity into the model. For instance, applying the ReLU activation function to the matrix
would yield:
![Rendered by QuickLaTeX.com \[ \text{ReLU}(\mathbf{Z}) = \begin{bmatrix} \max(0, 2.3) & \max(0, 3.0) \\ \max(0, 5.0) & \max(0, 6.6) \\ \max(0, 7.7) & \max(0, 10.2) \\ \end{bmatrix} = \begin{bmatrix} 2.3 & 3.0 \\ 5.0 & 6.6 \\ 7.7 & 10.2 \\ \end{bmatrix} \]](https://eitca.org/wp-content/ql-cache/quicklatex.com-28937da0c18075bb0bce51f4a7ae8891_l3.png)
Practical Considerations
When implementing neural networks in TensorFlow, it is essential to understand that the framework abstracts many of these operations, allowing developers to focus on higher-level design aspects. TensorFlow's `Dense` layer, for instance, handles the initialization of weights and biases, the computation of weighted sums, and the addition of biases internally.
Moreover, TensorFlow provides various initializers for weights and biases, such as `tf.keras.initializers.RandomNormal` for weights and `tf.keras.initializers.Zeros` for biases. These initializers can be specified when defining a layer:
python
dense_layer = tf.keras.layers.Dense(
units=10,
use_bias=True,
kernel_initializer=tf.keras.initializers.RandomNormal(mean=0., stddev=1.),
bias_initializer=tf.keras.initializers.Zeros()
)
The process of defining a layer in an artificial neural network with biases included involves computing a weighted sum of the input data and weights, followed by the addition of biases. This operation does not entail multiplying the input data matrices by the sums of weights and biases. Instead, it consists of two separate steps: matrix multiplication for the weighted sum and element-wise addition for the biases. TensorFlow efficiently handles these operations, abstracting the underlying complexity and providing a high-level interface for neural network design.
Other recent questions and answers regarding TensorFlow basics:
- How does batch size control the number of examples in the batch, and in TensorFlow does it need to be set statically?
- In TensorFlow, when defining a placeholder for a tensor, should one use a placeholder function with one of the parameters specifying the shape of the tensor, which, however, does not need to be set?
- In deep learning, are SGD and AdaGrad examples of cost functions in TensorFlow?
- Does a deep neural network with feedback and backpropagation work particularly well for natural language processing?
- Are convolutional neural networks considered a less important class of deep learning models from the perspective of practical applications?
- Would defining a layer of an artificial neural network with biases included in the model require multiplying the input data matrices by the sums of weights and biases?
- Does the activation function of a node define the output of that node given input data or a set of input data?
- In TensorFlow 2.0 and later, sessions are no longer used directly. Is there any reason to use them?
- Why is TensorFlow often referred to as a deep learning library?
- How does TensorFlow handle matrix manipulation? What are tensors and what can they store?
View more questions and answers in TensorFlow basics
More questions and answers:
- Field: Artificial Intelligence
- Programme: EITC/AI/DLTF Deep Learning with TensorFlow (go to the certification programme)
- Lesson: TensorFlow (go to related lesson)
- Topic: TensorFlow basics (go to related topic)

