When you’re working with Neural Networks — especially deep ones — the output signals from each hidden layer often shift and scale unpredictably. This inconsistency between layers is known as internal covariate shift, and it can severely slow down or destabilize the training process.

The problem gets worse when your data is spread across varying ranges. Some features dominate the training dynamics, while others get ignored. A natural fix is to make your data more uniform — ideally, Gaussian-distributed with a mean of 0 and unit variance — which helps your model train more efficiently.

But why stop at just the input data?

That’s where Batch Normalization (BN) comes in.

⚙ What is Batch Normalization?

Batch Normalization (BN), introduced in the 2015 paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioffe and Szegedy, tackles this very problem not only at the input layer but also at intermediate hidden layers of a neural network.

⚙ How It Works

At its core, Batch Normalization inserts a normalization step right before the non-linearity in each layer. Here’s what happens step-by-step during training:

Compute the mean and variance

But why stop at just the input data?

That’s where Batch Normalization (BN) comes in.

⚙ What is Batch Normalization?

⚙ How It Works

At its core, Batch Normalization inserts a normalization step right before the non-linearity in each layer. Here’s what happens step-by-step during training:

Compute the mean and variance

DecodeX

DecodeX

DecodeX

DecodeX

Batch Normalization Decoded!

Exploring the Impact of Batch Normalization on Deep Learning Performance

⚙ What is Batch Normalization?

⚙ How It Works

Batch Normalization Decoded!

Exploring the Impact of Batch Normalization on Deep Learning Performance

⚙ What is Batch Normalization?

⚙ How It Works

⚙ Why Learnable Parameters Matter

Key Benefits

⚙ During Inference

TL;DR

Reference

⚙ Why Learnable Parameters Matter

Key Benefits

⚙ During Inference

TL;DR

Reference