Normal Probability Plot

Pratik Randad
2 min readApr 2, 2021

The probability plot is a way of visually comparing the data coming from different distributions.

Introduction

In exploratory data analysis we require to find out the distribution of our dataset. Once we can approximate the distribution to a theoretical distribution such as Poisson, Normal, Exponential, etc. This approximation then can be used to draw important inferences from the dataset.

The normal probability plot is a way of knowing whether the dataset is normally distributed or not. In this plot, data is plotted against the theoretical normal distribution plot in a way such that if a given dataset is normally distributed it should form an approximate straight line.

The normal probability plot is a case of the probability plot (more specifically Q-Q plot). This plot is commonly used in the industry for finding the deviation from the normal process.

I will explain some simple methods to get the normal probability plot.

Method 1

Steps

  1. Get the sample data in a list.
  2. Generate values from normal distribution of any mean and standard deviation with size = length of sample data.
  3. Sort both the lists of sample data and generated values.
  4. Plot both the lists with x-axis= generated values, y-axis=sample data

Example

Code for this is uploaded here.

For this example we will be using Pregnancy data, we will be finding if the birth weight of the baby is normally distributed.

Output

Normal Probability Plot

Method 2

We can also use function provided by scipy.stats

Normal Probability Plot

Conclusion

We can see from the above outputs that the weight of the babies at the time of birth almost follows the Normal Distribution except for the outlier values at the upper end.

In this way we can use the Normal Probability Plot to find if the distribution is normally distributed.

--

--