Probability calculations with a normal distribution - Science without sense...double nonsense (2023)

A series of examples of how to doprobability calculations with a normal distributionare shown, as well as the advantages of standardizing the data.

We already know that the normal distribution is one of the most used in biomedicine, since a large number of random variables follow this distribution. Although the density function of this probability distribution is rather unsympathetic, it is make up by the fact that the distribution can be characterized with only two parameters, its mean and its variance, with which we can perform multiple probability calculations.

We are going to carry out some examples of these calculations, using the R program and with the help of its R-Commander graphical interface. Although R has the advantages of being very powerful and totally free, its exclusive use from the command line can be a bit harsh for the uninitiated.

Some preliminars

Of course, to perform calculations on a data set, the first thing we are going to need is that data set.

In real life we would already have them. It would be the results of our study that we would import from R to do the statistical study.

On this occasion, we are going to make the data by generating a random distribution with R.

It must be said, first of all, that statistical programs do not generate random numbers, but pseudo-random numbers, performing calculations from a previous number that is usually referred to as the seed.

In practice we don’t care, they serve the same purpose for what we want. The problem is that the seed may be different in each R installation, so if you want to follow the examples in this post, the first thing is that we all establish the same seed.

First we launch R. Second, we launch R-Commander with the library(Rcmdr) command. Third, we select the menu option Distributions -> Set the seed of the random number generator. In the pop-up window that appears we select, for example, 24814. You can see it in the first figure. This can also be done with the command set.seed(24814).

Let’s now generate the data. We go back to the Distributions menu, but this time we select Continuous Distributions-> Normal Distribution-> Sample from a normal distribution. We are going to generate a sample of 1000 cases with a mean of 120, a standard deviation of 12 and, obviously, normally distributed. To do this, we fill in the pop-up window as shown in the second figure. Notice that, in the name of the data set, we enter “pas”.

We already have it all. Let’s get started!

Step 1. Check the normality of the data

We already have our database, called “pas”, which we are going to assume is a record of the systolic blood pressure of 1000 adolescents.

We are not going to go into how to do the basic descriptive statistical study here. We will only do a minimal numerical summary to verify that data are correct. We open the menu Statistics-> Summaries-> Numerical Summaries.

We see that our variable have a mean of 119.78 (we stay with 120) and a standard deviation of 11.83 (we stay with 12). The program also provides us with the median, the quartiles, the interquartile range, and the sample size.

We are going to check that they follow a normal distribution. We open the menu Statistics-> Summaries-> Normality test… In the pop-up window we mark, for example, the Shapiro-Wilk test. When we accept, the program gives us a statistic W = 0.99 with a value of p = 0.58.

Since p> 0.05, we cannot reject the null hypothesis which, for this test, assumes that data are normally distributed. But we already know that these numerical tests are not very powerful, so it is convenient to complement this result with some graphic method.

We select Distributions-> Continuous distributions-> Normal distribution-> Graph of the normal distribution…, Graphs-> Histogram, and Graphs-> Graph of comparison of quantiles… We thus obtain the graphical representation of the distribution, its histogram and the graph of theoretical quantiles, respectively, which you can see in the third figure.

Both the graphical representation of the curve and the shape of the histogram are compatible with a normal distribution. Furthermore, in the third graph, the points follow the diagonal quite well, which means that the quantiles of the distribution resemble quite well the theoretical ones if the distribution were normal.

In summary, we can assume that our data follow a normal distribution.

Step 2. Direct information that the normal distribution provides

Knowing that the arterial pressure of our adolescents follows a normal distribution of mean m=120 and standard deviation s=12, we can already draw a series of conclusions.

In a normal distribution, the values are centered symmetrically around the mean. 68% out of the population is grouped around m ± 1 s, 95% out of the population between m ± 2 s, and 99% between m ± 3 s, approximately.

With minimal calculations, we know that 68% out of our adolescents will have a pressure between 108 and 132 mmHg, 95% between 96 and 144 mmHg and 99% between 84 and 156 mmHg. In addition, only 2.5% out of the population will have a pressure less than 96 mmHg, and another 2.5%, greater than 144 mmHg.

Finally, we could estimate the value in the population from which the sample were extracted by calculating its confidence interval.

The 95% confidence interval of a mean is calculated according to the following formula:

95 CI = m ± 1.96 se

“se” represents the standard error of the mean, which is calculated, in turn, by dividing the standard deviation by the square root of the sample size.

Thus, we can already do the calculation:

95 CI = 120 ± 1.96 x (12 / square root of 1000)

If we solve the above equation, we obtain that, with 95% confidence, the mean value of systolic blood pressure of the population’s adolescents will be between 119.25 and 120.74 mmHg.

For the most puristic, we assume that we know the population’s variance and that it is equal to that of our sample. Otherwise, we would have had to use the quasi-standard deviation or, better still, use a Student’s t distribution to calculate the interval (although with such a large sample we would get essentially the same result).

Step 3. Probability calculation

Let’s imagine that we are interested in knowing the percentage of the population who have a pressure included in a certain interval. For example, between 90 and 135 mmHg. In other words, what is the probability that a randomly selected individual will have a systolic blood pressure between 90 and 135 mmHg.

We are going to calculate it with R through the menu Distributions-> Continuous distributions-> Normal distribution-> Cumulative normal probabilities…:

– Less than 90 mmHg: we mark 90 in the box “value (s) of the variable”, 120 in “mean” and 12 in “standard deviation”. What tail do we select? Since we want the probability of values ​​less than 90, we select the left tail. R tells us that the probability is 0.0062.

– Greater than 135 mmHg: we mark 135 in the box “value (s) of the variable”, 120 in “mean” and 12 in “standard deviation”. What tail do we select? Since we want the probability of values ​​greater than 135, this time we select the right tail. R tells us that the probability is 0.1056.

Since the total probability is 1 (100%), we know that P(<90) + P(90-135) + P(> 135) = 1. If we solve the equation, we obtain that P(90-135) = 0.8882. Rounding up, 89% out of our adolescents have a systolic blood pressure between 90 and 135 mmHg.

In other words, if we draw an individual at random, there is a 0.89 (89%) probability that their blood pressure is in the range of 90 to 135 mmHg.

Step 4. To standardize simplifies the calculations

The standard normal distribution is one that has a mean of 0 and a variance of 1, and which is usually represented as N (0,1).

Its great advantage is that it makes calculations much easier. In our example, a priori we do not know how many young people will have a blood pressure greater than 144 mmHg. However, in a standard distribution we know, without having to calculate, that the probability of having more than 2 (which is the same as more than 2 standard deviations) is 0.025 (2.5%).

Given the above, it is easy to understand that it will be simpler to calculate the probabilities of the standardized values. To do this, the mean of the distribution is subtracted from each value and the result is divided by the standard deviation. We thus calculate what we usually call the z-score, which represents the number of standard deviations that each value separates from the mean of the distribution.

Thus, for 90, z-score = -2.5; for 135, z-score = 1.25. We already know from a glance that it will be very unlikely to find someone with a pressure less than -2.5 and that there will not be much beyond a 10% above 1.25. Thus, the proportion of those within the range of -2.5 to 1.25 will be around 90%.

Of course, this is not done for rounding. We can use the same method as before to calculate the exact value of the probability. Do it and you will see how the same thing comes out.

The advantage, in addition to being more intuitive when the characteristics of the normal distribution are known, is that, in the case of not having a computer at hand, with a single probability table we can do the calculations for any normal distribution we desire. We just have to standardize it.

We’re leaving…

We have seen how to check that our data set follows a normal distribution and thus be able to calculate the probability of finding certain values.

But what if our data are not normally distributed? Well, we would have several options, from trying to transform them to using other probability distributions. But that is another story…

FAQs

How do you calculate probabilities using a normal distribution? ›

Probability between z-values

Then express these as their respective probabilities under the standard normal distribution curve: P(Z < b) – P(Z < a) = Φ(b) – Φ(a). Therefore, P(a < Z < b) = Φ(b) – Φ(a), where a and b are positive.

How do you find the unknown value of a normal distribution? ›

In order to find the unknown mean 𝜇 , we code 𝑋 by the change of variables 𝑋 ↦ 𝑍 = 𝑋 − 𝜇 𝜎 , where the standard deviation 𝜎 = √ 1 9 6 = 1 4 . Now 𝑍 ∼ 𝑁  0 , 1   follows the standard normal distribution and 𝑃 ( 𝑋 ≤ 4 0 ) = 𝑃  𝑍 ≤ 4 0 − 𝜇 1 4  = 0 . 0 6 6 8 .

How do you find the probability of two numbers in a normal distribution? ›

The probability that a standard normal random variables lies between two values is also easy to find. The P(a < Z < b) = P(Z < b) - P(Z < a). For example, suppose we want to know the probability that a z-score will be greater than -1.40 and less than -1.20.

How do you find probability with normal distribution and standard deviation? ›

In a normally distributed data set, you can find the probability of a particular event as long as you have the mean and standard deviation. With these, you can calculate the z-score using the formula z = (x – μ (mean)) / σ (standard deviation).

What is the basic formula for calculating probability? ›

P(A) = n(A)/n(S)

P(A) is the probability of an event “A” n(A) is the number of favourable outcomes. n(S) is the total number of events in the sample space.

What is the formula for calculating probability with examples? ›

Multiply all probabilities together

Using the example of the rolling dice, you'd calculate your total probability by multiplying the 1/6 chances you calculated: P(A and B) = 1/6 x 1/6 = 1/36. Using these results, there's a 1/36 chance of rolling "6" on one die at the same time you roll a "6" with the other.

What are the two unknown parameters for a normal distribution? ›

The two main parameters of a (normal) distribution are the mean and standard deviation. The parameters determine the shape and probabilities of the distribution.

How do you find the probability of two events together? ›

Just multiply the probability of the first event by the second. For example, if the probability of event A is 2/9 and the probability of event B is 3/9 then the probability of both events happening at the same time is (2/9)*(3/9) = 6/81 = 2/27.

How do you do probability with two events? ›

To determine the probability of two independent events, and , both occurring, we multiply the probabilities of each of the two events together: P ( A ) × P ( B ) = P ( A a n d B ) . In some cases, the outcome of one event affects the outcome of a second event.

What is the probability of getting at least 2 heads? ›

Answer: If you flip a coin 3 times, the probability of getting at least 2 heads is 1/2. Let's look into the possible outcomes.

What are the three steps to finding the probability using the standard normal curve? ›

Use the standard normal distribution to find probability
1. Go down to the row with the first two digits of your z-score.
2. Go across to the column with the same third digit as your z-score.
3. Find the value at the intersection of the row and column from the previous steps.
5 Nov 2020

How do you solve normal distribution problems? ›

All you have to do to solve the formula is: Subtract the mean from X. Divide by the standard deviation.

How do you find the probability when given the mean and standard deviation of a sample size? ›

FAQ
1. Define your population mean (μ), standard deviation (σ), sample size, and range of possible sample means.
2. Input those values in the z-score formula zscore = (X̄ - μ)/(σ/√n).
3. Considering if your probability is left, right, or two-tailed, use the z-score value to find your probability.
6 Oct 2022

What are the 3 rules of probability? ›

General Probability Rules
• Rule 1: The probability of an impossible event is zero; the probability of a certain event is one. ...
• Rule 2: For S the sample space of all possibilities, P(S) = 1. ...
• Rule 3: For any event A, P(Ac) = 1 - P(A). ...
• Rule 4 (Addition Rule): This is the probability that either one or both events occur.
• a. ...
• b.

What are the 4 laws of probability? ›

The four useful rules of probability are: It happens or else it doesn't. The probabilty of an event happening added the probability of it not happing is always 1. Empirical probabilities will also follow these rules (for a given set of trials).

How do you find total outcomes in probability? ›

To find the total number of outcomes for two or more events, multiply the number of outcomes for each event together. This is called the product rule for counting because it involves multiplying to find a product.

What is simple probability with example? ›

A simple probability is calculated by dividing a specific outcome by all the possible outcomes. For instance, when flipping a coin, there are two outcomes: heads or tails. To find the probability of getting either heads or tails, divide one outcome (1) by the two possible outcomes (2).

How do you identify an unknown variable? ›

The unknown is called a variable. In order to solve for a variable (like x), you need to isolate the variable. You can isolate the variable by using inverse operations to manipulate the equation. Addition is the inverse of subtraction, and multiplication is the inverse of division.

Can you do Z test without standard deviation? ›

In particular, it tests whether two means are the same (the null hypothesis). A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger.

How do you find the probability of A and B without replacement? ›

= P(A) * P(B|A) or P(A and B) = P(B) * P(A|B). Without replacement means that you will have a conditional probability. That is, the probability of an event occurring is affected by another event having already occurred.

What are 3 features of a normal distribution? ›

Characteristics of Normal Distribution

Normal distributions are symmetric, unimodal, and asymptotic, and the mean, median, and mode are all equal.

What are two properties of the normal distribution? ›

What are the properties of normal distributions? Normal distributions have key characteristics that are easy to spot in graphs: The mean, median and mode are exactly the same. The distribution is symmetric about the mean—half the values fall below the mean and half above the mean.

What are the main parameters of a probability distribution? ›

It has two parameters—the mean and the standard deviation. The Weibull distribution and the lognormal distribution are examples of other common continuous probability distributions.

How do you find the probability of the two events if they have no common elements? ›

If two events have no elements in common (Their intersection is the empty set.), the events are called mutually exclusive. Thus, P(A∩B)=0 . This means that the probability of event A and event B happening is zero.

What is the formula for combination in probability? ›

Combinations are a way to calculate the total outcomes of an event where order of the outcomes does not matter. To calculate combinations, we will use the formula nCr = n! / r! * (n - r)!, where n represents the total number of items, and r represents the number of items being chosen at a time.

What is the formula for two independent events? ›

Events A and B are independent if: knowing whether A occured does not change the probability of B. Mathematically, can say in two equivalent ways: P(B|A) = P(B) P(A and B) = P(B ∩ A) = P(B) × P(A).

How do you find the probability of exactly one event happening? ›

P(exactly one of them occurs) = P(A) + P(B)

What is the probability of getting no more than 2 heads? ›

There are 8 total possibilities out of which there are 4 with fewer than 2 heads. So the probability is 4/8 or 1/2. If you toss a fair coin twice, what is the probability of getting heads or tails? If you flip a fair coin 10 times what is the probability of getting all tails?

What is the probability of tossing a coin 3 times? ›

Explanation: If you flip a coin, the chances of you getting heads is 1/2. This is true every time you flip the coin so if you flip it 3 times, the chances of you getting heads every time is 1/2 * 1/2 * 1/2, or 1/8.

What is the probability of getting more than 2 heads? ›

If the coin is fair, then the probability of getting either a head or a tail is 0.5. So the probability of getting 10 heads in a row is which is 1 in 1024.

What is the main formula in the normal probability curve? ›

For a random variable x, with mean “μ” and standard deviation “σ”, the normal distribution formula is given by: f(x) = (1/√(2πσ2)) (e[-(x-μ)^2]/^2).

How do you solve at least 3 probability problems? ›

Example 1: Free-Throw Attempts
1. P(X≥3) = 1 – P(X=0) – P(X=1) – P(X=2)
2. P(X≥3) = 1 – . 2373 – . 3955 – . 2636.
3. P(X≥3) = 0.1036.
22 Feb 2022

What is a good example of normal distribution? ›

Characteristics that are the sum of many independent processes frequently follow normal distributions. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution.

How do you generate data from a normal distribution? ›

How to Generate a Normal Distribution in Excel
1. Step 1: Choose a Mean & Standard Deviation. First, let's choose a mean and a standard deviation that we'd like for our normal distribution. ...
2. Step 2: Generate a Normally Distributed Random Variable. ...
3. Step 3: Choose a Sample Size for the Normal Distribution.
12 Jul 2021

What is the formula for the standard deviation of a probability distribution? ›

What Is The Formula Of Standard Deviation Of Probability Distribution? The formula of the standard deviation of a binomial distribution is σ= √(npq). Here n is the number of trials, p is the probability of success, and q is the probability of failure.

How do you solve problems involving mean and variance of a probability distribution? ›

To calculate the mean, you're multiplying every element by its probability (and summing or integrating these products). Similarly, for the variance you're multiplying the squared difference between every element and the mean by the element's probability. and X = {1, 2, 3}, then Y = {1, 4, 9}.

How do you find the probability given the mean and variance? ›

var(X)=∑(x−μ)2pX(x), where the sum is taken over all values of x for which pX(x)>0. So the variance of X is the weighted average of the squared deviations from the mean μ, where the weights are given by the probability function pX(x) of X.

How do you find the probability when given the z-score? ›

The Z-score formula is z = x − μ σ .

How do you find the probability of a probability distribution? ›

The probabilities in the probability distribution of a random variable X must satisfy the following two conditions: Each probability P(x) must be between 0 and 1: 0≤P(x)≤1. The sum of all the possible probabilities is 1: ∑P(x)=1.

How do you find the probability of a random selection? ›

P (X) = n/N; where 'n' is the number of the favourable outcomes and 'N' is the number of total possible outcomes.

How do you calculate normal distribution by hand? ›

To standardize a value from a normal distribution, convert the individual value into a z-score: Subtract the mean from your individual value. Divide the difference by the standard deviation.

How do you calculate p-value by hand for z test? ›

How to find p-value from z-score?
1. Left-tailed z-test: p-value = Φ(Zscore)
2. Right-tailed z-test: p-value = 1 - Φ(Zscore)
3. Two-tailed z-test: p-value = 2 * Φ(−|Zscore|) or. p-value = 2 - 2 * Φ(|Zscore|)
19 Jul 2022

What are the 3 types of probability? ›

There are three major types of probabilities: Theoretical Probability. Experimental Probability. Axiomatic Probability.

Top Articles
Latest Posts
Article information

Author: Prof. Nancy Dach

Last Updated: 22/11/2023

Views: 6688

Rating: 4.7 / 5 (57 voted)

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.