![]() |
| Photo by Chris Liverani |
This is the first post in the series Statistics Terms for Data Science. In this series , we are going to understand the Statistics Concepts which are used in Data Science.
Probability:
Probability is a measure of the chances of an event happening.
You can't predict many events with complete certainty. We can only predict the chance of an event occurring i.e. how likely it will occur, using it. Probability will vary from 0 to 1, where 0 means that the event is impossible, and 1 implies a certain event.
Random Variables:
Random Variables , generally denoted by X, provides the association between the outcome of the experiment to something measurable.
For e.g. For a loan re-payment analysis, we come to know below outcome
1. Salary < 2 Lakhs ,fails to repay
2. Salary > 2 Lakhs , can repay
So here we have two outcomes, fails to repay and can repay , so we can assign a value to each outcome i.e. we can define the Random Variable X as:
This is useful in quantifying the data and we can perform statistical analysis on this.
Probability Distributions:
A probability distribution is a function that describes the likelihood of obtaining the possible values that a random variable can assume. In other words, the values of the variable vary based on the underlying probability distribution.
A probability distribution tells us the probability for all possible values of X.
E.g:
Expected Value:
An expected value for a variable X is the value of X we can "expect" to get after performing the experiment. It is also called the expectation, mathematical expectation, mean, average, or first moment.
Let be a random variable with a finite number of finite outcomes occurring with probabilities respectively. The expected value of is defined as
E.g.
- Let represent the outcome of a roll of a fair six-sided die. More specifically, will be the number of pips showing on the top face of the die after the toss. The possible values for are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of 16. The expectation of is
The expected value should be interpreted as the average value you get after the experiment has been conducted an infinite number of times
Binomial Distribution:
A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes
Binomial distributions must also meet the following three criteria:
- The number of observations or trials is fixed. In other words, you can only figure out the probability of something happening if you do it a certain number of times. This is common sense—if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a tails is very, very close to 100%.
- Each observation or trial is independent. In other words, none of your trials have an effect on the probability of the next trial.
- The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another.
Cumulative Probability:
Cumulative probability of X, denoted by F(x), is defined as the probability of the variable being less than or equal to x.
The cumulative distribution function of a real-valued random variable is the function given by
| (Eq.1) |
where the right-hand side represents the probability that the random variable takes on a value less than or equal to . The probability that lies in the semi-closed interval , where , is therefore
| (Eq.2) |
Remaining topics will be discussed in next blog in this series.


Comments
Post a Comment