Posts

Showing posts from May, 2023

Statistics for Data Science: Measures of Dispersion

Image
  What is dispersion? Dispersion in statistics is the measure of how far the data points stretch out or spread out from a certain point of reference. Usually, this certain point is the mean, one which gives us a measure of a data set’s central tendency. This is why it becomes a good reference point to calculate the ]dispersion or distance from the central point, so we can work on the data using a specific criteria. For example, let’s assume the average of all students’ scores in a class is 70%. From this overall average, now we can calculate the dispersion for each of the student scores. In other words, this stat can tell us how far behind or ahead a student is compared to the class average. In Data Science, where we deal with enormous amounts of data, measures of dispersion become the primal starting point before we start analysing data on a deeper level. The measures of dispersion commonly used are Range, Interquartile range, Standard Deviation, and Variance. Let’s look into what the

Probability Distribution Functions: PDF, CDF, PMF

Image
  Introduction Statistics heavily rely on probability distribution, and in this article we will learn about the probability mass function, probability density function, and cumulative distribution function in plain English. The many functions that are used to model probability distributions are referred to by a number of terminologies in the fields of probability and statistics. They consist of: Function of probability density (PDF): The PDF function calculates the likelihood that a continuous random variable will have a certain value. The likelihood that a random variable will fall inside a specific range of values is expressed mathematically. In statistics, the probability distribution of a continuous random variable is referred to as a PDF. Most frequently, PDF adheres to the Gaussian Distribution. If the features or random variables have a Gaussian distribution, then the PDF will as well. The probability of a single result on a PDF graph is always 0 since a single point reflects a

What is the importance of Inferential Statistics?

Image
Introduction  The mathematical and statistical principles that underpin data science and machine learning algorithms include descriptive statistics and inferential statistics . You may use extrapolation to guide your judgements if you use inferential statistics. This allows us to fundamentally separate Descriptive Statistics, which provide the measured data, from Inferential Statistics. In inferential statistics, we draw conclusions about the population from a sample. The primary goal of inferential statistics is to extrapolate findings from the sample to the population data. For instance, we need to figure out what a data analyst makes on average in India. Two choices are available. The first choice is to take into account the data of data analysts located all throughout India, inquire about their pay, and calculate an average. The second alternative is to average a sample of data analysts from India's key IT hubs and take it into account for the entire country. Importance of Inf

Statistics 101: Probability Distribution

  Definition: Probability Distribution is defined as a range of values that random variables take to decide the likelihood of an event happening. These events such as a roll of a dice, or the probability that a king is drawn from a pack of cards is calculated and assigned across all possible outcomes. However, it starts proving significance when Probability Distribution helps to work out the probability distribution function, which in turn is used in statistical models such as hypothesis testing. Before we dive deep into it, let’s see some common examples, uses and types of probability distributions. Types and properties of a probability distribution: In Statistics, probability distribution is done with the use of plotting on the X-axis and the Y-axis. On the X-scale, we label the random variables while we can tell about the scores of probabilities of distribution from the Y-axis. For general expression of probability distribution in statistics, we use X ~ N (µ, σ) where the population

Statistics 101: Bayes Theorem and its real life application

  Introduction The ‘ Bayes' Theorem ’ is a conditional probability that assesses uncertainty in the context of a particular situation. The Bayes' rule is applied in a variety of situations, such as when diagnosing a rare disease. By using the Bayes' rule, we may calculate the likelihood that the condition for which the test was designed truly exists. For example, you can be friends with A given that you are friends with B can be deduced by using  Bayes Theorem . This takes into consideration the probability that you are friends with B given that you are already friends with A, multiplied by the probability that you are simply friends with A. To rule out flaws in this calculation, we further divide the whole calculation with the probability that you are simply friends with B, to complete our application of the theorem. In addition to specific situations, Bayes' theorem may be used in daily situations like dating and friendships. You may improve your understanding of the