Return to References

Random Variables

Understanding the concept of a random variable is important for a deeper understanding of statistics. Next some key terminology will be covered related to random variables.

\defs{Definitions}

  • {\bf Random variable: Is an outcome or observation whose value is determined by a process that is not predetermined and thus can't be predicted. Random variables are often denoted using capital letters, and possible values that a random variable can take by a lower case letter.}
    1. {\bf Categorical random variable:} Is a random variable that results in categorical response (non-numeric), such as gender (male or female), and opinion (strongly disagree, disagree, ..., or strongly agree).
      • {\bf Dummy coding:} Dummy coding is turning a variable with two or more outcomes into a variable(s) with possible values of 0 and 1. Often categorical variables are dummy coded for analysis purposes. For example, the gender male might be assigned the value of 0 and females the value of 1. If there are several categories, several dummy variables are needed to capture all the information. The dummy coded data can now be treated as a numerical random variable.
    2. {\bf Numerical random variable:} Is a random variable that results in a numerical response. Examples include height, weight, age, income, etc. of a randomly selected individual.
      1. {\bf Discrete random variable:} Resulting integer values, like the number of heads observed when flipping a coin four times, x=0,1,2,3 or 4. For an example, see Table~contdisc1.
      2. {\bf Continuous random variable:} Resulting in continuous values, like income. For an example see Table~contdisc1.
  • {\bf Cumulative distribution function (c.d.f.):} Basically $P(X \leq x)$ where $X$ is a random variable and $x$ is a real number. The cdf is often denoted with a capital $F$ as $F(x)$, i.e. $F(x)=P(X \leq x)$.
  • {\bf Probability distribution function (p.d.f.):}
    1. For a discrete random variable it is merely the probability of a certain value occurring, $P(X=x)$.
      • The probability distribution function has the following properties:
        1. $f(x_i) \geq 0, \quad \forall i.$
        2. $\sum_{\forall i} f(x_i)=1$
    2. For a continuous random variable the $P(X=x)=0$ and thus the definition is not the same. The p.d.f. for a continuous random variable is a curve described by the function, $f(x)$. The area under the curve within a given interval yields the probability of the continuous random variable falling within that given interval.
      • The probability distribution function has the following properties:
        1. $f(x) \geq 0$
        2. $\int_{-\infty}^{\infty}{f(x)dx}=1$
        3. $F(b)-F(a)=P(a\leq X\leq b) = \int_{a}^{b}{f(x)dx}$, which is the area under the curve $f(x)$ from $a$ to $b$, $a\leq b$.
      • Note: $P(X=b)=F(b)-F(b)=\int_{b}^{b}{f(x)dx}=0$, that is the probability of a continuous random variable equaling a specific constant, say $b$, is zero.
  • {\bf Expectation} of a random variable is the mean value (a weighted mean) of the variable $X$ in the sample space, or population, of possible outcomes. {\em Expected value} can also be interpreted as the mean value that would be obtained from an infinite number of observations of the random variable.

\begin{table} \centering \begin{tabular}{|c|c|}\hline Discrete & Continuous \hline 0& 736.1918273 1& 759.5668806 2& 812.7593044 3& 562.2359305 4& 798.2952718 \hline \end{tabular} \caption{Example of Discrete and Continuous Data} \label{contdisc1} \end{table}

\defl{Examples of Categorical, Continuous and Discrete Data.}

  • Categorical:
    1. Gender
    2. Blood Type
    3. Marital Status
    4. Eye Color
    5. Political Party
  • Discrete:
    1. Number of people using the ATM at a certain location within the past hour.
    2. Number of brothers or sisters a person has.
    3. Number of times a person won at roulette within the past 20 spins.
  • Continuous:
    1. Income
    2. Age
    3. Height
    4. Weight
[\latex]

Binomial

\defl{Binomial Distribution has the following properties:} There are a fixed number of trials or observations, $n$, determined in advance. Each trial can take on one of two possible outcomes, labeled ”success” and ”failure”. Each trial’s outcome is determined independently of all the other trials. The probability of a success and that of a failure remains …

Exponential

Exponential Distribution has the following properties: Equals the distance between successive occurances or arrivals of a Poisson process with mean $\lambda > 0$ $\lambda$ is the average number of occurances or arrivals per unit of time (length, space, etc.) $\frac{1}{\lambda}$ is the average time between occurrences or arrivals. \defl{Exponential Distribution:} \[f(x) = \lambda e^{-{\lambda}x} \] …

Hypergeometric

\defl{Hypergeometric distribution has the following properties:} When units are selected from a finite population without replacement and the population consists of successes and failures. The major difference between the Hypergeometric distribution and the Binomial distribution is that the probability of selecting a success is {\bf not constant and is not independent} from each draw. \defm{Hypergeometric …

Normal

\defl{Normal Distribution has the following properties:} Symmetrical and a bell shaped appearance. The population mean and median are equal. An infinite range, $-\infty < x < \infty$ The approximate probability for certain ranges of $X$-values: $P(\mu - 1\sigma < X < \mu + 1\sigma) \approx 68%$ $P(\mu - 2\sigma < X < \mu + 2\sigma) …

Poisson

\[P(X=x)=f(x) = \frac{e^{-\lambda}\lambda^{x}}{x!} \] $X$ is Poisson Distributed $x$ equals the number of success in the interval $x = 0,1,2,\ldots$ $0