What A Simple Linear Regression Model Is and How It Works

A basic statistical method of finding relationships between variables

Business woman hands inspection about point of profit loss on business report.
Korrawin / Getty Images

Linear regression models are used to show or predict the relationship between two variables or factors. The factor that is being predicted (the factor that the equation solves for) is called the dependent variable. The factors that are used to predict the value of the dependent variable are called the independent variables.

In linear regression, each observation consists of two values. One value is for the dependent variable and one value is for the independent variable. In this simple model, a straight line approximates the relationship between the dependent variable and the independent variable.

When two or more independent variables are used in regression analysis, the model is no longer a simple linear one. This is known as multiple regression.

Formula For a Simple Linear Regression Model

The two factors that are involved in simple linear regression analysis are designated x and y. The equation that describes how y is related to x is known as the regression model.

The simple linear regression model is represented by:

y = β0 +β1x

The linear regression model contains an error term that is represented by ε. The error term is used to account for the variability in y that cannot be explained by the linear relationship between x and y. If ε were not present, that would mean that knowing x would provide enough information to determine the value of y.

There also parameters that represent the population being studied. These parameters of the model are represented by β0 and β1.

The simple linear regression equation is graphed as a straight line, where:

  1. β0 is the y-intercept of the regression line.
  2. β1 is the slope.
  3. Ε(y) is the mean or expected value of y for a given value of x.

A regression line can show a positive linear relationship, a negative linear relationship, or no relationship.

  1. No relationship: The graphed line in a simple linear regression is flat (not sloped). There is no relationship between the two variables.
  2. Positive relationship: The regression line slopes upward with the lower end of the line at the y-intercept (axis) of the graph and the upper end of the line extending upward into the graph field, away from the x-intercept (axis). There is a positive linear relationship between the two variables: as the value of one increases, the value of the other also increases.
  3. Negative relationship: The regression line slopes downward with the upper end of the line at the y-intercept (axis) of the graph and the lower end of the line extending downward into the graph field, toward the x-intercept (axis). There is a negative linear relationship between the two variables: as the value of one increases, the value of the other decreases.

The Estimated Linear Regression Equation

If the parameters of the population were known, the simple linear regression equation (shown below) could be used to compute the mean value of y for a known value of x.

Ε(y) = β0 +β1x

In practice, however, parameter values generally are not known so they must be estimated by using data from a sample of the population. The population parameters are estimated by using sample statistics. The sample statistics are represented by β0 and β1. When the sample statistics are substituted for the population parameters, the estimated regression equation is formed.

The estimated regression equation is:

(ŷ) = β0 +β1x

Note: (ŷ) is pronounced y hat.

The graph of the estimated simple regression equation is called the estimated regression line.

  1. β0 is the y-intercept of the regression line.
  2. β1 is the slope.
  3. (ŷ) is the estimated value of y for a given value of x.

Limits of Simple Linear Regression

Even the best data does not tell a complete story. 

Regression analysis is commonly used in research to establish that a correlation exists between variables. But correlation is not the same as causation: a relationship between two variables does not mean one causes the other to happen. Even a line in a simple linear regression that fits the data points well may not guarantee a cause-and-effect relationship.

Using a linear regression model will allow you to discover whether a relationship between variables exists at all. To understand exactly what that relationship is, and whether one variable causes another, you will need additional research and statistical analysis.