Linear Regression Analysis: Concept And Characteristics

Let’s say we need to conduct research for a company. She wants to know the relationship between a company’s sales and its advertising expenses. What can we do?
Linear regression analysis: concept and characteristics

Sometimes, in a survey, we are interested in knowing if there is a linear relationship between two random variables. That’s what we use linear regression analysis for.

The coefficient that allows us to obtain these data is the Pearson linear correlation coefficient r, whose value varies from -1 to +1 (1). In cases where the linear regression coefficient is close to +1 or -1, it makes sense to consider the equation of the line that “best fits” the point cloud as an acceptable modeling of the association between the two variables.

Mainly, this line allows us to estimate the values ​​of Y that we would get for different values ​​of X. These concepts will be represented in what we call a scatter diagram. The most common procedure for determining the best-fit line is least squares.

An example of using linear regression analysis

An example of using linear regression analysis

Let’s say we need to do research for a company. She wants to know the relationship between  a company’s sales and its advertising expenses. What can we do? Linear regression analysis allows us to know to what degree advertising expenses   explain the sales variable  . Thus,  this last variable will be the dependent variable of the model,  while the explanatory or independent variable will be advertising expenses.

The use of this model will allow us to observe the influence of advertising expenses on the company’s revenue or sales (1). To find out, we have the linear regression line equation. In order to quantify the relationship between the two variables and have an approximation of the magnitude of the influence of advertising expenditures on company sales, we can estimate the model by  ordinary least squares (MQO),  where the sum of squares of the residues is minimized.

This residual is the difference between an observed value and the estimated value. But what is this information for? Well, the goal is to minimize the sum of squares of the residuals. However, we should keep in mind that when performing this analysis, not all points will be found within the regression line (in fact, there are rarely any). If all were, and also if the number of observations were large enough, there would be no estimation error. In this case, there would be no difference between the observed value and the prediction value (1).

The standard error of estimation

In real cases, absolute adjustments of the model to reality do not occur . That’s why there is a measure that describes how accurate the prediction of Y is as a function of X. Or, conversely, how imprecise the estimate can be. This measure is called the standard error of estimate. It is used in linear regression analysis to measure the dispersion around the regression line.

Linear Regression Model Assumptions

If our observations are a random sample from a population, then we are interested in making inferences about that. For these inferences to be “statistically reasonable”,  the following conditions must be met :

  • In the population, the relationship between the ​​X and Y variables should be approximately linear.
  • Residuals are distributed according to a normal curve with a mean of 0.
  • Furthermore, the residues are independent of each other.
  • Residuals have constant variance.

Thus,  this linear regression model is quite “robust”. This means that it is not necessary for the above conditions to be met exactly (in particular the last three).

statistical graphics

Inference in the regression model

After calculating the regression line and the goodness of fit we achieved with the linear regression model, the next step is to perform a hypothesis test in which  the null hypothesis will correspond to the absence of a relationship and the rejection of the null hypothesis to the presence of a significant relationship.

To do this, we must test whether the correlation between the two variables ​​ is different from zero or whether the regression model is valid in the sense of testing whether the analysis of our endogenous variable (Y) is valid through the influence of the explanatory variable (X)

In short,  linear regression analysis applies to countless real-life aspects. It is used in both the social and scientific fields, and is the key to understanding some relationships between variables in statistics.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *


Back to top button