The tree based linear regression model for hierarchical categorical variables

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

The tree based linear regression model for hierarchical categorical variables. / Carrizosa, Emilio; Mortensen, Laust Hvas; Romero Morales, Dolores; Sillero-Denamiel, M. Remedios.

I: Expert Systems with Applications, Bind 203, 117423, 2022.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Carrizosa, E, Mortensen, LH, Romero Morales, D & Sillero-Denamiel, MR 2022, 'The tree based linear regression model for hierarchical categorical variables', Expert Systems with Applications, bind 203, 117423. https://doi.org/10.1016/j.eswa.2022.117423

APA

Carrizosa, E., Mortensen, L. H., Romero Morales, D., & Sillero-Denamiel, M. R. (2022). The tree based linear regression model for hierarchical categorical variables. Expert Systems with Applications, 203, [117423]. https://doi.org/10.1016/j.eswa.2022.117423

Vancouver

Carrizosa E, Mortensen LH, Romero Morales D, Sillero-Denamiel MR. The tree based linear regression model for hierarchical categorical variables. Expert Systems with Applications. 2022;203. 117423. https://doi.org/10.1016/j.eswa.2022.117423

Author

Carrizosa, Emilio ; Mortensen, Laust Hvas ; Romero Morales, Dolores ; Sillero-Denamiel, M. Remedios. / The tree based linear regression model for hierarchical categorical variables. I: Expert Systems with Applications. 2022 ; Bind 203.

Bibtex

@article{a36c038c05bd445bb962e06522ed4692,
title = "The tree based linear regression model for hierarchical categorical variables",
abstract = "Many real-life applications consider nominal categorical predictor variables that have a hierarchical structure, e.g. economic activity data in Official Statistics. In this paper, we focus on linear regression models built in the presence of this type of nominal categorical predictor variables, and study the consolidation of their categories to have a better tradeoff between interpretability and fit of the model to the data. We propose the so-called Tree based Linear Regression (TLR) model that optimizes both the accuracy of the reduced linear regression model and its complexity, measured as a cost function of the level of granularity of the representation of the hierarchical categorical variables. We show that finding non-dominated outcomes for this problem boils down to solving Mixed Integer Convex Quadratic Problems with Linear Constraints, and small to medium size instances can be tackled using off-the-shelf solvers. We illustrate our approach in two real-world datasets, as well as a synthetic one, where our methodology finds a much less complex model with a very mild worsening of the accuracy.",
keywords = "Accuracy vs. model complexity, Hierarchical categorical variables, Linear regression models, Mixed integer convex quadratic problem with linear constraints",
author = "Emilio Carrizosa and Mortensen, {Laust Hvas} and {Romero Morales}, Dolores and Sillero-Denamiel, {M. Remedios}",
note = "Publisher Copyright: {\textcopyright} 2022 The Author(s)",
year = "2022",
doi = "10.1016/j.eswa.2022.117423",
language = "English",
volume = "203",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Pergamon Press",

}

RIS

TY - JOUR

T1 - The tree based linear regression model for hierarchical categorical variables

AU - Carrizosa, Emilio

AU - Mortensen, Laust Hvas

AU - Romero Morales, Dolores

AU - Sillero-Denamiel, M. Remedios

N1 - Publisher Copyright: © 2022 The Author(s)

PY - 2022

Y1 - 2022

N2 - Many real-life applications consider nominal categorical predictor variables that have a hierarchical structure, e.g. economic activity data in Official Statistics. In this paper, we focus on linear regression models built in the presence of this type of nominal categorical predictor variables, and study the consolidation of their categories to have a better tradeoff between interpretability and fit of the model to the data. We propose the so-called Tree based Linear Regression (TLR) model that optimizes both the accuracy of the reduced linear regression model and its complexity, measured as a cost function of the level of granularity of the representation of the hierarchical categorical variables. We show that finding non-dominated outcomes for this problem boils down to solving Mixed Integer Convex Quadratic Problems with Linear Constraints, and small to medium size instances can be tackled using off-the-shelf solvers. We illustrate our approach in two real-world datasets, as well as a synthetic one, where our methodology finds a much less complex model with a very mild worsening of the accuracy.

AB - Many real-life applications consider nominal categorical predictor variables that have a hierarchical structure, e.g. economic activity data in Official Statistics. In this paper, we focus on linear regression models built in the presence of this type of nominal categorical predictor variables, and study the consolidation of their categories to have a better tradeoff between interpretability and fit of the model to the data. We propose the so-called Tree based Linear Regression (TLR) model that optimizes both the accuracy of the reduced linear regression model and its complexity, measured as a cost function of the level of granularity of the representation of the hierarchical categorical variables. We show that finding non-dominated outcomes for this problem boils down to solving Mixed Integer Convex Quadratic Problems with Linear Constraints, and small to medium size instances can be tackled using off-the-shelf solvers. We illustrate our approach in two real-world datasets, as well as a synthetic one, where our methodology finds a much less complex model with a very mild worsening of the accuracy.

KW - Accuracy vs. model complexity

KW - Hierarchical categorical variables

KW - Linear regression models

KW - Mixed integer convex quadratic problem with linear constraints

U2 - 10.1016/j.eswa.2022.117423

DO - 10.1016/j.eswa.2022.117423

M3 - Journal article

AN - SCOPUS:85129966271

VL - 203

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

M1 - 117423

ER -

ID: 310556709