PhD defense Peng Yu: Improving Decision Tree Learning

Wednesday, 11 December, 2024 at 10.00 (Montréal time) at Polytechnique Montréal

Polytechnique Montréal, Québec, Canada

Jury

Prof. Daniel Aloise, External Reviewer, Polytechnique Montréal, Canada
Prof. Pinghui Wang, External Reviewer, Xi’an Jiaotong University, China
MdC Maroua Bahri, Internal Examiner, Télécom Paris, France
Prof. Daniel Neagu, External Examiner, University of Bradford, UK
Prof. Albert Bifet, PhD Supervisor, Télécom Paris, France
Prof. Jesse Read, Co-supervisor, Télécom Paris, France

Abstract

Decision tree models stand out in machine learning for their efficiency and transparency, particularly in handling structured data. However, they face persistent challenges, including the interpretation of complex structures and the efficient management of categorical data. This thesis proposes innovative solutions to these challenges, notably the Linear TreeShap algorithm, which enhances the interpretability of decision trees while reducing computational costs.

A second research focus addresses the handling of categorical features. We introduce the BSplitZ method, which simplifies splitting large sets of categories, and a framework allowing decision trees to directly handle categorical data without requiring numerical encoding. These approaches bridge the performance gap between binary and multi-class classification problems.

Finally, this thesis makes a significant theoretical contribution by proving the non-existence of optimal numerical encoding for splits based on the Mean Absolute Error (MAE) criterion and presenting a novel algorithm for solving complex splitting problems. These advances strengthen both the theoretical foundations and practical applications of decision trees.