PhD defense Peng Yu: Improving Decision Tree Learning
Polytechnique Montréal, Québec, Canada
Jury
- Prof. Daniel Aloise, External Reviewer, Polytechnique Montréal, Canada
- Prof. Pinghui Wang, External Reviewer, Xi’an Jiaotong University, China
- MdC Maroua Bahri, Internal Examiner, Télécom Paris, France
- Prof. Daniel Neagu, External Examiner, University of Bradford, UK
- Prof. Albert Bifet, PhD Supervisor, Télécom Paris, France
- Prof. Jesse Read, Co-supervisor, Télécom Paris, France
Abstract
Decision tree models stand out in machine learning for their efficiency and transparency, particularly in handling structured data. However, they face persistent challenges, including the interpretation of complex structures and the efficient management of categorical data. This thesis proposes innovative solutions to these challenges, notably the Linear TreeShap algorithm, which enhances the interpretability of decision trees while reducing computational costs.
A second research focus addresses the handling of categorical features. We introduce the BSplitZ method, which simplifies splitting large sets of categories, and a framework allowing decision trees to directly handle categorical data without requiring numerical encoding. These approaches bridge the performance gap between binary and multi-class classification problems.
Finally, this thesis makes a significant theoretical contribution by proving the non-existence of optimal numerical encoding for splits based on the Mean Absolute Error (MAE) criterion and presenting a novel algorithm for solving complex splitting problems. These advances strengthen both the theoretical foundations and practical applications of decision trees.