Discovering Patterns in Categorical Data: Manuele Leonelli Explains the Capabilities of Staged Trees Modeling

March 14, 2024

In our latest IE Research Datalab seminar, Manuele Leonelli gave his talk “Modeling categorical data with staged trees”. Professor Leonelli began by introducing the concept of Bayesian networks, often used to map out relationships between categorical variables. These directed graphs, which learn from observable data, are widely used in various scientific fields. To name a few, Bayesian networks are applied in medicine for case-control studies, and in engineering to study the probability of faults and identify the most critical components.

In order to demonstrate how Bayesian networks represent variable dependencies, professor Leonelli used a simple graph named “is the dog barking”. He then addressed the critical question of distinguishing correlation from causation within these networks, highlighting the importance of incorporating expert knowledge to discern probable causes.

The seminar then focused on Manuel Leonelli’s expertise: staged trees. These algorithms are a generalization of Bayesian networks that enable a deeper analysis on variable dependency. Professor Leonelli gave two examples he studies in his research. First, he constructed staged trees that grouped chances of survival at Titanic, based on gender, age and class. A second key application is in studying El Niño impact, where he found that precipitations vary based on the ocean dipole.

A major challenge of staged trees is how to group probabilities and learning independencies without having to try every possible scenario. This can be very computationally expensive for complex problems. Leonelli pointed out the role of Machine Learning in clustering algorithms to address these issues, sparking a debate among IE datalab researchers about alternative clustering methods.

Concluding his presentation, Leonelli introduced his R library, a resource developed by his team to facilitate the application of staged trees in research. He acknowledged the computational demands of optimized staged tree analysis, underscoring his ongoing efforts to improve the algorithm. Researchers interested in further reading can follow this link: https://arxiv.org/abs/2004.06459