XXIV Congresso da Associação Portuguesa de Investigação Operacional (IO2025)

Name: XXIV Congresso da Associação Portuguesa de Investigação Operacional (IO2025)
Start: 2025-09-07T14:30:00+01:00
End: 2025-09-09T23:50:00+01:00
Location: No location set

7–9 Sept 2025

Europe/Lisbon timezone

IO2025

Learning to schedule from demonstrations: What we lose by only imitating the best?

Not scheduled

15m

Room 3

Apresentação regular Session 2.3 – Scheduling Session 2.3 – Scheduling

Alexandre Jesus (Universidade de Coimbra)

Early success of Deep Reinforcement Learning (DRL) was rooted in arcade and board games, where expert behavior could be readily captured from top players. In these settings, demonstrations were used to bootstrap learning and accelerate policy convergence. In contrast, in combinatorial optimization problems, such as the Flexible Job-shop Scheduling Problem (FJSP), optimal demonstrations are costly to obtain. In this work, we build on a state-of-the-art DRL framework to investigate how the quality and diversity of demonstrations from FJSP solutions affect learning dynamics and policy generalization. We argue that representativity of the action space is more beneficial for pretraining than strict optimality. To that end, we consider an efficient Constraint Programming (CP) method and several composite heuristic rules as candidate experts. These were evaluated based on the final policy performance, the generalization to unseen instances, and the time required to gather expert FJSP solutions. Preliminary results show that agents pre-trained with diverse sub-optimal demonstrations converge faster to near-optimal policies than those trained solely on solver-based solutions. Moreover, combining CP and heuristic demonstrations leads to superior robustness to unseen instances. These findings suggest that diversity and representativeness in expert behavior may be more critical than optimality alone.

Alexandre Jesus (Universidade de Coimbra)

Arthur Corrêa (Departamento de Engenharia Mecânica, CEMMPRE, ARISE, Universidade de Coimbra, Coimbra, Portugal) Cristóvão Silva (Departamento de Engenharia Mecânica, CEMMPRE, ARISE, Universidade de Coimbra, Coimbra, Portugal) Samuel Moniz (Departamento de Engenharia Mecânica, CEMMPRE, ARISE, Universidade de Coimbra, Coimbra, Portugal)

There are no materials yet.

XXIV Congresso da Associação Portuguesa de Investigação Operacional (IO2025)

IO2025

Learning to schedule from demonstrations: What we lose by only imitating the best?

Room 3

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

XXIV Congresso da Associação Portuguesa de Investigação Operacional (IO2025)

IO2025

Speaker

Description

Author

Co-authors

Presentation materials