This Thematic Programme will foster research on learning models and algorithms when - in contrast to supervised learning - information about the correct predictions are not immediately available to the learner. The assumption of full information about a training instance is often unrealistic and in many applications the learner must deal with limited feedback. Although some aspects of learning with limited feedback have already been thoroughly analyzed (e.g., multi-armed bandit problems), many problems are still open.
Among others the following topics are relevant for this Thematic Programme:
- Reinforcement learning as a model of delayed feedback, where the utility of predictions/actions might be revealed only after a number of further predictions.
- Variants of the bandit problem as models of partial feedback, where only the utility of the learner's predictions is available but not the utility of possible alternative predictions.
- Models of indirect feedback, where neither the true outcome nor the utility of the prediction is observed, but only an indirect feedback loosely related to the prediction.
- In general, the exploration-exploitation trade-off in learning models.
- Semi-supervised and active learning.
For more information, please visit the
webpage of this Thematic Programme.