Exercises for Lecture 1
Step 1: Get some data
Choose a dataset for regression or classification. Feel free to choose whichever dataset you wish.
For dataset suggestions, try taking a look at the datasets in the UC Irvine ML Repository with 2000+ samples and easily importable in Python.
Split your data into three parts: a training split (where ML models will be trained), a validation/calibration split (where the conformal prediction calibration will happen) and a test split (where performance metrics will be estimated).
Step 2: Modelling and Conformal Prediction
The following is a list of exercises. There is one, ‘basic’ exercise, which we recommend doing completely, as well as additional exercises to further your understanding according to your interests.
Basic exercise:
- Train a ML model on the training set. Use your favorite model! Possibly one that gives an estimate of uncertainty (e.g., probabilistic classification, quantile regression).
- Choose one or more (e.g., ). Pick a conformity score and use it to do the split conformal calibration on the calibration set, finding the threshold.
- Implement a way to present the predictive sets to the user. E.g., in the case of regression.
- Calculate the coverage probability in the test set: . How far is it from the specified ?
- Calculate the average set size in the test set: . Are your predictive sets informative enough?
- Typically there are different levels of uncertainty for different covariates . How do your intervals change as the change?
Additional exercises:
- Implement the conformity score proposed by [Huang et al, 2024] for multi-class classification.
- Adapt the conformity score for conformalized quantile regression to leverage the relaxed quantile regression method of [Pouplin et al., 2024].
- The inclusion of for the quantile computation is a crucial part of split conformal prediction. What happens to the marginal coverage if it is not included? Feel free to assume that the conformity scores are almost surely unique. (Suggestion: try to prove an alternate version of the quantile lemma [i.e., Lemma 1 of [Tibshirani et al., 2020]].)
- Hard: Suppose for some and consider the conformity score . We know that if , then our predictive sets will be singletons, thus ensuring tightness. Can you prove something when ? (E.g., under some assumption of closeness of to .)