Brier Score on Self Improvement

Inspired by the book Superforecasting, I'm curious about my ability to forecast myself. Like the best curiosities, this one will be settled with an experiment.

Anything that's worth improving is worth measuring, and a way to measure probabilistic predictions is a Brier Score.

I hope to get a better understanding of my understanding of myself while exercising prototyping and just-do-the-thing skills.

Experimental Design

I spent a little over an hour making a list of things in the coming week I hope, wish, or expect to do. Each item is objectively evaluated, and binary (either true or false).

At the end of each day-half, and at the end of the week, observations will be recorded on what happened.

After that, I'll calculate an aggregate Brier Score and write up my notes from the week.


The process of making this list and scoring it revealed a number of shortcomings in the approach, which can be addressed in following experiments.

The quantization of predictions (1% granularity) doesn't match my feelings, which are pretty vague and fuzzy compared to the column of numbers at the bottom of this post.

The predictions basis is almost entirely "gut feeling", loosely on experience. Even with CFAR techniques I hesitate making these quantitative judgements without hard data.

My format was in half-days, so if it occurs in every day-half, I made ten predictions for it. It doesn't seem valuable to generate that many predictions, so next time I'll replace them with a coarser grained prediction with a bunch of observations (e.g. "4/5 Morning Workouts").

Making events binary simplifies the predictions and observations, but leaves out cases where multiple-outcome predictions make sense. These kind of predictions I expect to fit well with resource constrained choices. If I see "Friday Night" as a limited resource, I could assign predictive probabilities to different exclusive events.

The layout has 5 predictions per day-half, and 10 week-long predictions. I felt a very strong desire to fill this out, and very little desire to add more after it was full. Future designs for this should make it easier to add/remove to an appropriate (but experimentally rigorous!) amount.

I feel like I expressed only a tiny part of what I hoped to accomplish in a week, but thinking of doing all of the things on the list feels more burdensome than a whole week. This might be indicative of miscalibration in self-modeling this experiment hopes to address, or that the current experimental design is a bit depressing to get through.

Dependencies are poorly expressed in this -- separate predictions are assumed to be independent events. This fails particularly badly in the "Bike Commute to work" predictions -- there's very little chance of riding my bike home if I took the train to work.

I'll be keeping other notes throughout the week to see what works.