surprise = information content
Bootstrap
one bootstrap sample:
Before bootstrap, I can have like one decision tree. After bootstrap, I can have a lot of independent decision trees.
During prediction:
mode = value that appears most frequently in a dataset.
Bootstrapping doesn’t increase the BIAS because the expected value of the bootstrap sample is the same as the original dataset. But decrease the variance.
After bootstrap, the decision trees won’t be correlated (hopefully). i.e. correlation won’t be 1.
Variance of ( the average of the decision trees on bootstrap samples) will be lower.
The problem in real life though is that the decision trees on the bootstrap samples are still somewhat correlated.
There’s also something called feature bagging.
🌳 Random Forests
Question: Consider n training points in a feature space of d dimensions. Consider building a random forest with T binary trees, each having exactly h internal nodes. Let m be the number of features randomly selected (from among d input features) at each treenode. For this setting, compute the probability that a certain feature (say, the first feature) is never considered for splitting in any treenode in the forest.
Solution:
Prob of not considering feature i in a single node =
Prob of not considering feature i in the forest =
Stump: Decision Tree with One Split.