Discussion 9 189

surprise = information content

$X = {x_{1}, x_{2}, \dots, x_{n}}$

Bootstrap

Initial Dataset = 111234

one bootstrap sample:

Bootstrap sample 1 = 111442

Before bootstrap, I can have like one decision tree. After bootstrap, I can have a lot of independent decision trees.

During prediction: $\overset{y}{^} = mode (D T^{(1)} (x), \dots, D T^{(τ)} (x))$

mode = value that appears most frequently in a dataset.

Bootstrapping doesn’t increase the BIAS because the expected value of the bootstrap sample is the same as the original dataset. But decrease the variance.

After bootstrap, the decision trees won’t be correlated (hopefully). i.e. correlation won’t be 1.

Variance of ( the average of the decision trees on bootstrap samples) will be lower.

The problem in real life though is that the decision trees on the bootstrap samples are still somewhat correlated.

There’s also something called feature bagging.

🌳 Random Forests

Question: Consider n training points in a feature space of d dimensions. Consider building a random forest with T binary trees, each having exactly h internal nodes. Let m be the number of features randomly selected (from among d input features) at each treenode. For this setting, compute the probability that a certain feature (say, the first feature) is never considered for splitting in any treenode in the forest.

Solution:

Prob of not considering feature i in a single node = $1 - \frac{m}{d}$

Prob of not considering feature i in the forest = $(1 - \frac{m}{d})^{h T}$

Stump: Decision Tree with One Split.

🪴 Digital Brain

Explorer

Discussion 9 189

Bootstrap

🌳 Random Forests

Graph View