..
Boosting
What is boosting?
Boosting is tweaking your model sequentially. Fit a model and see where it lacks and improve that specific part.
What is Gradient Boosting?
- Gradient Boosting: Special case of boosting where errors are minimized by gradient decent algorithm
-
Different than [[AdaBoost AdaBoost]] - AdaBoost uses weighted versions of training examples, while GBMs use gradients for model coefficients
- GBMs make slightly larger individual trees than stumps in AdaBoost
- GBMs scales the trees by same amount, but in AdaBoost trees are scaled according to the relative amount of influence each individual tree has
Step by Step example 1 (Regression)
- Start by predicting the average of continuous values. $\hat{y’}$ (average value of continuous values)
- Formula: $Residual = (Observed\space value - Predicted\space value\space in\space the\space previous\space step)$
- $r = y - \hat{y’}$
- Fit a tree to the residuals, and average values in the leaf nodes. $\hat{y’’}$ (Value after fitting the model and averaging the leaf nodes)
- Final combined prediction would be: $\hat{y} = \hat{y’} + (\lambda*\hat{y’’})$, where $\lambda$ is the learning rate.
- Repeat step 2-4 while taking $\hat{y}$ as now the predicted value $\hat{y’}$
- After each step we add the predictions we get from fitting the residuals to the overall prediction
- It can work with any model as long as you have a differentiable loss function for the algorithm to minimize
Working example
![[Assets/gbm_1.png]]
General equation
- $F_0 = average(y)$
- Residual $R_n = (y - F_{n})$
- Fit the tree $H_n$ on $R_n$
- $F_{n+1}=F_n + \lambda*H_n$
![[notes/images/gbm_2.png]]
What is XGBoost
- XGBoost = eXtreme Gradient Boosting
- XGBoost: Largely software and hardware optimization of Gradient Boosting
- There are a few important parameters: - $\lambda$ is used for regularization - $\gamma$ is used for pruning - $\eta$ is the scaling parameter for each tree
- For classification: Cover is defined as similarity score $\lambda$. Cover controls how much can we grow the tree
Step by Step (Regression)
- Make an initial prediction of
0.5 - Instead of a regular regression tree, XGBoost individual trees that are slightly different
- Fit a tree and calculate similarity scores of all the nodes
-
Similarity score $S = \frac{(\sum_n Residuals)^2}{ n + \lambda}$, where $n$ is the number of residuals in the node
-
- Calculate the gain of a parent node based on delta of similarity score from the parent node to the child node
- Once the tree is fit to the desired depth, we prune the tree based on $\gamma$ value
- $\eta$ is the learning rate for scaling individual trees
Special things about XGBoost
- Sparsity Awareness: Finds best missing values
- Weighted Quantile Sketch: Efficiently find an optimal split
- Cross Validation: In build CV