2022-06-05 ~2 min read

Boosting

What is boosting?

Boosting is tweaking your model sequentially. Fit a model and see where it lacks and improve that specific part.

Gradient Boosting: Special case of boosting where errors are minimized by gradient decent algorithm
Different than [[AdaBoost AdaBoost]]
- AdaBoost uses weighted versions of training examples, while GBMs use gradients for model coefficients
- GBMs make slightly larger individual trees than stumps in AdaBoost
- GBMs scales the trees by same amount, but in AdaBoost trees are scaled according to the relative amount of influence each individual tree has

Start by predicting the average of continuous values. $\hat{y’}$ (average value of continuous values)
Formula: $Residual = (Observed\space value - Predicted\space value\space in\space the\space previous\space step)$
1. $r = y - \hat{y’}$
Fit a tree to the residuals, and average values in the leaf nodes. $\hat{y’’}$ (Value after fitting the model and averaging the leaf nodes)
Final combined prediction would be: $\hat{y} = \hat{y’} + (\lambda*\hat{y’’})$, where $\lambda$ is the learning rate.
Repeat step 2-4 while taking $\hat{y}$ as now the predicted value $\hat{y’}$
After each step we add the predictions we get from fitting the residuals to the overall prediction
It can work with any model as long as you have a differentiable loss function for the algorithm to minimize

![[Assets/gbm_1.png]]

General equation

![[notes/images/gbm_2.png]]

XGBoost = eXtreme Gradient Boosting
XGBoost: Largely software and hardware optimization of Gradient Boosting
There are a few important parameters: - $\lambda$ is used for regularization - $\gamma$ is used for pruning - $\eta$ is the scaling parameter for each tree
For classification: Cover is defined as similarity score $\lambda$. Cover controls how much can we grow the tree

Make an initial prediction of 0.5
Instead of a regular regression tree, XGBoost individual trees that are slightly different

Fit a tree and calculate similarity scores of all the nodes

Similarity score $S = \frac{(\sum_n Residuals)^2}{

+ \lambda}$, where $n$ is the number of residuals in the node

Calculate the gain of a parent node based on delta of similarity score from the parent node to the child node
Once the tree is fit to the desired depth, we prune the tree based on $\gamma$ value
$\eta$ is the learning rate for scaling individual trees