..

Boosting

What is boosting?

Boosting is tweaking your model sequentially. Fit a model and see where it lacks and improve that specific part.

What is Gradient Boosting?

  • Gradient Boosting: Special case of boosting where errors are minimized by gradient decent algorithm
  • Different than [[AdaBoost AdaBoost]]
    • AdaBoost uses weighted versions of training examples, while GBMs use gradients for model coefficients
    • GBMs make slightly larger individual trees than stumps in AdaBoost
    • GBMs scales the trees by same amount, but in AdaBoost trees are scaled according to the relative amount of influence each individual tree has

Step by Step example 1 (Regression)

  1. Start by predicting the average of continuous values. $\hat{y’}$ (average value of continuous values)
  2. Formula: $Residual = (Observed\space value - Predicted\space value\space in\space the\space previous\space step)$
    1. $r = y - \hat{y’}$
  3. Fit a tree to the residuals, and average values in the leaf nodes. $\hat{y’’}$ (Value after fitting the model and averaging the leaf nodes)
  4. Final combined prediction would be: $\hat{y} = \hat{y’} + (\lambda*\hat{y’’})$, where $\lambda$ is the learning rate.
  5. Repeat step 2-4 while taking $\hat{y}$ as now the predicted value $\hat{y’}$
  6. After each step we add the predictions we get from fitting the residuals to the overall prediction
  7. It can work with any model as long as you have a differentiable loss function for the algorithm to minimize

Working example

![[Assets/gbm_1.png]]

General equation

  • $F_0 = average(y)$
  • Residual $R_n = (y - F_{n})$
  • Fit the tree $H_n$ on $R_n$
  • $F_{n+1}=F_n + \lambda*H_n$

![[notes/images/gbm_2.png]]

What is XGBoost

  • XGBoost = eXtreme Gradient Boosting
  • XGBoost: Largely software and hardware optimization of Gradient Boosting
  • There are a few important parameters: - $\lambda$ is used for regularization - $\gamma$ is used for pruning - $\eta$ is the scaling parameter for each tree
  • For classification: Cover is defined as similarity score $\lambda$. Cover controls how much can we grow the tree

Step by Step (Regression)

  1. Make an initial prediction of 0.5
  2. Instead of a regular regression tree, XGBoost individual trees that are slightly different
  3. Fit a tree and calculate similarity scores of all the nodes
    1. Similarity score $S = \frac{(\sum_n Residuals)^2}{ n + \lambda}$, where $n$ is the number of residuals in the node
  4. Calculate the gain of a parent node based on delta of similarity score from the parent node to the child node
  5. Once the tree is fit to the desired depth, we prune the tree based on $\gamma$ value
  6. $\eta$ is the learning rate for scaling individual trees

Special things about XGBoost

  • Sparsity Awareness: Finds best missing values
  • Weighted Quantile Sketch: Efficiently find an optimal split
  • Cross Validation: In build CV

References

  1. XGBoost Algorithm: Long May She Reign! - Towards Data Science
  2. Introduction to Boosted Trees — xgboost 1.0.0-SNAPSHOT documentation
  3. Reference video
  4. Kaggle Blog