2022-06-10 ~2 min read

LightGBM

The major benefit of LGBM is that it is more efficient in training over large datasets.

Histogram binning

For a continuous variable, a traditional decision tree based model will sort the values of the variable and then check split on the average of each consecutive values.

In contrast, Light GBM groups the values of the variable and calculates gain b/w these groups. This enables it to run much faster.

Eg: A continuous variable has values: [50, 55, 60, 65, 70 ..] A traditional tree will do a split at 52.5, 57.5, 62.5 and so on. It will also calculate the gain at each of these splits to calculate which value should it select as the split value. LightGBM will group (bins) the values together like: [50, 55, 60] and [65, 70] and calculates gain at only one values i.e 62.5 here.

Histogram binning also helps in reduced memory usage and other speedups that are mentioned in the LightGBM optimization docs.

Exclusive feature bundling

If there are any mutually exclusive features, LightGBM will club them together in a single feature. This will reduce the feature space.

Eg: There is one feature: is_male and another feature is_female. Both of them will always take exclusive values. A single sample can either have 1 value in is_male or is_female. Light GBM will combine this into a single feature

[!Question] How can this be possible if we have not one hot encoded the features?

Gradient based one side sampling (GOSS)

After training an iteration of the model, LightGBM would sort the dataset in the order of decreasing gradients (errors). It will select top 20% of the dataset based on the gradient values and from the remaining 80% of the dataset, it will sample 10% of the values. (20% and 10% are hyper parameters).

The intuition here is that the model should focus more where the gradient is higher and focus less where the gradient is already low.

Leaf wise split

Leaves are only constructed when the loss crosses some threshold. This can result in trees growing asymmetrically.

In the leaf-wise growth strategy, the leaf that will result in the maximum reduction in the loss is chosen to be split, rather than splitting all the leaves at a particular level. Because of this, leaf-wise trees may become deeper compared to level-wise trees for the same number of leaves. The strategy can lead to increased accuracy because it aims to reduce the loss as much as possible during each split, but it can also be more prone to overfitting if not properly tuned.

![[../assets/images/lgbm.png]]

Split for categorical features

Very similar to histogram based binning. Categories of features are clubbed into same bin and then splits are evaluated.