diff --git a/docs/source/semantic_segmentation.md b/docs/source/semantic_segmentation.md
index 4a660f737..0102a148f 100644
--- a/docs/source/semantic_segmentation.md
+++ b/docs/source/semantic_segmentation.md
@@ -253,7 +253,7 @@ The following mask formats are supported:
 
 - png
 
-### Supported Mask Dataset Format
+### Specify Mask Filepaths
 
 We support two ways of specifying the mask filepaths in relation to the image filepaths:
 
@@ -659,6 +659,20 @@ transform_args={
 }
 ```
 
+### Train with Multi-channel Images
+
+By default, images are loaded as RGB images. LightlyTrain EoMT also supports 4-channel images, which can be specified in `transform_args`:
+
+```
+transform_args={
+    "num_channels": 4
+}
+```
+
+In this case, you may also want to customize the normalization parameters in `transform_args` to fit your dataset. Otherwise, LightlyTrain will simply repeat the mean and std values of the RGB channels for the extra channels.
+
+You can also randomly drop channels during training for data augmentation with certain probability with the `ChannelDrop` augmentation. See [here](#method-transform-args-channel-drop) for more details.
+
 ## Exporting a Checkpoint to ONNX
 
 [Open Neural Network Exchange (ONNX)](https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange) is a standard format
diff --git a/docs/source/train/index.md b/docs/source/train/index.md
index 77025bf62..85257321c 100644
--- a/docs/source/train/index.md
+++ b/docs/source/train/index.md
@@ -364,6 +364,26 @@ See {ref}`method-transform-args` on how to configure image transformations.
 
 (method-args)=
 
+### Train with Multi-channel Images
+
+By default, images are loaded as RGB images. Beyond that, LightlyTrain pretraining and distillation also supports 4-channel images, which can be specified in `transform_args`:
+
+```
+transform_args={
+    "num_channels": 4
+}
+```
+
+In this case, you may also want to customize the normalization parameters in `transform_args` to fit your dataset. Otherwise, LightlyTrain will simply repeat the mean and std values of the RGB channels for the extra channels.
+
+Currently supported models:
+
+| Library | Supported Models | Docs |
+|---------|------------------|------|
+| TIMM | All models | [🔗](#models-timm) |
+| LightlyTrain | DINOv2 | |
+| LightlyTrain | DINOv3 | |
+
 ### Method Arguments
 
 ```{warning}
diff --git a/docs/source/train/method_transform_args.md b/docs/source/train/method_transform_args.md
index 5e168f742..6417d2cff 100644
--- a/docs/source/train/method_transform_args.md
+++ b/docs/source/train/method_transform_args.md
@@ -100,6 +100,8 @@ Interested in the default augmentation settings for each method? Check the metho
 
 The following arguments are available for all methods.
 
+(method-transform-args-channel-drop)=
+
 ### Channel Drop
 
 Randomly drops channels from the image. Can be disabled by setting to `None`.