Self-Supervised Learning -

Joint Embedding	Masked Image Modeling (MIM)
Pros:	Pros:
✓ Produces highly semantic features, great for classification.	✓ Conceptually simple, with no need for positive/negative pairs.
✓ Architecture agnostic.	✓ Masking reduces pre-training time.
✓ Achieves competitive results in linear probing evaluations.	✓ Achieves competitive results with fine-tuning.
	✓ Stronger fit for low-level tasks (e.g., denoising, super-resolution).
Cons:	Cons:
✗ May require very large batch sizes (e.g., SimCLR).	✗ Requires a Vision Transformer (ViT) backbone.
✗ Requires careful tuning of data augmentations.	✗ Weaker performance on abstract, high-level tasks like classification.
✗ Requires special mechanisms to handle negative samples or avoid collapse.
✗ Not well-suited for low-level tasks.

Method Category	Core Idea	Key Characteristics	Examples
Contrastive	Pull positive pairs (views of the same image) close in the embedding space while pushing negative pairs (views of different images) apart.	Requires negative samples, which can necessitate large batch sizes. Good for multimodal data.	SimCLR, MoCo
Clustering	Learn embeddings by grouping similar samples into clusters without using explicit negative pairs.	Jointly learns feature representations and cluster assignments.	SwAV, Deep Cluster
Distillation	A “student” network is trained to match the output distribution of a “teacher” network on different augmented views.	Avoids collapse via an asymmetric architecture (student vs. teacher). The teacher is often updated via an Exponential Moving Average (EMA) of the student’s weights. Does not require negative samples.	BYOL, DINO
Regularization	Avoids collapse by imposing regularization terms on the embeddings, such as decorrelating feature dimensions.	Maximizes the information content of the embeddings by penalizing redundancy. No negative samples required.	Barlow Twins, VICReg

SwAV: Swapping Assignments between Views

Posted on October 8, 2025 | 1276 words

Simultaneously cluster the data and learn visual representations by enforcing consistency between cluster assignments, or ‘codes’, generated from different augmented views of the same image. [Read More]

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

Posted on October 7, 2025 | 1055 words

It stabilizes and scales contrastive learning by maintaining a dynamic dictionary with momentum-based updates, becoming a cornerstone for modern SSL methods. [Read More]

SSL Vision Representation Learning Joint Embedding Contrastive Methods

Mastering TerraMind: From Understanding to Fine-tuning

Posted on September 10, 2025 | 2211 words • Other languages: CH

TerraMind is the first large-scale, any-to-any generative multimodal foundation model proposed for the Earth Observation (EO) field. It is pre-trained by combining token-level and pixel-level dual-scale representations to learn high-level contextual information and fine-grained spatial details. The model aims to facilitate multimodal data integration, provide powerful generative capabilities, and support zero-shot and few-shot applications, while outperforming existing models on Earth Observation benchmarks and further improving performance by introducing ‘Thinking in Modalities’ (TiM). [Read More]

GeoFM terramind Remote Sensing AI Large Model EO

Monte Carlo Sampling

Posted on August 30, 2025 | 2912 words • Other languages: CH

Understand the core concepts of Monte Carlo: Law of Large Numbers, rejection sampling, importance sampling, variance reduction techniques (antithetic variates, control variates, stratified sampling). [Read More]

Monte Carlo Sampling Mathematics python

Introduction to MCMC

Posted on August 22, 2025 | 1757 words • Other languages: CH

The reason we need MCMC is that many distributions are only known in their unnormalized form, making traditional sampling/integration methods ineffective. By constructing a ‘correct Markov chain’, we can obtain the target distribution from its stationary distribution, meaning the long-term distribution of the trajectory ≈ target distribution. [Read More]

Monte Carlo Markov Chain Sampling Mathematics python

What is Probability?

Posted on August 15, 2025 | 4453 words • Other languages: CH

This article introduces the basic concepts and rigorous formulas of probability, serving as the foundation for understanding random variables, sampling, and MCMC. [Read More]

Probability Statistics Foundation Mathematics python

Random Variables and Sampling

Posted on August 2, 2025 | 2646 words • Other languages: CH

Understand concepts of random variables, PDF, expectation, and sampling methods for common distributions (Uniform, Normal, Exponential). [Read More]

Random Variables Sampling Random Experiments Course Notes Math python

Lesson 1: Introduction to Remote Sensing Data and Python Setup

Posted on July 2, 2025 | 74 words • Other languages: CH

Learn about raster and vector data and set up your Python environment [Read More]

raster vector remote sensing python

Posted on January 1, 0001 | 998 words

title: “BYOL Explained: Self-Supervised Learning without Negative Pairs” date: 2025-10-08 summary: “Understanding BYOL: How interactions between Online and Target networks achieve SOTA performance without negative samples. A deep dive into the architecture and loss function.” series: [“Self-Supervised Learning”] tags: [“BYOL”, “Contrastive Learning”, “SSL”, “CV”, “Paper Notes”]

[Read More]

Posted on January 1, 0001 | 1072 words • Other languages: CH

title: “SimCLR Explained: Contrastive Learning Design & Code” date: 2025-10-07 summary: “A detailed visual guide to SimCLR. Understand the logic behind stochastic data augmentation, the NT-Xent loss, and why contrastive learning works.” series: [“Self-Supervised Learning”] tags: [“SimCLR”, “Contrastive Learning”, “SSL”, “CV”, “Paper Notes”]

[Read More]

Self-Supervised Learning

Core Paradigms of Self-Supervised Learning

1. Joint Embedding Architectures

2. Masked Image Modeling (MIM)

3. Hybrid Architectures

Latest Articles