🏷️ Model Name
I-JEPA - Image-based Joint Embedding Predictive Architecture
🧠 Core Idea
“Predict what you can’t see — not in pixels, but in meaning.”

🖼️ Architecture
+-------------------------+
| Input Image |
+-------------------------+
|
v
+------------------------------------+
| Random Masking of Regions |
+------------------------------------+
| Visible Patches | Masked Patches |
|-----------------|----------------|
| |
v v
+------------------+ +------------------+
| Context Encoder | | Target Encoder |
| (f_context) | | (f_target) |
+------------------+ +------------------+
| |
v v
+------------------+ +------------------+
| Predictor Head | --> | Target Features |
+------------------+ +------------------+
|
v
+----------------------------+
| Loss: MSE in feature space |
+----------------------------+
💡 Strengths
⚠️ Limitations
📚 Reference
- Paper: arXiv:2301.08243
- Code: GitHub – facebookresearch/ijepa