🏷️ Model Name
I-JEPA - Image-based Joint Embedding Predictive Architecture
🧠 Core Idea
“Predict what you can’t see — not in pixels, but in meaning.”

🖼️ Architecture
                    +-------------------------+
                    |     Input Image         |
                    +-------------------------+
                                |
                                v
                +------------------------------------+
                |     Random Masking of Regions      |
                +------------------------------------+
                   | Visible Patches | Masked Patches |
                   |-----------------|----------------|
                     |                 |
                     v                 v
          +------------------+     +------------------+
          |  Context Encoder |     |   Target Encoder |
          |  (f_context)     |     |   (f_target)     |
          +------------------+     +------------------+
                     |                 |
                     v                 v
          +------------------+     +------------------+
          |  Predictor Head  | --> |  Target Features |
          +------------------+     +------------------+
                     |
                     v
          +----------------------------+
          | Loss: MSE in feature space |
          +----------------------------+
💡 Strengths
⚠️ Limitations
📚 Reference
- Paper: arXiv:2301.08243
- Code: GitHub – facebookresearch/ijepa