I-JEPA: Image-based Joint Embedding Predictive Architecture -

🏷️ Model Name

I-JEPA - Image-based Joint Embedding Predictive Architecture

🧠 Core Idea

“Predict what you can’t see — not in pixels, but in meaning.”

I-JEPA architecture

🖼️ Architecture

                    +-------------------------+
                    |     Input Image         |
                    +-------------------------+
                                |
                                v
                +------------------------------------+
                |     Random Masking of Regions      |
                +------------------------------------+
                   | Visible Patches | Masked Patches |
                   |-----------------|----------------|
                     |                 |
                     v                 v
          +------------------+     +------------------+
          |  Context Encoder |     |   Target Encoder |
          |  (f_context)     |     |   (f_target)     |
          +------------------+     +------------------+
                     |                 |
                     v                 v
          +------------------+     +------------------+
          |  Predictor Head  | --> |  Target Features |
          +------------------+     +------------------+
                     |
                     v
          +----------------------------+
          | Loss: MSE in feature space |
          +----------------------------+

I-JEPA: Image-based Joint Embedding Predictive Architecture

🏷️ Model Name

🧠 Core Idea

🖼️ Architecture

💡 Strengths

⚠️ Limitations

📚 Reference

See also

目录

🏷️ Model Name

🧠 Core Idea

🖼️ Architecture

💡 Strengths

⚠️ Limitations

📚 Reference

Scan to Share微信扫一扫分享

See also

目录

Scan to Share
微信扫一扫分享