MedGemma-4B-IT GGUF (Multimodal)

This repository provides GGUF-formatted model files for google/medgemma-4b-it, designed for use with llama.cpp. MedGemma is a multimodal model based on Gemma-3, fine-tuned for the medical domain.

These GGUF files allow you to run the MedGemma model locally on your CPU, or offload layers to a GPU if supported by your llama.cpp build (e.g., Metal on macOS, CUDA on Linux/Windows).

For multimodal (vision) capabilities, you MUST use both a language model GGUF file AND the provided mmproj (multimodal projector) GGUF file.

Original Model: google/medgemma-4b-it

Files Provided

Below are the GGUF files available in this repository. It is recommended to use the F16 version of the mmproj file with any of the language model quantizations.

Language Model GGUFs:

  • medgemma-4b-it-F16.gguf:
    • Quantization: F16 (16-bit floating point)
    • Size: ~7.77 GB (Verify this with your actual file size)
    • Use: Highest precision, best quality, largest file size.
  • medgemma-4b-it-Q8_0.gguf:
    • Quantization: Q8_0
    • Size: ~4.13 GB (Verify this with your actual file size)
    • Use: Excellent balance between model quality and file size/performance.

Multimodal Projector GGUF (Required for Image Input):

  • mmproj-medgemma-4b-it-Q8_0.gguf:
    • Quantization: Q8_0
    • Size: ~591 MB
    • Use: This file is essential for image understanding. It should be used alongside any of the language model GGUF files listed above. mmproj-medgemma-4b-it-F16.gguf:
    • Quantization: F16 (Recommended precision for projector)
    • Size: ~851 MB
    • Use: This file is essential for image understanding. It should be used alongside any of the language model GGUF files listed above.

How to use?

Dowload the models mmproj files.

Install llama.cpp (https://github.com/ggml-org/llama.cpp)

Run the server via: llama-server -m ~/models/medgemma-4b-it-f16.gguf --mmproj ~/models/mmproj-medgemma-4b-it-f16.gguf -c 2048 --port 8080

Then use the model. Example usage via a visual chat: https://github.com/kelkalot/medgemma-visual-chat

Downloads last month
195
GGUF
Model size
3.88B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support