| Jin H Kim

Ophtimus: Ophthalmology-specific LLM

🤗 Models and Datasets | 📕 AAAI 2025 workshop Paper

Introduction

Ophtimus is an open-source large language model (LLM) specialized in ophthalmology, built with 8 billion parameters based on the LLaMA architecture. It was trained on carefully curated ophthalmology-specific data, including medical papers, textbooks, and research reports. Through filtering, summarization, and preprocessing, only the most relevant and high-quality information was retained.

Designed to be both lightweight and high-performing, Ophtimus is suitable for real-world applications such as clinical decision support, medical education, and patient communication. The model and its training pipeline are fully open-sourced, providing a practical reference for developing similar domain-specific LLMs in other areas of medicine.

GitHub Repository: github.com/jinkimh/Ophtimus-Ophthalmology-LLM

Dataset Details

Note: All datasets were either newly constructed or adapted for this project. Pre-training datasets were curated from open-source ophthalmology materials, while instruction-tuning and evaluation datasets were built by extracting only ophthalmology-relevant samples from broader medical corpora. All data underwent preprocessing steps including deduplication, language filtering (English only), and removal of any personally identifiable information (PII).

Dataset name	Source	Size	Purpose	Key Features
Ophthalmology-pubmed-corpus [Link]	Ophthalmology paper	18.4M Tokens	Pre-Training	• Map-reduce method summary • Broad ophthalmic keywords
Ophthalmology-textbook-corpus [Link]	Ophthalmology textbook	4M Tokens	Pre-Training	• Trusted medical sources • Rich in diagnostic cases
Ophthalmology MCQA Inst dataset [Link]	Ophthalmology Docs	51.7k QAs	Inst-Tuning	• Diverse multiple-choice formats • Reasoning included • Variety of ophthalmic topics
Ophthalmology EQA Inst dataset [Link]	Ophthalmology Docs	49.3k QAs	Inst-Tuning	• Variety of ophthalmic topics
Ophtimus-Eval-Dataset [Link]	Medical platform data	2,153 QAs	Evaluation	• expert-verified data • MCQA dataset
PubMedQA-ophthal-Dataset [Link]	PubMedQA	297 QAs	Evaluation	• Ophthalmology domain filtered • True/False MCQA dataset
MedMCQA-Ophthal-Dataset [Link]	MedMCQA	6,932 QAs	Evaluation	• Ophthalmology domain filtered • MCQA dataset
EQAEval-Dataset [Link]	MedQuAD, Others	1,389 QAs	Evaluation	• Diverse open-source datasets • Ophthalmology domain filtered • Essay QA

Model Details

Note: The "pre-training" and "fine-tuning" columns in the table refer to the training performed in this project. The base models had already undergone pre-training and/or fine-tuning prior to this project, and we applied transfer learning using those models.

Model name	Base model	Parameters	Pre-training	Instruction-tuning
Ophtimus-Base [Link]	Llama-3.1-8B	8B	✅	❌
Ophtimus-Llama-1B [Link]	Llama-3.2-1B-Instruct	1B	❌	✅
Ophtimus-Llama-3B [Link]	Llama-3.2-3B-Instruct	3B	❌	✅
Ophtimus-Llama-8B [Link]	Llama-3.1-8B-Instruct	8B	❌	✅
Ophtimus-Instruct-8B [Link]	Ophtimus-Base	8B	✅	✅

Performance

Note: Multi-Choice QA: Ophtimus-Eval, MedMCQA, PubMedQA | Essay QA: MedQuAD, Medical Flashcards, Medical Wikidoc
Ophtimus-Eval is a proprietary dataset collected from a medical platform. The others are established medical benchmark datasets, from which only ophthalmology-related QA pairs were extracted for evaluation.

Model	Multi-Choice Question			Essay Question
Model	Ophtimus Eval	MedMCQA (Ophth)	PubmedQA (Ophth)	RougeL	BLEU	METEOR	SemScore
OpenAI GPT-4o	71.95%	81.95%	89.90%	0.193	0.082	0.341	0.761
Llama-3-8B-Instrct	48.60%	74.02%	63.97%	0.193	0.064	0.244	0.684
Llama-3.1-8B-Instrct	39.78%	57.96%	83.84%	0.177	0.054	0.215	0.641
Eye-Llama	32.56%	59.43%	66.11%	0.183	0.062	0.211	0.686
PMC-Llama-13B	48.28%	63.45%	72.48%	0.223	0.082	0.288	0.714
Ophtimus-Llama-1B	41.45%	45.74%	61.95%	0.219	0.076	0.217	0.711
Ophtimus-Llama-3B	52.70%	62.10%	69.36%	0.224	0.077	0.225	0.726
Ophtimus-Llama-8B	60.78%	68.25%	69.70%	0.226	0.083	0.230	0.733
Ophtimus-Instruct-8B	63.85%	71.51%	72.73%	0.222	0.079	0.224	0.735

Quickstart

Install Dependencies

cd Ophtimus-Ophthalmology-LLM
pip install -r requirements.txt

Ophtimus Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# model_name example : BaekSeungJu/Ophtimus-Instruct-8B or Ophtimus-Llama-1B or Ophtimus-Llama-3B or Ophtimus-Llama-8B
model_name = "BaekSeungJu/Ophtimus-Instruct-8B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

system_instruction = (
    "You are an expert ophthalmologist. Please provide accurate and "
    "medically sound answers to the user's ophthalmology-related question."
)

# Enter your questions in the list
questions = [
    "Please describe the symptoms and treatment of epiretinal membrane.",
    "What's good for eyes?"
]

prompts = []
for question in questions:
    row_json = [
        {"role": "system", "content": system_instruction},
        {"role": "user", "content": question}
    ]
    prompt = tokenizer.apply_chat_template(row_json, add_generation_prompt=True, tokenize=False)
    prompts.append(prompt)

input_ids = tokenizer(
    prompts,
    padding=True,
    return_tensors="pt",
)["input_ids"].to("cuda")

model.eval()
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=1024,
        do_sample=False,
    )

decoded = tokenizer.batch_decode(outputs, skip_special_tokens=False)
for i, text in enumerate(decoded):
    print(f"------------------------\nAnswer for question {i+1}:\n{text}")

For more details, visit the GitHub repository.