Medical AI Projects

한국어 English

1. Core Vision

의료 영상에서 단순 분류(Classification)가 아니라, 질병을 정량화(Quantification)하고, 구조화된 임상 표현(structured clinical representations)을 생성하여 LLM 기반의 임상 추론으로 연결되는 "End-to-End Clinical Reasoning Pipeline"을 구축하는 것이 전체적인 연구 비전입니다.

이 비전은 다음과 같은 핵심 목표를 가집니다:

의료 영상 → 병변 분할(segmentation) → 수치화(quantification) → 정량 지표 기반 예측 모델
정량 지표 + 이미지 + 텍스트 리포트 → Multimodal LLM 기반 AI Doctor Assistant
안전성·신뢰성 강화: 모델 검증(Formal Verification), 안전 뉴런 분석, mechanistic interpretability

2. Research Theme A: Medical Image Quantification & Disease Modeling

전통적인 CNN 기반 분류를 넘어, 질병의 진행 정도를 수치화하고 임상적으로 해석 가능한 continuous biomarker를 생성하는 연구 라인입니다.

A.1. Ophthalmology (안과 영상 기반 정량화)

Epiretinal Membrane(ERM), Diabetic Retinopathy, Macular Disease 등
Fundus / OCT-B-scans 기반:
- 병변 segmentation (Membrane, Retina layers, cyst regions)
- Thickness map, curvature, reflectance profile 등 정량 지표 생성
정량 지표 기반 disease staging, progression prediction 모델

A.2. Gait / Orthopedics (정형외과 보행 분석)

Markerless video (pose estimation) → biomechanical features → gait anomaly quantification
Clinical grading 대신 정량 feature를 활용한 진단 및 progression 모델
Pediatric, elderly imbalance assessment 등 확장 가능

A.3. Multi-modal structured data integration

영상, 정량 feature, EMR, 검사 수치(labs) 등 통합.
최종 목적: disease progression world model 구축.

3. Research Theme B: Domain-Specialized Medical LLMs (Ophtimus-V2 계열)

사용자가 직접 개발한 Ophthalmology 특화 LLM(Ophtimus-V2-Tx) 연구 라인입니다.

B.1. Clinical reasoning 모델

케이스 리포트 기반 fine-tuning
증상–영상–진단–치료로 이어지는 "임상 지식 경로(clinical knowledge pathway)" 학습
hallucination 감소 및 안전성 강화 목적의 LoRA 및 structured LoRA 실험

B.2. Multi-modal 입력 확장

Fundus / OCT(B-scan) embedding + structured quantification + textual description
나아가 의료용 World Model과 결합하여 progression simulator 연동 가능

B.3. Safety & Trustworthiness

"Safety Neurons" 분석
Mechanistic interpretability (circuit-level patterns in reasoning)
Clinically harmful output 검출 및 unlearning

4. Research Theme C: Formal Verification + AI Safety for Medical AI

의료 AI의 신뢰성과 규제 대응(의료기기 인허가 등)을 위해
정형 기법(Formal Methods) + AI Safety를 결합한 독자적 연구 라인.

C.1. Verified Environment Models

Timed Automata 기반 의료 프로세스 모델
Model checking(PCTL, CTL, TCTL)을 통한 안전 제약 조건 검증
강화학습 또는 AI inference가 이 제약을 위반하지 않도록 control shield 제공

C.2. Verified AI Controllers

Medical AI inference pipeline에 safety property 강제
"언제 어떤 입력에서 위험한 출력이 발생 가능한가"를 검증하는 분석
Verification-aware fine-tuning 또는 pruning

C.3. Trustworthy Data & Contamination Check

Crowd annotation에서 LLM-cheating 탐지(peer prediction 기반)
의료 데이터 라벨의 신뢰성 확보

5. Research Theme D: Medical World Models & Embodied AI

NeurIPS 2025의 핵심 트렌드("World Models", "Embodied AI for Healthcare")와 직접적으로 정렬되는 연구 방향.

D.1. Disease Progression World Model

Retina / ERM progression dynamics를 모델링하는 generative world model
OCT/B-scan 연속 영상 기반 temporal latent dynamics
"만약 환자의 상태가 X라면, 6개월 후의 OCT는 어떻게 변할까?" 같은 counterfactual simulation 가능

D.2. Multi-modal Clinical Simulator

이미지, 정량 biomarker, 텍스트 리포트, 치료 이력 포함
LLM에게 구조화된 임상 시뮬레이션 컨텍스트 제공
임상 결정지원(Decision Support) 최대 강화

D.3. Reinforcement Learning in Verified Clinical Simulation

실세계 의료를 직접 학습시키는 것이 금지되는 경우
Verified world model 기반 safe RL 적용 가능
Treatment planning 또는 screening 정책 최적화 연구로 확장 가능

6. Research Theme E: Foundations for AI-Driven Clinical Decision Support

위의 모든 축(A~D)를 통합하여 임상 추론 자동화라는 궁극적인 의료 AI 목적을 지원.

E.1. Image → Biomarker → Reasoner → Recommendation

완전히 end-to-end 연결 가능한 pipeline 구축
영상 기반 quantification이 LLM reasoning의 입력 구조로 연결됨

E.2. Multi-lingual / Multi-institution Generalization

한국, 미국(UPenn), 기타 기관 데이터 협력 기반
Robustness, distribution shift 연구 수행

E.3. Regulatory-readiness

신뢰성 평가 지표(specificity, sensitivity, FN-critical tasks)
"Safety case" 구조를 갖춘 의료 AI 문서화 가능

7. 전체 테마 요약 (One-page Executive Summary)

사용자의 Medical AI 연구는 단순한 이미지 분류를 넘어서 다음의 통합적 연구 생태계를 구축하는 것에 초점을 둔다.

질병 정량화 기술
- 영상 기반 병변 분석, 수치화, progression modeling
임상 특화 LLM 개발(Ophtimus-V2-Tx)
- Ophthalmology 전문 reasoning 모델
- Multi-modal (OCT/Fundus + EMR + biomarkers) 처리
AI Safety & Formal Verification 적용
- 의료 AI를 위한 safety constraints 보장
- Verified environment + verified inference
World Model 기반 임상 시뮬레이션
- 질병 진행 시뮬레이션
- LLM의 clinical decision reasoning을 위한 foundation
전반적 의료 의사결정 지원 시스템 구축
- Data → Image → Quantification → LLM → Decision까지 end-to-end

Key Themes

Domain-specialized LLMs for ophthalmology (e.g., Ophtimus-V2-Tx)
Noise-robust medical image analysis and quantification
Reliable mapping from model outputs to clinical coding systems
Evaluation frameworks for safety, robustness, and explainability

Selected Projects

Ophtimus-V2-Tx: An 8B-parameter ophthalmology LLM trained on case reports and evaluated with CliBench-based coding.
ERM Quantification: Low-cost and fast SD-OCT based epiretinal membrane detection and thickness quantification.

Ophtimus: Ophthalmology-specific LLM

🤗 Models and Datasets | 📕 AAAI 2025 workshop Paper

Introduction

Ophtimus is an open-source large language model (LLM) specialized in ophthalmology, built with 8 billion parameters based on the LLaMA architecture. It was trained on carefully curated ophthalmology-specific data, including medical papers, textbooks, and research reports. Through filtering, summarization, and preprocessing, only the most relevant and high-quality information was retained.

Designed to be both lightweight and high-performing, Ophtimus is suitable for real-world applications such as clinical decision support, medical education, and patient communication. The model and its training pipeline are fully open-sourced, providing a practical reference for developing similar domain-specific LLMs in other areas of medicine.

GitHub Repository: github.com/jinkimh/Ophtimus-Ophthalmology-LLM

Dataset Details

Note: All datasets were either newly constructed or adapted for this project. Pre-training datasets were curated from open-source ophthalmology materials, while instruction-tuning and evaluation datasets were built by extracting only ophthalmology-relevant samples from broader medical corpora. All data underwent preprocessing steps including deduplication, language filtering (English only), and removal of any personally identifiable information (PII).

Dataset name	Source	Size	Purpose	Key Features
Ophthalmology-pubmed-corpus [Link]	Ophthalmology paper	18.4M Tokens	Pre-Training	• Map-reduce method summary • Broad ophthalmic keywords
Ophthalmology-textbook-corpus [Link]	Ophthalmology textbook	4M Tokens	Pre-Training	• Trusted medical sources • Rich in diagnostic cases
Ophthalmology MCQA Inst dataset [Link]	Ophthalmology Docs	51.7k QAs	Inst-Tuning	• Diverse multiple-choice formats • Reasoning included • Variety of ophthalmic topics
Ophthalmology EQA Inst dataset [Link]	Ophthalmology Docs	49.3k QAs	Inst-Tuning	• Variety of ophthalmic topics
Ophtimus-Eval-Dataset [Link]	Medical platform data	2,153 QAs	Evaluation	• expert-verified data • MCQA dataset
PubMedQA-ophthal-Dataset [Link]	PubMedQA	297 QAs	Evaluation	• Ophthalmology domain filtered • True/False MCQA dataset
MedMCQA-Ophthal-Dataset [Link]	MedMCQA	6,932 QAs	Evaluation	• Ophthalmology domain filtered • MCQA dataset
EQAEval-Dataset [Link]	MedQuAD, Others	1,389 QAs	Evaluation	• Diverse open-source datasets • Ophthalmology domain filtered • Essay QA

Model Details

Note: The "pre-training" and "fine-tuning" columns in the table refer to the training performed in this project. The base models had already undergone pre-training and/or fine-tuning prior to this project, and we applied transfer learning using those models.

Model name	Base model	Parameters	Pre-training	Instruction-tuning
Ophtimus-Base [Link]	Llama-3.1-8B	8B	✅	❌
Ophtimus-Llama-1B [Link]	Llama-3.2-1B-Instruct	1B	❌	✅
Ophtimus-Llama-3B [Link]	Llama-3.2-3B-Instruct	3B	❌	✅
Ophtimus-Llama-8B [Link]	Llama-3.1-8B-Instruct	8B	❌	✅
Ophtimus-Instruct-8B [Link]	Ophtimus-Base	8B	✅	✅

Performance

Note: Multi-Choice QA: Ophtimus-Eval, MedMCQA, PubMedQA | Essay QA: MedQuAD, Medical Flashcards, Medical Wikidoc
Ophtimus-Eval is a proprietary dataset collected from a medical platform. The others are established medical benchmark datasets, from which only ophthalmology-related QA pairs were extracted for evaluation.

Model	Multi-Choice Question			Essay Question
Model	Ophtimus Eval	MedMCQA (Ophth)	PubmedQA (Ophth)	RougeL	BLEU	METEOR	SemScore
OpenAI GPT-4o	71.95%	81.95%	89.90%	0.193	0.082	0.341	0.761
Llama-3-8B-Instrct	48.60%	74.02%	63.97%	0.193	0.064	0.244	0.684
Llama-3.1-8B-Instrct	39.78%	57.96%	83.84%	0.177	0.054	0.215	0.641
Eye-Llama	32.56%	59.43%	66.11%	0.183	0.062	0.211	0.686
PMC-Llama-13B	48.28%	63.45%	72.48%	0.223	0.082	0.288	0.714
Ophtimus-Llama-1B	41.45%	45.74%	61.95%	0.219	0.076	0.217	0.711
Ophtimus-Llama-3B	52.70%	62.10%	69.36%	0.224	0.077	0.225	0.726
Ophtimus-Llama-8B	60.78%	68.25%	69.70%	0.226	0.083	0.230	0.733
Ophtimus-Instruct-8B	63.85%	71.51%	72.73%	0.222	0.079	0.224	0.735

Quickstart

Install Dependencies

cd Ophtimus-Ophthalmology-LLM
pip install -r requirements.txt

Ophtimus Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# model_name example : BaekSeungJu/Ophtimus-Instruct-8B or Ophtimus-Llama-1B or Ophtimus-Llama-3B or Ophtimus-Llama-8B
model_name = "BaekSeungJu/Ophtimus-Instruct-8B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

system_instruction = (
    "You are an expert ophthalmologist. Please provide accurate and "
    "medically sound answers to the user's ophthalmology-related question."
)

# Enter your questions in the list
questions = [
    "Please describe the symptoms and treatment of epiretinal membrane.",
    "What's good for eyes?"
]

prompts = []
for question in questions:
    row_json = [
        {"role": "system", "content": system_instruction},
        {"role": "user", "content": question}
    ]
    prompt = tokenizer.apply_chat_template(row_json, add_generation_prompt=True, tokenize=False)
    prompts.append(prompt)

input_ids = tokenizer(
    prompts,
    padding=True,
    return_tensors="pt",
)["input_ids"].to("cuda")

model.eval()
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=1024,
        do_sample=False,
    )

decoded = tokenizer.batch_decode(outputs, skip_special_tokens=False)
for i, text in enumerate(decoded):
    print(f"------------------------\nAnswer for question {i+1}:\n{text}")

For more details, visit the GitHub repository.

Ophtimus-V2-TX

To be Updated

SD-OCT-based Epiretinal Membrane Diagnostic Assistant System

Introduction

This project presents a low-cost and efficient method for detecting and quantifying Epiretinal Membranes (ERM) using Spectral-Domain Optical Coherence Tomography (SD-OCT). By applying deep learning techniques—specifically, YOLO object detection—we generate en face "ERM Projection Images" from B-scan data, enabling intuitive visualization and accurate measurement of ERM areas. The method also introduces a novel approach to quantify the association between ERM and retinal thickness, enhancing clinical decision-making. Our approach aims to bridge the diagnostic performance gap between SD-OCT and Swept-Source OCT (SS-OCT) while maintaining accessibility and reducing diagnostic burden.

Overall pipeline architecture for ERM detection & quantification

YOLO Model Evaluation

We evaluated three YOLO-based models (v5, v8, v11) for ERM detection using SD-OCT B-scan images.
Each model was trained on two datasets (2,200 images for Full, 1,100 images for Half) and tested on 650 expert-labeled images.

Model	Size	Params (M)	Precision	Recall	mAP@50	mAP@50:95	Dataset Size
YOLOv5	S	7.02	0.752	0.703	0.722	0.423	Full
			0.694	0.642	0.664	0.376	Half
	M	20.87	0.783	0.734	0.752	0.444	Full
			0.723	0.685	0.701	0.396	Half
	L	46.14	0.813	0.762	0.784	0.463	Full
			0.745	0.704	0.726	0.414	Half
	X	86.22	0.836	0.784	0.802	0.485	Full
			0.763	0.725	0.743	0.437	Half
YOLOv8	S	11.14	0.781	0.736	0.764	0.447	Full
			0.723	0.676	0.701	0.393	Half
	M	25.86	0.813	0.762	0.791	0.466	Full
			0.748	0.705	0.724	0.412	Half
	L	43.63	0.844	0.792	0.823	0.482	Full
			0.774	0.731	0.754	0.436	Half
	X	68.15	0.867	0.814	0.842	0.504	Full
			0.793	0.752	0.772	0.454	Half
YOLOv11	S	9.43	0.804	0.752	0.783	0.468	Full
			0.746	0.692	0.714	0.417	Half
	M	20.05	0.846	0.794	0.821	0.493	Full
			0.774	0.736	0.757	0.443	Half
	L	25.31	0.873	0.823	0.854	0.524	Full
			0.807	0.773	0.793	0.476	Half
	X	56.87	0.902	0.857	0.882	0.556	Full
			0.836	0.803	0.826	0.507	Half

GitHub repository: github.com/jinkimh/SD-OCT-ERM-Quantification