Projects
PhotoPrism MLOps: Semantic Photo Search with Continuous Learning
Spring 2026MLOps Systems ProjectGitHubDeep dive →
- Built a self-hosted AI photo search engine on Chameleon Cloud Kubernetes combining CLIP ViT-B/32 retrieval, Qdrant HNSW indexing, and a Qwen2-VL-2B multimodal reranker that continuously fine-tunes from implicit user click feedback
- Designed and published Flickr30K-CFQ, a training dataset of 31,783 images with 5 query types per image (raw sentence, paraphrase, fragment, phrase, tag) addressing the caption-to-keyword distribution mismatch in standard retrieval benchmarks
- Adapted the POLAR (CVPR 2025) paradigm to the reranking stage: frozen CLIP handles first-stage ANN retrieval while LoRA adapters (r=8, ~0.2% trainable params) on Qwen2-VL-2B rescore candidates; retraining auto-triggers at 100 click events and redeploys in minutes via Docker
- Engineered an async ingest pipeline using FOR UPDATE SKIP LOCKED for parallel feature-worker scaling and a 5-stage semantic search stack with graceful fallback, keeping end-to-end latency under 400ms with GPU reranking active
Stack: Python, PyTorch, CLIP, Qwen2-VL-2B, LoRA (PEFT), Qdrant, Kubernetes, Chameleon Cloud, MLflow, FastAPI, Prometheus
Low-Field to High-Field MRI Super-Resolution with Task-Adaptive Transformers
Spring 2026Independent Neuroinformatics ProjectGitHub
- Independently designed a full medical imaging pipeline to enhance 64 mT MRI scans into 3 T like images using a transformer-based AMIR architecture
- Built preprocessing and augmentation pipelines, including slice-to-volume reconstruction, spatial resampling, and synthetic low-field generation from external IXI datasets
- Trained a 22M-parameter transformer model on ~200 paired subjects, achieving a mean test PSNR of 18.64 dB and max PSNR of 43.21 dB, SSIM of 0.544, demonstrating significant enhancement in image quality and structural similarity
Stack: Python, PyTorch, Nibabel, NumPy, SciPy, CUDA
Chinchilla-Optimal Transformer Pre-training for Music
Fall 2025Independent ML Systems ProjectGitHub
- Independently designed and trained decoder-only Transformer models (NanoGPT) on the Lakh MIDI Dataset
- Optimized training throughput on NVIDIA H100 GPUs using BFloat16 mixed precision, Flash Attention, and torch.compile
- Achieved a test perplexity of 2.20 with 100% syntactically valid output
Stack: Python, PyTorch, CUDA, Flash Attention
Toy Load Balancer with Consistent Hashing
2024Systems Engineering ProjectGitHub
- Built a custom load balancer implementing consistent hashing to distribute traffic across dynamic server nodes
- Containerized the entire architecture (API Gateway, Nodes, Analytics) using Docker Compose for easy deployment
- Implemented a management API to dynamically add/remove servers and visualize request rebalancing in real-time
Stack: Docker, Python, Node.js, Shell