Research

Long-Context LLM Token-Pruning

In this project, I’m researching how to make large language models more efficient by pruning unimportant tokens during inference. I design and test token-pruning strategies that use signals like attention patterns and gradient-based salience to decide which parts of the context can be safely dropped while preserving output quality. Using PyTorch and Hugging Face transformers, I run controlled experiments on long-context tasks such as question answering and summarization, tracking the tradeoff between speedup, memory savings, and accuracy. This work is helping me build a deeper understanding of transformer internals, optimization, and evaluation.

Tech Stack: Pytorch, Hugging Face Transformers, scikit-learn.

Paper

AprilTag Robustness Benchmark for FTC

In this project, I’m investigating how well the AprilTag v3 detector’s decision margin (the internal quality metric it outputs for each detection, which I refer to as a “confidence score”) actually reflects true detection reliability. I generate controlled test images with different degradations, such as distance, motion blur, sensor noise, lighting changes, compression, and partial occlusion, and record both the decision margin and the ground-truth outcome for each detection (whether the tag is correctly identified and how large the pose error is). By analyzing how strongly decision margin correlates with true correctness and pose accuracy, I aim to determine how trustworthy this metric is as a predictor. Finally, I explore calibration methods that map decision margin values to empirically grounded probabilities of correctness or error bounds, so that AprilTag v3 outputs become more interpretable and useful for high-stakes robotics applications like FTC autonomous routines.

Tech Stack:

Python + Jupyter + GitHub
AprilTag v3 + OpenCV
Data & Stats Stack
FTC-Relevant Hardware