Logan Lawrence

PhD Candidate @ UMass

I'm a second-year CS PhD Candidate in the Manning College of Information and Computer Sciences at the University of Massachusetts, Amherst. I am advised by Grant Van Horn.

Before UMass, I was a Senior Data Engineer working at Saifr in Fidelity Labs. There, I was the project lead for GOST Crawl, a proprietary search engine which uses NLP to index derogatory web information. I was also the lead of the Science team.

I am primarily interested in improving the fine-grained classification and question-answering abilities of multimodal systems. In particular, what motivates me is making sure that VLM responses are visually-grounded, figuring out ways to evaluate NLG responses, and being able to steer predictions with expert knowledge via language. I am always on the lookout for collaborators on research projects, reach out if any of this sounds interesting to you!

Publications

You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction

Logan Lawrence, Oindrila Saha, Megan Wei, Chen Sun, Subhransu Maji, Grant Van Horn

WACV 2026 Paper Poster

Despite the renewed interest in zero-shot visual classification due to the rise of Multimodal Large Language Models (MLLMs), the problem of evaluating free-form responses of auto-regressive models remains a persistent challenge. Most existing works focus on language-only tasks or don't consider Multiple Choice Questions (MCQs) beyond 5-way options, both of which are critical capabilities to solve tasks in Fine-Grained Visual Classification (FGVC) where choice counts are in the hundreds to thousands and the choices are highly related. Furthermore, in this highly multi-way MCQ setting it is not clear how to extend LLM choice extraction to retrieval-based problems, where computing probabilities over the choice set is computationally costly. In this work we investigate nlg2choice, a simple two-stage method which first asks the MLLM an open-ended question for the task with minimal constraints, then uses text-only constrained decoding to predict the most likely choice. In retrieval settings, we compute the probability of the constrained response taking that choice with an early stopping method to significantly improve throughput. Our results show improvement over a suite of seven fine-grained visual datasets when evaluating in terms of classification and retrieval, and show that this performance holds over the various ways that users of LLMs can implement tasks in natural language.

Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too

Logan Lawrence, Ashton Williamson, Alexander Shelton

Preprint, May 2025 Paper

As large-language models have been increasingly used as automatic raters for evaluating free-form content, including document summarization, dialog, and story generation, work has been dedicated to evaluating such models by measuring their correlations with human judgment. For sample-level performance, methods which operate by using pairwise comparisons between machine-generated text perform well but often lack the ability to assign absolute scores to individual summaries, an ability crucial for use cases that require thresholding. In this work, we propose a direct-scoring method which uses synthetic summaries to act as pairwise machine rankings at test time. We show that our method performs comparably to state-of-the-art pairwise evaluators in terms of axis-averaged sample-level correlations on the SummEval (+0.03), TopicalChat (-0.03), and HANNA (+0.05) meta-evaluation benchmarks, and release the synthetic in-context summaries as data to facilitate future work.

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji

ICCV 2025 Paper

Transductive zero-shot learning with vision-language models leverages image-image similarities within the dataset to achieve better classification accuracy compared to the inductive setting. However, there is little work that explores the structure of the language space in this context. We propose GTA-CLIP, a novel technique that incorporates supervision from language models for joint transduction in language and vision spaces. Our approach is iterative and consists of three steps: (i) incrementally exploring the attribute space by querying language models, (ii) an attribute-augmented transductive inference procedure, and (iii) fine-tuning the language and vision encoders based on inferred labels within the dataset. Through experiments with CLIP encoders, we demonstrate that GTA-CLIP yields significant performance improvements across multiple datasets in both zero-shot and few-shot settings.

Efficient Transformer Knowledge Distillation paper image

Efficient Transformer Knowledge Distillation: A Performance Review

Nathan Brown, Ashton Williamson, Tahj Anderson, Logan Lawrence

EMNLP 2023 Paper

As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers. We provide cost-performance trade-offs for the compression of state-of-the-art efficient attention architectures and the gains made in performance in comparison to their full attention counterparts. Furthermore, we introduce a new long-context Named Entity Recognition dataset, GONERD, to train and test the performance of NER models on long sequences. We find that distilled efficient attention transformers can preserve a significant amount of original model performance, preserving up to 98.6% across short-context tasks (GLUE, SQUAD, CoNLL-2003), up to 94.6% across long-context Question-and-Answering tasks (HotpotQA, TriviaQA), and up to 98.8% on long-context Named Entity Recognition (GONERD), while decreasing inference times by up to 57.8%. We find that, for most models on most tasks, performing knowledge distillation is an effective method to yield high-performing efficient attention models with low costs.

Experience

Senior Data Engineer

Saifr (Fidelity Labs)

Boston, MA

January 2024 - current

Data Engineer

Giant Oak Inc.

Arlington, VA

June 2020 - January 2024

Research Assistant, SVCL

University of California San Diego

La Jolla, CA

November 2020 - current

Research Assistant, Wearables Lab

Rice University

Houston, TX

April 2018 - May 2019

Education

Ph.D. Computer Science

University of Massachusetts, Amherst

Amherst, MA

August 2024 - current

M.S. Electrical and Computer Engineering

University of California, San Diego

La Jolla, CA

September 2019 - June 2021

B.S. Computer Science

Rice University

Houston, TX

August 2015 - May 2019

B.S. Electrical Engineering

Rice University

Houston, TX

August 2015 - May 2019