PhD Defense: Interpretability of Deep Models Across different Architectures and Modalities

Talk

Hamid Kazemi

Time:

04.08.2024 10:30 to 12:00

Location:

IRB 5165

URL:

https://talks.cs.umd.edu/talks/3847

The quest for understanding deep models has been a longstanding pursuit in research. Specifically, model inversion aims to uncover the inner workings of a model pertaining to a target class. This process is crucial for interpreting the inner mechanisms of neural architectures, deciphering the acquired knowledge of models, and clarifying their behaviors. However, prevailing techniques in model inversion often depend on complex regularizers like total variation or feature regularization, necessitating meticulous calibration for each network to generate satisfactory images. Presenting Plug-In Inversion, our method relies on a straightforward set of augmentations, sidestepping the need for extensive hyperparameter tuning. We demonstrate the efficacy of our approach by applying it to invert Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs).Utilizing model inversion with CLIP models leads to the creation of images that demonstrate semantic alignment with the provided target prompts. These inverted images offer us an opportunity to delve into different facets of CLIP models, such as their capacity to fuse concepts and their incorporation of gender biases. Particularly noteworthy are occurrences of NSFW (Not Safe For Work) images during model inversion. This phenomenon arises even with prompts that are semantically innocuous, such as "a beautiful landscape," as well as prompts involving celebrity names.While feature visualizations and image reconstructions have provided valuable insights into the workings of Convolutional Neural Networks (CNNs), these methods have struggled to interpret ViT representations due to their inherent complexity. Nevertheless, we demonstrate that with proper application to the appropriate representations, feature visualizations can indeed be successful with ViTs. This newfound understanding enables us to delve visually into ViTs and the information they extract from images.In the realm of image-based tasks, networks have been extensively studied using feature visualization, which generates interpretable images to activate individual feature maps. These visualization techniques aid in comprehending and interpreting what the networks perceive. Specifically, they reveal the semantic meaning of features at different layers, with shallow features representing edges and deeper features denoting objects. Although this approach has proven effective for vision models, our comprehension of networks processing auditory inputs, such as automatic speech recognition (ASR) models, remains limited as their inputs are non-visual. To address this, we explore methods to sonify, rather than visualize, their feature maps.

Upcoming Events

Talk

04.30.2024 10:00 to 12:00

IRB 4105

AI Empowered Music Education
Snehesh Shrestha

Talk

04.30.2024 12:30 to 15:00

IRB 4107

Towards Trustworthy Models in Machine Learning
Xiaoyu Liu

Talk

05.01.2024 14:00 to 16:00

IRB 2137

PhD Proposal: Scaling Policy Gradient Methods to Open-Ended Domains
Ryan Sullivan

Talk

05.01.2024 15:00 to 17:00

IRB IRB-4105

PhD Defense: Feedback for Vision
Michael Maynord

Talk

05.02.2024 12:30 to 14:00

IRB 4107

Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An

Event

05.03.2024 11:00 to 12:00

IRB-4105

Computer Science APT Meeting

Event

05.03.2024 12:00 to 13:30

IRB-4105

Computer Science FFL

Talk

05.03.2024 15:00 to 16:45

IRB 4107

PhD Proposal: Multi-Agent Autonomous Decision Making in Artificial Intelligence
Saptarashmi Bandyopadhyay

Event

05.06.2024 12:00 to 13:00

IRB-2137

Computer Science Department Council Meeting

Talk

05.06.2024 14:00 to 15:00

IRB 4105

EXAMPLE AIDED DESIGN: A PATH TO AUTOMATING EXPRESSIVE VISUALIZATION DESIGN
Hannah Bako