Paper accepted · ICML 2026

ShlokChannawar

AI Safety & Interpretability · Penn State

Junior at Penn State studying Applied Data Science. I work on interpretability and safety — trying to figure out what's actually happening inside language models.

scroll
01About
Background

I started college as a mechanical engineering major before switching to Applied Data Science at Penn State's College of IST. That pivot led me toward AI research — specifically interpretability and safety, trying to understand what's actually going on inside language models.

Recently co-first authored a paper on whether geometric properties of SAE decoder vectors can predict feature steerability — accepted to the ICML 2026 Mechanistic Interpretability Workshop. Now working on two new threads: applying interpretability methods to understand how models handle private information, and building practical mech interp tooling for finance. Also attending BlueDot's AI Safety program — thinking carefully about alignment and what it actually takes to make these systems safe.

Outside of research, I play poker with friends, play chess, listen to a lot of music, and just hang out. Originally from Nagpur, India. I also love astronomy and astrophotography — you can see some of my shots here.

02Research & Projects
01

Look Before You Steer

Geometry Predicts SAE Feature Steerability

ICML 2026 · MI Workshop

Algoverse AI Research · 2025 – 2026

Can geometric properties of SAE features predict how steerable they are — before you ever run a steering experiment? We show that decoder-space geometry (neighbor density, max cosine similarity) ranks features by steering cost, replicating across Gemma-2 scales, SAE widths, and architectures.

SAEMechanistic InterpretabilityPythonPyTorch
02

Contextual Privacy

NLA Probing for Contextual Integrity

In Progress

Python · PyTorch · June 2026 – Present

Exploring how language models internally represent contextual privacy norms using Natural Language Autoencoders — and whether models “know” when information sharing violates context-appropriate norms.

InterpretabilityPrivacyPythonPyTorch
03

Mechanistic Interpretability for Finance

Exploratory

Applying mechanistic interpretability tools to finance applications.

Mechanistic InterpretabilityFinance
03Reading Log

Papers I've been reading

Notes on what they do and why they matter. Click any entry to expand.

04Get in Touch

Always happy to talk interpretability, safety, or anything in between.