About

I'm an AI systems engineer at Morgan Stanley, owning the architecture of production LLM agent systems in a regulated environment. The core constraint: a confident wrong answer is worse than no answer. Every technical decision flows from that.
Most of what I know came from things that failed instructively. I shipped a model that beat every benchmark and degraded for real users. Diagnosing that — and rebuilding the evaluation framework around it — changed how I think about what it means for a system to actually be good. That framework became the org-wide standard.
I care most about systems that outlast the original deployment — documented clearly enough that teams I've never worked with can build on them. Three of the AI systems I've built are being used that way now.
How I Build
Evaluation is an architecture problem, not a QA step.
I decide what to measure before writing a line of code. A benchmark that doesn't correlate with real user outcomes isn't a safety net — it's a false one.
Confident wrong answers are worse than no answer.
In regulated environments, routing to a human beats returning a low-confidence response. I've built confidence-gated routing as a deliberate first-class decision, calibrated against production data — not bolted on as a fallback.
Backend reliability is what makes AI systems trustworthy.
The LLM is usually the easiest part to swap. What determines whether users can actually rely on the system is everything around it — pipelines, observability, retrieval, output enforcement.
Every system should outlast the original deployment.
I document failure modes after every production build — not just the architecture, but the specific places where the design breaks. That's what makes a system reusable rather than a one-off.
Experience
Nov 2023 — Present Senior Software Engineer II — AI Systems & Infrastructure · Morgan Stanley
Technical owner of production LLM agent systems in a regulated financial environment — orchestration architecture, evaluation infrastructure, and the patterns other teams build on.
- ▹Designed a multi-step agent with confidence-gated routing: uncertain outputs route to a human analyst rather than returning a low-confidence answer. Zero compliance incidents.
- ▹Rebuilt the org's LLM evaluation framework after shipping a model that beat every benchmark and degraded for real users. Rebuilt scoring around production outcome signals — now the standard for every model release.
- ▹The RAG architecture I built has been adopted as the baseline by three downstream teams, packaged with failure modes so they didn't have to start from scratch.
- ▹Co-architect with enterprise engineering teams when delivery is broken. The structural fix from one recovery was institutionalized across all enterprise accounts.
- LangGraph
- Python
- AWS Bedrock
- Kafka
- Terraform
Feb 2023 — Nov 2023 Software Engineer I — AI Systems & Automation · Morgan Stanley
Shipped the org's earliest production LLM deployments. Built the foundational infrastructure — pipelines, evaluation tooling, human feedback systems — that became the reference for teams that followed.
- ▹Led one of the org's first production LLM deployments: a RAG-enhanced pipeline for legacy code modernization with validation infrastructure built from scratch.
- ▹Built a human feedback pipeline from zero prior background in six weeks — preference schema, reward modeling, PPO integration. Adopted by multiple downstream teams; wrote the onboarding doc that became the standard reference.
- ▹Cut a three-week model selection process to three days by identifying the one decision variable that mattered and designing a targeted experiment around it.
- LangChain
- Python
- AWS Lambda
- RLHF
- RAG
Jul 2021 — Sep 2022 Regional Associate · Accelerator Intern · Hult Prize Foundation
Backend infrastructure for a global competition platform — distributed systems under real surge conditions.
- ▹Engineered a fault-tolerant distributed event system for real-time load surges. The gap between load testing and what users actually create at peak is a lesson I've carried into every production system since.
- Node.js
- PostgreSQL
- Distributed Systems
Jan 2021 — May 2021 AI Engineering Intern · IoTIoT.in
Built a real-time, device-agnostic gesture recognition framework using ML-driven motion tracking.
- ▹Developed motion tracking algorithms to improve input reliability across diverse hardware.
- ▹Built parallelized training pipelines to reduce latency without sacrificing classification accuracy.
- Python
- TensorFlow
- Signal Processing
Jun 2020 — Dec 2020 Backend & ML Intern · MediaPro Innovations
Applied ML-driven content filtering and behavior analysis to improve engagement on an ed-tech platform.
- ▹Built content filtering on user behavior patterns to surface more relevant learning materials.
- ▹Improved backend efficiency through caching and indexing, supporting a growing user base.
- Python
- Machine Learning
- Backend
Projects
Citation-Grounded Knowledge Agent
A multi-step AI agent built around one constraint: every answer must trace back to source — enforced at the generation layer, not bolted on after.
- ▹Confidence-gated abstention: if retrieval isn't confident enough to support a grounded answer, the system returns nothing rather than extrapolating.
- ▹Multi-model tiering routes each query to the cheapest model that clears the quality bar — inference cost as a first-class concern from day one.
- LangGraph
- AWS Bedrock
- pgvector
- LangSmith
- Python
LLM Evaluation Framework — Production Rebuild
A model can improve on every tracked metric, ship to users, and be worse. This is what I built after that happened — a rebuild of how evaluation actually works.
- ▹Pulled production outcome signals and ran correlation analysis against every benchmark task. Rebuilt scoring around what actually predicted user outcomes, with a mandatory human judgment gate at each release.
- ▹Adopted org-wide — not because it was mandated, but because every team building LLM features eventually hits the same benchmark-vs-production divergence. The framework solved it once.
- Python
- LLM Evaluation
- Statistical Analysis
Aarogya — Privacy-Preserving Mental Health Risk Detection
An NLP pipeline for early detection of depression and suicide-risk signals — privacy as an architectural constraint, not a compliance checkbox. The hardest problem was defining what a detection system that respects user dignity looks like at the architecture level.
- Python
- NLP
- Privacy-Preserving ML
- AWS
Diverting Public Complaints Based on Textual Analysis
Built a text-classification pipeline to route financial/public complaints to the right department using comparative ML experiments and evaluation-driven iteration.
- Python
- Machine Learning
- Text Classification
Text Summarization Using Sentiment Analysis
Implemented sentiment-driven summarization on customer review data, integrating web-scraped inputs with supervised ML baselines.
- Python
- NLP
- Machine Learning
Survey Masters Website
Designed and built a full-stack survey platform with authenticated workflows for creating surveys and collecting responses.
- Web
- JavaScript
- CSS
- Backend
Deep Learning for Satellite Imaging
Produced a deep-learning research report exploring satellite-imaging use cases, modeling approaches, and practical deployment constraints.
- Deep Learning
- Computer Vision
- Research
Question Paper Generator
Built a web-based application concept for generating structured question papers with configurable templates and sections.
- Web
- Automation
- Product Design
Stack & Tools
AI & Agent Systems
- LangGraph
- LangChain
- AWS Bedrock
- RAG Pipeline Design
- LLM Evaluation & Outcome-Aligned Scoring
- RLHF & Preference Data
- Confidence Routing & Abstention
- Structured Output Enforcement
- Embedding Model Evaluation
- LangSmith
- Multi-Step Agent Orchestration
Backend & Infrastructure
- Python
- TypeScript
- Node.js
- Java
- SQL
- REST APIs
- GraphQL
- AWS Lambda
- Step Functions
- API Gateway
- Kinesis
- DynamoDB
- S3
- Aurora
- Apache Kafka
- Apache Flink
- Terraform
- Docker
- GitHub Actions CI/CD
- PostgreSQL
- pgvector
- Pinecone
- Redis
- MongoDB
Certifications
- AWS Certified Machine Learning Specialty
Research
- 2022
Context-Enriched Machine Learning-Based Approach for Sentiment Analysis
Research Publication · Apr 2022
- 2020
Comprehensive Review of Text-Mining Applications in Finance
Q1 Journal · Nov 2020
- 2020
Interplay of Machine Learning and Software Engineering for Quality Estimations
Research Publication · Nov 2020
- 2020
BioUAV: Blockchain Framework for Digital Identification in Next-Gen UAVs
Research Publication · Sep 2020
- 2020
Comparative Study of Sentiment Analysis and Text Summarization for Commercial Social Networks
Research Publication · Jul 2020
Education
- Buffalo, NY
University at Buffalo, SUNY
M.S., Computer Science
- Ahmedabad, India
Nirma University
B.Tech, Computer Engineering