Aakash Nain
Blog
Archive
Resources
Annotated Research Papers
Kaggle Notebooks
TF-JAX Tutorials
Diffusion Models Tutorials
Archive
L1
Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Mar 10, 2025
Matryoshka Quantization
Feb 14, 2025
Janus-Pro
Unified Multimodal Understanding and Generation with Data and Model Scaling
Jan 28, 2025
DeepSeek-R1
Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Jan 21, 2025
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Jan 20, 2025
DeepSeek-VL2
Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Dec 17, 2024
Gaze-LLE
Gaze Target Estimation via Large-Scale Learned Encoders
Dec 16, 2024
NVILA
Efficient Frontier Visual Language Models
Dec 13, 2024
Rotary Position Encoding
A figure among cyphers: Part-1
Dec 10, 2024
PaliGemma 2
A Family of Versatile VLMs for Transfer
Dec 9, 2024
Star Attention
Efficient LLM Inference over Long Sequences
Dec 2, 2024
AIMv2
Multimodal Autoregressive Pre-training of Large Vision Encoders
Nov 27, 2024
JanusFlow
Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Nov 25, 2024
Cut Your Losses in Large-Vocabulary Language Models
Nov 20, 2024
The Super Weight in Large Language Models
Nov 13, 2024
Depth Pro
Nov 8, 2024
A Hitchhiker’s Guide to Scaling Law Estimation
Nov 4, 2024
OmniParser for Pure Vision Based GUI Agent
Oct 28, 2024
Normalized Transformer
Oct 23, 2024
What Matters for Model Merging at Scale?
Oct 15, 2024
Agent WorkFlow Memory
Sep 24, 2024
No matching items