Aakash Kumar Nain
  • CV
  • Blog
  • Archive
  • Resources
    • Annotated Research Papers
    • Kaggle Notebooks
    • TF-JAX Tutorials
    • Diffusion Models Tutorials

Archive

L1

Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Mar 10, 2025

Matryoshka Quantization

Feb 14, 2025

Janus-Pro

Unified Multimodal Understanding and Generation with Data and Model Scaling
Jan 28, 2025

DeepSeek-R1

Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Jan 21, 2025

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Jan 20, 2025

DeepSeek-VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Dec 17, 2024

Gaze-LLE

Gaze Target Estimation via Large-Scale Learned Encoders
Dec 16, 2024

NVILA

Efficient Frontier Visual Language Models
Dec 13, 2024

Rotary Position Encoding

A figure among cyphers: Part-1
Dec 10, 2024

PaliGemma 2

A Family of Versatile VLMs for Transfer
Dec 9, 2024

Star Attention

Efficient LLM Inference over Long Sequences
Dec 2, 2024

AIMv2

Multimodal Autoregressive Pre-training of Large Vision Encoders
Nov 27, 2024

JanusFlow

Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Nov 25, 2024

Cut Your Losses in Large-Vocabulary Language Models

Nov 20, 2024

The Super Weight in Large Language Models

Nov 13, 2024

Depth Pro

Nov 8, 2024

A Hitchhiker’s Guide to Scaling Law Estimation

Nov 4, 2024

OmniParser for Pure Vision Based GUI Agent

Oct 28, 2024

Normalized Transformer

Oct 23, 2024

What Matters for Model Merging at Scale?

Oct 15, 2024

Agent WorkFlow Memory

Sep 24, 2024
No matching items