Archive

Speculative Decoding - The Bits and the Bytes!

Towards Robust Mathematical Reasoning

Reverse-Engineered Reasoning for Open-Ended Generation

On the Theoretical Limitations of Embedding-Based Retrieval

Kimi K2: Open Agentic Intelligence

Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check

L1

Matryoshka Quantization

Janus-Pro

DeepSeek-R1

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

DeepSeek-VL2

Gaze-LLE

NVILA

Rotary Position Encoding

PaliGemma 2

Star Attention

AIMv2

JanusFlow

Cut Your Losses in Large-Vocabulary Language Models

The Super Weight in Large Language Models

Depth Pro

A Hitchhiker’s Guide to Scaling Law Estimation

OmniParser for Pure Vision Based GUI Agent

Normalized Transformer

What Matters for Model Merging at Scale?

Agent WorkFlow Memory