Qwen/Qwen3.6-35B-A3B on vLLM/Ray TP=2 passed ~256K prompt tokens with 5/5 recall at ~2,098 effective tok/s.qwen36-official. Critical launch fix: --default-chat-template-kwargs '{"enable_thinking": false}'.nvidia/Llama-4-Scout-17B-16E-Instruct-NVFP4. Native config advertises massive context (text_config.max_position_embeddings = 10485760). The model itself is a strong architectural candidate; the runtime backend was the limiting factor.Qwen3 30B A3B Instruct 2507 GGUF Q4_K_M.HauhauCS Qwen3.6 GGUF Q4_K_M path.Llama4ForConditionalGeneration.