Speculative Decoding is an inference acceleration technique where a smaller, faster "draft" model generates candidate tokens that a larger "target" model then verifies in parallel. This can significantly speed up generation while maintaining the exact output distribution of the target model.
The key insight is that verifying multiple tokens in parallel is faster than generating them one at a time with the large model.