FastLongSpeech: 30x Compression That Doesn't Murder Your Context
Speech models are having a moment and seems like they’re here to stay. They can transcribe your rambling, understand your questions, and even tell when you're being sarcastic. But ask them to process anything longer than a TikTok video and they straight-up collapse.
The problem? Speech can