https://www.arxiv-summary.com/posts/2212.08136/
Efficient Long Sequence Modeling via State Space Augmented Transformer