https://www.arxiv-summary.com/posts/2212.10846/
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models