Cross-domain prompting
Prompting is not just a gateway for leveraging text-based models like LLMs but also provides an extremely powerful way of interacting with vision, audio, as well as multi-modal models. The general prompting strategies discussed earlier in the chapter are applicable in other domains as well. It is important to design prompts that are clear and specific, are composed of simpler, well-defined tasks rather than one big complex task, make use of contextual information, and provide examples wherever possible.
Apart from these, non-text-based model prompting also benefits from:
- Clear specification of the output format; for instance, it is helpful to state if we are expecting the response to be in Markdown, JSON, and so on.
- Pay attention to the recommended order of image and text for multi-modal models. For instance, models such as Gemini by Google seem to perform better if the image is placed before the textual prompt.
- Negative prompts are...