Agent mode does not use streaming output, so the conversation experience seems slow #2280

lzk90s · 2024-01-30T06:00:46Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Provide a description of the help you need

assistant_cot_runner.py

  # update prompt messages
        prompt_messages = self._originze_cot_prompt_messages(
            mode=app_orchestration_config.model_config.mode,
            prompt_messages=prompt_messages,
            tools=prompt_messages_tools,
            agent_scratchpad=agent_scratchpad,
            agent_prompt_message=app_orchestration_config.agent.prompt,
            instruction=app_orchestration_config.prompt_template.simple_prompt_template,
            input=query
        )

        # recale llm max tokens
        self.recale_llm_max_tokens(self.model_config, prompt_messages)
        # invoke model
        llm_result: LLMResult = model_instance.invoke_llm(
            prompt_messages=prompt_messages,
            model_parameters=app_orchestration_config.model_config.parameters,
            tools=[],
            stop=app_orchestration_config.model_config.stop,
            stream=False,                ////////////////////// stream false
            user=self.user_id,
            callbacks=[],
        )

Is there any good way to improve the experience?

The text was updated successfully, but these errors were encountered:

dosubot · 2024-01-30T06:01:07Z

🤖

Hey @lzk90s, good to see you again! Hope you're doing well. Let's dive into this issue you've raised.

To improve the user experience in the agent mode of dify, you could consider implementing a streaming output. This would allow the application to send data in a continuous flow, making it appear faster and more responsive to the user.

In the context of your application, you could modify the invoke_llm method to support streaming. Here's a conceptual example of how you could do this:

llm_result: LLMResult = model_instance.invoke_llm(
    prompt_messages=prompt_messages,
    model_parameters=app_orchestration_config.model_config.parameters,
    tools=[],
    stop=app_orchestration_config.model_config.stop,
    stream=True,  # Enable streaming
    user=self.user_id,
    callbacks=[],
)

In this example, setting stream=True would enable streaming output. However, please note that this would require modifications to the invoke_llm method and the underlying model to support streaming.

Also, keep in mind that streaming output can increase the complexity of your application and may require additional resources, such as more memory or CPU. Therefore, it's important to carefully consider the trade-offs before implementing this feature.

If you need further assistance or have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Yeuoly · 2024-01-30T11:54:14Z

When using a model without function calling ability, ReACT mode is what Agent is going to use, it takes a while to wait for a full response so that we can parse an action from it. However, not all model supports it, we have already adapted almost all models for function calling in latest version in main branch, you can check if there is model you want.

lzk90s · 2024-01-31T05:51:58Z

When using a model without function calling ability, ReACT mode is what Agent is going to use, it takes a while to wait for a full response so that we can parse an action from it. However, not all model supports it, we have already adapted almost all models for function calling in latest version in main branch, you can check if there is model you want.

thanks

dosubot bot added the 💪 enhancement New feature or request label Jan 30, 2024

takatost closed this as completed Jan 31, 2024

dosubot bot mentioned this issue Apr 4, 2024

How to make the conversational application created using dify use streaming output? #3100

Closed

3 tasks

dosubot bot mentioned this issue Sep 14, 2024

chat-messages API is taking more time compared to test run in dify #8455

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent mode does not use streaming output, so the conversation experience seems slow #2280

Agent mode does not use streaming output, so the conversation experience seems slow #2280

lzk90s commented Jan 30, 2024

dosubot bot commented Jan 30, 2024 •

edited

Loading

Yeuoly commented Jan 30, 2024

lzk90s commented Jan 31, 2024

Agent mode does not use streaming output, so the conversation experience seems slow #2280

Agent mode does not use streaming output, so the conversation experience seems slow #2280

Comments

lzk90s commented Jan 30, 2024

Self Checks

Provide a description of the help you need

dosubot bot commented Jan 30, 2024 • edited Loading

Sources

Yeuoly commented Jan 30, 2024

lzk90s commented Jan 31, 2024

dosubot bot commented Jan 30, 2024 •

edited

Loading