Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent mode does not use streaming output, so the conversation experience seems slow #2280

Closed
2 tasks done
lzk90s opened this issue Jan 30, 2024 · 3 comments
Closed
2 tasks done
Labels
💪 enhancement New feature or request

Comments

@lzk90s
Copy link

lzk90s commented Jan 30, 2024

Self Checks

Provide a description of the help you need

assistant_cot_runner.py

  # update prompt messages
        prompt_messages = self._originze_cot_prompt_messages(
            mode=app_orchestration_config.model_config.mode,
            prompt_messages=prompt_messages,
            tools=prompt_messages_tools,
            agent_scratchpad=agent_scratchpad,
            agent_prompt_message=app_orchestration_config.agent.prompt,
            instruction=app_orchestration_config.prompt_template.simple_prompt_template,
            input=query
        )

        # recale llm max tokens
        self.recale_llm_max_tokens(self.model_config, prompt_messages)
        # invoke model
        llm_result: LLMResult = model_instance.invoke_llm(
            prompt_messages=prompt_messages,
            model_parameters=app_orchestration_config.model_config.parameters,
            tools=[],
            stop=app_orchestration_config.model_config.stop,
            stream=False,                ////////////////////// stream false
            user=self.user_id,
            callbacks=[],
        )

Is there any good way to improve the experience?

Copy link

dosubot bot commented Jan 30, 2024

🤖

Hey @lzk90s, good to see you again! Hope you're doing well. Let's dive into this issue you've raised.

To improve the user experience in the agent mode of dify, you could consider implementing a streaming output. This would allow the application to send data in a continuous flow, making it appear faster and more responsive to the user.

In the context of your application, you could modify the invoke_llm method to support streaming. Here's a conceptual example of how you could do this:

llm_result: LLMResult = model_instance.invoke_llm(
    prompt_messages=prompt_messages,
    model_parameters=app_orchestration_config.model_config.parameters,
    tools=[],
    stop=app_orchestration_config.model_config.stop,
    stream=True,  # Enable streaming
    user=self.user_id,
    callbacks=[],
)

In this example, setting stream=True would enable streaming output. However, please note that this would require modifications to the invoke_llm method and the underlying model to support streaming.

Also, keep in mind that streaming output can increase the complexity of your application and may require additional resources, such as more memory or CPU. Therefore, it's important to carefully consider the trade-offs before implementing this feature.

If you need further assistance or have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the 💪 enhancement New feature or request label Jan 30, 2024
@Yeuoly
Copy link
Collaborator

Yeuoly commented Jan 30, 2024

When using a model without function calling ability, ReACT mode is what Agent is going to use, it takes a while to wait for a full response so that we can parse an action from it. However, not all model supports it, we have already adapted almost all models for function calling in latest version in main branch, you can check if there is model you want.

@lzk90s
Copy link
Author

lzk90s commented Jan 31, 2024

When using a model without function calling ability, ReACT mode is what Agent is going to use, it takes a while to wait for a full response so that we can parse an action from it. However, not all model supports it, we have already adapted almost all models for function calling in latest version in main branch, you can check if there is model you want.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants