Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

看了论文,有个疑问,论文中提到了现在多模态大模型对于UI截图上的所有的标注的区域,猜测他们功能可能不准确,所以你们将UI层级也加入到了prompt中,那自己的APP可以,facebook这种例子,你们怎么获取UI层级的啊?? #7

Open
suzhenyu006 opened this issue Jul 24, 2024 · 5 comments

Comments

@suzhenyu006
Copy link

No description provided.

@TSKGHS17
Copy link
Collaborator

只要是原生安卓应用都可以通过adb dump拿到XML的吧,我们没有对App本身做修改,使用的都是线上版本~

@suzhenyu006
Copy link
Author

哦,我理解错了UI层级的意思了。明白了,多谢~

@kx-kexi
Copy link

kx-kexi commented Sep 4, 2024

请问一下你们利用XML获取的信息除了clickable那些布尔值信息和文本之外,还获取了其他信息了吗?

@TSKGHS17
Copy link
Collaborator

TSKGHS17 commented Sep 4, 2024

请问一下你们利用XML获取的信息除了clickable那些布尔值信息和文本之外,还获取了其他信息了吗?

还有class, bounds等;前者用于写一些规则辅助LLM判断,后者是UI元素的坐标。

@kx-kexi
Copy link

kx-kexi commented Sep 5, 2024

请问一下你们利用XML获取的信息除了clickable那些布尔值信息和文本之外,还获取了其他信息了吗?

还有class, bounds等;前者用于写一些规则辅助LLM判断,后者是UI元素的坐标。

方便详细说一下class规则判断吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants