Discover an innovative approach to mobile UI agents with a cutting-edge solution from Tsinghua University that leverages the power of Small Language Models (SLMs) to automate tasks on-device. Our method addresses the privacy and cost concerns associated with large language models (LLMs) by offering a domain-specific, compact model trained with high-quality data. This breakthrough transforms the UI task automation challenge into a code generation problem, efficiently tackled by an SLM and executed with an on-device code interpreter. Our document-centered strategy automatically constructs detailed API documentation for each app, creating diverse task samples to guide the agent in learning to generate accurate and efficient scripts for unseen tasks. Experience the future of mobile UI interactions with our solution, boasting significantly higher success rates, lower latency, and reduced token consumption compared to state-of-the-art mobile UI agents. Stay ahead with our open-source code, set to revolutionize the field.
Large language models (LLMs) have brought exciting new advances to mobile UI agents, a long-standing research field that aims to complete arbitrary natural language tasks through mobile UI interactions. However, existing UI agents usually demand high reasoning capabilities of powerful large models that are difficult to be deployed locally on end-users' devices, which raises huge concerns about user privacy and centralized serving cost. One way to reduce the required model size is to customize a smaller domain-specific model with high-quality training data, e.g. large-scale human demonstrations of diverse types of apps and tasks, while such datasets are extremely difficult to obtain. Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter. Unlike normal coding tasks that can be extensively pretrained with public datasets, generating UI automation code is challenging due to the diversity, complexity, and variability of target apps. Therefore, we adopt a document-centered approach that automatically builds fine-grained API documentation for each app and generates diverse task samples based on this documentation. By guiding the agent with the synthetic documents and task samples, it learns to generate precise and efficient scripts to complete unseen tasks. Based on detailed comparisons with state-of-the-art mobile UI agents, our approach effectively improves the mobile task automation with significantly higher success rates and lower latency/token consumption. Code will be open-sourced.
The first emotionally intelligent AI. Hi, I'm Pi. I'm your personal AI, designed to be supportive, smart, and there for you anytime. Ask me for advice, for answers, or let's talk about whatever's on your mind.
Pulse is an AI-powered app designed to offer a chat-like interface that acts as a supportive companion, helping users talk about their feelings, organize their thoughts, and engage in mental health exercises. PulseAI aims to enable people to grow and better understand themselves through reflecting. Dive into intuitive conversations with our advanced AI, ready to understand and resonate with your feelings. Whether it's a moment of joy or a challenging day, PulseAI is here to chat. Set up weekly chat sessions, ensuring you have a dedicated time to check in with your emotions and well-being. It's like a coffee date, but with your mind. Engage in curated exercises designed to boost your mental clarity, resilience, and overall well-being. A mini-workout for your mind!
Woebot Health offers AI-driven mental health support through a conversational agent, providing users with tools for mood tracking, cognitive behavioral therapy techniques, and mindfulness practices.