This is a Model Context Protocol (MCP) server that enables scalable mobile automation through a platform-agnostic interface, eliminating the need for distinct iOS or Android knowledge. This server allows Agents and LLMs to interact with native iOS/Android applications and devices through structured accessibility snapshots or coordinate-based taps based on screenshots.
https://github.com/user-attachments/assets/c4e89c4f-cc71-4424-8184-bdbc8c638fa1
Join us on our journey as we continuously enhance Mobile MCP! Check out our detailed roadmap to see upcoming features, improvements, and milestones. Your feedback is invaluable in shaping the future of mobile automation.
How we help to scale mobile automation:
- 📲 Native app automation (iOS and Android) for testing or data-entry scenarios.
- 📝 Scripted flows and form interactions without manually controlling simulators/emulators or physical devices (iPhone, Samsung, Google Pixel etc)
- 🧭 Automating multi-step user journeys driven by an LLM
- 👆 General-purpose mobile application interaction for agent-based frameworks
- 🤖 Enables agent-to-agent communication for mobile automation usecases, data extraction
- 🚀 Fast and lightweight: Uses native accessibility trees for most interactions, or screenshot based coordinates where a11y labels are not available.
- 🤖 LLM-friendly: No computer vision model required in Accessibility (Snapshot).
- 🧿 Visual Sense: Evaluates and analyses what’s actually rendered on screen to decide the next action. If accessibility data or view-hierarchy coordinates are unavailable, it falls back to screenshot-based analysis.
- 📊 Deterministic tool application: Reduces ambiguity found in purely screenshot-based approaches by relying on structured data whenever possible.
- 📺 Extract structured data: Enables you to extract structred data from anything visible on screen.
Detailed guide for Claude Desktop
{
"mcpServers": {
"mobile-mcp": {
"command": "npx",
"args": ["-y", "@mobilenext/mobile-mcp@latest"]
}
}
}
claude mcp add mobile -- npx -y @mobilenext/mobile-mcp@latest
What you will need to connect MCP with your agent and mobile devices:
- Xcode command line tools
- Android Platform Tools
- node.js
- MCP supported foundational models or agents, like Claude MCP, OpenAI Agent SDK, Copilot Studio
When launched, Mobile MCP can connect to:
- iOS Simulators on macOS/Linux
- Android Emulators on Linux/Windows/macOS
- Physical iOS or Android devices (requires proper platform tools and drivers)
Make sure you have your mobile platform SDKs (Xcode, Android SDK) installed and configured properly before running Mobile Next Mobile MCP.
When you do not have a physical phone connected to your machine, you can run Mobile MCP with an emulator or simulator in the background.
For example, on Android:
- Start an emulator (avdmanager / emulator command).
- Run Mobile MCP with the desired flags
On iOS, you'll need Xcode and to run the Simulator before using Mobile MCP with that simulator instance.
xcrun simctl list
xcrun simctl boot "iPhone 16"
The commands and tools support both accessibility-based locators (preferred) and coordinate-based inputs, giving you flexibility when accessibility/automation IDs are missing for reliable and seemless automation.
- Description: List all the installed apps on the device
-
Parameters:
-
bundleId
(string): The application's unique bundle/package identifier like: com.google.android.keep or com.apple.mobilenotes )
-
- Description: Launches the specified app on the device/emulator
-
Parameters:
-
bundleId
(string): The application's unique bundle/package identifier like: com.google.android.keep or com.apple.mobilenotes )
-
- Description: Terminates a running application
-
Parameters:
-
packageName
(string): Based on the application's bundle/package identifier calls am force stop or kills the app based on pid.
-
- Description: Get the screen size of the mobile device in pixels
- Parameters: None
- Description: Taps on specified screen coordinates based on coordinates.
-
Parameters:
-
x
(number): X-coordinate -
y
(number): Y-coordinate
-
- Description: List elements on screen and their coordinates, with display text or accessibility label.
- Parameters: None
- Description: Taps on a UI element identified by accessibility locator
-
Parameters:
-
element
(string): Human-readable element description (e.g., "Login button") -
ref
(string): Accessibility/automation ID or reference from a snapshot
-
- Description: Taps on specified screen coordinates
-
Parameters:
-
x
(number): X-coordinate -
y
(number): Y-coordinate
-
- Description: Press a button on device (home, back, volume, enter, power button.)
- Parameters: None
- Description: Open a URL in browser on device
-
Parameters:
-
url
(string): The URL to be opened (e.g., "https://example.com").
-
- Description: Types text into a focused UI element (e.g., TextField, SearchField)
-
Parameters:
-
text
(string): Text to type -
submit
(boolean): Whether to press Enter/Return after typing
-
- Description: Performs a swipe gesture from one UI element to another
-
Parameters:
-
startElement
(string): Human-readable description of the start element -
startRef
(string): Accessibility/automation ID of the start element -
endElement
(string): Human-readable description of the end element -
endRef
(string): Accessibility/automation ID of the end element
-
- Description: Performs a swipe gesture between two sets of screen coordinates
-
Parameters:
-
startX
(number): Start X-coordinate -
startY
(number): Start Y-coordinate -
endX
(number): End X-coordinate -
endY
(number): End Y-coordinate
-
- Description: Presses hardware keys or triggers special events (e.g., back button on Android)
-
Parameters:
-
key
(string): Key identifier (e.g., HOME, BACK, VOLUME_UP, etc.)
-
- Description: Captures a screenshot of the current device screen
- Parameters: None
- Description: Fetches the current device UI structure (accessibility snapshot) (xml format)
- Parameters: None