The next frontier of artificial intelligence is moving beyond simple text generation toward 'agentic' behavior—AI that can see, understand, and interact with software interfaces just as a human does. Mininglamp (2718.HK) has signaled a significant leap in this direction with the launch of Mano-P, an open-source Vision-Language-Action (VLA) model specifically designed to operate directly on edge devices. By shifting the computational burden from the cloud to the user’s local hardware, the project addresses the twin hurdles of data privacy and the prohibitive costs of persistent AI reasoning.
While industry giants like Anthropic and OpenAI have focused on cloud-based computer use, Mano-P prioritizes the edge, optimized specifically for Apple’s M-series silicon. The model’s 72-billion parameter version recently claimed the top spot on the OSWorld benchmark for specialized models, achieving a 58.2% success rate. This outperforms significantly larger general-purpose models, proving that specialized, smaller architectures can be more effective for complex graphical user interface (GUI) tasks than their monolithic cloud-based counterparts.
To make a large model viable on a standard laptop, Mininglamp utilized advanced visual token pruning and mixed-precision quantization. The model, referred to as Mano-P 4B, can run on a MacBook Pro with a peak memory footprint of just 4.3GB while maintaining a lightning-fast pre-filling speed of 476 tokens per second. This efficiency is achieved through 'GSPruning,' a technique that allows the AI to focus only on critical UI elements like buttons and input fields, discarding nearly 87% of redundant visual data without sacrificing accuracy.
Beyond technical specifications, the move to on-device agents represents a fundamental shift in the economics of AI. Cloud-based assistants are often limited by API costs; a truly proactive assistant that checks your email or calendar every few minutes would be too expensive to maintain at scale. By running locally, Mano-P allows for 'infinite' autonomous cycles at no extra cost to the provider, enabling the kind of 24/7 proactive digital companionship that cloud models currently find commercially unfeasible.
Mininglamp has adopted a strategy of radical transparency by releasing Mano-P under the Apache 2.0 license. This three-phase open-source roadmap includes the release of the model weights, the Python SDK, and eventually the full training methodology. This approach is clearly designed to foster a developer ecosystem that can integrate these 'AI hands' into specialized workflows for industries like finance, law, and healthcare, where the leakage of screen data to external servers remains a non-negotiable security risk.
