Alibaba Unveils Advanced Open-Source AI Models with Image Interpretation Capabilities

Alibaba unveiled its advanced AI model on Friday, designed to comprehend images and facilitate deeper conversations compared to its older models. This is a testament to the growing global competition in the realm of AI technology.

The tech behemoth from China has introduced two open-source models: Qwen-VL and Qwen-VL-Chat. With this move, global researchers, institutions, and corporations can develop their AI applications without the requirement of training their own models, thereby reducing costs and time.

The Qwen-VL model is adept at responding to a variety of questions related to images and can also produce image captions. On the other hand, Qwen-VL-Chat is tailored for intricate interactions. Alibaba highlights its ability to contrast multiple image inputs, answer multiple rounds of inquiries, craft stories, produce images from user-provided photos, and solve mathematical problems represented in images.

For instance, when shown an image of a hospital sign written in Chinese, the AI can provide details about specific hospital sections based on its interpretation of the sign.

Historically, generative AI has been more inclined towards text-based responses. However, newer models like OpenAI’s ChatGPT, similar to Qwen-VL-Chat, can now interpret images and offer textual feedback.

Both of Alibaba’s recent models derive their foundation from Tongyi Qianwen, the company's vast language model launched earlier this year. These Large Language Models (LLMs) undergo training with vast amounts of data and serve as the backbone for chatbot applications.

Earlier this month, the Hangzhou-based Alibaba made two of its AI models open-source. Even though this won't earn them licensing revenue, the open-source approach will likely attract a broader user base to their AI model. This strategy is timely, especially as Alibaba's cloud sector seeks to invigorate its growth in anticipation of its public listing.