The Allen Institute for Artificial Intelligence (Ai2) has released a new family of open-source multimodal language models called Molmo.
AI2 presents Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
– Presents a new family of VLMs that are SotA in their class of openness
– Compares favorably against GPT-4o and Claude on both academic benchmarks and human evaluation
proj:… pic.twitter.com/fZMgE05ArR
— Aran Komatsuzaki (@arankomatsuzaki) September 26, 2024
The largest Molmo model, with 72 billion parameters, outperforms OpenAI’s GPT-4 in various tests assessing capabilities such as understanding images, charts, and documents. Even its smaller variant, with 7 billion parameters, closely matches the state-of-the-art performance of OpenAI’s models.
OpenAI o1 isn’t just another step forward, it’s the dawn of a new era with a next-level AI called LRM (Large Reasoning Model).
8 things you may have missed: pic.twitter.com/80sK2ly3Ue
— Min Choi (@minchoi) September 24, 2024
Ai2 attributes the impressive performance of these models to their strategy of training them on high-quality, curated data rather than vast, indiscriminate data sets. Ani Kembhavi, a senior director of research at Ai2, explains that Molmo models are trained on a carefully selected data set of only 600,000 images, which significantly reduces the noise and hallucinations often seen in models trained on larger, less curated datasets. Ali Farhadi, the CEO of Ai2, believes this shows that open-source AI development can now compete with closed, proprietary models.
The open nature of Molmo offers a considerable advantage, allowing developers to build and innovate upon the model freely.
Allen Institute for AI – @allen_ai – launches open #multimodal models |
"…according to Ai2, its 72B Molmo model is on par with the @OpenAI GPT 4o and @Google Gemini 1.5 proprietary large language models (#LLMs) in terms of performance."https://t.co/AYKBEFFVAO#GenerativeAI
— Bob E. Hayes (@bobehayes) September 25, 2024
Although some parts of the most powerful Molmo model remain restricted, most of the model is available for tinkering on the Hugging Face website.
Ai2’s data strategy drives innovation
Molmo introduces the ability to “point” at elements within an image, offering a significant advance in image analysis capabilities. In a demonstration, the model accurately described elements within a photo of the Seattle marina near Ai2’s office, identifying and counting objects like deck chairs, although it was not perfect in all tasks. Percy Liang, director of the Stanford Center for Research on Foundation Models, notes that training on high-quality data can indeed lower computing costs.
This efficiency was achieved by employing human annotators to describe the images in the training set in detail, then converting their speech to data using AI techniques, which expedited the training process and reduced computational demands. Farhadi and other experts, including Yacine Jernite from Hugging Face, who was not involved in the research, see the real significance of Molmo in the applications and improvements that will emerge from its open-source availability. They hope that such models will drive further innovation and efficient use of resources in the AI field.
In conclusion, Ai2’s Molmo models not only demonstrate exceptional performance but also embody the potential for efficient, impactful AI development in an open-source environment. The release of Molmo is significant because it democratizes access to advanced AI technology, making it accessible to developers and researchers who might not have the resources of large tech companies.