Yang said she and her colleagues propose a “model-over-models” approach to LLM development. That calls for a decentralised paradigm in which developers train smaller models across thousands of specific domains, including code generation, advanced data analysis and specialised AI agents.
These smaller models would then evolve into a large and comprehensive LLM, also known as a foundation model. Yang pointed out that this approach could reduce the computational demands at each stage of LLM development.
That paradigm can make LLM development more accessible to university labs and small firms, according to Yang. An evolutionary algorithm then evolves over these domain-specific models to eventually build a comprehensive foundation model, she said.
Successfully initiating such LLM development in Hong Kong would count as a big win for the city, as it looks to turn into an innovation and technology hub.
According to Yang, her team has already verified that small AI models, once put together, can outperform the most advanced LLMs in specific domains.
“There is also a growing consensus in the industry that with high-quality, domain-specific data and continuous pretraining, surpassing GPT-4/4V is highly achievable,” she said. The multimodal GPT-4/4V analyses image inputs provided by a user, and is the latest capability OpenAI has made broadly available.
Yang said the next step is to build a more inclusive infrastructure platform to attract more talent into the AI community, so that some releases can be made by the end of this year or early next year.
“In the future, while a few cloud-based large models will dominate, small models across various domains will also flourish,” she said.
Yang, who received her PhD from Duke University in North Carolina, has published more than 100 papers in top-tier conferences and journals, and holds more than 50 patents in the US and mainland China. She played a key role in developing Alibaba’s 10-trillion-parameter M6 multimodal AI model.