Exclusive | PolyU’s top AI scientist Yang Hongxia seeks to revolutionise LLM development in Hong Kong

At present, Yang said LLM development has mostly relied on deploying advanced and expensive graphics processing units (GPUs), from the likes of Nvidia and Advanced Micro Devices, in data centres for projects involving vast amounts of raw data, which has put deep-pocketed Big Tech companies and well-funded start-ups at a major advantage.
The entrance to the Hung Hom campus of Hong Kong Polytechnic University, where artificial intelligence scientist Yang Hongxia serves as a professor at the Department of Computing. Photo: Sun Yeung

Yang said she and her colleagues propose a “model-over-models” approach to LLM development. That calls for a decentralised paradigm in which developers train smaller models across thousands of specific domains, including code generation, advanced data analysis and specialised AI agents.

These smaller models would then evolve into a large and comprehensive LLM, also known as a foundation model. Yang pointed out that this approach could reduce the computational demands at each stage of LLM development.

Domain-specific models that are typically capped at 13 billion parameters – a machine-learning term for variables present in an AI system during training, which helps establish how data prompts yield the desired output – can deliver performance that is on par or exceeds OpenAI’s latest GPT-4 models, while using far fewer GPUs from around 64 to 128 cards.

That paradigm can make LLM development more accessible to university labs and small firms, according to Yang. An evolutionary algorithm then evolves over these domain-specific models to eventually build a comprehensive foundation model, she said.

Successfully initiating such LLM development in Hong Kong would count as a big win for the city, as it looks to turn into an innovation and technology hub.

Yang Hongxia, a leading artificial intelligence scientist, previously worked on AI models at TikTok-owner ByteDance in the United States and Alibaba Group Holding’s research arm Damo Academy. Photo: PolyU
Hong Kong’s dynamic atmosphere, as well as its access to AI talent and resources, make the city an ideal place to conduct research into this new development paradigm, Yang said. She added that PolyU president Teng Jin-guang shares this vision.

According to Yang, her team has already verified that small AI models, once put together, can outperform the most advanced LLMs in specific domains.

“There is also a growing consensus in the industry that with high-quality, domain-specific data and continuous pretraining, surpassing GPT-4/4V is highly achievable,” she said. The multimodal GPT-4/4V analyses image inputs provided by a user, and is the latest capability OpenAI has made broadly available.

Yang said the next step is to build a more inclusive infrastructure platform to attract more talent into the AI community, so that some releases can be made by the end of this year or early next year.

“In the future, while a few cloud-based large models will dominate, small models across various domains will also flourish,” she said.

Yang, who received her PhD from Duke University in North Carolina, has published more than 100 papers in top-tier conferences and journals, and holds more than 50 patents in the US and mainland China. She played a key role in developing Alibaba’s 10-trillion-parameter M6 multimodal AI model.

Read original article here

Denial of responsibility! Pioneer Newz is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a Comment