A group of engineers, researchers and a Silicon Valley-based chip company collaborated to release advanced Arabic language software that can power generative AI applications.
The new large language model called Jais contains 13 billion parameters that was made from a big batch of data combining Arabic and English, a portion of which is from computer code. The group, which included academics and engineers embarked, on the project in part because they said there are few large language models that are bilingual.
The new language model was created with the help of supercomputers produced by the Silicon Valley-based Cerebras Systems, which designs dinner plate-sized chips that compete with Nvidia’s (NVDA.O) powerful AI hardware. Nvidia’s chips are in short supply,which has driven companies around the world to seek alternatives.
Named after the highest peak in the United Arab Emirates, Jais is a collaboration between Cerebras, Mohamed bin Zayed University of Artificial Intelligence and a subsidiary of the Abu Dhabi-based tech conglomerate G42 called Inception, which focuses on AI.
Because there is not enough Arabic data to train a model of Jais’ size, the computer code within the English language data helped train the model’s ability to reason, according to Mohamed bin Zayed University of Artificial Intelligence professor Timothy Baldwin.
“(Code) gives the model a big leg up in terms of reasoning abilities, because it spells out the (logical) steps,” Baldwin told Reuters.
Jais will be available via an open source license.
The group trained the Jais model on a Cerebras’ supercomputer called a Condor Galaxy. This year Cerebras announced it had sold three such units to G42, with the first scheduled to arrive this year and the remaining units to be delivered in 2024.