🎙 Listen to the full conversation
小宇宙 / Apple Podcasts / Spotify → 「离线时间」This is the third episode of Stolen Chat. Wesley had just flown in from Abu Dhabi. A month earlier, he'd lived through Iran's missile strikes on the UAE — air raid sirens every thirty minutes, the sound of intercepted missiles landing like earthquakes. He does voice AI research at MBZUAI, the Middle East's dedicated AI university, and is incubating an Arabic-first Voice AI project. After this conversation, my understanding of Middle East AI was completely rewritten.
What it's like to work under missile fire
On February 28th, a missile alert woke Wesley from his afternoon nap. The first week was the most intense: triple-digit drones and double-digit missiles daily. His apartment was 10 kilometers from a military base in a straight line. Interceptor debris fell at random — the Burj Al Arab caught fire and made the news, but residential areas were hit too. The closest debris landed two kilometers away, at the university.
He and a few classmates drove east to the Oman border that night. The second week, he went back to Abu Dhabi and returned to work. The school was humane — hybrid work, immediate survey to check on everyone.
What surprised him: business barely stopped. During the second week of the war, he had a meeting with BCG. He emailed asking whether to postpone. Their reply: no need, proceed as normal.
"Short-term, operations keep running. Long-term, what gets hit is confidence — and confidence affects the flow of capital and talent."
What the Middle East's AI university actually is
MBZUAI — Mohammed bin Zayed University of Artificial Intelligence — is named after the UAE's president. Founded in 2020, first students arrived in 2021 due to COVID.
The president is Eric Xing, former head of CMU's Machine Learning Department. Most early faculty came from CMU. Yann LeCun has visited; Sam Altman came last summer and received an honorary degree.
Growth has been staggering: Wesley started with 3 departments, graduated with 7 or 8. Faculty doubled in two years. Undergrad admissions have begun.
"AI investment in the Middle East is concentrated in the UAE and Saudi Arabia, proportional to economic power. They've shown genuine commitment to the tech transformation."
Why build Arabic Voice AI
Wesley's project, AudarAI, builds multilingual Voice AI centered on Arabic — speech-to-text, text-to-speech, Voice Agents.
Why not just use OpenAI or ElevenLabs? Because the differences between Arabic dialects aren't like British vs. American English. They're like Mandarin vs. Cantonese — or even bigger. North African dialects diverge even more dramatically.
Major model providers only handle Modern Standard Arabic (MSA), which is literary language — nobody actually speaks it daily. It's like building an AI that only speaks classical Chinese. The emotional delivery might be perfect, but locals will immediately feel it's fake.
"Emotionally expressive speech can simultaneously have terrible accent and pronunciation. The two aren't mutually exclusive."
AudarAI's strategy: perfect UAE dialect first, then Saudi, then expand to GCC countries, then mainstream languages. Niche market, expanding outward step by step.
Voice is the next era's AI interface
This is the single most memorable judgment from the episode.
Wesley believes voice will become the universal interface for human-AI interaction in the next era. The logic is simple: over 70-80% of human-to-human information transfer happens through voice — online chats, conference calls, in-person coffee chats. If AI is going to work closely with humans, the way humans communicate with AI must mirror the way humans communicate with each other.
"You can't be having a conversation with someone, then stop to transcribe everything into text and upload it for AI processing. The efficiency gap is too big. We need AI with native audio reasoning and audio generation capabilities."
The current problem: voice modality lags text-based LLMs by one to two years. The biggest technical bottleneck is latency. Two approaches: end-to-end (voice to voice) — low latency but a black box, hard to customize; or cascaded pipeline (speech → text → LM → speech) — can build agents and industry customization, but the LM inference step creates a latency bottleneck.
"You can tolerate a model thinking for minutes. But in face-to-face conversation, more than two seconds of silence already feels abnormal."
Sovereign AI is a real need in the Middle East
I asked Wesley: is sovereign AI in the Middle East a genuine need or a political slogan?
Real need. He gave a vivid example: in UAE supermarkets, locally produced products carry the label "Proudly Made in UAE."
The logic: if you use third-party models, you can't guarantee they conform to local culture, social customs, and religious factors. Chinese and American models won't adapt to Arab culture and Islam.
The deeper concern is education. If we believe the next generation is AI-native, and AI plays an increasingly important role in their education — if you don't control the AI in that process, how do you ensure citizens raised under that education still identify with your nation?
"Rather than imposing all kinds of restrictions and regulations, it's better to have the capability to train your own local models."
G42: more like a Chinese state enterprise than Silicon Valley
G42 is one of the UAE's most important tech conglomerates, deeply affected by US-China tensions — previously extensive cooperation with Huawei, later restricted by the US.
Wesley's observation: G42 resembles a large Chinese state-owned enterprise more than Silicon Valley. It has solid subsidiaries (Inception, Precise), combining internal R&D with government and local enterprise service contracts.
"American influence is still stronger. You need compute, you need NVIDIA GPUs, so you play by their rules."
It's essentially a B2B service contractor: regardless of whether the technology comes from America or China, it deploys locally and serves local clients.
Where the opportunities are in Middle East AI
Context-dependent, but several clear directions: Energy is the undisputed top priority — ADNOC has its own AI division, partnering with G42 to form AIQ for oil exploration, reserve estimation, and process optimization. Fintech, especially in Dubai. Healthcare, with G42's M42 subsidiary investing heavily.
"Different countries focus on different priorities. China focuses on smart manufacturing and embodied intelligence because manufacturing is strongest. The Middle East doesn't have strong manufacturing — it relies on imports and re-exports — so the focus is energy, finance, and health."
For external founders: if you have energy industry background plus AI capabilities, the Middle East might be one of the best markets globally.
这是「离线时间」第三期对话。Wesley刚从阿布扎比飞到新加坡出差,一个月前他经历了伊朗对阿联酋的导弹袭击——每隔半小时一波警报,导弹拦截的爆炸声像地震。他在中东的AI大学MBZUAI做语音AI研究,正在孵化一个阿拉伯语优先的Voice AI项目。这期聊完,我对中东AI的理解被彻底刷新了。
战火中上班是什么体验
2月28号中午,导弹来袭的警报把Wesley从午休中惊醒。第一周烈度最高:每天三位数的无人机、两位数的导弹。他住的地方离军事基地直线距离10公里,导弹拦截后碎片掉落的位置完全随机——帆船酒店起火上了新闻,居民区也有,离他最近的碎片落在两公里外的大学里。
他和几个同学当晚就开车东撤到阿曼边境。第二周回到阿布扎比继续上班。学校很人性化,hybrid办公,第一时间发问卷了解大家情况。
让他意外的是,公司基本没停。战争第二周他和BCG有个会,发邮件问要不要推迟,对方说没必要,照常进行。
"短期来看业务还是正常运转。长期来看,影响最大的是信心——信心会影响资金和人才的流动。"
中东的AI大学到底什么来头
MBZUAI,全称Mohammed bin Zayed University of Artificial Intelligence——用阿联酋总统的名字命名。2020年成立,第一届学生因疫情推迟到2021年入学。
校长Eric Xing之前是CMU机器学习系主任,最早来的一批教授大部分是CMU的人。杨立昆来过,Sam Altman去年夏天也来了,还拿了个荣誉学位。
发展速度惊人:Wesley入学时3个系,毕业时7、8个系。Faculty数量两年翻一倍。已经开始招本科生了。
"中东在AI上的投入主要以阿联酋和沙特两个国家为主,根据经济实力决定的。他们在科技转型上确实下了很大的决心。"
为什么要做阿拉伯语的Voice AI
Wesley做的项目叫AudarAI,做以阿拉伯语为核心的多语言Voice AI——语音转文本、文本转语音、Voice Agent。
为什么不直接用OpenAI或ElevenLabs?因为阿拉伯语方言之间的差异,不是英式英语和美式英语的区别,是普通话和粤语的区别,甚至更大。北非的方言差异更极端。
大模型厂商只做标准阿拉伯语(MSA),那是书面语,日常没人这么说话。就像你做了一个只会说文言文的AI——情感表达做得再好,当地人一听就觉得假。
"情感充沛的表达,也可以同时是口音很糟糕的表达,这两者不矛盾。"
所以AudarAI的策略是:先把阿联酋方言做好,再做沙特方言,再推到GCC国家,最后再覆盖主流语言。Niche market,一步一步往外推。
语音是下一个时代的AI接口
这是这期里最值得记住的一个判断。Wesley认为,语音会成为下一个时代人与AI交互的通用接口。
逻辑很简单:人和人之间七八成以上的信息传递靠声音——不管是线上聊天、电话会议还是面对面coffee chat。如果未来AI要和人类紧密协作,人和AI的交流方式就必须和人与人的交流方式保持一致。
"你不可能说跟人聊着聊着,然后把对话整理成文本,再上传给AI处理。这个效率差太远了。我们需要AI具备原生的Audio reasoning和Audio generation能力。"
现在的问题是语音模态落后文本大模型一到两年。最大的技术瓶颈是延迟。两种路线:端到端(voice to voice),延迟低但像黑盒;级联管线(语音→文本→LM→语音),可以做Agent但LM推理延迟是瓶颈。
"你能忍受模型思考几分钟,但面对面交流的时候沉默超过两秒就已经很不正常了。"
主权AI在中东是真需求
我问Wesley:主权AI在中东是真实需求还是政治口号?他说是真需求。而且举了一个很生动的例子:在阿联酋超市买东西,如果是当地生产的,包装上会写"Proudly Made in UAE"。
这背后的逻辑是:你用第三方模型,没法保证它符合你的文化、社会习俗和宗教因素。中国和美国的模型不可能针对阿拉伯文化和伊斯兰教做适配。
更深一层的concern是教育。如果我们相信下一代是AI原生一代,他们从小接受的教育里AI会扮演越来越重要的角色。如果你不控制这个环节的AI,你怎么保证在这样的教育下成长起来的国民还能认同你的国家?
"与其做各种各样的restriction和regulation,不如自己有能力训练本土的大模型。"
G42:更像中国国企,不像硅谷
G42是阿联酋最重要的科技集团之一,受中美角力影响很大——之前和华为有很多合作,后来被美国限制了。
Wesley的观察是:G42更像国内的大国企,不太像硅谷。旗下有做得不错的子公司(Inception、Precise等),一边有内部研发,一边承接政府和当地企业的业务需求。
"美国的影响还是更大一点。你需要算力,你需要英伟达的显卡,你就得按他们的协议来。"
它本质上是一个ToB的服务承包商:不管技术来自美国还是中国,拿来用,部署到本地,服务当地客户。
中东AI的机会在哪
因地制宜,几个方向:能源是毫无疑问的第一优先级。金融科技,尤其是迪拜。医疗健康,G42旗下的M42在大量投入。
"不同国家关注的侧重点不同。中国最关注智能制造和具身智能,因为制造业最强。中东的制造业不发达,主要依赖进口和转口贸易,所以重点在能源、金融和健康。"
对于外部创业者来说,如果你有能源行业背景加上AI能力,中东可能是全球最好的市场之一。
This is an episode of「离线时间」Stolen Chat. If you're building in AI and thinking about going global, I'd love to hear from you. AudarAI →