Qalb: The World’s Largest Urdu AI Model, Built by Pakistani Youth in America

In January 2026, a young Pakistani computer science student in the United States achieved a milestone that resonated across the global Urdu-speaking community. Muhammad Taimoor Hassan, a graduate student at Auburn University, launched Qalb (قلب), recognized as the world’s largest and most advanced large language model (LLM) developed exclusively for the Urdu language.

“Qalb” means “heart” in Urdu and Arabic, a fitting name for a project that aims to place the linguistic and cultural heart of over 230 million Urdu speakers at the center of the AI revolution.

The Urdu AI Gap

For years, Urdu speakers — primarily in Pakistan, India, and diaspora communities worldwide — faced significant barriers in accessing cutting-edge AI tools. Major models like GPT-4, Claude, and Gemini excel in English but often stumble with Urdu’s complex Nastaliq script, rich morphology, poetic nuances, cultural idioms, and right-to-left writing system. Even multilingual models frequently produce grammatically incorrect, culturally tone-deaf, or contextually shallow responses in Urdu.

This digital divide limited opportunities in education, healthcare, governance, customer service, and creative industries for millions. Qalb directly addresses this gap as an Urdu-first model rather than an afterthought in a multilingual system.

Technical Achievement

Qalb builds upon Meta’s LLaMA-3.1 8B architecture. It employs a sophisticated two-stage development process:

  1. Continued Pre-training on a massive 1.97 billion token dataset, including 1.84 billion tokens of high-quality Urdu text sourced from news archives, classical and contemporary literature, government documents, social media, and educational materials. To prevent “catastrophic forgetting” of general reasoning abilities, the team incorporated 140 million tokens from English Wikipedia.
  2. Supervised Fine-Tuning on the Alif Urdu-instruct dataset, enabling strong instruction-following capabilities.

This methodical approach yielded impressive results. On comprehensive Urdu-specific benchmarks spanning seven diverse tasks — including classification, sentiment analysis, reasoning, and more — Qalb achieved a weighted average score of 90.34. This outperforms the previous state-of-the-art Alif-1.0-Instruct model (87.1) and dramatically surpasses the base LLaMA-3.1 8B-Instruct by over 44 points.

At 8 billion parameters, Qalb stands as the largest dedicated Urdu LLM to date, balancing performance with practical deployability for real-world applications.

The Visionary Behind Qalb

Muhammad Taimoor Hassan completed his BS in Computer Science from FAST University in Pakistan before pursuing graduate studies in the United States. Working from Auburn University’s Department of Computer Science and Software Engineering, he led the project with collaborators including Jawad Ahmed and Muhammad Awais.

Hassan’s motivation stemmed from personal experience and national pride. He observed how language barriers hindered Urdu speakers’ access to modern technology and sought to create a foundational model supporting local businesses, educational platforms, digital services, and voice-based AI agents.

The project demonstrates the power of the Pakistani diaspora in advancing technology that benefits the homeland. Many Pakistani students and professionals abroad channel their expertise into projects with direct impact on Pakistan and South Asia.

Broader Implications

Qalb’s launch carries profound implications:

  • Education: Urdu-medium schools and students can access AI tutors, explanation tools, and learning resources in their native language.
  • Government and Public Services: More effective chatbots, document processing, and citizen engagement platforms in Urdu.
  • Healthcare: Better medical information dissemination and patient interaction tools.
  • Content Creation: Enhanced support for writers, poets, journalists, and digital creators working in Urdu.
  • Economic Opportunity: Local startups can build Urdu-first applications without relying on suboptimal foreign models.

By prioritizing high-quality, domain-diverse Urdu data, Qalb captures nuances that generic multilingual training often misses — from classical poetry references to contemporary slang and regional variations.

Challenges and Future Outlook

Developing Qalb was not without hurdles. Curating a clean, massive Urdu corpus required significant effort given the relative scarcity of digitized high-quality Urdu text compared to English. Computational resources for training an 8B model also presented challenges, typically overcome through university access and efficient optimization techniques.

Looking ahead, the Qalb team and community can pursue several advancements:

  • Larger model variants
  • Multimodal capabilities (image + Urdu text)
  • Voice synthesis and recognition integration
  • Domain-specific fine-tunes for law, medicine, or finance
  • Open-source collaboration to accelerate Urdu AI ecosystem growth

The model is available on Hugging Face (enstazao/Qalb-1.0-8B-Instruct), encouraging developers and researchers worldwide to build upon it under an Apache 2.0 license.

A Source of National Pride

Qalb represents more than technical achievement. For many Pakistanis, it symbolizes innovation, resilience, and the ability to compete on the global AI stage despite resource constraints. A young Pakistani mind in America created something that directly empowers millions back home and in the diaspora.

As AI continues transforming society, models like Qalb ensure that technological progress becomes more inclusive. They prove that excellence in AI need not be confined to a handful of dominant languages or nations.

In the words of its creator and the excitement rippling through Pakistani tech communities, Qalb is not just an AI model — it is a heartbeat for the Urdu digital future. It reminds us that technology should serve culture and people, not force them to conform to its limitations.

The success of Qalb will likely inspire more young Pakistanis to pursue ambitious AI projects. It sets a precedent: with determination, strategic focus on local needs, and smart adaptation of open-source foundations, significant breakthroughs remain possible even from emerging tech nations.

As Urdu speakers begin interacting with more natural, culturally aware AI, Qalb may be remembered as the model that opened the door to a new era of linguistic equity in artificial intelligence.

(Word count: approximately 1020)

FAQ: Qalb Urdu AI Model

Q: What does “Qalb” mean? A: “Qalb” (قلب) means “heart” in Urdu and Arabic, symbolizing the project’s aim to center the heart of Urdu language and culture in AI development.

Q: Who developed Qalb? A: Muhammad Taimoor Hassan, a Pakistani graduate student at Auburn University in the United States, led the development. He collaborated with researchers including Jawad Ahmed and Muhammad Awais.

Q: Is Qalb really the world’s largest Urdu AI model? A: Yes. It is widely recognized as the largest and most capable LLM developed exclusively for Urdu, trained on 1.97 billion tokens — significantly more dedicated Urdu data than previous efforts.

Q: What is the model size and base? A: It is an 8-billion-parameter model based on LLaMA-3.1 8B, adapted through continued pre-training and instruction fine-tuning.

Q: How was it trained? A: Through continued pre-training on 1.84 billion Urdu tokens + 140 million English tokens, followed by fine-tuning on the Alif Urdu-instruct dataset. This approach preserved general capabilities while deeply specializing in Urdu.

Q: How does it perform? A: Qalb achieved a weighted average benchmark score of 90.34 across seven tasks, outperforming previous Urdu models and the base LLaMA-3.1 8B-Instruct by a wide margin.

Q: Where can I try or download Qalb? A: The model is available on Hugging Face: enstazao/Qalb-1.0-8B-Instruct.

Q: Will Qalb support applications in Pakistan? A: Absolutely. It is designed to power chatbots, educational tools, government services, content creation platforms, and more in Urdu.

Q: Is Qalb open source? A: Yes, released under Apache 2.0 license, encouraging broad adoption and further development by the community.

Q: How does this benefit ordinary Urdu speakers? A: It enables more accurate, natural, and culturally relevant AI interactions in their native language, reducing dependence on English-centric tools and promoting digital inclusion.

Q: What’s next for Qalb? A: Potential expansions include larger models, multimodal features, specialized domains, and stronger integration with voice technologies to further empower Urdu-speaking communities globally.

More From Forest Beat

Technology in woodworking

Tech Is Transforming Modern Woodworking

Woodworking workshops are undergoing a major transformation, with technology making them cleaner, safer, and more efficient than ever before. From advanced dust extraction systems...
Technology News
2
minutes
Modern First Aid Courses Beyond CPR

Why Modern First Aid Courses Cover More Than CPR

Introduction For many years, when people thought about first aid training, the first thing that came to mind was CPR (Cardiopulmonary Resuscitation). While CPR remains...
Technology News
8
minutes
NYMOBILE

NYMobile Announces Partnership With Priceagent to Advance Wireless Pricing Strategy

NYMobile partners with Priceagent to evaluate customer demand and pricing behavior for personalized wireless plans and vanity phone numbers. United States, June 1, 2026 — New...
Technology News
4
minutes
Google lab-grown mosquitoes Florida project

Google’s ‘Debug’ Project to Release Millions of Lab-Grown Mosquitoes in Florida...

Google is preparing to launch an ambitious biotechnology experiment in the United States that could reshape how mosquito-borne diseases are controlled. Through its experimental...
Technology News
2
minutes
spot_imgspot_img