Text · Speech · Vision Low-resource focus Hands-on recipes

Multilingual and Multimodal LLMs in the Wild

Build inclusive tri-modal systems that see, hear, and read for low-resource languages and dialects. We cover BLIP-2, LLaVA, KOSMOS-1, PaLM-E, PALO, Maya, SeamlessM4T, and AudioPaLM—plus efficient PEFT/adapters/MoE, culture-aware benchmarks, and speech→text→LLM pipelines.

Efficient multimodality

Adapter stacks, LoRA/QLoRA, quantization, and MoE routing for compact multilingual VLMs.

Data & evaluation

Low-cost data creation, OCR/ASR bootstraps, and culture-aware benchmarks like xGQA, MaRVL, and HaVQA.

Speech in the loop

Practical speech→text→LLM pipelines, cascaded vs. unified speech–text models, and robustness checks.

Hands-on resources

Slides, lab notebooks, and checklists for rapid replication across low-resource settings.

Venue

Conference venue and room details will be posted once the schedule is finalized.

Date

Date: TBA (aligned with the conference schedule)

Time: TBA

Speakers

Please check the bio for each speaker

Citation

Please cite the tutorial as:

  • Alam, Firoj and Shammur Absar Chowdhury. Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages. Tutorial, 2026. https://mm-llms-in-the-wild.github.io.

Table of Content