Cite the tutorial
If you use material from this tutorial, please cite:
@misc{alam_chowdhury_2026_mm_llms_wild,
title = {Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages},
author = {Alam, Firoj and Chowdhury, Shammur Absar},
year = {2026},
howpublished = {\url{https://mm-llms-in-the-wild.github.io}},
note = {Tutorial materials}
}
Suggested reading
- Foundations and multimodal models: BLIP-2 (Li et al., 2023); LLaVA (Liu et al., 2023); KOSMOS-1 (Huang et al., 2023); PaLM-E (Driess et al., 2023); PALO and Maya (multilingual vision–language); SeamlessM4T and AudioPaLM (speech–text LLMs).
- Adapters, PEFT, and efficient training: LoRA/QLoRA (Hu et al., 2022; Dettmers et al., 2023); adapter stacks for VLMs; mixture-of-experts for modality/language specialization (Shen et al., 2024).
- Data creation and curation: translation/back-translation pipelines; OCR/ASR bootstraps for low-resource multimodal data; safety and culture-aware filtering (Pfeiffer et al., 2022).
- Evaluation and robustness: culture-aware benchmarks such as xGQA, MaRVL, HaVQA; dialectal stress tests; hallucination and grounding diagnostics for VLMs and speech→text→LLM cascades.
- Applications and toolkits: open-source pipelines for multilingual VLM fine-tuning (e.g., LLaVA-Med/Next-GPT variants), speech front ends (Whisper, Seamless), and benchmarking toolkits for multilingual/multimodal tasks.