Sapienza NLP @ CLiC-it 2025
Two papers accepted at CLiC-it 2025!
We're excited to present two papers at CLiC-it 2025! Find us in Cagliari talking about continual pretraining and sustainable LLM evaluation.
What we Learned from Continually Training Minerva: a Case Study on Italian
by L. Moroni, T. Bonomo, L. Gioffré, L. Xu, D. Fedele, L. Colosi, A. S. Bejgu, A. Scirè, R. Navigli
Modern Large Language Models (LLMs) are commonly trained through a multi-stage pipeline encompassing pretraining and supervised finetuning. While recent studies have extensively investigated the benefits of continual pretraining on high-quality data, these efforts have focused primarily on English. In this work, we explore the effectiveness of various data mixtures in a continual pretraining setting to enhance performance on Italian-language tasks. Leveraging Minerva-7B, a fully open-source LLM pretrained on a corpus composed of 50% Italian, we define and evaluate three distinct data recipes – comprising mathematical, encyclopedic, and copyrighted content – spanning both Italian and English. We also investigate the effect of extending the model’s context window during continual pretraining on its ability to handle long-context tasks. To support our evaluation, we introduce INDAQA, a new benchmark for narrative question answering in Italian. Our results reveal that both data composition and increased context length substantially improve performance, offering valuable insights into continual pretraining strategies for less represented languages within an open scientific framework.
Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines
by L. Moroni, G. Pappacoda, E. Barba, S. Conia, A. Galassi, B. Magnini, R. Navigli, P. Torroni, R. Zanoli
The evaluation of large language models for Italian faces unique challenges due to morphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.