Workshop on Training and Evaluation Data for Italian Large Language Models

Posted on Dec. 7, 2023

The inaugural workshop marks the initial phase of the development of the Italian Large Language Models (LLM).

When: 18/12/2023

Where: DIAG, Sapienza Università di Roma - Aula Magna (first floor), Via Ariosto 25, Roma

LINK: https://uniroma1.zoom.us/j/86905776052?pwd=QWd4QzZhczlTSUdDeWZVdVhkNHlaZz09
Event starting at 2 p.m. CET

This inaugural workshop, focusing on the development of Large Language Models (LLM) for the Italian language, marks the initial phase of constructing a Large Multimodal Model within the framework of the Transversal Project "Vision, Language, and Multimodal Challenges" as part of the big project "Future Artificial Intelligence Research" (FAIR). The workshop is organized in collaboration with the CINI AIIS (Artificial Intelligence and Information Systems) laboratory, serving as the hub for the entire Italian AI community. The specific goal of this event is to inform and discuss the collection and curation of training and evaluation datasets, representing the foundational step towards the realization of Italian LLMs and LMMs.

Organizers:

Roberto Navigli (Sapienza University of Rome)

Rita Cucchiara (University of Modena and Reggio Emilia; CNR)

Agenda

Introductory session: 14:00 - 14:20

14:00 - 14:20 | Project Introduction
Roberto Navigli, Sapienza University of Rome
Rita Cucchiara (University of Modena and Reggio Emilia; CNR)

Invited Talks - 1st part: 14:20 - 16:00

14:20 - 14:40 | LLMs at Barcellona Supercomputing Center (slides)
Marta Villegas, Barcelona Supercomputing Center
14:40 - 15:00 | Data for European Large Language Models: The European Perspective (slides)
Georg Rehm, DFKI
15:00 - 15:20 | A Dataset Framework for Large Language Models (slides)
Malte Ostendorff, DFKI
15:20 - 15:40 | Assessing Reliability of Knowledge in LLMs (slides)
Barry Haddow, University of Edinburgh
15:40 - 16:00 | HPLT: Data and Models for European Languages (and more) (slides)
Sampo Pyysalo, University of Turku

Coffee break: 16:00 - 16:30

Invited Talks - 2nd part: 16:30 - 17:30

16:30 - 16:50 | Annotating Multilingual Heterogeneous Web-Based Corpora (slides)
Pedro Ortiz, DFKI
16:50 - 17:10 | GPT-SW3: the first LLM for the North-Germanic languages (slides)
Magnus Sahlgren, AI Sweden
17:10 - 17:30 | LLMs and Data Protection: General Considerations (slides)
Roberto Lattanzi, Dip. AI, Garante per la Protezione dei Dati Personali

Participant Presentations and Closing: 17:30-18:45

17:30 - 17:42 | Italian Benchmark Language Resources and Tools: EVALITA4ELG, UINAUIL and more
Viviana Patti, University of Torino
17:42 - 17:54 | Collecting Italian Textual Data for the Medical Domain
Bernardo Magnini, FBK
17:54 - 18:06 | Il Dato che non ti ho Dato chi te l'ha Dato? Building trust in data donors
Fabio Massimo Zanzotto, University of Rome Tor Vergata
18:06 - 18:18 | The Italian challenge to Large Acoustic Models for automatic speech recognition and synthesis
Franco Cutugno, University of Napoli Federico II
18:18 - 18:30 | The Weakest Link: Understanding How Data Influences ML Trustworthiness
Antonio Cinà, University of Genoa
18:30 - 18:45 | Closing

In collaboration with:

News from Sapienza NLP