DC10

DC10 Project: Enhancing NLP Capabilities through Federated Learning and GNNs

Doctoral Candidate

Makhmoor Fiza Murk, MSc

Main Supervisor: Carlo Nitsch (UNINA)

Auxiliary supervisors: David Camacho (UPM), Shen Yin (NTNU), Dariusz Mrozek (SUT)

R&D cooperation: ALMAWAVE

Objectives: advance Natural Language Processing (NLP) in federated and decentralised settings by leveraging Graph Neural Networks (GNNs) to model complex linguistic structures.

As data privacy and ethical constraints increasingly limit access to large centralised corpora, this research seeks to enable privacy-preserving, distributed NLP systems capable of maintaining performance while respecting user confidentiality.

The project focuses on three main research goals:

Designing federated NLP models that can operate across decentralised data sources, ensuring data security and efficient learning;
Integrating GNNs into NLP tasks to capture the relational and syntactic dependencies within language;
Improving interpretability and transparency in federated NLP systems, ensuring that their decision-making processes can be understood and trusted.

The doctoral candidate will develop federated architectures for NLP, train GNN-enhanced language models, and test them on multilingual and domain-specific corpora, in collaboration with industrial partner ALMAWAVE S.p.A. and academic partners (UPM, SUT, NTNU). The research will combine theory and experimentation to produce scalable, explainable, and privacy-preserving NLP solutions.

Expected Results:

A federated NLP framework integrating GNNs for distributed language understanding and generation.

Privacy-preserving training protocols for text models on heterogeneous and sensitive data.

Tools and guidelines for interpretable federated NLP in real-world applications.

High-impact scientific publications in leading AI/NLP venues.

Applied research: The focus is on integrating Natural Language Processing (NLP) with federated learning and Graph Neural Networks (GNNs) for decentralized data processing. The project targets the development of privacy-preserving NLP models suitable for decentralized networks, ensuring secure, efficient language processing. It also aims to advance the use of GNNs in NLP, tackling complex linguistic structures in decentralized environments. The anticipated outcomes include practical federated NLP models, GNN-enhanced linguistic processing techniques, and comprehensive insights into federated NLP challenges, guiding future AI research towards more transparent and understandable natural language systems.

Planned secondments: UPM(4 months); SUT(4 months); NTNU (4 months)

Enrolment in Doctoral degree: UNINA

Towards an Understanding of Artificial Intelligence

TUAI