Collaborative Development and Validation of AI
In practice, the development of AI through to clinical use takes place in many iterations of a process consisting of medical data provision, technical development, technical, biological and statistical case-based validation and subsequent medical, biological and technical refinement. This requires close and efficient collaboration between various IT and medical working groups. In oncology in particular, extensive databases have been created and continue to grow, although these are often subject to intellectual property (IP) and data protection framework conditions that must be taken into account within this iterative development cycle. In contrast to conventional, purely legally structured collaboration, a methodically secured, decentralized, distributed, federated development and validation of AI is therefore to be investigated and developed here. The aim is to increase the efficiency of successful collaboration with the groups in CAIMed and external partners through the possibility of technically secure, scalable collaborative AI development. The focus here is on digital pathology image data sets. The key here is not only to distribute classically federated AI algorithms (deep learning), but also to establish potentially distributed, but primarily (for the data providers) protected semantically annotated data sets. In particular, this enables access to a considerably larger clinically relevant data pool than with pure open data approaches. These technical protection approaches for decentralized AI development are being investigated using semantically structured oncological image data (pathology, radiology) from CAIMed in particular. For this purpose, digital patient collectives (e.g. CancerScout) are extracted from the existing extensive oncological data sets and then processed by automatic data flows of oncological image data and their semantic meaning for diagnostics and therapy for AI methods collaboration partners for training and case-based validation in technically protected federated learning environments via AI methods and made available for deep learning. For this purpose, high-performance computing resources of the GWDG are used for machine learning. Clinical use cases such as response prediction, prognosis and biomarker analysis (e.g. immune infiltrates) are used as examples.