Artificial Intelligence (KI) is a key technology for the future and is being deployed in more and more applications affecting our everyday and working lives. Developing such AI applications requires large amounts of training data. Yet to date, very few quality standards for such data have emerged. KITQAR is a new research project for progressing these, funded by the Policy Lab.
Many applications of Artificial Intelligence (AI) need data; in particular, data sets are essential in the development phase of AI applications. These so-called training, testing and validation data are used to train an AI application and check that it is fully functional in such areas as machine learning. Unfortunately, to date there have been no uniform quality standards for such data, although these will be crucial if AI applications are to meet certain technical, legal, ethical or social requirements in the future. If the underlying data are faulty or distorted, this can negatively impact an application’s functionality. Potentially affected areas include security, non-discrimination and data protection.
KITQAR – higher-quality training data for Artificial Intelligence
The research project KITQAR, funded by the Policy Lab Digital, Work & Society under the aegis of the Federal Ministry of Labour and Social Affairs (BMAS), aims to fill this gap. The first step is to find out what quality standards actually do need to be met by training data for AI applications and how such data quality can be made measurable and verifiable. Here the project uses both data collected from practice and artificial data. The resulting “data quality framework” will be designed to cover the most diverse aspects of data quality and will then be checked in a series of trial runs. A further aim is to develop a partially automated testing kit that can be used to evaluate data quality in the future.
Wide-ranging expertise from practice and business
The business and scientific communities are working together closely to make sure the project retains a clear practical focus and relevance and to ensure an interdisciplinary exchange. The project is being led by the VDE (Association for Electrical, Electronic & Information Technologies), and collaborators include scientists from the European University Viadrina in Frankfurt an der Oder, the University of Tübingen and the Hasso Plattner Institute at the University of Potsdam. Countless stakeholders from companies, civil society, trade unions and the field of regulation are also involved. Within this group, scenarios for the application of training data are discussed and proposals for future data standardisation developed.