AI applications are being deployed in more and more areas. But what happens when they make mistakes? In order to answer that question, we need to distinguish between an application suggesting further items to online shoppers and one assisting a human resources department with application processes. In September and October the Network Artificial Intelligence in Labour and Social Administration, in collaboration with Tobias Krafft of TU Kaiserslautern, discussed which AI applications are critical and which are less so. They also considered what this would mean for the use of AI in public authorities.
Mr Krafft, you were involved in the development of a “criticality matrix” for assessing the potential effects of AI systems. What exactly is this?
The criticality matrix can be used either for assessing risks initially when developing an AI or while using it. In either scenario, the AI system is evaluated in the specific context of its application with a view to identifying any potential for damage – for example if it were to make an incorrect decision.
Using the criticality matrix, the key question – namely the overall potential damage – can be answered more precisely. This makes it possible to take suitable damage limitation measures in advance. The importance of this becomes clear once we compare the areas in which different applications are used. An AI system using camera images to assess screws on a conveyor belt, for example, will have a relatively low potential for damage, whereas a system employing AI to predict chances in the labour market is capable of doing far more harm. Depending on the overall damagepotential, it is then possible to formulate the appropriate technical requirements regarding transparency and traceability of the AI system orto develop risk mitigation measures.
How does your model function in concrete terms?
When determining the overall potential for damage, two dimensions need to be considered. The first relates to the actual damage that would be caused by a wrong decision. Apart from any individual consequences or damage, the assessment must also take into account any potential damage to society as a whole that might be associated with using the system. One current much-discussed example is upload filters. On an individual level, an error of judgement on the part of the underlying AI system will cause no more than a degree of uncertainty among users; on a broader societal level, however, the harm can be considerable.
The second dimension involves the issue of whether a person is affected by a decision made by AI and if so to what extent. One factor influencing this is the potential for contradictions and changes. In contrast to many areas, where supplier variety reduces the overall risk of harm, the use of AI by official bodies mostly increases the risk. This is because people may have no choice but to allow AI systems to evaluate them, although in such cases, the potential for damage can be reduced if there are options for objecting or intervening. In short, the specific context in which an AI system is applied will be the key factor determining the overall potential for damage.
Figure 1 has the heading “Criticality matrix” and it contains a diagram with axes and the following description: “The criticality matrix by Tobias Krafft and Katharina Zweig differentiates five degrees of criticality in the use of AI systems to make decisions or support them being made. The categories build on each other and are linked to increasing regulatory requirements for transparency and traceability in the decision-making process.”
Along the x axis of the graph, the damage potential ranging from low to high is indicated for AI systems making decisions or supporting them being made. On the y axis, the level of dependence ranging from low to high is indicated for these AI systems. Starting at 0 with low damage potential and a low level of dependence going up to the upper right corner with high damage potential and a high level of dependence, the five degrees of criticality are depicted with even spacing. The first degree of criticality, 0, has the title “Post hoc analysis”. Then follow:
- Degree 1 “Constant monitoring as a form of black-box analysis”
- Degree 2 “Reviews of the objectives of the AI system, the input ...”
- Degree 3 “Only traceable AI systems (significant restrictions)”
- Degree 4 “No AI systems”
With the help of diagonally arranged bars, the figure visually indicates from the bottom left to the top right how the regulatory requirements for transparency and traceability in the decision-making process are linked to AI systems making decisions or supporting them being made. The graphic shows that the regulatory requirements increase with the damage potential and degree of dependence.
© Algorithm Accountability Lab [Prof. Dr K. A. Zweig], aalab.cs.uni-kl.de/resources/.
So according to your model, the more critical the application of AI, the greater the need for transparency and traceability. How can we trace the way AI systems arrive at a certain result given they are frequently referred to as “black boxes”?
Along with transparency in respect of technical details, from the development process of the AI all the way through to the actual finished system, various substeps or components can be processed in a way that allows outsiders to understand and evaluate them, thereby making it possible to, for example, justify or explain to an affected group how the decision was reached in the first place.
Irrespective of this, however, it is also possible to grant a domain expert access to the system. This allows the expert to carry out testing procedures expressly devised for dealing with black-box systems and thereby to examine specific issues.
This is precisely where a differentiated risk-based AI evaluation can be beneficial. Based on an initial risk assessment assisted by the criticality matrix, it is possible to determine, in individual cases, what mechanisms for transparency and traceability could be established or even demanded.
What factors should be considered when AI is deployed in labour and social administration? Is the “criticality” of a learning system affected by whether it is deployed in the private sector or in unemployment or accident insurance?
Obviously, the context in which an AI system is deployed greatly influences the potential risks.
The examples mentioned above illustrate why the second dimension is necessary when evaluating the potential for damage. In the private sector, an affected individual frequently has the option of refraining from using a service, or choosing an alternative supplier, if they have concerns. By contrast, when such systems are used in the social security system, there is little choice. This creates a certain pressure to be subjected to the AI system, which in turn raises the level of criticality. This is of course not in itself an argument against using AI in this field, but it does demonstrate the need for a more careful response to the risks involved here.
In which areas of administration would you say AI has the greatest potential? What AI applications would you like to see in labour and social administration?
Today, it is already possible in many areas of labour and social administration to use good AI applications that can be classified as generally harmless in terms of criticality. These include relatively generic systems for improving work productivity – AI-based digitalisation of handwritten documents, for example, but also dictation and translation software. My experience suggests that there is still considerable untapped potential in such applications.
I’d also like to see more use of AI applications that reduce the burden of bureaucracy and especially help employees perform disagreeable tasks.
Tobias Krafft is pursuing a doctorate in the Algorithm Accountability Lab, a research group at TU Kaiserslautern. One key focal point in his work is research into the social effects of algorithms and AI. He is co-director of Trusted AI GmbH, advising companies and public-sector institutions on issues relating to the development and use of AI systems. In September 2021 he was invited by the Network Artificial Intelligence in Labour and Social Administration to talk about how to distinguish between critical and less critical AI applications and what respective requirements the systems should meet.