In an interview, Dr Katharina Zweig, a professor of computer science and an AI expert, discusses the use of AI in the field of labour and social affairs, the need to develop AI skills and what AI systems should be regulated and how.
Professor Zweig, do political decision makers in Germany and the rest of Europe understand the changes that artificial intelligence (AI) will unleash?
ZWEIG: I think this varies a great deal. The Bundestag’s study commission on artificial intelligence is composed precisely of those members of parliament who have already given some thought to this matter. Other than this, I see the full scope of variations: a local government interested in such issues; some state governments are also waking up. And together with Gerald Swarat of the Fraunhofer Institute, we have just launched an initiative for AI in local government. Its prime purpose is to educate as we are concerned that AI systems may be procured too hastily and without full understanding of how the technology works.
Are there any typical AI misperceptions that you repeatedly encounter?
ZWEIG: There is this diffuse idea that AI is clever. Frequently, the basic mechanism is not understood – namely the fact that the methods that are attracting so much discussion at the moment are simply statistical processes that search for patterns in data. Any many people do not realise how much scope there is for variation. They think that the machine always comes up with an optimum and objective solution. Yet, sometimes the questions asked are still too complex for this to work, especially when people are involved.
In what way?
ZWEIG: When it comes to complex questions, we no longer have any algorithms actually capable of finding an optimum. Instead, we have to resort to what are known as heuristics. These heuristics try to find patterns in the data that are as meaningful as possible but there is no guarantee that they have found the best patterns. This means that, depending on the data and the question, errors may also occur. Most people do not realise this, which is why too much is expected of AI in certain areas, while in others, the trust placed in it is entirely justified.
Is this excessive reliance on AI causing policymakers and businesses to be overly worried by AI systems with the result that they want to regulate it too heavily?
ZWEIG: Both are true. If you greatly exaggerate the possibilities of artificial intelligence, you may also feel that there are opportunities for the economy that must be utilised come what may. By the same token, some people are clamouring for strong regulation – so we have to be careful in that respect as well.
ZWEIG: Because there is no such thing as “one-size-fits-all AI”. AI is a set of methods that are used to extract patterns from data. My field of research addresses those AI systems that use the patterns found in order to make decisions. And these decisions are as diverse as the decision makers they are intended to replace or support. So, when it comes to regulation, it is crucial to distinguish between a scenario in which I am recommended a book by a public library, a medical practitioner makes a diagnosis or a judge decides how long I have to go to prison for. And the same thing applies to AI systems.
Is there a rule of thumb as to what AI systems really need to be regulated?
ZWEIG: Basically only those systems whose use impacts fields regulated by law. Broadly speaking, these are systems that make judgements on people or their property as well as AI systems that grant access to social or natural resources.
What resources do you specifically mean?
ZWEIG: Access to the labour market and the housing market, access to oil, energy, education – anything that is a resource which is not available in infinite amounts. But when it is a question of building the best paper clip and you develop an AI system that sorts out paper clips that have not been folded correctly, it is self-evident that no regulation is required.
What criteria should be used to assess whether regulation is necessary and, if so, to what extent?
ZWEIG: This is what we are currently researching – it is not easy to develop simple rules in view of the wide range of possible applications I referred to before. In the case of AI systems that make decisions or support decision-making processes, the key question is the potential that an AI system has for causing damage when it is used and the extent to which someone is dependent on this decision. If, say, we take an assessment system for job candidates and employees, it makes a difference whether I as a jobseeker write to 200 companies, each of which has its own system. This is because even if I am rejected by ten per cent of these systems, there is still hope for me that the remaining 90 per cent will apply different criteria. On the other hand, if the same system is used to assess internal candidates, there is a far greater degree of dependence as I cannot simply change companies to obtain a different assessment. This shows that one and the same system may need to be regulated to differing extents depending on the social processes in which it is integrated.
Figure 1 has the heading “Criticality matrix” and it contains a diagram with axes and the following description: “The criticality matrix by Tobias Krafft and Katharina Zweig differentiates five degrees of criticality in the use of AI systems to make decisions or support them being made. The categories build on each other and are linked to increasing regulatory requirements for transparency and traceability in the decision-making process.”
Along the x axis of the graph, the damage potential ranging from low to high is indicated for AI systems making decisions or supporting them being made. On the y axis, the level of dependence ranging from low to high is indicated for these AI systems. Starting at 0 with low damage potential and a low level of dependence going up to the upper right corner with high damage potential and a high level of dependence, the five degrees of criticality are depicted with even spacing. The first degree of criticality, 0, has the title “Post hoc analysis”. Then follow:
- Degree 1 “Constant monitoring as a form of black-box analysis”
- Degree 2 “Reviews of the objectives of the AI system, the input ...”
- Degree 3 “Only traceable AI systems (significant restrictions)”
- Degree 4 “No AI systems”
With the help of diagonally arranged bars, the figure visually indicates from the bottom left to the top right how the regulatory requirements for transparency and traceability in the decision-making process are linked to AI systems making decisions or supporting them being made. The graphic shows that the regulatory requirements increase with the damage potential and degree of dependence.
© Algorithm Accountability Lab [Prof. Dr K. A. Zweig], http://aalab.cs.uni-kl.de/resources/.
What is the specific situation with respect to labour and social affairs? Is there any interest in the changes that AI will bring about?
ZWEIG: Works councils have been looking into AI systems for a long time, definitely for more than five years now. We are doing a lot of work with them and are now on the verge of establishing a workshop system that we can use regularly for training. I have also held presentations at employment agencies and we have already done quite a bit for consumer protection. I think you’ll find people who realise the potential offered by AI just about anywhere.
Can you give us any positive examples worthy of a closer look showing how AI is being used today in administration?
ZWEIG: Generally speaking, this is a difficult area but I don’t want to rule out the possibility of such systems being able to support government offices. However, this is contingent upon these systems being scientifically observed and assessed to determine whether they are operating properly and effectively. As well as this, the process in which the system is to be used must be prepared carefully. On this basis, there is indeed a positive example showing how a pilot project can be established when a government body decides to use AI. That is the algorithm of the labour market service AMS in Austria. Although it is, strictly speaking, only a heuristic model, let’s stick to the more customary term and call it an algorithm. The system places unemployed people in three different groups: the first group consists of people who will not have any difficulty finding a new job. A third group has been out of work for a long time and may never find employment again. And then there is a group between the other two for everyone else. This group is to receive greater support in the form of further education or advanced training. Obviously, this is a very sensitive task and therefore needs to be evaluated well.
How does this specifically work at the technical level? How does the AI system sort applicants?
ZWEIG: The heuristic model used for this purpose is what is known as a logistic regression, which is a very, very simple form of machine learning. Compared with other methods of machine learning, the results are still very limited. As a human, you can understand what factor affects the result and how.
What insights have been gained from this system?
ZWEIG: It was discovered that there is a kind of penalty if you are a woman, aged over 50 or care for other people. This triggered a huge outcry in the media; the software was said to be discriminatory. But this is not quite correct as the heuristics used merely learned from the labour market.
So what really happened was that the software exposed existing discrimination rather than actually being responsible for the discrimination itself?
ZWEIG: Yes, it exposes the discrimination. Does it discriminate of its own accord? Well, the software does not have any self-awareness of any kind, so it doesn’t “do” anything in the sense of an autonomous act. However, depending on how the software is used, it may under certain circumstances preserve the discrimination that has been discovered. According to the head of AMS, the expected effect would be for people who experience discrimination in the labour market to increasingly enter the middle group so that they can receive additional support. This effectively amounts to anti-discrimination or a countermove, so to speak.
What can we learn from this Austrian example?
ZWEIG: In my opinion, there is only one way of finding out whether such systems are helpful or not: we have to from the outset analyse them using scientific methods and determine in the light of the overall system whether they really do improve performance or not. But this often fails because we don’t know how good people’s decisions were before the system was introduced. Frequently, there is a feeling – in HR for example – that the decisions made by people are not good enough and that something needs to be done about this.
Why is that a problem?
ZWEIG: This leads to action merely for its own sake: a system that is supposed to be good and often pretrained on the basis of external data is bought. There’s a good example of this from medicine: an AI system to help with cancer diagnostics was tested in pilot projects at many German clinics and performed quite well as such. However, the system also made strange proposals. One reason for this may have been the fact that the system had been trained in the United States where there is a financial incentive for medicinal practitioners to prescribe certain drugs. The computer picked this up during the learning phase with the result that it suggested very specific treatments in some cases. What I’m trying to say is that you cannot train these systems at any old place and then simply buy them.
Do you think that it’s a good thing that it’s not possible to buy any old pretrained AI system?
ZWEIG: Yes, I find that reassuring. I’m often asked whether we are already hopelessly lagging behind in Europe given that the United States and China have progressed so much further. However, all these examples repeatedly show that if you want to have an AI system for Europe, you must train it using European data. Europe is an important market – but only if we work together. And this means that we are in a position of power: if we decide to adopt a different approach to handling our data, there will be few alternatives for those who want to develop systems for Europe that accurately reflect European behaviour. My primary demand is that it should no longer be permitted for digital data on our behaviour to be collected centrally to learn from it. Instead, distributed machine learning processes must be used. However, we don’t yet have the infrastructure for this and further research is required.
For this market power to work properly, AI systems must be monitored. But there are no central instances for this yet. Who do you think should monitor AI systems?
ZWEIG: I think that we already have arbitration structures in most social processes. In the case of work as a resource, this function is performed by the works councils. With respect to consumers, we have the consumer protection organisations and, in the case of private-sector media, the state media watchdogs. However, these bodies would need to be given the necessary skills.
Many of the issues relating to labour and social affairs are seen from employees’ point of view – questions such as recruiting, bonuses, assessments as well as automatic selection systems of the type already tested by a large US online retailer in the sensitive area you have mentioned. Is it realistic to expect such skills to be developed on a distributed basis?
ZWEIG: Yes, it is entirely realistic. The works councils have been exploring these questions for years now. And indeed, we are doing quite a lot of workshops and, to be quite honest, 45 minutes are sufficient to impart a certain degree of knowledge about AI systems. That’s why I am reasonably optimistic that we will be able to achieve fairly quickly and effectively a broad basic understanding of how machine learning processes work and what they can and cannot do.
So the works council will then need to look at whether AI is discriminating? And whether the data has been properly curated?
ZWEIG: No, the works council is not able to do this itself. Obviously, you need experts for this. However, they will come if the market is there and there are offers.
What do these actors need to be able to actually monitor artificial intelligence systems?
ZWEIG: For one thing, they need the experts that I just mentioned. More than anything else, however, they require access to data and interfaces depending on the extent of the damage potential caused by such a system in order to determine exactly what has happened and whether, for example, there has been any discrimination. For this reason, we have tabled a proposal for a regulation that stipulates various transparency and traceability obligations on the basis of the potential for damage and the degree of dependence on a decision (figure 1).
How could this work in practice?
ZWEIG: A colleague, Mr Wagner-Pinter in Austria, who developed the AMS algorithm, offers his software together with a set of social compatibility rules, for example. These rules describe how the system is to be used in practice.
What do these rules say?
ZWEIG: Things like a decision on the category to which a person is assigned must always be discussed with the jobseeker and this person may submit an objection. The person may view and modify their basic data any time. This means that a decision can be overridden if it has been made by the machine on the basis of incorrect data. If a decision is overridden, the reason for this must be documented. As well as this, the system is recalibrated each year and always only on the basis of the data of the last four years. This means that jobseekers have a right to be forgotten. You should not have to suffer from the consequences of reduced employability during a youthful phase of defiance when you are older. We also demand further technical possibilities for access to enable legal experts to detect systematic discrimination. Obviously, this would be done in conjunction with experts.
What are the objections to the idea of an AI inspector who would be responsible for performing the checks?
ZWEIG: It is not the system itself that needs to be checked. This is only part of the inspection. After all, we don’t have a single body that simultaneously decides whether doctors have made mistakes or whether lawyers are working correctly. Instead, there are individual institutions that you can turn to if you believe that the profession in question is making systematic errors in its decisions. In addition, we always need an approach that views the overall process, as we have seen with the AMS algorithm for example. In my opinion, what we need is an approach that guides and certifies the standardisation of the quality of this overall process rather than issuing a seal of approval for the software. This has an added advantage in that it would no longer be necessary to certify different versions of the same software. Instead, we must certify the quality assurance of the overall process. As long as this is evaluated continuously on the basis of certain criteria and in the light of the potential for damage, the company can continue working. However, in addition to this, we require an independent body for state-related AI.
Do you think that the field of labour and social affairs within administration could be a test bed for utilising the opportunities offered by AI?
ZWEIG: Labour is a difficult area. I am not sure whether current AI systems are complex enough to take account of the contextual dependency that we would particularly require in an area such as employment. It is frequently claimed today that there is no alternative to the use of AI systems. But of course there are alternatives, such as using better and more advisors. So, yes, on the one hand, it would be an interesting field as we could learn a lot about the ways of supporting human decisions more effectively. This is because the computer forces us to define things more precisely: How do we measure success? By the number of people we place in jobs? How do we want to subsequently benchmark ourselves? I believe that this process in itself could be very beneficial. However, whether we ultimately allow machines to assess humans in these sensitive areas is something that requires discussion on a broad front.
This interview has been taken from a publication by the Federal Ministry of Labour and Social Affairs released to mark Germany’s presidency of the Council of the European Union from July to December 2020. In scientific articles, interviews, viewpoints and infographics, the thematic publication describes the main thrusts being pursued by the ministry during the German presidency of the Council of the European Union. In this way, the ministry is seeking to strengthen dialogue within the EU and to join European labour and social-affairs ministers in identifying issues requiring EU-wide action. The full digital publication can be found here.