Skip to main content
Technical Article: Deep Learning

Deep Learning and how a neural network decides

bigdata insider | 16.04.2019 In the context of digital transformation or Industry 4.0, keywords such as artificial intelligence (AI) and its subcategory DeepLearning (DL) come up. Deep Learning provides the basis for speech recognition and control, for example, or is used for predictive maintenance purposes in production.

With Deep Learning, it is hardly comprehensible why a neural network makes certain decisions. But what criteria play a role in the decision?

Kluge Hans, or Clever Hans in English, was a horse at the beginning of the 20th century that was supposedly able to calculate while reproducing the result by tapping its hooves [1]. In fact, however, the horse had learned to read the body language of its taskmasters, who used it to dictate the solution. The Clever Hans effect has since been used in science to describe the unconscious influencing of a behavior.

Deep learning solutions can have a similar effect. A system that is supposed to recognize boats in images may react primarily to water. In many images, this is not noticeable because boats are found near water. As researchers at TU Berlin suggest, it is "quite conceivable that about half of the currently deployed AI systems implicitly or explicitly use such Clever Hans strategies." [2] How can you detect and avoid building a Clever Hans predictor, and is it really bad to have such an effect in your system?

Decision criteria of a neural network

A major disadvantage of the Deep Learning is that it is hardly comprehensible why a neuronal network makes certain decisions. Under the heading of Explainable AI [3], research is being conducted into methods that at least give indications as to which criteria a neural network uses to make its decisions, i.e. assigns the input to a class. There are two groups of methods: One method explains for concrete input data, on the basis of which information a decision was made. Layer-wise Relevance Propagation shows in a heatmap which data areas were used for a decision. For images, it shows whether a train was detected mainly on the basis of the rails in the image. In signal analysis, it highlights relevant time segments and channels. Comparable things can also be found in neural networks for text analysis. With so-called attention mechanisms, a model evaluates which words are particularly important depending on the context. This information can also be visualized.

The other group is generative methods, which are currently used for image recognition. In general, one wants to understand what a neural network has learned. In this process, an image is gradually modified so that it fits as well as possible to a class. Google's Deep Dream algorithm is one of the first methods of this group. Google analyzed what information was relevant for their neural network to recognize a dumbbell in an image. The surprising result was that the network includes whether there is an arm holding the dumbbell in this image.

Sufficient training data must be available

A neural network learns exclusively data-driven. Training data must be available that represent the complete range of inputs in the best possible way. First, the problem space must be defined: What scenarios need to be considered and what should the output of the model be? The decisions made have a major impact on the practical benefits in actual operation and on the complexity of the model. It has proven useful to restrict the problem space as much as possible at the beginning, so that there is just a practical benefit, and to extend the problem space step by step.

With a complex input, it is more likely to unconsciously introduce correlations into the input. For example, by recording all training data for a class at the same time of day. Therefore, the data should be recorded in a way that corresponds as closely as possible to later, real-world operation. However, a Clever Hans strategy is not always problematic. The decisive factor is how the problem space is defined. If only images of ships and of trains are to be distinguished, it is valid if images of trains are recognized on the basis of the rails and images of ships on the basis of the water.

It only becomes problematic if ships are also to be recognized in the vicinity of tracks. How much influence a possible Clever Hans strategy has on the recognition rate of the model in real operation is shown by the evaluation of the test data set. As with the other data, the test data should correspond to the actual application. If all defined scenarios are recognized with high accuracy in a well-created test data set, the accuracy in later operation is likely to be similarly good, even if a Clever Hans strategy is used. If wrong decisions do not cause major damage, the behavior is perfectly acceptable.

It is clearly different in medicine, for example, when human lives are at stake. Due to the high risk of a wrong decision, AI systems can only be used under very specific conditions. However, it is precisely here that Explainable AI approaches can help to better support the physician in making a diagnosis by pointing out relevant sections of data.


[1] Hans Joachim Gross: Eine vergessene Revolution. Die Geschichte vom klugen Pferd Hans. In: Biologie in unserer Zeit. Band 44, Nr. 4, 2014, S. 268–272

[2] Gemeinsame Pressemitteilung der Technischen Universität Berlin und des Fraunhofer Heinrich-Hertz-Instituts HHI vom 11. März 2019: Wie intelligent ist Künstliche Intelligenz? In: Medieninformation Nr. 40/2019:

[3] A. Holzinger. Informatik Spektrum (2018) 41:138.

* Dr. Matthias Weidler develops algorithms with a focus on image processing and machine learning at ASTRUM IT. Dr. Jan Paulus is an expert in machine learning and pattern recognition. Both are AI and Machine Learning Engineer, Consultant and Trainer at ASTRUM IT in Nuremberg.

News-Kategorie: Articles