Understanding and/or grayboxing deep learning is a hot topic. It is hyped a bit by media and a whole bunch of people who blame others that they have not gained basic math skills to understand machine learning algorithms. I can understand that many people want to understand their e.g. credit rating. However, on the other hand a good portion of these people exclude themselves by choice because they show limited willingness to learn and understand machine learning algorithms. It seems like the situation is similar to the late 1980s/early 1990s with fuzzy logic (which is still used heavily): everybody talks about it, but hardly anybody understands (even or especially in academia). Well, there is some serious math involved and back then there was no Khan Academy and no YouTube. Let’s try to look at it on a more conceptual model.
On Artificial Intelligence
AI needs better marketing and better understanding. A popular idea of understanding AI is to think of it like human-like behavior and human-like performance. First, no matter if it is about explainability or interpretability, all approaches to understand intelligent agents are based on the assumption that there should be a perfect explanation for every step because many people think that it is human-like. However, try to explain your arms movements next time you eat a soup ;). Classical control theory allows for such a detailed understanding of arm movements. However, controllers such as PID have to be fine-tuned for a very narrow range of input values otherwise the controller will “break” and can cause chaos or will be shut down. The advantage of ML based controllers (e.g. Deep Reinforcement Learning) is that it provides better performance for a wider spectrum of scenarios. However, this comes at the cost of less precision and extreme difficulties in understanding an intelligent agent’s behavior.
Explanability:
Explainability assumes full understanding of every step (for any given example input) without leaving any uncertainty how an algorithms behaves. This is based on the assumption that all machine learning models are deterministic. Well, technically maybe. That is why we set random_state to a well defined state. If we would not do this, then we would end up with results that we can’t compare perfectly. We could argue that the direct comparison is no longer possible but we will end up with an idea of an algorithms performance.
Interpretability:
Interpretability is what many people want when perceiving AI as “human-like”. It allows some uncertainty and helps to distinguish between correlation and causality. This is relatively easy with “classical machine learning algorithms” but more challenging with (deep) neural networks. Hence, it might be a good solution to interpret DNN behavior for different outcomes (classes) and debug them grouped and not by single inputs. This can be a lot of work. Therefore, we should ask ourselves to what degree do we need interpretability. This depends on varying fields of applications.
Christopher Molnar writes a nice book on Interpretable Machine Learning - A Guide for Making Black Box Models Explainable. Most chapters are completed and it is available on GitHub under CC BY-NC-SA 4.0 licensing.