What can philosophy teach machine learning?

Machine learningReaching your goal without mind

Sorry for the harsh beginning of the show. It must be:

(You can hear a piano piece by Johann Sebastian Bach, underlaid with white noise.)

That is white noise. It contains all audible frequencies equally loud and therefore seems uncomfortable to us, without structure. In the world of images, he corresponds to the color white.

But there is actually a structure hidden in this noise, namely a piano. Do you hear it? How can I get this possibly palpable piano out of the unusable recording? A few years ago you would have said: Let's try filtering! With filters nothing works here at all. I could start writing a computer program that would find the tones in the noise. Only: In which line do I instruct the program to look specifically for the and the frequency, in this pulp?

Self-learning machines are irritating - many find them scary

Classic computer programs have to know everything concretely, the millisecond they are looking at, the pitch. They just consist of clear instructions. They pull them through until they are finished.

This type of programming got very far for 50 years. But with more and more problems today, it is useless. To this end, software is now emerging that learns and thus opens up completely new perspectives. She can cope with this music too.

Such software does not yet exist; the market for it would also be very small. But other is already in use. Software that has learned to intelligently interpret language and images; which recognizes whether a chair can be seen - or an old man. Another uses x-rays to differentiate whether it is cancer or calcium deposits. And yet another guides self-driving cars to their destination.

Self-learning machines are irritating. They are scary to many. They are on an equal footing with us, they pretend to be intelligent. If you think about this consistently, the philosophical question arises, whether the machine is really just pretending? Because with our human neural networks in the brain, we feel pleasure and pain. If we recreate this in the computer, shouldn't we at least take into account that this learning machine can have similar feelings?

"A robot that is very careful not to run into a wall, because it knows that it will then receive negative feedback, may not feel anything. But we use a mechanism in its programming that would cause pain in animals.

If we continue with machine learning like this, we may cross a line without even realizing it, where we will have robots that actually hurt when they run into a closed door, "says Anders Sandberg, computer scientist and futurologist at the Oxford University.

But let's take a look at mathematics first - even if that demystifies the learning machines a little. Paul Lukowicz:

"The highly praised learning is nothing magical in any way, nothing incomprehensible, but nothing more than statistical data analysis. And such statistical data analysis is always necessary when I can no longer find deterministic rules."

For example, there is no rule where to start looking for music in a noisy sound material. Or in a completely shaky video after a stop sign, recorded while driving at high speed.

"I can approach every problem in two ways: I can go and invent a physical model where I can measure quantities and say: From this dividing line onwards, it's a stop sign, from that point onwards it's not a stop sign.

The only problem is when you look at the complexity of the world, in what different lighting conditions there can be stop signs, what kind of graffiti children can have smeared on it, how things can be yellowed and otherwise - then we as humans are no longer in able to really specify such a deterministic algorithm.

So the other variant you do is: I just look at enough stop signs and statistically determine something from them. That's learning. "

Paul Lukowicz heads the "Embedded Intelligence" department at the German Research Center for Artificial Intelligence in Kaiserslautern.

A great pioneer of artificial intelligence, Marvin, died in 2015 at an old age, and with him a generation of computer pioneers, dreamers and whackers. They all wanted computer programs to emulate the brain's neural network, i.e. to build machines that could compete with human intelligence. Nobody tries that anymore.

The US scientist Marvin Minsky conducted research in the field of artificial intelligence. (picture alliance / dpa / Ingo Wagner)

Australian computer scientist Jeremy Howard, born in 1973, says the promises of the 1980s to create human intelligence were never kept, and nothing useful ever came out of them.

Much more useful, however, comes out of the new approach of artificial intelligence, which is now more modestly called "machine learning".

Speech recognition on smartphones

Question: "What is the travel time from Bautzen to Malaga?"

Answer: "If there is little traffic, the Malaga location, Malaga province, Spain, is one hour by car from the Bautzen location in one day. This is your route."

This answer from the smartphone seems normal to many today. But it is remarkable and has only been possible for a few years. The travel time and route planning already existed in navigation systems two decades ago. But it is anything but trivial that the smartphone understands what I am talking about - and it can only be achieved with machines that are constantly learning. Jeremy Howard:

"Speech recognition works independently of the voice. Thanks to machine learning, this has succeeded fantastically and is so much better than speech recognition in the past, when you had to train software with individual speakers."

I can also inquire in English on my smartphone:

Question: "What’s the distance between London and Stockholm?"

Answer: "The distance from London, United Kingdom, to Stockholm, Sweden is 1,891.9 kilometers."

or in Franconian:

Question: "Give me amol the oath from Bauzn to Nice!"

Answer: "The Nice, France location is twelve hours, 15 minutes by car from the Bautzen location with little traffic. This is your route."

Such a system is called robust. It can't be thrown out of sync. My Franconian has never heard it before.

There is massive computing power behind this speech recognition. My little cell phone doesn't provide that, not a fraction of it. When I ask for directions, I have to be online, i.e. connected to the Internet. My question goes straight to an American data center via the microphone of the smartphone.

Millions of people speak to this data center umpteen times a day, and a computer program works there that nobody knows exactly how it works. For several years it has been listening to everything and trying to draw conclusions about the content of the petabytes of data. It is a machine that is constantly learning. In computer science, this process with very large amounts of data is called deep learning.

"I analyzed data that were recorded by the Hanover Medical School, where guinea pigs were played sounds relevant to them," says Dominika Lyzwa from the Max Planck Institute for Dynamics and Self-Organization. The neuroscientist evaluates noise processing in the mammalian brain, specifically in a small area of ​​the midbrain, the inferior colliculus, or IC for short.

"The group in Hanover measured different guinea pigs, different places in the IC, different volume levels. That means you know the input and you look at what comes out of the IC. Then you try to create a function, like the input signal is transformed into an output signal. "

The inferior colliculus converts auditory impressions into electrical impulses. How this works, which sound generates which brain signals, cannot be described with simple equations. It is so complex that Dominika Lyzwa uses machine learning:

"A lot of data. I examined more than 2000 neural groups. Ultimately, this research serves to develop implants that convert sounds into neural signals, that is, to recreate the IC."

Image recognition in vehicles

Another example is image recognition: modern cars use sensors to identify the lane in which they are driving. They warn the driver of obstacles and of overtaking vehicles in the blind spot. What they do next is recognize traffic signs, even when driving past quickly. Marc Tschentscher from the Institute for Neuroinformatics at the Ruhr University in Bochum uses machine learning processes to develop computer programs that recognize stop signs:

"We drove around NRW with our vehicles for a few months and recorded this data."

- terabytes?

"Yes, terabytes, definitely. In the end, these are only raw data, so no compressed video data, so that you can really see even the smallest traffic signs."

- And how do you deal with it then?

"Then you take different machine learning methods and split this data set into a training data set and a test data set."

- Have you tested it?

"Sure, we do it on a regular basis. Every time we train something new, we have a test set to know how good we are.

The test data set is indispensable in machine learning. You can train the machine as well as possible by telling it that the light shaky top left in the video is a stop sign, the reddish one at the bottom right isn't a stop sign. Only when you show the program pictures that it has never seen before and then have to decide whether there are stop signs on them, the quality of the process becomes apparent. "

The example of Computer-Go

"It is difficult for a person with a full stomach to use his head in a meaningful way. Are there no Liupo and Go players among them? Just playing these games is better than doing nothing."

The Chinese philosopher Confucius referred to two board games around 2,300 years ago, one of which is more widespread in Asia than chess and, despite the almost primitive rules, is one of the most demanding games of all: Go, Weiqi in Chinese, Baduk in Korean.

While even experienced chess players have been losing to computer programs for 20 years, the Go programs play at the level of beginners. They are not fun.

Chess software works like this: If I move the rook, the opponent then moves the pawn, I then move the bishop, I am 0.88% worse than now. So the program continues to examine further possibilities until it comes across a more positive position. This is a classic deterministic program in which every step of the calculation is clear. With 32 pieces on the board, it is quite complex to judge only four moves forward. No chess player can see that far. But today's PCs, yes the chess apps in smartphones, can easily cope with it.

With Go, the complexity is incomparably greater. When the game begins, Black places a stone on one of 361 squares. How should the computer decide which one? White's first move is left with 360 squares. Where should white put his first stone? The number of possibilities goes into astronomical terms. Nothing can be done here with deterministic programs, not even with supercomputers.

In March 2016, AlphaGo hit the Go scene like a bomb. It was programmed by British computer scientists using machine learning techniques and, after defeating the European champion, it competed against the world champion, the Korean Lee Sedol.

The Korean Go World Champion Lee Sedol in the game against AlphaGo, which is programmed with machine learning techniques (picture alliance / dpa)

The first game begins here on March 9th. Lee Sedol has black and starts. Two hours of free play, then one minute of reflection per move. The match will go down in history. The 33-year-old Korean had announced that he would win at least four of the five games. Three hours and 186 trains later:

AlphaGo won. The program had been ahead for a long time.

Lee Sedol seems to be giving up.

It's hard to believe! A computer program beat the world champion and is playing with nine dan pro.

The machine learning program also wins games two and three, and with it the tournament. The world champion raves about the unusually good moves that AlphaGo makes that nobody would have thought of. Nice trains, trains that he would have to think about for years. Game four wins Lee Sedol - and shines in the subsequent press conference.

"Thank you, thank you! Even though I lost the tournament, I have never earned so much applause. If I had won the first three games and lost this one, it would really hurt. But as it is, it is a special win I don't want to exchange anything for anything in the world. "

The world of Go is enchanted, and at the same time the ancient Chinese game of Go seems to be disenchanted. Not even the programmers who have let the software play millions of games against itself know what exactly it is doing. Paul Lukowicz:

"There are learning algorithms where the way the system has coded its knowledge is presented in a way that humans can hardly understand. And that includes the neural networks."

In other words, computer programs that simulate the neurons in our brain on a primitive level.

Neural networks: no longer understandable for people

Paul Lukowicz is a computer scientist who likes to disenchant myths. So also the myth of neural networks.

"Neural networks are nothing special. I always tell my students in lectures that neural networks are a special way of representing vector-matrix multiplications. They don't do more than that.

The problem is that the neural network hides the knowledge in very complex structures that you simply cannot see as a human. "

Sebastian Houben: "Yes, that's the big problem. The dimensions of data - vectors - are so huge that it is by no means possible to understand it step by step."

The mathematician Sebastian Houben from the Ruhr University Bochum. His colleague Marc Tschentscher points to a dilemma that arises from the machine learning programs: On the one hand, the neural networks attract with their elegance; they recognize stop signs even in blurred images where we humans can no longer see anything. On the other hand, you can't understand how they do that. Marc Tschentscher:

"It's always important that the people in the auto industry know exactly what exactly happens in such a process."

- Does that mean that the neural networks have no chance at all?

"So not at the moment. As long as you don't know exactly what they're really doing, or you can't understand it, it's not possible to sell something like this to industry at the moment."

- That means you have a black box that works wonderfully. But you can't look inside, and that's why it's impossible where very safety-relevant things happen, in other words, with a car that drives at 30 or 130 km / h?

"Yes, I would say that as well, yes."

The automakers I spoke to all sympathize with deep learning and neural networks. But liability issues alone prevent the software from being built into safety-critical elements. In the event of an accident, you cannot find the line in the code that the child thinks is a bush and therefore does not brake the car fully automatically. A neural network cannot be seen in the cards, it does not work through strictly prescribed instructions.

But neural networks will shortly be used where they only give recommendations. Instead of actually braking the car, the electronics report:

"Careful! I suspect that a stop sign will come towards us very quickly."

- And it would be even more ideal if the system kept learning new things by keeping the camera constantly looking around?

Marc Tschentscher: "That would definitely be a perspective. There would only be a lot of work to do in networking the vehicles, because they would have to be able to communicate with each other and save everything they have learned.

I think the biggest problem is that there always has to be someone who says: This is really how you learned it, that's right. Or the whole thing is, I'll say, nonsense, that shouldn't be adopted, this procedure. "

Machine learning processes take a lot of computing time when they are learning. It took AlphaGo years to understand how to play at world championship level. Tobias Glasmachers, Junior Professor for Machine Learning at the Institute for Neuroinformatics at the University of Bochum:

"It is true that a large part of the computing time is used for training the machine. And here we have to differentiate between offline and online training. For offline training, we have specified a fixed data set: We withdraw into ours Data center, train the machine and at some point come out again with the finished classifier and then deliver it to a customer, for example.With online training, on the other hand, we want to continue learning while the classifier is making predictions.

Then we have to further distinguish between systems that are real-time capable, that is, those that have certain computing time restrictions while they are running, and systems where the real runtime is not so critical. "

Parallelization is an elegant method of saving computing time. The jobs of learning, training and data processing are simply distributed over several computers or graphics cards. This is common with many classic computer programs. Neural networks can also be parallelized to some extent, but not the latest generation of machine learning programs. They are called "support vector machines" and come from the depths of statistical mathematics. Basically, it's about sharing large amounts of data. Tobias Glasmachers:

"Training support vector machines for those who are familiar with it is similar to a coordinate descent, a subspace descent algorithm. It takes a lot of small, fast steps, and all of these steps are interdependent. That is, such algorithms cannot be parallelized particularly well and are therefore typically not ported to graphics cards. "

The support vector machine is very much in vogue. In the future, both methods, neural networks and support vectors, will determine machine learning. Machine learning is often mentioned in connection with "big data", that is, very, very large amounts of data. A vague term that computer scientists like Tobias Glasmachers use with caution:

"In machine learning we always have two points of view: One is the IT point of view, where we look at the runtimes of algorithms. And from this point of view, we actually get massive problems in the big data area, at least with many standard algorithms that today. From a statistical point of view, as I said, large amounts of data are more of a blessing than a curse, and we can rely much more on our results.

Continuous learning with very large amounts of data is called deep learning. Machine learning is often confused with deep learning, just as neural networks are equated with machine learning. The generic term machine learning is correct. The mathematical models today are mostly neural networks or support vector machines. And if the training process never ends, but just keeps going, then that's deep learning. Google and Apple learn deeply and further as they listen to new spoken questions every day and try to interpret them. You're getting better by the minute.

"There is another aspect. We have been talking all along about the interpretation of data analytical models, that is, the data that we analyze. But we should talk about something else, namely questioning the assumptions on which such models are based. Procedures are based on assumptions about the data. For example, it is assumed that the test data in the future will be based on the same assumptions as the old training data. So we should start making predictive models in the light of the assumptions instead of just calculating, "says the Russian mathematician Vladimir Cherkassky, a veteran of machine learning.

Software where we don't know exactly how to get a result, but which usually delivers convincing results, is tempting. There is something almost human about a smartphone that I ask about "better" and that understands me immediately even though I said "better" instead of "weather".

The Karlsruhe computer scientist Katharina Zweig points to machine learning programs that make predictions about human behavior. And she does not mean the weather or the stock market price, where this technology is gaining a foothold, but politics:

"Imagine - in America it's already commonplace - there is software called" Predictive Policing ". It tells the police where it is particularly likely that break-ins will occur.

Of course that's great: You can optimize patrol services, you can perhaps save one or two officers. But you can take this algorithm even further and then say: 'Hey you, you have four criminals in your circle of acquaintances, you drink a lot and have been conspicuous a lot. There is an 80 percent chance that you will commit the next break-in! '

What if there are incorrect modeling assumptions behind such an algorithm? Who can protect us from this? Who can make this algorithm accountable? Who can make him do what we think he does? "

Anders Sandberg: "Google has published this new algorithm that writes what can be seen under a photo. It is certainly based on deep learning and was trained with very, very many examples. Now it happened that a couple with dark skin took a selfie and the software wrote under it: 'Gorillas'.

The two shared it on Twitter and were pretty excited. Google was in shock. Under no circumstances did they want the algorithm to be racist. The algorithm, however, cannot be racist at all, because it has learned to look at pictures and write words under them. He doesn't care if someone is offended by the connections he makes. He's only describing a picture the way he learned to do it. For us humans, however, some of these relationships are extremely problematic.

Actually, you should teach the machine that people should be treated more carefully, because they are very different from chairs and maybe also cats. But we only see that now after the Google algorithm made this stupid 'mistake'. Of course, when that is fixed, other errors will show up that we cannot predict. No matter what we do, machine learning will always surprise us. "