Risk vs. Loss

A risk is defined as the probability of an undesirable event to take place. Since most risks are not totally random but rather dependent of a range of influences, we try to quantify a risk function, that gives the probability for each set of influences. We then calculate the expected loss by multiplying the costs that are caused by the occurrence of this event with the risk, i.e. its probability.

Often, the influences can be changed by our actions. We might have a choice. So it makes sense to look for a course of actions that would minimize the loss function, i.e. lead to as little expected damages as possible.

Algorithms that run in many procedures and on many devices often make decisions. Prominent examples are credit scoring or shop recommendation systems. In both cases it is clear that the algorithm should be designed to optimize the economic outcome of its decision. In both cases, two risks emerge: The risk of a false negative (i.e. wrongly give credit to someone who cannot pay it back, resp. make a recommendation that does not fit the customer’s preferences), and the risk of a false positive (not granting credit to a person that would have been creditworthy, resp. not offering something that would have been exactly what the customer was looking for).

There is however an asymmetry in the losses of these two risks. For the vast majority of cases, it is far more easy to calculate the loss for a false negative than for the false positive. The cost of credit default is straightforward. The cost of someone not getting the money is however most certainly bigger than just the missed interests; the potential borrower might very well go away and never come back, without us ever realizing.

Even worse, while calculating risk is (more or less) just maths and statistics, different people might not even agree on the losses. In our credit scoring example: One might say, let’s just take what we know for sure, i.e. the opportunity costs of missed interests, the other might insist to evaluate a broader range of damages. The line where to stop is obviously arbitrary. So while the risk function can be made somehow objective, the loss function will be much more tricky and most of the time prone to doubt and discussion.

Collision decision

In the IoT – the world of connected devices, of programmable object, the problem of risks and losses becomes vital. Self-driving cars will cause accidents, too, even if they are much safer than human drivers. If a collision is inevitable, how should the car react? This was the key question ask by Majken Sander in our talk on algorithm ethics at Strata+Hadoop World. If it is just me in the car, a possible manoeuvre would turn the car sideways. If however my children sit next to me, I might very well prefer a frontal crash and rather have me injured than my passengers. Whatever I would see as the right way to act, it is clear that I want to make the decision myself. I would not want to have it decided remotely without my even knowing on what grounds.

Sometimes people mention that even for human casualties, a monetary calculation could be done -no matter how cruel that might sound. We could e.g. take the valuation of humans according to their life expectancy, insurance costs, or any other financial indicator. However, this is clearly not, how we would usually deal with lethal risks. “No man left behind” -how could we explain Saving-Private-Ryan-ish campaigns on economic grounds? Since the human casualty in the values of our society is regarded as total, not commensurable (even if a compensation can be defined), we get a singularity in our loss function. Our metric just doesn’t work here. Hence there will be no just algorithm to deal with a decision of that dimension.

Calculate risks, let losses be open

We will nevertheless have to find a solution. One suggestion for the car example is, that in risky situations, the car would re-delegate the driving back to a human to let them decide.
This can be generalized: Since the losses might be valuated differently by different people, it should always be well documented and fully transparent to the users, how the losses are calculated. In many cases, the loss function could be kept open. The algorithm could offer different sets of parameters to let the users decide on the behavior of product.

As a society we have to demand to be in charge defining the ethics behind the algorithms. It is a strong cause for regulation, I am convinced about that. It is not an economic, but a political task.

Further reading

Algorithm Ethics

2014 highlight (2): On of the best courses on Big Data and Data Mining

I already mentioned the Hastie & Tibshirani course on statistical learning as one of my personal highlights in data science last year. My second highlight is also an online course, also by leading experts on their field (this time: Big Data and data mining), also based on a (freely available) book and also by Stanford University professors: Jure Leskovec, Anand Rajamaran and Jeff Ullman’s course on “Mining Massive Datasets”.

ullman_mmds

If you’re interested in data science or data mining, chances are high that you have already been in touch with their book. It can safely be considered a standard work on the fascinating intersection of data mining algorithms, machine learning and Big Data. The 7 week course is the online version of the Stanford courses CS246 and the earlier version of CS345A.

mmds_cover_v21The course is very dense and covers a lot of territory from the book, for example:

  • How does Map Reduce work and why is it important?
  • How can I retrieve frequently appearing combinations from very large sets of items such as shopping baskets?
  • How to retain information about a datastream that does not fit in memory?
  • What are the most common tasks in supervised machine learning and how to implement them?
  • How do I program an intelligent system for recommending movies?
  • How to compute optimal placements of online advertisements?

Some of the lectures are on a beginners to intermediate level, but some lectures cover very advanced topics. What I especially liked about this course is that a lot of the material covered really is state-of-the-art in data mining. Some algorithms – e.g. the BIGCLAM community detection and CUR matrix decomposition – had only been developed about year ago.

So, take a look at the book, and if you haven’t already: enroll at the Coursera course website to make sure you won’t miss the next session of this course.

Mining Research Interests – or: What Would Google Want to Know?

I am a regular visitor of Google’s research page where they post all of their latest and upcoming scientific papers. Lately I have thought whether it would be possible to statistically extract some of the meta-information from the papers. Here’s the result of the analysis of the papers’ titles produced with just a few lines of R code:

Research Topics @ Google

 

I clustered the data with a standard hierarchical cluster analysis to find out which terms tend to often go together in the paper titles. Then I took a deeper look at the abstracts – of all the papers that had abstracts that is. I processed the abstracts with the tm R package and draw the following heat-map that shows how often which of the most important keywords appear in each paper:

Keywords_Abstracts_google

I did a similar heatmap but this time normalized by the term frequency – inverse document frequency measure. While the first heatmap shows the most frequently used terms, this weighted heatmap shows terms that are quite important in their respective research papers but normalizes this by the overall term frequency.

Keywords_Abstracts_google_tfidf

If you need input for playing buzzword bingo at the next Strata Conference in Santa Clara, you don’t have to look any further 😉

Algorithm Ethics

An algorithm is a structured description on how to calculate things. Some of the most prominent examples of algorithms have been around for more than 2500 years like Euklid’s algorithm that gives you the greatest common divisor or Erathostenes’ sieve to give you all prime numbers up to a given maximum. These two algorithms do not contain any kind of value judgement. If I define a new method for selecting prime numbers – and many of those have been publicized! – every algorithm will come to the same solution. A number is prime or not.

But there is a different kind of algorithmic processes, that is far more common in our daily life. These are algorithms that have been chosen to find a solution to some task, that others would probably have done in a different way. Although obvious value judgments done by calculation like credit scoring and rating immediately come to our mind, when we think about ethics in the context of calculations. However there is a multitude of “hidden” ethic algorithms that far more pervasive.

On example that I encountered was given by Gary Wolf on the Quantified Self Conference in Amsterdam. Wolf told of his experiment in taking different step-counting gadgets and analyzing the differing results. His conclusion: there is no common concept of what is defined as “a step”. And he is right. The developers of the different gadgets have arbitrarily chosen one or another method to map the data collected by the gadgets’ gyroscopic sensors into distinct steps to be counted.

So the first value judgment comes with choosing a method.

Many applications we use work on a fixed set of parameters – like the preselection of a mobile optimized CSS when the web server encounters what it takes for a mobile browser. Often we get the choice to switch to the “Web-mode”, but still there are many sites that would not allow our changing the view unless we trick the server into believing that our browser would be a “PC-version” and not mobile. This of course is a very simple example but the case should be clear: someone set a parameter without asking for our opinion.

The second way of having to deal with ethics is the setting of parameters.

A good example is given by Kraemer et. al in their paper. In medical imaging technologies like MRI, an image is calculated from data like tiny elecromagnetic distortions. Most doctors (I asked some explicitly) take these images as such (like they have taken photographs without much bothering about the underlying technology before). However, there are many parameters, that the developers of such an algorithmic imaging technology have predefined and that will effect the outcome in an important way. If a blood vessel is already clotted by arteriosclerosis or can be regarded still as healthy is a typical decision where we would like be on the safe side and thus tend to underestimate the volume of the vessel, i.e. prefer a more blurry image, while when a surgeon plans her cut, she might ask for a very sharp image that overestimates the vessel’s volume by trend.

The third value judgment is – as this illustrates – how to deal with uncertainty and misclassification.

This is what we call alpha and beta errors. Most people (especially in business context) concentrate on the alpha error, that is to minimize false positives. But when we take the cost of a misjudgement into account, the false negative often is much more expensive. Employers e.g. tend to look for “the perfect” candidate and by trend turn down applications that raise their doubts. By doing so, it is obvious that they will miss many opportunities for the best hire. The cost to fire someone that was hired under false expectations is far less than the cost of not having the chance in learning about someone at all – who might have been the hidden beauty.

The problem of the two types of errors is, you can’t optimize both simultaneously. So we have to make a decision. This is always a value judgment, always ethical.

With drones prepared for autonomous kill decisions this discussion becomes existential.

All three judgments – What method? What parameters? How to deal with misclassification? – are more often than not made implicitly. For many applications, the only way to understand these presumptions is to “open the black box” – hence to hack.

Given all that, I would like to demand three points of action:
– to the developers: you have to keep as many options open as possible and give others a chance in changing the presets (and customers: you must insist of this, when you order the programming of applications);
– to the educational systems: teach people to hack, to become curious about seeing behind things.
– to our legislative bodies: make hacking things legal. Don’t let copyright, DRM and the like being used against people who re-engineer things. Only what gets hacked, gets tested. Let us have sovereignty over the things we have to deal with, let us shape our surroundings according to our ethics.

Notes

My slides on this topic:

At the last re:pubica conference I gave a talk and hosted a discussion on “Algorithm ethics” that was recorded. (in German):

Algorithmic Glass Bead Games – Why predicting Twitter trends will not change the world

The last hours, I’ve seen a lot of tweets mentioning this great new algorithm by MIT professor Devavrat Shah. The UK Wired, The Verge, Gigaom, The Atlantic Wire and Forbes all posted stories on this fantastic discovery. And this has only been the weekend. Starting next week, there will be a lot more articles celebrating this breakthrough in machine learning.

At first, I was very enthusiastic as well and tweeted the MIT press release. A new algorithm – great stuff! But then slowly, I began to think about this whole thing. This new algorithm claims to predict trending topics on Twitter. But this is a lot different from an algorithm predicting e.g. the outcome of presidential elections or other external events. Trending topics are nothing more than the result of an algorithm themselves:

Trends are determined by an algorithm and are tailored for you based on who you follow and your location. This algorithm identifies topics that are immediately popular, rather than topics that have been popular for a while or on a daily basis, to help you discover the hottest emerging topics of discussion on Twitter that matter most to you.

So, what Shah et al developed is an algorithm that is predicting the outcome of an algorithm. A lot of the coverage suggests that this new algorithm could be very useful for Twitter – because then they would not have to wait for the results of their own algorithm that is defining trends but could use the much brand new algorithm that gives the results 1.5 hours in advance:

The algorithm could be of great interest to Twitter, which could charge a premium for ads linked to popular topics.

What’s next? A Stanford professor that develops an algorithm that can predict the outcome of the Shah algorithm some 1.5 hours in advance? Or what about Google? Maybe someone will invent an algorithm predicting the PageRank for web pages? Oh, wait, something like this has already been invented. Maybe you’ll better know this under its acronym “SEO” or “Search Engine Optimization”.