Usually, we want tools to be easy to use, but it’s not always possible; The most powerful ones are typically quite difficult. Or at least fragile. In other words, you have to be careful what you wish for.

The phrase gains meaning when it’s applied to machine learning algorithms. As we mentioned in the previous part, the A.I. surrounds us densely, and it is very doubtful, you haven’t met it on your way to this day – using one of the many AI-fueled solutions available.

Some of them are barely autonomous – we are not aware of the smart algorithms behind them.

Some of them leave you the margin of autonomy. AI-supported advertising is one of those types of tools. However, you don’t have to be an A.I. specialist to deal with it. They work autonomously, but to leverage from it, you have to overthink what you wish precisely- we discussed it in the previous part.

The task, however, was (quite?) simple here because all you have to do is pick something “from the list” without having an in-depth interpretation – thanks to well designed UX / UI.

There is a chance (growing every month and year) that you will be forced to face “raw” A.I. or cooperate in introducing it into your business.

## Decision making

We can measure the effectiveness of a model; however, we can use a couple of approaches to do it. By default, we can use the so-called F-score test, about which I will say a bit more in a moment.

These two approaches are reflected in two statistics, which we use to measure the performance of decision-making systems.

- Precision
- Recall

As a result of their juxtaposition, it is customary to calculate the so-called F-score, which is an overall interpretation of how “effective” our algorithm is.

## The art of tradeoff

The equation itself is not complicated.

[ {displaystyle text{F1} = { 2 cdot frac {text{Precision} cdot text{Recall}}{text{Precision}+text{Recall}}} = 0.814 !} ]

Even if you don’t yet know what precision and recall are, calculating the F1 value comes down to multiplying these two values, then dividing the result by their sum (and that’s all times two).

Let’s assume that our model has the task of selecting clients forth of offering the discount.

At this moment is worthy of introducing other essential terms in ML – types of data.

- Training data is the data we use to train the model.
- Evaluation data are used to ongoing evaluate its efficiency.
- Test data are used to check how good your model finally is doing (F1 as well).

Comparing it to school – training is like solving the problems along with the class on the blackboard, evaluation data is your homework, and test data are the final exam. Let’s go back to our clients.

Never mind the size of the training group, yet for math simplification, let’s assume that your evaluation consists of 20 (in practice, much more) customers divided worthy/not worthy 50/50. We have already offered all of them the discount, so we know if it will work for them and compare the prediction with reality.

We design the network, fire up the learning, then test.

F1 = 0.66 What it tells us? Well, our model is mediocre. But to learn more, it’s worthy bowing down on precision and recall. Their values equal, respectively, 1 and 0.5. At first glance, you can assume there is no difference which one is which. Multiplication and addition are alternating, so the algorithm’s overall performance is not affected by which of these values (precision and recall) takes the value of 1 and which takes the value of 0.5. In practice, the algorithm will present completely different results. In one version, it returns five clients classified worthy, in the other fifteen.

## I would like to but…

So let’s look at the first value, precision. This situation is illustrated in the following figure:

The algorithm selected only worthy clients, indeed, but also missed 5 of them. That’s “fine”, because we haven’t offered a discount to the wrong customers (who are going to leave us anyway).

Precision is the ratio of Correct choices to the number of all choices, both correct and incorrect. More formally:

[ {displaystyle text{Precision} = {frac {text{TP}}{text{TP}+text{FP}}} = {frac {5}{5}} = 1 !} ]

TP – True Positives – classified correctly

FP – False Positives – classified not correctly

TN – True Negatives – not classified correctly

FN – False Negatives – not classified, but they should be

If we wanted to try to personify this variant of the algorithm somehow, we would say that it is “conservative”. Somewhat fearful. In “fear of making a mistake,” it prefers not to make a decision about which he is unsure and chooses only apparent.

At the same time, in his “timidity”, he found only half of them (recall = 0.5).

[ {displaystyle text{Recall} = {frac {text{TP}}{text{TP}+text{FN}}} = {frac {5}{10}} = 0.5 !} ]

## Carpe Diem, baby

Let us now take the second version of our model—the previous one selected only half of the cats. So let’s try to improve the performance of our network a bit. We make the corrections and rerun the test.

We reach again F1 = 0.66, but this time 15 clients are picked by our system! In fact, 5 of them left us anyway, but ten discounts worked!

As I mentioned, the F1=0.66 is rather an average score. But this is just the theoretical example. In practice, the classifiers we can call “working” achieve F1’s much higher. Let us assume that we have improved both versions. Both achieve F1 = 0.95 ± 0.005

Which version would you choose (let us assume that yours would be more accurate – it would achieve F1 = 0.95 (a pretty good result).

- 90 out of 100 customers worthy of discount were found, and no one from them is lost afterwards (F1 = 0,947)
- You addressed all worthy clients, but 10 of them left you despite the discount (F1=0.952)

In theory, the second one is slightly better, so intuition should be to choose it. However, are those lost clients worthy of a lesser income from sales?

As you see applying A.I. is not always about choosing most efficient one.

Other examples not necessarily related to eCommerce may be helpful to understand this importance:

## Cancer Detection

A.I. is beneficial among the others for health diagnosis. Computer vision systems can detect even slightly neoplastic lesions on C.T. (Computer Tomography) or X-ray pictures.

Such algorithms should work flawlessly, however as we know, it’s practically impossible. We need to make a sort of precision-recall tradeoff, and in this case, recall is far more critical.

It’s better to detect a lesion that ultimately is just visual noise or something harmful rather than miss the serious one.

**On the other hand…**

The system that detects and fights weed on food plantations should be pickier. It’s better to leave alone a couple of intruders rather than destroy precious plants.

edrone

CRM, Marketing Automation and Voice Commerce for online stores. All in one.