Challenge - Defraud the Investors¶

You've developed a model that predicts the probability a 🏠 house for sale can be flipped for a profit 💸. Your model isn't very good, as indicated by its predictions on historic data.

Your investors want to see these results, but you're afraid to share them. You devise the following algorithm to make your predictions look better without looking artificial.

Step 1: 
  Choose 5 random indexes (without replacement)

Step 2: 
  Perfectly reorder the prediction scores at these indexes 
  to optimize the accuracy of these 5 predictions

For example

If you had these prediction scores and truths

indexes: [   0,     1,    2,     3,    4]
scores:  [ 0.3,   0.8,  0.2,   0.6,  0.3]
truths:  [True, False, True, False, True]

and you randomly selected indexes 1, 2, and 4, you would reorder their scores like this.

indexes:    [   0,     1,    2,     3,    4]
old_scores: [ 0.3,   0.8,  0.2,   0.6,  0.3]
new_scores: [ 0.3,   0.2,  0.3,   0.6,  0.8]
truths:     [True, False, True, False, True]

This boosts your accuracy rate from 0% to 20%.

In [22]:
import numpy as np

rng = np.random.default_rng(123)
targets = rng.uniform(low=0, high=1, size=20) >= 0.6
preds = np.round(rng.uniform(low=0, high=1, size=20), 2)

print(targets)
print(preds)
# [ True False False ... False True False]
# [ 0.23  0.17  0.50 ...  0.87 0.30  0.53]
[ True False False False False  True  True False  True  True False False
  True False  True  True  True False  True False]
[0.23 0.17 0.5  0.58 0.18 0.01 0.47 0.73 0.92 0.63 0.92 0.86 0.22 0.87
 0.73 0.28 0.8  0.87 0.3  0.53]
In [58]:
indices = rng.choice(np.arange(0, preds.size), size=5, replace=False)
index_ordered_preds = np.sort(preds[indices])
truth_ordered_preds = np.argsort(targets)

_, og_index, _ = np.intersect1d(truth_ordered_preds, indices, return_indices=True)
preds[og_index] = index_ordered_preds
preds
Out[58]:
array([0.92, 0.47, 0.63, 0.3 , 0.53, 0.92, 0.47, 0.8 , 0.87, 0.87, 0.87,
       0.87, 0.92, 0.3 , 0.92, 0.87, 0.8 , 0.87, 0.3 , 0.73])
In [59]:
def accuracy_rate(preds, targets):
    return np.mean((preds >= 0.5) == targets)

# Accuracy before finagling
accuracy_rate(preds, targets)  # 0.3
Out[59]:
0.55