THE ICONIC Tech

Stories and thoughts from THE ICONIC’s tech team.

Follow publication

Learning to rank is good for your ML career — Part 2: let’s implement ListNet!

Justin
THE ICONIC Tech
Published in
17 min readJun 6, 2020

--

Photo by Martin Sanchez on Unsplash

Cao, Zhe et al. “Learning to rank: from pairwise approach to listwise approach.” ICML ’07 (2007)

The setup

Let’s start at the end and break it down

We employ a new learning method for optimizing the listwise loss function based on top one probability, with Neural Network as model and Gradient Descent as optimization algorithm. We refer to the method as ListNet.

What do they mean by listwise?

What is this top one probability they speak of?

What is the listwise loss function?

What is the neural network architecture?

What’s a ‘listwise approach’ to learning to rank?

The listwise approach addresses the ranking problem in a more straightforward way. Specifically, it takes ranking lists as instances in both learning and prediction. The group structure of ranking is maintained and ranking evaluation measures can be more directly incorporated into the loss functions in learning.

Where do probabilities fit into ListNet?

We assume that there is uncertainty in the prediction of ranking lists (permutations) using the ranking function. In other words, any permutation is assumed to be possible, but different permutations may have different likelihood calculated based on the ranking function. We define the permutation probability, so that it has desirable properties for representing the likelihood of a permutation (ranking list), given the ranking function.

Warning: detail ahead!

Permutation probability

('dress', 'pants', 'shirt')
('dress', 'shirt', 'pants')
('pants', 'dress', 'shirt')
('pants', 'shirt', 'dress')
('shirt', 'dress', 'pants')
('shirt', 'pants', 'dress')
{'dress': 1.6243453636632417, 'shirt': -0.6117564136500754, 'pants': -0.5281717522634557}
('dress', 'shirt', 'pants')
object at position 1 is 'dress'
object at position 2 is 'shirt'
object at position 3 is 'pants'
first term is 0.8176176084739423
second term is 0.47911599189971854
probability of permutation is 0.39173367147866855

The scores sorted in descending order have the highest permutation probability. The scores sorted in ascending order have the lowest permutation probability.

What’s the issue with calculating permutation probability?

Top one probability

0.08738232042105001

Converting scores and relevance labels into probability distributions

tf.Tensor([0.8176176  0.08738231 0.09500005], shape=(3,), dtype=float32)
tf.Tensor([0.8437947  0.11419519 0.04201007], shape=(3,), dtype=float32)

Our loss function — KL divergence

Future work includes exploring the performance of other objective function besides cross entropy and the performance of other ranking model instead of linear Neural Network model.

If we have two separate probability distributions P(X) and Q(X) over the same random variable X, we can measure how different these two distributions are using the Kullback-Leibler (KL) divergence.

The KL divergence is 0 if and only if P and Q are the same distribution in the case of discrete variables.

<tf.Tensor: shape=(), dtype=float32, numpy=0.022873338>

The logarithm of one is zero. So it follows that KL divergence is zero when both distributions are identical.

<tf.Tensor: shape=(), dtype=float32, numpy=0.0>

What’s our neural network architecture?

Our inputs

“dog” and “what is a dog?”

The number of words in our queries and documents can vary. It follows that the number of word embeddings that make up the queries and documents can vary.

35
index 1 - dog
index 2 - wikipedia
index 3 - a
index 4 - australia
index 5 - history
...
...
...
index 30 - facts
index 31 - about
index 32 - dk
index 33 - find
index 34 - out
[[-1.0729686   0.86540765]
[-2.3015387 1.7448118 ]
[-0.7612069 0.3190391 ]
[-0.24937038 1.4621079 ]
[-2.0601406 -0.3224172 ]
...
...
...
[-0.29809284 0.48851815]
[-0.07557172 1.1316293 ]
[ 1.5198169 2.1855755 ]
[-1.3964963 -1.4441139 ]
[-0.5044659 0.16003707]]
[[[-2.3015387  1.7448118]]]
[[[-0.93576944 -0.26788807]
[ 0.53035545 -0.69166076]
[-0.24937038 1.4621079 ]
[-2.3015387 1.7448118 ]]]

We can aggregate our embedding vectors!

[[[-0.7390808  0.5618427]]]
(2, 1, 2)
[[-2.3015387  1.7448118]
[-0.7612069 0.3190391]]

[[-0.39675352 -0.6871727 ]
[-0.24937038 1.4621079 ]
[-2.3015387 1.7448118 ]
[-0.84520566 -0.6712461 ]
[-0.0126646 -1.1173104 ]
[ 0.2344157 1.6598022 ]
[-2.0601406 -0.3224172 ]]

[[-2.3015387 1.7448118 ]
[-0.38405436 1.1337694 ]
[-1.0998913 -0.1724282 ]
[-0.8778584 0.04221375]
[ 0.58281523 -1.1006192 ]
[ 1.1447237 0.9015907 ]]

[[ 0.74204415 -0.19183555]
[-0.887629 -0.7471583 ]
[ 1.6924546 0.05080776]
[ 0.50249434 0.90085596]
[-0.6369957 0.19091548]
[ 2.1002553 0.12015896]
[-2.0601406 -0.3224172 ]
[-0.68372786 -0.12289023]]

[[-2.3015387 1.7448118 ]
[ 0.6172031 0.30017033]]
[[-2.3015387  1.7448118]
[-0.7612069 0.3190391]]

[[-2.3015387 1.7448118 ]
[-0.35224986 -1.1425182 ]
[-0.34934273 -0.20889424]
[-0.7612069 0.3190391 ]
[ 0.5866232 0.8389834 ]
[-0.68372786 -0.12289023]
[ 0.9311021 0.2855873 ]]

[[-2.3015387 1.7448118]
[ 0.8851412 -0.7543979]
[ 1.2528682 0.5129298]]

[[-2.3015387 1.7448118 ]
[-0.38405436 1.1337694 ]
[-1.0998913 -0.1724282 ]
[-0.8778584 0.04221375]
[ 0.58281523 -1.1006192 ]
[ 1.1447237 0.9015907 ]]

[[-0.93576944 -0.26788807]
[ 0.53035545 -0.69166076]
[-0.24937038 1.4621079 ]
[-2.3015387 1.7448118 ]
[-0.29809284 0.48851815]
[-0.07557172 1.1316293 ]
[ 0.50249434 0.90085596]
[ 1.5198169 2.1855755 ]
[-1.3964963 -1.4441139 ]
[-0.5044659 0.16003707]]
[[[-1.5313728   1.0319254 ]
[-0.80446535 0.29551077]
[-0.4893006 0.42488968]
[ 0.09609441 -0.01519538]
[-0.8421678 1.0224911 ]]
[[-1.5313728 1.0319254 ]
[-0.41862014 0.24487413]
[-0.0545098 0.50111455]
[-0.4893006 0.42488968]
[-0.32086387 0.56698734]]]
(2, 5, 2)

Showing documents in the context of other documents and a query

array([[[-2.3015387,  1.7448118],
[-2.3015387, 1.7448118],
[-2.3015387, 1.7448118],
[-2.3015387, 1.7448118],
[-2.3015387, 1.7448118]],

[[-0.7390808, 0.5618427],
[-0.7390808, 0.5618427],
[-0.7390808, 0.5618427],
[-0.7390808, 0.5618427],
[-0.7390808, 0.5618427]]], dtype=float32)
[[[-2.3015387   1.7448118  -1.5313728   1.0319254 ]
[-2.3015387 1.7448118 -0.80446535 0.29551077]
[-2.3015387 1.7448118 -0.4893006 0.42488968]
[-2.3015387 1.7448118 0.09609441 -0.01519538]
[-2.3015387 1.7448118 -0.8421678 1.0224911 ]]

[[-0.7390808 0.5618427 -1.5313728 1.0319254 ]
[-0.7390808 0.5618427 -0.41862014 0.24487413]
[-0.7390808 0.5618427 -0.0545098 0.50111455]
[-0.7390808 0.5618427 -0.4893006 0.42488968]
[-0.7390808 0.5618427 -0.32086387 0.56698734]]]

The hidden layers

tf.Tensor(
[[[0.96246356 0. 2.3214347 ]
[0.5498358 0. 2.0962873 ]
[0.4715745 0. 2.1984253 ]
[0.17358822 0. 2.0852127 ]
[0.72574073 0. 2.414626 ]]

[[0.8194035 0. 0.91152126]
[0.26407483 0. 0.7183531 ]
[0.197609 0. 0.88388896]
[0.3285144 0. 0.7885119 ]
[0.30305254 0. 0.87557834]]], shape=(2, 5, 3), dtype=float32)

The output layer — our scores!

tf.Tensor(
[[[-0.51760715]
[-0.18927467]
[-0.10698503]
[ 0.13695028]
[-0.29851556]]

[[-0.58782816]
[-0.13076714]
[-0.04999146]
[-0.1772059 ]
[-0.14299354]]], shape=(2, 5, 1), dtype=float32)

Calculate KL divergence in the context of our expanded batch

tf.Tensor(
[[0.14152995 0.19653566 0.21339257 0.27234477 0.17619705]
[0.1358749 0.21460423 0.23265839 0.20486614 0.21199636]], shape=(2, 5), dtype=float32)
tf.Tensor(
[[0.44663328 0.1643072 0.1643072 0.1643072 0.06044524]
[0.4309495 0.4309495 0.05832267 0.05832267 0.02145571]], shape=(2, 5), dtype=float32)
tf.Tensor(0.4439875, shape=(), dtype=float32)
tf.Tensor([0.29320744 0.5947675 ], shape=(2,), dtype=float32)
tf.Tensor(0.4439875, shape=(), dtype=float32)

A toy ListNet implemenetation

Conclusion

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Written by Justin

Hello, world! My name is Justin. I solve problems using data. Check me out at embracingtherandom.com and linkedin.com/in/justin-m-evans/