Mini-batch setting with Semi Markov CRF #110

urchade · 2021-10-02T10:22:35Z

I encounter learning instability when using a batch size > 1 with the semi-markovian CRF (loss goes to very large negative number), even when explicitly providing "lengths". I think the bug comes from the masking.
The model train well when setting batch size 1.

srush · 2021-10-04T16:50:39Z

thanks. Can you by any chance provide an example? I will take a look.

urchade · 2021-10-05T10:06:03Z

The problem also occurs during inference :

import torch, torch_struct
import matplotlib.pyplot as plt

torch.manual_seed(1)

batch, N, C, K = 3, 10, 2, 6

def show_sm(chain):
    plt.imshow(chain.detach().sum(1).sum(-1).transpose(0, 1))

log_potentials = torch.randn(batch, N, K, C, C)

# dist with and withoud mask length (we do not pad the 0th element of the batch)
dist_1 = torch_struct.SemiMarkovCRF(log_potentials)
dist_2 = torch_struct.SemiMarkovCRF(log_potentials, lengths=torch.LongTensor([N+1, 5, 1]))
dist_3 = torch_struct.SemiMarkovCRF(log_potentials, lengths=torch.LongTensor([N+1, 5, 4]))

# argmax for the 0th index should be the same for every dist since there is no padding on this index
assert torch.allclose(dist_1.argmax[0], dist_2.argmax[0])
assert torch.allclose(dist_1.argmax[0], dist_3.argmax[0])
assert torch.allclose(dist_2.argmax[0], dist_3.argmax[0])

srush · 2021-10-06T19:32:53Z

oh thanks, this is a useful test (and sounds like a bug)

@da03 we should fix this. Any chance you could take a first look?

da03 · 2021-10-14T22:37:36Z

@urchade Thanks for pointing this out! It's fixed in PR #114. The issue was due to this line

pytorch-struct/torch_struct/semimarkov.py

Line 67 in 5328ec5

mask[:, :, : end - (k - 1), k - 1, k].diagonal(0, -2, -1).fill_(True)

not considering different ending positions for sentences of different lengths.

Besides, I also added back another implementation _dp_standard for log partition calculation that's more memory-efficient, which can be used like below:

import torch, torch_struct

torch.manual_seed(1)

batch, N, C, K = 3, 10, 2, 6

log_potentials = torch.randn(batch, N, K, C, C)

dist_1 = torch_struct.SemiMarkov()
dist_2 = torch_struct.SemiMarkovCRF(log_potentials, lengths=torch.LongTensor([N+1, 5, 1]))

assert torch.allclose(dist_1._dp_standard(log_potentials, lengths=torch.LongTensor([N+1, 5, 1]))[0], dist_2.partition)

srush · 2021-10-14T22:48:30Z

Oh wow, impressive @da03 ! This code is really complex.

Long term let make SemiMarkovParallel and SemiMarkovFlat their own classes and let CRF pick which one to use.

da03 mentioned this issue Oct 14, 2021

fix semimarkov batching and add tests #114

Merged

srush closed this as completed in #114 Oct 14, 2021

urchade mentioned this issue Oct 27, 2021

Instable learning with SemiMarkov CRF #117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mini-batch setting with Semi Markov CRF #110

Mini-batch setting with Semi Markov CRF #110

urchade commented Oct 2, 2021 •

edited

Loading

srush commented Oct 4, 2021

urchade commented Oct 5, 2021 •

edited

Loading

srush commented Oct 6, 2021

da03 commented Oct 14, 2021 •

edited

Loading

srush commented Oct 14, 2021

Mini-batch setting with Semi Markov CRF #110

Mini-batch setting with Semi Markov CRF #110

Comments

urchade commented Oct 2, 2021 • edited Loading

srush commented Oct 4, 2021

urchade commented Oct 5, 2021 • edited Loading

srush commented Oct 6, 2021

da03 commented Oct 14, 2021 • edited Loading

srush commented Oct 14, 2021

urchade commented Oct 2, 2021 •

edited

Loading

urchade commented Oct 5, 2021 •

edited

Loading

da03 commented Oct 14, 2021 •

edited

Loading