You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I note the use of a sliding window for attention. Although this captures n_layers * window_len in width of attention, some work LM-Infinite seems to suggest that isn't enough to get good passkey retrieval. Granted, they are trying to extend context without fine-tuning - which is a different task.
The launch post says that use of sliding window does not affect quality. In what way did you measure that?
Also, is Mistral 7B just using the sliding window OR also adding in historical chunks of attention too?
The text was updated successfully, but these errors were encountered:
Thanks for releasing this model.
Have you run any passkey retrieval tests?
I note the use of a sliding window for attention. Although this captures n_layers * window_len in width of attention, some work LM-Infinite seems to suggest that isn't enough to get good passkey retrieval. Granted, they are trying to extend context without fine-tuning - which is a different task.
The launch post says that use of sliding window does not affect quality. In what way did you measure that?
Also, is Mistral 7B just using the sliding window OR also adding in historical chunks of attention too?
The text was updated successfully, but these errors were encountered: