https://huggingface.co/mmhamdySo what are those "thinking...

https://huggingface.co/mmhamdySo what are those "thinking tokens"?! Nothing fancy, they are just special tokens '<T>' that you insert after each word in a sentence whenever a complex problem is encountered. That's it!

👉 The main idea is to "buy" the model "some time" to think about the problem with these additional computations before answering. Using this method they observed an improved (a little bit) perplexity.

👉 Before getting excited note that: They have added these tokens manually, and they have used an RNN language model. From the paper:

"As a proof of concept, we have added N ’thinking tokens’ (< T >) after each observed word in a dataset. Our vision is that this basic concept can be extended to a self-adjusting model, which will be able to decide itself if and how many ’thinking tokens’ will be used for a specific problem, where N could also vary throughout the sentence. This would allow us to reduce the computational time, which would not increase N times."