Whereas large-scale neural language fashions, comparable to GPT2 and BART, have achieved spectacular outcomes on numerous textual content era duties, they have a tendency to get caught in undesirable sentence-level loops with maximization-based decoding algorithms (e.g., grasping search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions within the human corpus (e.g., 0.02% in Wikitext-103). To analyze the underlying causes for producing consecutive sentence-level repetitions, we research the connection between the likelihood of repetitive tokens and their earlier repetitions in context. By way of our quantitative experiments, we discover that 1) Fashions have a desire to repeat the earlier sentence; 2) The sentence-level repetitions have a self-reinforcement impact: the extra occasions a sentence is repeated within the context, the upper the likelihood of constant to generate that sentence; 3) The sentences with greater preliminary chances normally have a stronger self-reinforcement impact. Motivated by our findings, we suggest a easy and efficient coaching technique DITTO (PseuDo-RepetITion PenalizaTiOn), the place the mannequin learns to penalize chances of sentence-level repetitions from artificial repetitive knowledge. Though our technique is motivated by mitigating repetitions, our experiments present that DITTO not solely mitigates the repetition subject with out sacrificing perplexity, but additionally achieves higher era high quality. In depth experiments on open-ended textual content era (Wikitext-103) and textual content summarization (CNN/DailyMail) show the generality and effectiveness of our technique.