Jump to content

Talk:Model collapse

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Not better be a section within overfitting?

[edit]

I have just read the paper, and I am not sure if this might not be better as a section within the overfitting page. The process is essentially recurrent overfitting of the model, due to class imbalance in the input, leading to an exaggerated class imbalance in later stages. Unless I am missing something conceptually. Bastianhornung (talk) 09:32, 26 July 2024 (UTC)[reply]

It is about training on synthetic data, including its own output. The media and academic coverage makes it more than notable enough for its own page. Wqwt (talk) 15:15, 2 January 2025 (UTC)[reply]

Inbreeding

[edit]

I have heard this phenomenon colloquially referred to as "inbreeding" or "inbred AI" on reddit. Might be worth putting in the article. 68.237.60.88 (talk) 17:28, 30 January 2025 (UTC)[reply]

Collapse of a single model or a succession of models?

[edit]

The intro doesn't clear up my confusion on this point: Is this a matter of a single model degrading over time, or of a succession of models, with the later ones performing worse than the earlier ones? My uneducated assumption is that once a model is trained, it becomes relatively fixed (unless retrained), so its performance won't degrade. A subsequent model trained (partially or fully) on the output of the first model, though, will perform worse. Is this correct? It would be good to clarify. Sharpner (talk 23:32, 19 February 2025 (UTC)[reply]

Is the core issue 'synthetic vs. human' or 'grounded vs. un-grounded' data?

[edit]

The definition of model collapse often relies on the dichotomy between "synthetic data" and "human-generated data." However, this view is superficial. The fundamental distinction lies not in who generated the data, but in whether the data is "grounded" in reality. Grounded data derives from direct interactions with the world, whereas data generated by an AI model is the output of a statistical model of the world, not of the world itself (Goodfellow et al., 2016; von Helmholtz, 1860).

To strengthen this argument, we can draw parallels to human cognitive, social, and even neurological phenomena. Model collapse is the computational counterpart to what occurs in "echo chambers," where information degrades in closed loops (Arendt et al., 2021). It is even more analogous to the phenomenon of sensory deprivation. When a brain is deprived of new, real-world stimuli, it begins to generate its own perceptions—hallucinations—by recycling internal memories and patterns uncontrollably, a process underpinned by the brain's predictive nature and its reliance on internal models when external data is absent (Friston, 2010; Goldstein & Volkow, 2011).

In all these cases, the principle is the same: the degradation of information occurs in any system—biological or artificial—that is forced to learn from its own representations of the world, instead of from the world itself. Therefore, model collapse is not a problem of "synthetic vs. human data," but rather of isolated learning systems vs. open systems continuously grounded in reality.

References

Raphael2718 (talk) 12:40, 28 August 2025 (UTC)[reply]

That is interesting, but seems to be in part original research. Do you have a reference dealing specifically with the topic, so we can verify it? TucanHolmes (talk) 08:02, 29 August 2025 (UTC)[reply]