Talk:Heaps' law

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

[Untitled][edit]

The original version of this page was adapted from http://planetmath.org/?method=src&from=objects&id=3431&op=getobj owned by akrowne, with permission under the GFDL

Divergence[edit]

I think with "Where VR is the subset of the vocabulary V represented by the instance text of size n" the author wanted to say "Where VR is the cardinality of the subset of the vocabulary V represented by the instance text of size n", because a subset is not a number. However, the size of the subset diverges (i.e. becomes arbitrarily large) as n goes towards infinity. That would only make sense if the vocabulary would also be of infinite size. (just as a sidenote: I would have expected the fraction of the vocabulary not covered by the text to decrease exponentially when looking at larger and larger documents). Icek (talk) 19:02, 29 September 2009 (UTC)[reply]

Vocabulary size is infinite according to generative grammar, see e.g. Mark Aronoff "Word formation in generative Grammar" MIT Press 1985, Andras Kornai "How many words are there?" Glottometrics 2002/4 61-86 88.132.28.96 (talk) 20:33, 17 March 2012 (UTC)[reply]

Types, Tokens, and Hapaxes: A New Heap's Law[edit]

There is a new paper by Victor Davis (link), deriving a stronger version of Heap's law from first principles (rather than empirically). I think it's worth adding it to the article. Compare this YouTube video for a brief summary and visual explanation. Renerpho (talk) 20:13, 29 August 2022 (UTC)[reply]

ArXiv preprints do not count as reliably published sources for Wikipedia purposes. YouTube videos even less. —David Eppstein (talk) 20:50, 29 August 2022 (UTC)[reply]
@David Eppstein: Fair enough -- How about this version? Renerpho (talk) 11:31, 30 August 2022 (UTC) The YouTube video is not intended as a source, but as assistance for the editor who is going to work on this. Renerpho (talk) 11:35, 30 August 2022 (UTC)[reply]