Talk:Benford's law

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Statistical Question regarding the section "Multiplicative Fluctuations"[edit]

It is stated in the article under "Muliplicative Fluctuations": "More technically, the central limit theorem says that multiplying more and more random variables will create a log-normal distribution with larger and larger variance, so eventually it covers many orders of magnitude almost uniformly" However, the "central limit theorem" in its classical form does not refer to the multiplication of random variables. It refers to the sample average of a number of random variables that are mutually independent and identically distributed. To form the sample average, these variables are added and then divided by the number n (the total number of input-variables) and the distribution of this new random output-variable tends to follow a standard normal distribution (if n is sufficiently large). If a special form of the central limit theorem is used in the article on "Benford's law" (I am unaware of a muliplicative formulation of the central limit theorem) then the source of this formulation should be clearly noted. Aurelien101 (talk) 11:15, 6 July 2023 (UTC)[reply]

You add the logarithms of the numbers. You apply the central limit theorem on the log scale. Constant314 (talk) 13:10, 6 July 2023 (UTC)[reply]

regarding the "Multiplicative Fluctuations" section[edit]

why does the log-normal distribution prove Benford's law? NadaB04 (talk) 16:14, 26 July 2023 (UTC)[reply]

It doesn't. It explains Benford's law under the assumption that measurements of many natural processes seem to be distributed uniformly on a log scale. Constant314 (talk) 17:11, 26 July 2023 (UTC)[reply]
(first of all, thanks for explaining it to me:)
so just to make sure, we don't really have an interest here in the log-normal distribution? but rather just broad distribution which will be semi uniform?
also, i believe there's an error in this section about the increasing variance part, i couldn't find any version of the central limit theorem which isn't about fixed variance. 2A10:8012:F:64D9:A9D3:6666:1CD0:DEA4 (talk) 21:50, 26 July 2023 (UTC)[reply]

Discarded zeros.[edit]

Zero is a digit and a number can start with it.

"Zero", AKA "0", is a digit, as is stated in the article, here:

"[...] in a given base with a fixed number of digits 0, 1, ..., n, ..., [...]"

and here:

"Four digits is often enough to assume a uniform distribution of 10% as "0" appears 10.0176% of the time in the fourth digit, while "9" appears 9.9824% of the time."

Numbers *can* start with the digit zero, as is also stated in the article, here:

"Numbers satisfying this include 3.14159..., 314285.7... and 0.00314465... ."

Too little too late about discarded zeros.

The role of zeros is perhaps neglected a bit by the article, to the detriment of the accessibility of the article. It's not obvious what the roles of zero are, in Benford's law. That zeros are implicitly being excluded is not always clear.

In fact, this fundamental point is not touched on in the lede, and only touched on explicitly twice in the body of the article, and in passing, literally in parenthesis each time.

It's easily missed, I think. It's not very accessible to most nonmathematicians either. I think using ellipsis is a false economy here, making it harder to notice that there are just *nine* digits there, and zero is not among them, and it would be far better to just write all the digits out.

The first time discarding of "zero" or "0" is touched on is in the body of the article is, but it's not explicit, and easily goes unnoticed:

"A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability [...]"

The first explicit reference to the discarding of zeros is quite far down in the article:

"For example, the first (non-zero) digit on the aforementioned list of lengths should have the same distribution whether the unit of measurement is feet or yards."

The second explicit reference to it is:

"It is possible to extend the law to digits beyond the first. In particular, for any given number of digits, the probability of encountering a number starting with the string of digits n of that length – discarding leading zeros – is given by [...]"

Possible improvements.

The first sentence of the lede is:

"Benford's law, also known as the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation that in many real-life sets of numerical data, the leading digit is likely to be small."

Maybe "the leading digit" should be instead, "the leading digit (discarding leading zeros) ", "the leading nonzero digit", "the leading digit of the normalized significand", or "the leading significant digit", to make it clear that a number starting with zero, "0.998", say, does not count as a number starting with a small digit.

Also, how about some explanation of *why* leading zeros are discarded. As Dale Carnegie once said, "I keep stating the obvious, because the obvious is what people need to be told." Polar Apposite (talk) 19:53, 17 September 2023 (UTC)[reply]

There's no such term as a "high burglary".[edit]

The WP article contains this:

"Television crime drama NUMB3RS used Benford's law in the 2006 episode "The Running Man" to help solve a series of high burglaries.[30]"

I don't think "high burglary" is a real term (Google has never heard of it), and have no idea what it could mean. A burglary that is a high crime? A high altitude burglary? A burglary committed while intoxicated? A burglary of a mansion? The link does not contain the term, and the burglary referred to in the link is a fictional one in an episode of "Numb3rs", a break-in at a university laboratory that is equipped with the latest high tech anti-burglary security equipment (the burglars are nevertheless successful in defeating the security equipment).

https://numb3rs.fandom.com/wiki/The_Running_Man contains this:

"He has a past selling high-end break-in tools. Some of the tech that the robbers would have had to get past are after his time. He suggests going to look for somebody else and for the police to stop bothering him."

So the word "high" seems to have broken off from "high-end" and somehow got attached to the front of "burglaries", for no apparent reason.

I therefore propose deleting the word "high" from the sentence. Polar Apposite (talk) 20:06, 17 September 2023 (UTC)[reply]

Absolutely - go for it! - DavidWBrooks (talk) 20:27, 17 September 2023 (UTC)[reply]
Done. Polar Apposite (talk) 20:36, 1 October 2023 (UTC)[reply]

Reverted edit[edit]

@Constant314: I'm aware; what I was saying was that it's obvious information that did not need to be included, especially as an entire sentence in the lead. Snowmanonahoe (talk · contribs · typos) 05:37, 11 April 2024 (UTC)[reply]

I missed the implication of the sarcasm. The 11% needs to be there to contrast with the 30% and 5% in the previous sentence. But the one out of nine is redundant. I will fix it. Constant314 (talk) 20:48, 11 April 2024 (UTC)[reply]