Talk:Clustering coefficient

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Just some random feedback here... Many of the references to this article deal with examples that are somewhat complex. As someone who understands graph theory in a very visual way, the graphical example was too simple to be meaningful immediately. A counter-factual visual example would also serve to highlight the nuance of this concept. Thanks to the authors though, overall a good article. Meson537 (talk) 00:44, 8 December 2010 (UTC)[reply]


Shouldn't the degree reference (3rd paragraph in formal discussion section) point to Degree_(graph_theory) instead of Degree_(mathematics) ?


I believe the caption below the figure for the local clustering coefficient is either wrong, or misleading. First, there is no light blue node. The (regular) blue node doesn't seem to be related to the edges. It seems like the leftmost node is the one that needs to be painted light blue. —Preceding unsigned comment added by 128.59.65.170 (talk) 21:10, 21 April 2010 (UTC)[reply]


—Preceding unsigned comment added by Adonyk (talkcontribs) 15:28, 11 December 2008 (UTC)[reply]

Duno is this is the right place to discuss about this but it seems like there is an error on this page. According to other websites and research papers, the maximum of links within the neighbourhood in a directed graph is k_i(k_i-1), not 2k_i(k_i-1).

One of the sources : http://www.elvis.ac.nz/brain?ClusteringCoefficient

It is indeed the correct place to discuss it. It was indeed incorrect, I've changed it. Thank you for your correction. --stochata 16:55, 1 Mar 2005 (UTC)

p.s., do not be afraid to be bold and correct it yourself!

Added a few wikilinks to explore for us aspiring math dummies out there. ^^ Zanturaeon 23:21, August 1, 2005 (UTC)

Isn't problematic if is an edge leading to the vertex whrer it starts? (that was from 83.135.204.222 in June 2006)

Indeed. If there is an assumption that i doesn't equal j in the definition of N, it ought to be stated. Or is it not a problem? 72.248.65.171 22:44, 18 March 2007 (UTC)[reply]

I think, is wrong, too. For directed graphs, is defined as the number of vertices leading in _or_ out later on. I think it would make more sense using "and" or "or" both times. Is there any reference for these definitions? --92.224.229.115 (talk) 10:14, 24 September 2010 (UTC)[reply]

I agree with the above comment. The neighbourhood in a directed graph should be . Otherwise you are only counting nodes where there are 2-cycles between that node and node i. —Preceding unsigned comment added by 81.108.140.67 (talk) 13:56, 17 November 2010 (UTC)[reply]

Shouldn't this definition be restricted to vertices with degree greater than or equal to zero? Can you find the clustering coefficient of an isolated vertex? Possie11 20:21, 24 April 2007 (UTC)[reply]

I think you mean degree strictly greater than one. Yes, you're clearly right, because a vertex with degree zero is isolated, and has an empty neighborhood; one with degree one has a singleton neighborhood, which can have no edges (well, no self-edges, anyway). BrianTung (talk) 21:20, 7 February 2008 (UTC)[reply]

The link on the bottom of the page doesn't work anymore. Someone might want to update it. 15:29, 27 May 2007 (UTC)


Does this,

mean this?

If not then what does it mean? If it does mean the former then surely that is how it should be written? —Preceding unsigned comment added by 128.232.238.63 (talk) 23:58, 4 November 2008 (UTC)[reply]


Sorry but... I think the page is very incomplete. First of all, there is no definition of the concept of clustering coefficient given. A formula is not a concept, is just an expression of a manner to estimate a quantity. Second, Watts and Strogatz did not invent the clustering coefficient, what is a quite old concept. Third, there is no distinction between the global and the local clustering coefficients. The global clustering, C, is the overall conditional probability that any two nodes are connected provided that they share at least one common neighbour. The local clustering Ci of a node i represents the probability of its neighbours to be connected within each other. In directed networks, there is no clear and commonly accepted manner of defining C (at least to my knowledge). However, the local clustering Ci can be applied to individual nodes of a directed network, and then averaged. BUT, if you calculate C of an undirected graph, and the average Ci of all its nodes, C and <Ci> are not the same! Please, let me know what you think about my comments. I will be happy to collaborate.

Small World != Clustering Coefficient[edit]

I believe the clustering coefficent has nothing to do with the "small-world"-effect, at least not directly. Think of a "big graph", which consists of a lot of throughly connected subgraphs, but the individual subgraphs are not connected to each other. Then the clustering coefficient of the "big graph" will be 1, but there is no small world effect. Most of the networks exhibiting the small world effect will have a high clustering coefficient, but a high clustering coefficient does not mean that the graph exhibits a small world effect. I think "ntroduced the clustering coefficient[1] graph measure to determine whether or not a graph is a small-world network." and "A graph is considered small-world, if its average clustering coefficient \bar{C} is significantly higher than a random graph constructed on the same vertex set." is wrong. 88.65.98.195 14:57, 15 September 2007 (UTC)[reply]

Please read the paper[1]. It defines the "small world phenomenon" as L slightly greater than Lrnd, while C much greater than Crnd. L is the Average path length, C is the Clustering Coefficient, and the Xrnd values are calculated for a random graph with the same number of edges and vertices. So in a way, you're right, a high clustering coefficient does not suffice for a small world (the paper even states that you can't detect a small world from clustering coefficient alone). The problem is that the clustering coefficient is a local graph measure, while the small world property is a global one. just my 2E-2 Euros. --80.136.95.157 14:58, 2 October 2007 (UTC)[reply]
I already knew the paper before. I did not find the explicit definition of "small world network" in the paper like you said. I think you mean this passage: "We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them ‘small-world’ networks, by ...". I think the meaning is "We call networks with small characteristic path lengths small world networks", not "We call highly clustered networks with small characteristic path lenghts small world models". Anyway, i feel the gist of the paper is: "Is it possible that a network with small path length is highly clustered?" and not "How can we find out whether a network is a small world network?". Besides, defining "small world" through the clustering coefficient is quite illogical IMHO. 88.65.77.185 —Preceding signed but undated comment was added at 15:57, 8 October 2007 (UTC)[reply]
I agree, the small world property is as indicated by the name, a property of graphs that have a small average distance. The literature is however often interested in real-world small world graphs, they tend to have a large clustering coefficient too. The disregard for trivial small world graphs in literature has confused the author of this article.

'clustering coefficient' has little to do with 'clustering'[edit]

If you consider a network with overlapping community structure, then there will be many triangles and hence a reasonably high clustering coefficient. But I suspect many people take 'clustering' to imply a non-overlapping partitioning. A more extreme example is a ring lattice - a ring lattice has high clustering coefficient, but it doesn't have anything that can be reasonably called community structure (overlapping or otherwise).

Is it possible to concisely clear this up at the top of the article? Aaron McDaid (talk - contribs) 10:47, 16 July 2011 (UTC)[reply]

I agree that it is often overlooked that a network with a high clustering coefficient might have no clusters in it. On the other hand, a network with many clusters, even one that consists entirely of network clusters, could have a low clustering coefficient. Consider a large, undirected graph, say with hundreds of billions of nodes. Consider that every node belongs to one hundred cliques (i.e., complete subgraphs) of size eleven that are randomly selected. Thus all edges in the graph are embedded in a clique of size eleven. Yet the clustering coefficient will be pretty low. To see this, imagine the egocentric neighborhood of any given node. Each node is connected to ten cliques of ten, so (ignoring the small chance that any of those ten cliques shares members) there is the potential for 500k edges between the alters of any ego, however, only 4500 of these connections actually exist. This example also illustrates the clustering coefficients of graphs with different sizes should not be compared without some kind of suitable normalization (I'm not aware of one).--Conradl (talk) 23:14, 14 September 2011 (UTC)[reply]

Global clustering coefficient vs. Transitivity ratio[edit]

The article claims that these are two separate concepts, but the definitions seem to be identical. What am I missing?

Also: why is it called a transitivity ratio? — Preceding unsigned comment added by 84.94.180.144 (talk) 15:30, 13 March 2014 (UTC)[reply]

Let T = the number of triangles, P = the number of two-edge paths, and C = the number of connected triples of vertices. Then C = P − 2T, because a connected triple that is not a triangle contributes one to the P term while a triangle contributes 3 to the P term and −2 from the other term, so that its contribution is also one. My understanding is that the transitivity ratio is 3T/P and that the clustering coefficient is T/C. So they both give the same information (one is a function of the other) but have different values. But this is not clearly explained in the article and the formula in the transitivity ratio says it is equal to two different things. As for why: because transitivity (a friend of a friend is automatically a friend) would imply that 3T = P, and the transitivity ratio measures how close this is to being true. —David Eppstein (talk) 16:04, 13 March 2014 (UTC)[reply]
Thanks! That cleared it up. I think I'll try to incorporate all that into the article, as long as I'm here. — Preceding unsigned comment added by 84.94.180.144 (talk) 20:40, 16 March 2014 (UTC)[reply]
This has me confused. Among the sources i've found that are careful with their definitions, most define the (global) clustering coefficient as 3T/P, while the rest call this the transitivity ratio and define the (global) clustering coefficient as the mean local clustering coefficient. I have not been able to find a source that defines anything as T/C, though some sources (e.g. [2]) use phrases like "connected triples" but make it clear from examples or further exposition that the calculation they're performing is 3T/P. If anyone has a source for the T/C definition, i would love to see it! In the meantime (as written, though perhaps even if it were written more clearly) i worry that the distinction causes more confusion than clarity. 155.37.131.50 (talk) 21:17, 4 June 2014 (UTC)[reply]

Hi everyone! I think the page is a bit confusing currently. On my network software documentation site (http://toreopsahl.com/tnet/weighted-networks/clustering/), some people have reached out to clarify concepts in this article. There are multiple ways of defining the overall level of clustering in a network. The global clustering coefficient which is defined as 3x triangles over triples has been around for a long time (all the way back to Luce and Perry, 1949). In their 1998 paper, Watts and Strogatz introduced the local clustering coefficient, which is a node level metric (studying the number of ties among neighbors were not a new idea, see Granovetter, 1973, Coleman, 1988, and Burt, 1992). To get an overall coefficient, Watts and Strogatz took a simple average of the node-level scores. By using a simple average, you get a different answer than the global clustering coefficient if the nodes do not all have the same number of ties. To get the same answer, you can use a weighted average where each node is weighted by the number of triplets they are the center of, n*(n-1) (see code and references on website).

I have gone ahead and cleared up this article by - removing the paragraph starting with Watts and Strogatz under the Global clustering coefficient heading as it is already mentioned under local clustering coefficient; and their way of computing the simple average is mentioned under Network average clustering coefficient) - removing the heading Transitivity Ratio as this is the Global clustering coefficient.

Tore Opsahl 16:42, 27 July 2014 (UTC) — Preceding unsigned comment added by Tore.opsahl (talkcontribs)

Thanks, Tore. However, another source of confusion for me is the Luce–Perry citation. Can you say where in their paper they attempt to compute or describe (using their terms) the proportion of symmetric 3-chains in a matrix that are redundant? I have read through it and have not found such a comment, so i don't understand why the paper is being cited. 155.37.131.50 (talk) 19:10, 7 September 2014 (UTC)[reply]

Algorithms to generate clustered networks[edit]

I think it would be worth having a section about algorithms to generate clustered networks. If there's agreement: I'd go ahead with the standard small-worlds network algorithm and also the random clustered graphs introduced (separately) by Miller and Newman. Since I'm "Miller", I want to be careful not to be biased on this, so seeking feedback before I do any such changes. If no-one has comments on this, I may go ahead and do the changes (eventually). I'd be interested in including other algorithms as well. Please give suggestions. Joelmiller (talk) 02:39, 19 March 2015 (UTC)[reply]

Triplet contradiction[edit]

The following two sentences contradict each other:

"The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed)."

"In this formula, a connected triplet is defined to be a connected subgraph consisting of three vertices and two edges."

I had tried editing the latter to read "and either two or three edges", but it was reverted. I don't see how both of these sentences can be true at the same time.

Michaelmalak (talk) 19:32, 5 October 2015 (UTC)[reply]

A complete graph on three vertices u, v, and w has three triplets, according to the current definition of a triplet: the paths u-v-w, v-w-u, and w-u-v. According to your proposed replacement, it would have four (the whole graph is a subgraph of itself and would also count as a connected triplet) but I expect that what you intended was for it to have only one (the three vertices considered as an unordered set rather than a choice of vertices and edges forming a subgraph). That changes the formula that you need to use to define the clustering coefficient. In particular, it should be a number from 0 to 1, but your proposed replacement turns it into a number from 0 to 2/3 (if you read what you wrote literally) or from 0 to 3 (if you go by what I think you intended). —David Eppstein (talk) 20:16, 5 October 2015 (UTC)[reply]
I believe my understanding is the same as yours. But just to make sure, let me run through an example. Let G have four nodes. a, b, and c are fully connected with three edges. There is a fourth edge from c to d. There are three closed triplets (a-b-c, b-c-a, and c-a-b) and two open triplets (a-c-d and b-c-d). Therefore the clustering coefficient is 3 / (2+3) = 0.6.
My concern is that the definition "In this formula, a connected triplet is defined to be a connected subgraph consisting of three vertices and two edges" suggests (at least to me) that closed triplets are to be excluded from the denominator and thus in my example, with this (in my opinion erroneously worded) definition of triplet, the clustering coefficient would be 3 / 2 = 1.5, which is outside the desired (0,1) range. Michaelmalak (talk) 22:58, 5 October 2015 (UTC)[reply]
Look at the word "subgraph", and notice that it does not say induced subgraph. —David Eppstein (talk) 23:29, 5 October 2015 (UTC)[reply]
I suppose that might be technically correct, but it is highly misleading. How about we just completely remove the sentence "In this formula, a connected triplet is defined to be a connected subgraph consisting of three vertices and two edges," since it conflicts apparently (but perhaps not absolutely) with "The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed)"? Michaelmalak (talk) 14:21, 6 October 2015 (UTC)[reply]
I still don't see the contradiction. If a closed triplet is a triangle with one designated vertex, and an open triplet is an induced two-edge path, then the connected triplets (as defined by this sentence) corresponds one-for-one with the union of closed and open triplets. —David Eppstein (talk) 15:45, 6 October 2015 (UTC)[reply]
A closed triplet has three edges. "In this formula, a connected triplet is defined to be a connected subgraph consisting of three vertices and two edges" makes it sound like closed triplets are excluded from the denominator. Michaelmalak (talk) 15:54, 6 October 2015 (UTC)[reply]
Only if you continue to wilfully misunderstand what the word "subgraph" means. —David Eppstein (talk) 16:13, 6 October 2015 (UTC)[reply]
Was my attempted edit, "In this formula, a connected triplet is defined to be a connected subgraph consisting of three vertices and either two or three edges" incorrect? If so, how? If not, isn't it more clear than the current statment, "In this formula, a connected triplet is defined to be a connected subgraph consisting of three vertices and two edges"? — Preceding unsigned comment added by Michaelmalak (talkcontribs) 16:42, 6 October 2015 (UTC)[reply]