Wikipedia talk:Categorization policy

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Please add your opinion below!

Looking at the bigger picture, categories do not work in the vast majority of mirror sites. -- John Gohde 21:25, 25 Mar 2005 (UTC)

  • I like the proposal.WHEELER 14:56, 16 Mar 2005 (UTC)

Opposed[edit]

I see no need to change from the current system. Maurreen 18:44, 3 Apr 2005 (UTC)

Admins[edit]

I totally oppose the idea that only admins can make categories or rearrange them. I am an admin. There are only 400 or so of us in the English Wikipedia. We are already busy. Also, I see no evidence that as a group we are particularly more qualified than others for this task.

What we need is numerous strong WikiProject groups, each forming consensus and taking charge of categorization in a swath of subject matter. They don't need to be admins. -- Jmabel | Talk 07:54, Mar 17, 2005 (UTC)

  • Numerous strong project groups are a good idea. However, it is easy for a random user to create a bunch of categories and put articles in them, and it takes a lot of time for the groups to clean up the mess (see WP:CFD for hundreds of misspelled, miscapitalized, nonstandard or empty cats). I realize the admins are already busy, but this would not be a lot of work (simply create a few cats requested each week) and it would alleviate the work from CFD.
  • So a solution might be to get those project groups you propose, and make the head of one or two of them an admin. That way, the groups can do the work, and random people cannot accidentally (in good intention but misguided) break the system. Radiant_* 09:02, Mar 17, 2005 (UTC)
  • I support the idea that only admins would create and arrange categories. Currently the situation favours a quick and impudent minority. If a such person creates a stupid category or a category with an inappropriate name, and puts many articles to it, few people are tenacious enough to recatogorize them. -Hapsiainen 13:14, Mar 17, 2005 (UTC)
  • I like the concept of project groups. What about this: any user can become a "category-admin" for a given category. Make the process fairly open and automated (no work for "real" admins). The "category-admins" would be able to manage things like article classification, sub-categories, etc. Feco 20:42, 10 Apr 2005 (UTC)

I oppose the restriction to admins. We have huge backlogs on admin-tasks as it is. What we need are dedicated category-watchers, who need not be admin, who will jump on users who create silly categories. Categories can be deprecated, emptied and put on CfD without any admin intervention. A backlog of emtpy categories that still hang around but are never seen by anybody do no harm. It's not like we have a lot of category-related vandalism (touch wood...), so this really seems like an exaggerated restriction that would hurt the project (like protecting ITN etc.; this also hurts, but it was forced on us by the vandals) dab () 14:21, 17 Mar 2005 (UTC)

Actually, I think this policy might create less for for admins (actually, everyone), not more! A lot of time/energy for both admins and regular users is spent deleting useless categories, because categories get created like wildfire. If we had some control over their creation, we'd have a lot less work to do deleting the useless ones. I mean, at this point in the project we should have most of the useful categories created, anyway - the rate of creating new ones ought to be low. Noel (talk) 14:51, 17 Mar 2005 (UTC)
Couldn't that argument be applied equally well to articles? I am quite sure that administrators are currently spending more time deleting (and dealing with deletion) articles than categories. Categories are actually a fairly new feature. The system is no where near complete. There are massive numbers of articles that are uncategorized and many category trees that are only partially complete.
Currently there are only a handful of administrators who work on category cleanup. Is that likely to change if this system is implemented? -Aranel ("Sarah") 00:28, 18 Mar 2005 (UTC)
We talk about creating and arranging categories, not adding articles to them. Categories are different from articles, because one category is far more widely present than one article. They are navigational tools. And you need more volunteers to work on a article, because articles should contain more text than categories. So it isn't wise to restrict editing articles. But with categories you rather need vision and planning than sudden moves. So creating them shouldn't be as immediate as it is now.
If in the future creating and organizing categories would become possible only for adiministrators, their work would just change, not increase. Then they ought to to keep an eye on request for new categories and categories for deletion pages. The action would move from categories for deletion and speedy deletion pages to requested categories page. I believe they would have less work, because there would be less silly or POV categories. -Hapsiainen 02:14, Mar 18, 2005 (UTC)
  • In response to Aranel... it isn't likely to change the amount of admins there, but the proposed system will allow them to work more effectively, because their cleanup work cannot be as easily undone by well-meaning but ill-advised random users. Radiant_* 09:52, Mar 18, 2005 (UTC)
sure, but this argument is exactly applicable to articles, will we protect them also, to save us from undoing well-meaning but ill-advised random users' additions? really, this simply defeats the point of a wiki. It may be another thing to prevent re-creation of a deleted category. If there was a consensus to not have a certain category at some point, it is indeed pointless to allow people to re-create it (same as blank-protecting articles; I don't think the same can be done for categories). dab () 15:06, 18 Mar 2005 (UTC)
No-one wants to reserve editing articles only for administrators. People have written how categories are different from articles, how they need a scheme told by community rather than edits by arbitrary users. So the categories need to be treated differently. But you don't comment this, you just provide analogies, and you don't even further explain why they are correct.
Still about applicability. It was a mistake to give administrators the possibility to block users. The administrators are users, too, so it is logical that the other users should be able to do it as well. Reductio ad absurdum. Yes, this is sarcasm, and this is what you get with unfounded analogies. -Hapsiainen 20:52, Mar 18, 2005 (UTC)

Disagree[edit]

Disagree--A virtue of WP is its flexibility. Another is its refusal to elevate members to exalted status or restrict them to a minor role. This proposal takes WP in the direction of the old Nupedia: "Let's have real experts do it right the first time." That's been shown not to work.
This proposal does point to some real problems; some of these may be an unavoidable "cost of doing business". But I'll wager there are more than 2 Finnish botanists, perhaps even more than 2 worthy of note. We simply haven't gotten around to filling out the category. In 100 or 200 years, I wouldn't be surprised to see a dozen in there. We're just laying the groundwork today.
I think it is appropriate to understand that over the next several centuries, Wikipedia and its sister projects will probably become the central repository for human knowledge: the primary source of durable, factual information, absorbing and subsuming all others. Maybe your planing horizon does not extend quite that far. (Sony Corporation's chairman once said that Sony's long-term planning horizon was 300 years in the future.) That's fine; work on today -- it's all we have now.
None of us today can fix a plan for WP, except in the most general terms, that will endure for centuries. Let's just stay flexible and trust in the Wiki Way to produce a work of value from moment to moment -- which is all that anyone has any use for, anyway. — Xiong (talk) 15:07, 2005 Mar 17 (UTC)
  • No, not exactly. This proposal makes sure that categorization is done by community consensus, rather than by whatever user happens to feel like making a new category. Just like we require a community vote for deleting anything, we should (imho) also require a community vote for major changes such as adding categories. The current category system is degenerating into a useless mess, and it is too large to be overseen by, say, a Wikiproject Category Watchers if any user can make any change at any time. Radiant_* 15:45, Mar 17, 2005 (UTC)
Replace "category" with "article" in that post, and read it again. Just for kicks ;) -- grm_wnr Esc 16:11, 17 Mar 2005 (UTC)
  • Well, yes, but the inherent difference is that an individual category can reach over the entire 'pedia, and an individual article cannot. Nobody is saying that article creation should be restricted, that would be patently absurd and unWikiish. The slippery slope argument is a fallacy. Radiant_* 16:42, Mar 17, 2005 (UTC)

I also disagree with the proposed restrictions on category creation. I think the current chaotic state of categorization is just a transient phase; en's collection of articles is vast and there's still an ongoing wild profusion of new categories being created to hold everything. Once most articles are thoroughly categorized I believe the process will start shifting over to consolidation, with categories being renamed and shuffled about and gradually coalescing into a more standardized and rational structure. Trying to "lock" categories at this point would only slow that process down, IMO. Bryan 09:04, 19 Mar 2005 (UTC)

Categories can be a headache sometimes, but I still say NO NO NO NO NO. This proposal is way too inflexible to be implemented effectively. Worse, the 'requests for categories' would be huge, almost on the order of VfD. →Iñgōlemo← talk 16:29, 2005 Apr 10 (UTC)

Redundancy with lists[edit]

If it is going to be policy to redirect lists with the same name to categories, it should be specified in the policy listing that any user (or whoever) can edit the category page for things other than adding subcategories, including adding redlinks. And when the list is redirected, the redlinks existing should be put on the Category page. Gene Nygaard 15:14, 16 Mar 2005 (UTC)

  • Good point. Hereby added. If it is not technically possible for a user to be able to add redlinks but not add subcats, the redlinks could instead be kept on the category's talk page. Radiant_* 15:40, Mar 16, 2005 (UTC)

Totally opposed to restricting category creation[edit]

This is one of the worst policy proposals I have ever heard. I am not an admin and I don't want to be one, but I have created hundreds of categories. I created over a third of the categories in category:United Kingdom, category:London and Category:cricket. I have created hundreds and hundreds in total. This policy would be an utter disaster for the category system which could delay its full implementation by years. Wincoote 11:01, 19 Mar 2005 (UTC)

I agree, the proposal just adds needless bureaucracy. As it is there aren't enough admins to do the far more necessary Copyvio deletions work, we're hardly going to have loads of admins wanting to create lots and lots of categories they know nothing about, jguk 11:52, 19 Mar 2005 (UTC)
  • You misunderstood the proposal. The idea is not that admins should decide what categories are to be created. The idea is to create a centralized page that lists all categories, and all of them are to be created unless objections exist (i.e. if a category is misspelled, or redundant, or pointless). Radiant_* 14:59, Mar 19, 2005 (UTC)
    • Once again your assumptions about me are patronising and wrong. I did not misunderstand the proposal. I don't want to have to wait five days. Nor will other category creators. Thus this proposal will drastically slow down the implementation of the category system, which many people think is one of the best facilities in Wikipedia. Wincoote 14:05, 22 Mar 2005 (UTC)
      • Er, no, my comment was to JGuk, not to you. Radiant_* 15:01, Mar 22, 2005 (UTC)
        • See commment on my talk page. Wincoote 15:41, 22 Mar 2005 (UTC)
We already have CfD, which unfortunately has wound up filling a far broader niche than just deletion. I prefer the approach of fixing these things after-the-fact rather than making sure we "get it right" on the first edit, that seems more like the proven-successful Wiki way to me. Bryan 19:05, 19 Mar 2005 (UTC)
  • Very well. But does either of you agree that the current category system is getting cluttered up? Would you think it useful that a WikiProject is started to simply look into that and streamline its consistency? Just like there are WikiProjects on stub sorting and spell checking? For instance, one user might create a category 'London churches', while another creates a category 'churches in London'. That's redundant and potentially confusing. Radiant_* 14:59, Mar 19, 2005 (UTC)
I definitely agree to that. Some sort of large-scale WikiProject for cleaning up and making category structures consistent would be wonderful. Bryan 19:05, 19 Mar 2005 (UTC)
I think categories might need a major rethink from the software implementation point of view before a major effort is made to fix the categories. At this point, inspecting the categories that have been created brings to light many problems. For example, take that 'London churches' category. If there is an article on some London church, wouldn't one want to put it in the 'London' category, and in the 'church' category, and *LET THE SOFTWARE* figure out that it was therefore a 'London church'? At present, if there are on the order of 100,000 cities, and on the order of 100 types of buildings, there will be 10 million categories similar to 'London church'. Do we want that? The whole system by which articles are indexed and classified needs a bit of a rework. --BM 22:02, 19 Mar 2005 (UTC)
That's a strawman, I doubt there'll be anywhere near that many cities with so many notable churches in them that they'd warrant separate categories. There are only 500,000 articles in all of Wikipedia, and since "random page" gets me a city's article far fewer than 1/5 of the time most of those 100,000 cities don't even have simple articles of their own yet. And what are these hundred types of buildings that all cities will have? Bryan 03:26, 21 Mar 2005 (UTC)
BM, you should read my hierarchial proposal above. Instead, it would be "/England/London/Churches/" (take your pick on England vs. UK vs. whatever). Cburnett 03:35, 21 Mar 2005 (UTC)
Cburnett, I read your proposal, and the main merit I see in it is that hierarchies are easy to understand, and they provide an obvious way to browse and navigate the category structures. A cyclic graph of categories is general and powerful, but it is hard to devise user interfaces for navigating it. Also, when we are relying on article editors to maintain a cyclic graph of categories, it is a bit too powerful: it is a bit too easy for people to mess up the categories, and the process for undoing their mistakes is cumbersome. That said, I do think it is necessary to have multiple hierarchies. The categories should form a forest of trees, and an article should be allocated to at most one category in any tree, although it could be in multiple trees. My point above is that we should avoid creating categories that simply represent the intersection of other categories. For example, /England/London/Churches is not a logical hierarchy. A more logical hierarchy is geographical: /England/London/Knightsbridge. Another logical hierarchy is /Architecture/Building/Church. There might a time dimension, too: for example, "14th Century". An article about a church in Knightsbridge built in the 14th Century would be put in categories from as many hierarchies as are applicable, at the lowest level of granularity as applicable. For example: Knightsbridge, Church, and 14th Century. This would automatically put the church article into the London, England, Building, and Architecture categories as these are parent and grandparent categories of Knightsbridge and Church. There would not be a need for categories like "England church", "London church", "14th Century Church", "14th Century London Church", "England building", unless there was some special text that one wanted to associate with that particular intersection of the basic categories. It should be possible for someone simply to list articles that are in the intersection of categories from any number of hierarchies to get whatever level of granularity is desired. --BM 15:01, 22 Mar 2005 (UTC)
Ugh, I hate this idea. I have created many categories to split stuff off from existing ones; this just creates another level of bureaucracy. Hopefully one day we can have category redirects working as they should to handle the problems with the current system. --SPUI (talk) 11:36, 20 Mar 2005 (UTC)
Not in the slightest. It is still in the early stages of development. The idea that categories which don't contain hundreds of entries are invalid is incomprehensible to me. Wincoote 15:41, 22 Mar 2005 (UTC)
  • BM, the system you are describing is a Directed acyclic graph. It is neither a Tree_(graph_theory)]], nor a cyclic graph (Though it does, in same ways, resemble a tree hierarchy). The use of DAGs for categorisation systems seems to be accepted as a good idea (see, for instance, the concept of ontologies). --Kieran 10:14, 11 Apr 2005 (UTC)
  • In response to the above comment about the software working out that an article in London and Churches is effectively in Churches in London: This might save on some disk space by reducing the number of categories, but will probably be more difficult to use. If the entire category creation system were automated, and we had sophisticated software for category browsing, it would be fine, but we don't. --Kieran 10:14, 11 Apr 2005 (UTC)

I agree with Wincoote on this - SoM 23:16, 17 Apr 2005 (UTC)

Locking categories is evil[edit]

This is a horrible proposal. It goes against the very nature of Wikipedia. That is, the ability to edit anything and be bold. Right now, if I see something wrong with a category, I can just fix it. I don't have to appeal to the council of category gods and wait five days to see if my application is in order.

Locking categories will drastically restrict gradual improvements and lead to stagnation. Locking categories will also cause many editors to say "up yours, Wikipedia!" and find ways to organize things without categories. That is, they'll use lists and navigational templates and pseudo-categories instead. In other words, locking categories will make categories useless and obnoxous.

I'm all for an effort to create category standards and work with the local wikiprojects to share knowledge and create a grand category scheme, but this is absolutely the wrong way to do it. Make a WikiProject instead (See WikiProject Categories), and make it easier for people to find out about current guidelines and best practices. As Cburnett said, "The answer is education instead of legislation". - Pioneer-12 22:53, 31 Mar 2005 (UTC)

Yes. Thank you, Pioneer-12. Septentrionalis 21:30, 8 Jun 2005 (UTC)

Proposals[edit]

Replace categories with hierarchial filing system[edit]

(This is probably the wrong place, but what the heck.) I think this is a symptom of a larger problem; both categories and disambiguation pages hint at this problem.

The problem is that WP puts all articles in the same "directory." This is much like a filesystem design problem. Do you put every file on your computer in the same directory? No, of course not because eventually you'll want to name two things the same thing or you have to put descriptive information in the name "2004 budget.xls" instead of "/2004/Budget.xls". (Or like library information: do you think the Library of Congress would consider sorting all 120 million books on a single 520 mile shelf?). Use a hierarchy.

This would be my proposal (I won't claim it's iron clad) before supporting this new policy. Take Enterprise as an example and apply a hierarchy (not all shown):

  • /Television/Series/Science fiction/Star Trek/Ships/Federation/Enterprise/NCC-1701
  • /Television/Series/Science fiction/Star Trek/Ships/Federation/Enterprise/NCC-1701-A
  • /Television/Series/Science fiction/Star Trek/Ships/Federation/Enterprise/NCC-1701-B
  • /United States/NASA/Space Shuttles/Enterprise
  • /United States/Navy/Ships/Enterprise
  • /United States/Kansas/Enterprise
  • /United Kingdom/Navy/Ships/Enterprise
  • /Business/Terms/Free enterprise
  • /Psychology/Terms/Enterprise
  • /Canada/Northwest territories/Enterprise

Now, going to /Enterprise would yield a dynamic disambiguation page that would pull Enterprise from all subcategories; going to /United States/Enterprise would would pull all Enterprise entries from all subcategories of /United States.

Enter categories. Categories are really this hierarchy placed over a flat system. You could call Category:Star Trek ships equivalent to /Television/Series/Science fiction/Star Trek/Ships/

Consider putting an article into multiple categories like hard linking or symbolic linking. Categorization is trying to put a hierarchy on flat file system. The real problem is that the article names on Wikipedia have no inherent order to them, thus the need to disambiguate The Alamo (1936 film) from The Alamo (1960 film) instead of:

  • /Films/1936/The Alamo
  • /Films/1960/The Alamo
  • /Films/1987/The Alamo
  • /Films/2004/The Alamo

and The Alamo could be generated from the articles and their path.

Converting WP from a flat file to a hierarchy would make categorization implicit. Prefix all the above with "/en" and you get your multilingual WP. In the end, this whole process is reinventing the wheel with what file systems, library information, URI/URL development, etc. have all gone through. So why not learn from them?

</rambling> Cburnett 21:11, 17 Mar 2005 (UTC)

Great idea, but what would it take to accomplish this, and how long would we be waiting for it? -Kbdank71 21:30, 17 Mar 2005 (UTC)
And there's the problem. I would guess that it'd pretty much require a fresh start in coding. I said it wasn't iron clad and I also never said it would be easy. :) Nonetheless, I think the categorization policy posed is solving a symptom and not the real problem.
I guess I don't see the "category crisis" (just made that up) as a crisis...at least it's being played that way. Come on, requiring admins to pretty much do all of the categorization now...that's a pretty drastic change. Cburnett 21:53, 17 Mar 2005 (UTC)
Nobody is suggesting that admins do all the categorization. The suggestion is that categorization is done by consensus of whomever is willing to help. It only requires an admin to create/rename the cats as decided (and that'd take all of twenty seconds). Then the people willing to help will fill the cat. 84.81.42.123 22:57, 17 Mar 2005 (UTC)
Touche. I apparently didn't read #4 carefully enough. Still, I don't see the need require admin intervention in the process. See following heading for discussion on a more realistic proposal. Cburnett 00:31, 18 Mar 2005 (UTC)
Notice, however, that OS vendors have been expending a great deal of effort to get around the limitations imposed by hierarchical organization, always attempting to offer better search facilities. The metadata being used for some of those search schemes looks an awful lot like MediaWiki's categories, and the other big piece of those search tools is to find things by name. Would fitting everything into a single hierarchy be a step forward, or backward? --iMb~Mw 21:39, 17 Mar 2005 (UTC)
Searching the contents of a file is irrelevant to the storage method (flat, hierarchial, etc.); searching by file name is irrelevant to the contents of the file. (Just to clarify, what are the limitations of the hierarchial file system that apply here?) The purpose of categorization is to link similar articles together, which can be directly equated to directories. For example, I could put all of my vacation to Fiji in one directory but I could also create another directory for all the pictures my mother is in (which, theoretically, may overlap with my theoretical Fiji vacation pictures). I could create both directories and hard link the same photo in both directories and I get the same results as if I just threw the files somewhere and indexed (read: categorize) their metadata (where & who's in it) to search. Cburnett 21:53, 17 Mar 2005 (UTC)

How would the choice of directory (and of the appropriate directory structure and directory names) be any different from the current situation with categories? -Aranel ("Sarah") 00:29, 18 Mar 2005 (UTC)

Ever move a file on your computer? If the current structure is disagreeable then just move it. For example, let's say it shouldn't be "/Television" but rather "/Entertainment/Television" then move it. When you rename a directory on your computer, it doesn't require renaming of anything other than the directory. Cburnett 00:51, 18 Mar 2005 (UTC)
Disagree.-- The category structure is a more general graph than a tree. Trees are excessively rigid, in that a given node may only have one parent. WP articles may belong to more than one category, and that's a Good Thing. The last thing we need is a hierarchy.
I do agree that categories, as well as some lists, templates, and disambiguation pages all point to a missing component in the WP engine: a more advanced method of documenting and reporting the relationships between articles. The overall structure of WP (as the larger net itself) may be represented by a complex directed graph with both colored edges (A is the Talk page of B; A is the Disambiguation page of B; A is a Template used on B) and colored nodes (namespaces and categories). This is a stubborn subject for analysis, considering that not only are the number of nodes and their interconnectioning edges constantly changing; even the lists of possible colors for nodes and colors for edges are dynamic.
By "stubborn", I mean that it is, I believe, much worse than NP-complete. In one sense, analysis is not only impossible, but forbidden; the rules themselves keep changing. From an engineering perspective, the task is formidable indeed. Dynamic creation of a new edge color automatically spawns a host of specific nodes linked to existing ones and to each other via rules that are also subject to edit; each change, however minor, spawns another archival version, with all its links to the nodes which existed at that moment. A user peering through the time machine at an old page must consider that the rules governing a certain color of edge may have changed in the interim.
Now, I can fix this. I'll need a team of programmers, a nice, well-lighted office, and enough chocolate-chip cookies to keep the hamsters coders going for a few months. By the time we're done, you'll hardly recognize the site. When you log in, naked angels (of your preference) will swim out of the window, stroke your lapel, and purr. The navigational links will parade across your screen doing the can-can and Michigan J. Frog will tip his hat. Who pays?
The entire current structure must be replaced by a general metamodel in which all graph objects -- nodes, edges, colors, rules, users, versions -- are represented by identified nodes in a fully connected graph. Of course, this increases the size of the graph somewhat.
The current structure is messy, but fixing it will be a really messy job. — Xiong (talk) 10:03, 2005 Mar 18 (UTC)
Naked angels, eh? When can you start?  :) -Kbdank71 13:35, 18 Mar 2005 (UTC)
Sorry; not today. :) I thought it over and realized that the graph did not have to be fully connected (complete) and the engine restructuring might actually not be too bad. Migrating all content to the new engine might be nasty work. I'm working on a proposal anyway; perhaps in 150 years or so, we'll have the resources -- and we'll surely have the burning desire. — Xiong (talk) 00:00, 2005 Mar 20 (UTC)

the horror! the pov battles! (/people/rulers/usurpators/dictators/g_w_bush or /ideologies/suprematism/racism/judaism anyone??) they attempted these semantic trees since the 1600s. needless to say, with very little success. the categories are a rough gesture at such a hierarchical system, but we wouldn't want to base wiki namespace on it! believe me, we're better off with "all articles on the same shelf. dab () 15:15, 18 Mar 2005 (UTC)

strong disagree. The hierarchial system was the original proposal that was ditched in favour of categories. Categories allow the flexibility to allow an article to be in more than one category. For example I've just followed a random link to Karl Gustav Homeyer. That article is in the following categories:

  1. category:1911 Britannica
  2. category:1795 births
  3. category:1874 deaths
  4. category:German historians.

Each of these categories are sub categories of two other categories, some of which are themselves subcategories of more than one category - where would you put that article? Thryduulf 16:11, 18 Mar 2005 (UTC)

Answer: links. Quote:
"Consider putting an article into multiple categories like hard linking or symbolic linking."
To Xiong and Thyrduulf, next time read my proposal before lashing out with the "disagrees". Notably, exactly how does one get a tree when links are allowed? That's right, it makes them a graph. Seriously, read first then respond. Not too hard, eh? Cburnett 18:22, 20 Mar 2005 (UTC)
Maybe I wasn't paying attention. Sorry. You started this thread with the words "hierarchial filing system"; your introduction was This is probably the wrong place...; and you closed your comment with </rambling>.
I simply oppose all static structures. I'm working an a dynamic model for WP content organization which might or might not be ready to look at before I die. If not, rest assured I'll upload the bits and pieces so Somebody smarter can put them together. — Xiong (talk) 02:07, 2005 Mar 21 (UTC)
A hierarchial system need not necessarily be static. I elaborated a bit on it User:Cburnett/Hierarchy there. Much like categories as they are now, you would insert tags that define where the article is (see just after the first Raegen listing). It would be very much like linking an article into a directory. You can do it as many times as you want and would be completely independent of the grandparent location (only directly tied to parent directory). It is truly like a graph but specifically oriented to be presented as a tree (based on the assumption that knowledge can be hierarchially arranged).
I think it's the wrong location primarily it's probably too radical of an idea to be discussed on a page about a policy consideration. If there's a better forum, then I'd be very interested in knowing. Cburnett 03:19, 21 Mar 2005 (UTC)

Other than the ability to have two articles with the same name, I don't see a fundamental difference between this proposal and the current categorization scheme. Currently, you can consider every categorized article to be in the "directory" (usually multiple) /fundamental/.../category/article (assuming the category itself or some parent is not orphaned). Allowing two different articles to have the same name so long as they are in different "directories" doesn't really solve the disambiguation issue Wikipedia faces. For example, two different movies with the same title should BOTH appear in /...(whatever).../movies/HERE. You can require that this not happen (in this example) by enforcing a year "directory" between /movies/ and its contents, but you haven't fixed the real issue, you've just moved it from articlename to pathname/articlename. -- Rick Block 20:46, 22 Mar 2005 (UTC)

Different proposal[edit]

Instead of requiring admin intervention, perhaps the category creation page needs to be changed. For example, [1] says *nothing* about the category naming guidelines (or any naming guidelines directly).

Start simple. Change this page to include "Please read the ____ naming guidelines before saving." I have a strong suspicion that the badly named categories are the fault of ignorance rather than malice. The answer is education instead of "legislation." Cburnett 00:40, 18 Mar 2005 (UTC)

There is similar message about copyrights in the file upload page, but some people still don't care about it. -Hapsiainen 02:14, Mar 18, 2005 (UTC)
I'm struggling to find your point. When I first joined, if there wasn't anything on that page (like on the new article/category/template creation page) I wouldn't have know about copyright tags until after someone pointed them out to me. It would appear I'm living proof that actually providing instructions can do good.
Category creation page contains nothing about naming guidelines nor does it point to when to use a category vs. list vs. template.
NOTHING
So again: what's your point? The real question you should ask yourself is if putting "Please follow these guidelines:" on the page has increased the number of uploads that adhear to those guidelines. Cburnett 02:28, 18 Mar 2005 (UTC)
My point is that misbehaviour would still occur even if we gave people guidelines. I admit I aggravated it. However, I didn't say "all people". I agree that guidelines would lessen the unintentionally badly named categories. But there are still people who don't care. They want to have some fun in Wikipedia and create silly categories. Or they definitely must do as they wish, at least they think so. Or they don't bother to examine further how the things are currently done. Or they think that their POV categories are neutral. Or it is unclear to everyone what is approproate, before they discuss and negotiate.
It is easier to create a POV category than a POV image, because categories classify and label the articles, images are just something extra. Categories are more central. -Hapsiainen 03:16, Mar 18, 2005 (UTC)
To be frank and honest, your point isn't worth addressing. Why? If the goal is to remove all misbehavior then WP would be locked down. Only Jimbo could do edits. Best yet, just disallow editting. Guaranteed no vandalism. Guaranteed abidance of guidelines. Guaranteed WP:CFD, WP:RFD, WP:IFD, etc. traffic would reduce to nill.
WP will always have it if it's open to anyone.
Start simple. Right now, there's NOTHING at the creation page guiding anyone. You haven't come anywhere *near* convincing me the problem is malicious editors and not ignorant users. Change that page and see how things improve. Why revamp policy when the bulk of the problem can be solved by educating users? Locking things down won't improve their understanding as much as "Please follow these guidelines:" notice on the page. Cburnett 06:52, 18 Mar 2005 (UTC)
"If the goal is to remove all misbehavior". You are making a straw man from my arguments and kicking it. I have already explained why misbehaviour and laziness is more severe thing when creating and arranging categories than when editing articles. And that writing and arranging categories needs less hands than writing articles. Could you finally respond to my point? I haven't written all of it in this thread, though, so your wrong idea of my opinions could be due to it. -Hapsiainen 12:18, Mar 18, 2005 (UTC)

See Meta:Instruction creep. Specifically, "The fundamental fallacy of instruction creep is thinking that people read instructions. If people read instructions, we wouldn't have the problem the new instruction is meant to solve." The problem isn't malicious users. But the solution isn't adding an extra warning to the templates (even though it wouldn't hurt either). Radiant_* 13:45, Mar 18, 2005 (UTC)

The assumption to instruction creep is that there already is some instruction there. I'm curious to know how you're saying adding any instructions will result in fewer people following the instructions.....where there are none to follow. Cburnett 18:16, 20 Mar 2005 (UTC)


Strict hierarchy is not the right approach[edit]

I disagree with the filesystem-directory-like filing system proposal. Wikipedia has abandoned subpages for a reason. Modern information management software (e-mail clients, personal information managers, etc.) is also abandoning hierarchical folders, preferring to have a big bag of data out of which users can pull out certain slices which are of interest at the moment. The world's information cannot be classified into a neat hierarchy; it really is more of a directed graph, with arbitrary connections. The current idea of arbitrary category membership of articles and subcategories does a better job of handling complexity like this than a flat hierarchy would. Non-hierarchical relationships would just end up sneaking back in with symbolic-link-like cross-references, anyway. It's also easy to establish a strict hierarchy on top of an arbitrary graph infrastructure (which is what categories sometimes do) but not the other way around. -- Beland 06:16, 21 Mar 2005 (UTC)

I'll repeat, again, for the people in the back that still aren't paying attention/reading my proposal thorougly:

Answer: links. Quote:
"Consider putting an article into multiple categories like hard linking or symbolic linking."
To Xiong and Thyrduulf, next time read my proposal before lashing out with the "disagrees". Notably, exactly how does one get a tree when links are allowed? That's right, it makes them a graph. Seriously, read first then respond. Not too hard, eh? Cburnett 18:22, 20 Mar 2005 (UTC)

So, do I need to iterate this point again? It's is primarily a hierarchy of information but ability to not be a tree. Even if I can file the same article in multiple "directories" does no mean that information is generally hierarchial.

The shift that you speak of is away from rigid hierarchies to a hierarchy of metadata. gmail still has a hierarchy based on labels and conversations. The hierarchy is still there, just less obvious. What's most interesting about the gmail example is that there still is a folder hierarchy of inbox, starred, drafts, sent, spam, & trash EXCEPT messages can be in different folders. Hmm, sounds kind of like my proposal where I include linking of articles in different "directories" to achieve the same thing.

If anything, my proposal makes the categories less transparent and more obvious. So "/Television/Series/Science Fiction/Star Trek" or something instead of just "Star Trek"

Also, where did this strict in "strict hierarchy" come from? It appears you *really* did not read my proposal since I never said it (quite the opposite actually). Cburnett 06:39, 21 Mar 2005 (UTC)

I've just reread your proposal and at this point I really can't see what the significant differences are between it and what we already have. You yourself said "It is truly like a graph but specifically oriented to be presented as a tree." The current category system is a graph but its subcategories are almost universally arranged into a forest of cross-linked trees (and those bits that aren't are probably in violation of categorization guidelines anyway), making presenting current categories as trees quite straightforward. What does your directory structure proposal do that a directed graph can't, and why is that worth the effort of switching over to it? Bryan 07:44, 23 Mar 2005 (UTC)
Perhaps a third read is in order? :) Articles wouldn't be located at the root level (like on WP right now). Cburnett 19:21, 23 Mar 2005 (UTC)
It seems to stem from a fundamental misunderstanding of why subdirectories ever existed (it was a workaround to let many things be addressed from a small working set, with the major tradeoff being the need to traverse a tree to find anything). This has always been a suboptimal arrangement. Roget used a tree-like organization to save paper, with the cost that finding anything takes multiple steps. Brittanica adopted something like that in the 1970s, making it arguably easier to tie together subjects on paper but again requiring multiple lookups to get at all of a subject.
An assumption that information can fit into a tree is exactly what stalled Wikispecies -- while species can be and are classified into hierarchical structures, there are multiple, incompatible groupings in use. The current MediaWiki scheme, where multiple relationships are drawn around the entries rather than the reverse, and areas that don't fit together aren't forced into artificial relationships, is eally quite sensible. --iMb~Mw 10:09, 23 Mar 2005 (UTC)
Two things:
  1. I must not keep up on my literature, but what exactly is the optimal method for storing information? I wasn't aware one had been found.
  2. I didn't say tree; I said hierarchy. A hierarchy does not necessitate a tree.
Cburnett 19:21, 23 Mar 2005 (UTC)


Lists and categories[edit]

In the long run, categories need to be a.) faster and b.) easier to edit. (It's absolutely ridiculous that I had to write a bot to facilitate category moves.) The proposal of storing a list of the members of a category in a single file, perhaps as an "enhanced" article, aligns well with both of these goals. The important thing to preserve is that changes made in one place show up in all other relevant places. That is, if you edit an article and add it to a category, that change shows up on the category page. If you edit a category page and change the membership list, that change needs to show up in all the affected articles. Lists can't do that, and that makes them potentially harder to maintain. The proposed re-engineering would also make it possible for categories to be annotated, just like lists. At that point, the only thing that lists would be good for that categories wouldn't, would be to create groups that aren't listed on the article pages themselves. This can be good, because some articles would get cluttered up by all their memberships. If a feature were added to enable categories to be "hidden", then categories would essentially obsolete lists, and then it would be a good idea to convert all lists into categories. (I assume there would be a way for curious readers and editors to see an article's "hidden" category memberships.) -- Beland 06:16, 21 Mar 2005 (UTC)


Beland is right. The current category implementation is poor. Category membership should be implemented as a single "metadata" file, not distributed among all the members of the category.

This would:

  1. Make categories MUCH easier to edit and maintain.
  2. Cause any changes to the categorization to appear in one location, instead of being distributed among all the members of the category.
  3. Allow categories to be annotated, like lists.
  4. Allow categories to be easily sorted and subsorted (And also sorted in multiple ways, if the category entries and annotations are treated like a flat-file database).
  5. Alow categories to effectively link to empty pages, as do lists. (There could be an option to turn on or off viewing of empty pages.) This setup would also allow newly created pages to auto-link to categories!

A system like this would make both lists AND current-stye categories obsolete. Alright, where's the main discussion on "Categories version 2"? - Pioneer-12 21:57, 31 Mar 2005 (UTC)


Proposal: "New category" warning[edit]

The most common categorization mistake, and the one we are actually currently the most backlogged in fixing, is when someone puts an article in a category that doesn't exist or isn't connected to the universal category hierarchy. These errors are actually easy to detect. Every database dump, I have Pearle do a scan and dump a list on Category:Orphaned categories. I would support adding a "warning" message when someone tries to commit an article that links to a non-existent category:

Category:Foo does not yet exist. You might want to check your spelling or see if there is an appropriate existing category. Choose 'Save anyway' to create this category. Be sure to edit the category after it's created and put it in the appropriate parent category in the existing tree.

If someone is intentionally creating a category that the enlightened majority will think is malformed, that's fine with me. It's much easier to clean up these few instances than the large number of errors and blind assignments. -- Beland 06:16, 21 Mar 2005 (UTC)

I would support this as well. It might even be useful to include in the message a generated list of possible alternatives (particularly spelling and capitalization variants). -- Rick Block 19:24, 22 Mar 2005 (UTC)

The Distant Future[edit]

In an ideal world, the software would suggest some possible alternatives. Pearle gives a "top ten best guesses" for Category:Orphaned categories. The correct category is suggested less than 50% of the time, but I am using a very primitive algorithm. Also, many times the "right" category is only a few clicks away from one that is suggested. But doing this is very computationally intensive, and would take time to develop properly. So this is a pie in the sky idea for now. -- Beland 06:16, 21 Mar 2005 (UTC)


Solution: How to fix list vs categories shortcomings[edit]

This is cross-posted from Wikipedia talk:Merge some redundant lists to categories, to get more general discussion

Lists and Categories generally try to solve the same problem, but have different appealing functionality. Categories are better conceptually, and save on manual edits, whereas lists allow for:

  1. Extra information per each item
  2. Missing articles ('Red links')

For #1, I propose parametrized categories (somewhat similar to template parameters). For example, in some actor's article, inserting [[Category:Actors|Doe, John|Birth=...|Death=...|Descr=...]] would create properly formated item on the Category:Actors page, with birth/death/description next to the name. We can have some standardized formating element on the category page, describing how category parameters should be rendered: {{Render:* ({{{Birth}}} - {{{Death}}}), {{{Descr}}}}} (here, '*' is auto-replaced with the article name)

For #2, category pages can allow special "missing" element template: ''{{thiscategory|Jane Missing|Birth="1/1/01"|Descr="..."}}''. This element would force current category page to add "Jane Missing" to the list as a red link. A bot can later clean up {{thiscategory...}} templates once articles have been written.

The feature request has been added to bugzilla: [Bug 1775] --Yurik 07:23, 30 Mar 2005 (UTC)


The Finnish botanist question[edit]

Overall, I think this proposal is a very good one. I'm wondering though, whether it's really a problem having a well-constructed category with few members. For example, if a particular WP reader has an interest in Finnish botanists, it would be useful for that person to see a Finnish botanist category. I think the real problem is that when an editor assigns the Finnish Botanist category (or the British Botanist category for that matter) you lose some of the utility of having all those botanists appear in the more general "Botanist" category. In other words, it prevents the display of a complete alphabetical listing of all articles that are at the same level of sub-category. It means there's no way that the reader will see all the botanists without clicking, one by one, through every national botanist category. This seems like a major problem even when the subcategories are well-populated. Perhaps what is needed is a checkbox or button in the category page that will allow the reader to display all articles within the current set of subcats (or even one or two levels below). --Lee Hunter 15:42, 16 Mar 2005 (UTC)

Lee Hunter is right; there should be an option to display articles in subcategories. And Radiant has a very good idea; an "intersection of x and y categories" feature would be very powerful. Wikipedia's MediaWiki needs a software upgrade. Now, where do I go to disuss MediaWiki software upgrades?... --Pioneer-12 19:23, 31 Mar 2005 (UTC)
"Categories: Botanists by various nationalities - These are not useful. It breaks up the complete list of botanists at Category:Botanists, thereby making pages much harder to find when one needs to know (a) if a particular botanist has a page, and (b) what the page is titled, when adding links at e.g. a plant named by that botanist. Botanists are also highly international in their work, and the their nationality is often completely irrelevant to the areas they worked in, making it hard to predict what their nationality might have been (e.g. Siebold worked mainly on Japanese plants while based at a Dutch mission, making it very hard to know that he can only be found listed at Category:German botanists). These subcategories would be best deleted, or at the very least, any botanist listed at one of these subcategories must also be on the full list at Category:Botanists". - MPF 13:05, 17 Mar 2005 (UTC)
  • One other very big advantage of large categories is that by going to the category and clicking on 'related changes', one gets a very handy quick overview of which articles have been edited recently; I now find this the most convenient way of finding changes and vandalism in the groups I'm working on a lot. Doing this for one large category is a lot easier than for several small categories.
  • One point that would be very useful in this context, would be to have all the articles at a category listed on one page, rather than having the current ceiling of 200 articles per page (a very annoying example of this is at wikimedia commons Category:Plantae_by_family, where a quater of the families are relegated to page 2). - MPF 13:05, 17 Mar 2005 (UTC)

Lists are good for the wiki[edit]

A similar plan was proposed on the Commons, and it was pointed out that MediaWiki is put under a heavy strain when dealing with categories. Eliminating lists would be a bad idea. If either a list or category could serve in a particular situation, the list is the better choice. --iMb~Mw 15:47, 16 Mar 2005 (UTC)

  • Could you substantiate that please? It seems to me you are mistaken, because categories were implemented later as a possible substitute for lists, and you claim here that every category should be replaced by a list. Radiant_* 09:02, Mar 17, 2005 (UTC)
Oh yes, and to clarify. I did not state that every category should be replaced with a list. I stated that where either one would serve a particular situation, it would be better to go with a list. That's not always the situation. --iMb~Mw 12:01, 17 Mar 2005 (UTC)
  • Sorry for misunderstanding, but it seems to me that in any situation where a category would serve, a list would serve as well (except of course that a category has certain functionalities that a list lacks). Where would you lie the bar between them? Radiant_* 13:26, Mar 17, 2005 (UTC)
Anybody remotely familiar with database design realizes that Lists which are files, are infinitely preferable over categories which are not file based. the hardware strain of Categories will put Wikipedia under as categories start growing geometrically. -- John Gohde 01:28, 18 Mar 2005 (UTC)
To me, categories make sense when the goal is to tie together related articles that don't fit naturally into a list. For example, the set of all IRC-related articles would be a good application for a category: it gives you, at a glance, eveything about the subject from software packages, history, cultural phenomena, personalities, and so on. That's very handy if you aren't sure exactly where to look. On the other hand, putting the IRC clients into a list would work better than a category, because notes (what platforms, whether it's still being used, etc.) can be put on the list page, and ordering can be logical rather than merely sorted by name.
Ideally, a guideline or policy would spell out the relative advantages of lists and categories, so that editors can make reasoned decisions about which one to use. Right now the decisions are definitely haphazard, and better guidelines would be a big help. --iMb~Mw 13:54, 17 Mar 2005 (UTC)
Lists vs Categories? The current guidelines are here: Wikipedia:Categories, lists, and series boxes. I recently added the relative advantages sections myself. Please add what you can! - Pioneer-12 19:46, 31 Mar 2005 (UTC)
    • I strongly concur with that, and I believe we should set up a WikiProject:Categorization regardless of the outcome of this discussion.
  • Okay, thanks for pointing that out, I was unaware of the inefficiency of categories. That said, would you think the proposal is a good idea if it is delayed until the category implementation system has changed? Radiant_* 13:26, Mar 17, 2005 (UTC)
    • If in the future, the category mechanism receives a major overhaul, then yes, it would be good to revisit this. Of course, any guidelines or policies would need to reflect how categories work if and when that happens. If software changes make them easier to manage, some of the proposed restrictions could become obsolete. --iMb~Mw 14:14, 17 Mar 2005 (UTC)

Yes, please do see my comments at Commons. Sorry, I don't like disappointing people, but I do need to point out the extreme diffeence in page building cost. Yet I do see that people want to use categories in this way. And they are handy. I know I'm not the only technical person who knows this. From the look of it what is needed may be a complete or very substantial rewrite of the category architecture, so that the contents of a category is somehow stored in a normal page which can be cached (or is cached in some other ways, but a normal page also solves the search and linking problems, so it may be the neatest way to do it). Once that's done they will cease to be so painful to generate and we'll be able to use them completely freely without worrying so much about the impact on site load. The desire (and need - categories are already a pain for the servers) is clear. For now I suggest patience and revisiting in the Mediawiki 1.5 or 1.6 timeframe (which means 4-9 months, roughly). Hopefully by then we'll have amore efficient display method for categories. I'm guessing that 1.6 is more likely than 1.5, because 1.5 is mostly tackling some other scaling unpleasantness we have. Jamesday 19:14, 17 Mar 2005 (UTC)

  • It does occur to me that the proposal to lock the category system would serve to reduce the amount of categories, which would then be a Good Thing. Only merging lists into there would not be, for the time being. Radiant_* 09:48, Mar 18, 2005 (UTC)
List articles should be retained even if there's a similar or identical category. I like being able to see what was added to or deleted from the list, and when, and by whom, just by checking the list's edit history. I like finding all the inclusion issues discussed on the list's talk page (instead of in should-this-cat-be-added threads on several article talk pages). Some people may find it helpful to copy or print the list article, and not worry about how the categorization software is handling nonexistent articles, or to link to a particular historical version of a list that they know will stay put. It wouldn't surprise me if there are analogs to all these things with the category system, but for the benefit of those of us who are more comfortable with a straightforward list, let's keep the "old-fashioned" method, regardless of what's done about category creation. JamesMLane 13:37, 19 Mar 2005 (UTC)

Redlinks on category page[edit]

If we try to list nonexistent articles in category pages, we actually get two lists in the same page: the proper category and a list of nonexistent articles above it. Such looks ugly and is an artificial distiction for the reader. -Hapsiainen 13:14, Mar 17, 2005 (UTC)

Maybe the list of nonexistent articles should be kept on the category talk page instead. And wouldn't you say that having two lists in different pages is unwieldy? Radiant_* 13:26, Mar 17, 2005 (UTC)
Talk pages may not get noticed highly over article space. -- AllyUnion (talk) 14:11, 17 Mar 2005 (UTC)
It is rather artificial than unwieldy. The division shouldn't be based on such an arbitratry criterion like whether an article exists. Such idea has nothing to do with the meaning of the articles. Category pages are not requsted articles pages. Moving a list of nonexistent articles to a talk page would make the division even worse, so it is not a solution. -Hapsiainen 14:23, Mar 17, 2005 (UTC)
  • Problem is that the current approach (having a category and a list that partially overlaps) is similarly artificial and unwieldy. They become divergent very easily, and it's a lot of work to check if every listed article is categorized, and vice versa. Radiant_* 15:45, Mar 17, 2005 (UTC)


I like it, but...[edit]

I'm also concerned that we don't have enough admins to take care of this properly. Also, regarding category standardization(e.g. German cities vs. Cities in Germany), there are many disagreements on CFD concerning exactly that. Do we already have standards for this, or does it have to be solved on a case by case basis? -Kbdank71 13:51, 17 Mar 2005 (UTC)

RE: German cities vs. Cities in Germany. A proposal has been made to deal with this exact issue. See Wikipedia:Naming conventions (country-specific topics). - Pioneer-12 20:03, 31 Mar 2005 (UTC)
  • I believe we clearly need standards for this. And to avoid repetitions of arguments, we should hold them once for each group of articles, rather than individually for each article.
  • This will not be a lot of work. It needs a single admin, once per week, to make the changes and additions decided upon by the New Category Request page. As standards are decided upon, this amount of work would quickly die down (and note that by this proposal, CFD will also die down quickly, which it does not under current policy). Radiant_* 15:50, Mar 17, 2005 (UTC)
I tend to agree that the proposal to limit category creation/movement to admins is not a good idea. It is too restrictive. On the other hand, the current category system is a big mess, and its current state represents strong evidence that a good categorization can't be achieved by numerous small edits by different editors, without some planning and attention to the overall scheme. It is different with articles, where within one article, one can hope for eventualism to bring about a decent article -- eventually. With the category scheme, it just seems to get more chaotic, and there doesn't seem to be anything pulling it back towards reasonable organization. Also at present there is no way to edit the overall scheme, and if implementing a categorization scheme reqires some categories to go through CFD, it becomes even harder. --BM 14:29, 17 Mar 2005 (UTC)
But that's exactly what this proposal aims to achieve! Look, the restriction to admins is not intended to say that admins make the decisions on what categories to create. Rather, the concept is to have the community come up with a coherent scheme (and review proposals for new categories, to make sure they fit within that scheme), which the janitors then implement. Unless you have some restriction on creating categories, it will remain a mess, for the exact reasons you point out.
Ideally, we'd create a class of "category janitors", who are the only people who are allowed to create categories, but unless the developers can find the time to add that feature to the software, restricting it to admins is probably the best substitute we've got. Noel (talk) 15:04, 17 Mar 2005 (UTC)
There will probably never be a way to edit the whole category scheme without going through CfD or a similar procedure first. I think that coming up with standardized conventions for naming and including categories is the first step to solving the problem suggested here. Blatantly redundant categories should be candidates for speedy deletion once merged into the right category. Wikiproject Categorization is a very good idea, and would help to consolidate discussion into one place. -Sean Curtin 00:23, Mar 19, 2005 (UTC)


General comments[edit]

Rather than trying to split it up I'm just going to post all of my comments at once.

  1. Except for the fact that we cannot easily move them, categories are just like articles. We should no more require approval to create a category than to create an article. (A better idea might be to push for a feature that would allow us to move categories as easily as articles. Or at least more easily than is currently the case.) Categories are still relatively new. Standards are evolving. It's not an overnight process.
  2. Using the stub sorting project as an example is actually fairly useful. The Project suggests that new categories be discussed first, but it doesn't require this. Often, new stub categories are created without any consultation. If they are useful, they are kept and used (see for instance Template:Norway-bio-stub). If they are not useful, they may be deleted.
  3. Categories are not lists. It's true that a list of articles is redundant with a category and should be merged. However, a list of, say, towns in Bosnia is not redundant, since there is a good chance that it will include items that do not currently have articles (but should). Lists also have the ability to be annotated, which makes them more appropriate for unusual or potentially controvesial subjects.
  4. The way that templates and categories are used is evolving. Redundant templates exist. (There are also redundant articles. They are merged. The same should hold for templates and cateogories.) We are working towards standards. -Aranel ("Sarah") 00:23, 18 Mar 2005 (UTC)

Re #3: I try and make lists into something more than categories. For example, List of Star Trek: Enterprise episodes contains much more information than Category:Star Trek: Enterprise episodes though they do have duplicated data.

The posed List of computer viruses is a good example of a list that needs drastic improvement. For example, the list could contain the author(s), the date released, the extent of spread, etc. I see it as an example for future improvement rather than an example of the fault of lists. Cburnett 00:38, 18 Mar 2005 (UTC)

Categories are a mental disorder[edit]

Categories do not work, unlike lists to related articles, during server problems. Categories are a real time feature that obviously puts a tremendous burden on Wikipedia's limited computer resources.

Looking at the bigger picture, categories are totally botched on all mirror copies of Wikipedia. And the reason is totally obvious. Categories are not a file. The implication is clear. Anything done with Categories will not exist on mirror copies of Wikipedia, whereas commonsense systems that are based on Lists are totally functional. -- John Gohde 01:19, 18 Mar 2005 (UTC)

  • Categories are a work in process and will continue to improve. This proposal hopes to achieve one step on that way. Radiant_* 09:53, Mar 18, 2005 (UTC)
  • They may not work on mirror sites, but that should be a secondary concern. Categories have the potential of greatly increasing the usability of wikipedia, which I thought was the whole idea of why we exist in the first place. A repository of infomation is only as good as the ease of which people can retrieve that information. In my book, this fact is a big vote in favor of categories. To me, they're also easier to manage/maintain than "List of Foo".Feco 20:44, 10 Apr 2005 (UTC)

Contra categories[edit]

I don't really expect anybody to care much, and I know the decision must already have been made, but I'd like to register the fact that I really don't like the category feature that's been added to Wikipedia. The beauty of Wikipedia is that it constitutes its own category scheme. This just adds another layer of complication to the (large) category scheme that *is* the network of Wikipedia articles. If you want to know how knowledge is organized, you should be able to gather that from the links in the articles themselves. If you want to be able to navigate around articles easily, that's what the links within and below the articles are for. The whole idea of categories as implemented here strikes me as an excuse to impose conceptual hierarchies instead of letting the structure of the universe display itself beautifully, as it will, if we write and interlink articles as you have been. I've elaborated some of these thoughts, in case you're interested, here. Thanks for listening... Larry Sanger

Hey, it's Larry Sanger! Cool. That being said, I disagree. :-) I think "conceptual hierarchies" are useful and beneficial, as long as they are multidimensional and not forced on you. (That is, you don't have to pick one definative category for an article.) - Pioneer-12 20:22, 31 Mar 2005 (UTC)
I generally find categories to be more useful as a maintenance tool for work in progress than as something readers need to see. Categorization doesn't add much to the handful of perfect articles, but it's very helpful for finding duplicates, spotting poorly-titled articles (with the widespread use of pipes, titles are very often not what they seem to be), comparing coverage between the different languages (several non-English WPs now have hundreds or even thousands of articles with no English counterparts), and so forth. I expect to see them used to aid content checking at some point ("what biographies are still missing birth/death dates?"). Stan 16:18, 8 Sep 2004 (UTC)

Categories, at best, are an administrative tool that should never be seen by the public.

I feel vindicated by Larry's position and his many other sentiments and experiences that mange to echo my own thoughts on Wikipedia quite well (see my user page for details).

Navigation boxes, article series boxes, infoboxes with hyperlinks, and hyperlinks, within the text of articles, to lists that are by definition harddisk based work quite well under all operating conditions and whether or not you are looking at the original Wikipedia or one of the scores and scores of mirror Wikipedias.

In short, categories are just plain stupid IMHO, and their geometrical growth makes the growing insanity of Wikipedia real-time features ever more insane from a view point of both practicality and commonsense. -- John Gohde 11:30, 18 Mar 2005 (UTC)

  • Might I point out that the proposal under discussion here will actually reduce category usage? I don't think categories will be going away, but they would (arguably) be more useful if they don't grow too rampant. Radiant_* 13:09, Mar 18, 2005 (UTC)
I forecast geometric growth for categories. Quoting from the project page:
"The major technical problem with categories compared to lists is that categories are horrendously inefficient, requiring hundreds or thousands of times the resources per page view. This extreme cost disparity arises because the contents of category pages isn't cached. Instead, it's generated anew, using a new datbase query and page build with every page view. A normal list page is simply loaded from the Squid cache server closest to the viewer, a very inexpensive operation. At present, 6 or so Squid cache servers handle about 80% of all hits to the sites, with some 40 other machines needed to handle the rest."[2]
I rest my case. -- John Gohde 13:58, 18 Mar 2005 (UTC)
  • What exactly is your point? You say categories are a Bad Thing, and that therefore you oppose a policy proposal that intends to limit categories? Or did I misunderstand you? Radiant_* 14:29, Mar 18, 2005 (UTC)
Limit the growth of categories? It is not going to happen. (See "The root problem: anti-elitism, or lack of respect for expertise."[3]. Also, see my user page[4])
From the comments posted below, I told you so. -- John Gohde 13:49, 21 Mar 2005 (UTC)

The editors who are currently pushing categories don't care about the cost of categories. Furthermore, you speak with a forked tongue. The project page if anything suggests to me that categories will only grow a lot bigger when most lists, templates, infoboxes, etc., are replaced with categories. I have already discussed the mess in alternative medicine. It will only get worst. -- John Gohde 14:45, 18 Mar 2005 (UTC)

  • No, I'm not. I'm proposing several things and putting them up for discussion, and it may well turn out that some of them (e.g. locking the category system) will be considered good ideas, and others (changing lists to categories) won't be. I'm presently convinced that the latter isn't a good idea but will become one in a later WikiMedia version. That doesn't invalidate the former, and frankly you haven't provided any arguments against that. It would stop precisely those editors who are currently pushing categories with no respect for consequences. Radiant_* 16:23, Mar 18, 2005 (UTC)
  • John Gohde, I think we get it. You don't like categories and don't think they work. Seeing as it doesn't appear they're going anywhere, I'd like to hear your ideas for perhaps improving the situation. -Kbdank71 21:12, 18 Mar 2005 (UTC)
Wikipedia strives to be the sum of all human knowledge. However, for knowledge to be useful, it must also be accessible. The obvious way of making the Wikipedian articles more accessible is categorization, NOT!
I really resent how the people pushing categorization feel that they have a god given right to roll over the past efforts of editors. In my humble opinion, the best way of making the Wikipedian articles more accessible is with small boxes (what ever you choose to call them) that utilize Lists which are disk-based operations.
Ever since the days of DOS, DbaseIII, and Lotus programing I have favored disk-based, over RAM-based, database programming. I am totally amazed at the number of realtime features Wikipedia employs. (I, also, think that keeping edit histories of all the articles that go on forever is insane.)
What you are proposing essentially is that a WikiProject be created to manage the categories. And, that all the participants of this project be Admins who have the expertise to do the work. Well, from my experiences with the WikiProject on Alternative Medicine that concept is NOT going to fly. And, my previous reference points out why it wont work.
  • No, that's not what I'm proposing. What I'm proposing is that a number of Wikipedians (not necessarily admins) look into the categorization system to make it internally consistent. Radiant_* 14:54, Mar 19, 2005 (UTC)
Frankly, the entire recent approach from the default user skin that clearly hides categories from the public at the bottom of the page while flaunting the existence of talk pages, makes the entire Wikipedia operation insane.
You already know my desired approach to the situation. When many articles are absolutely choking on graphics, all this complaining about an ugly little navigation box is totally absurd. Designing better and smaller boxes that utilize disk access is the obvious way to go. Lists are altogether a better approach. And, the best way to integrate these Lists is to uses boxes near the top of the article, rather than endless lists, listed at the bottom of the article. It is not going to happen. So, I wait for categories, templates, and watch lists to crash Wikipedia for the last time.
Your proposal, only appears to me as an even more insidious push to eradicate Lists, infoboxes, and templates under the absurd grounds that they are redundant with categories.
  • That is also not the case. I only stated that some lists and infoboxes were made obsolete by the category system. Clearly an annotated list has usefulness that cannot be duplicated in categories, but a mere list of links (that doesn't have redlinks) could be made obsolete (at least when fixes are created for the fact that categories aren't cached and require more computing power)
In the remote possibility that you are serious, I would redesign the whole article approach. Instead of articles having just a talk page, articles would have three options or tabs: Article, Talk, Navigation. Use your imagination, as to what you could put in the Navigation page.
So, what is your point? -- John Gohde 22:16, 18 Mar 2005 (UTC)
  • My point is that the category system could stand improvement, and that I welcome discussion on how to improve it. There have been a number of suggestions to that point, including my original proposal, the concept of /article/paths, and your suggestion to remove categories altogether. That's brainstorming, no? Radiant_* 14:54, Mar 19, 2005 (UTC)
Categories are wonderful. If I had first visited Wikipedia before they were created I doubt I would have stuck around. If you don't like them ignore them, but please don't try to sabotage one of the key features of Wikipedia which is appreciated by vast numbers of people. Your proposals reveal your techie background. Most people are not techies and do not think in techie ways. Wincoote 11:07, 19 Mar 2005 (UTC)

General opposition to prior restraint[edit]

We have perhaps hundreds of thousands of articles that aren't in any categories, much less the right categories. We also have lots of categories that are starting to get too big, and need someone ambitious to come along and subcategorize everything in them. Personally, I do a fair bit of categorization of articles. Having to ask for permission and wait 5 days before creating a new category would create a significant disincentive to do this work. I would object to doing so; it severely undermines the open-edit principle of the Wiki.

If WP:CFD is overloaded deleting categories, just imagine what a bottleneck would be created by having to approve each new category! Right now, far more categories need creating than need deleting.

I don't like the idea of prior restraint of the creation of categories. Making ideas subject to the veto of others before anyone can see what the finished project would look like means that some good ideas will get vetoed for bad reasons, including by people who won't put in any effort to explain themselves or make a better suggestion. The time-consuming part of categorization is not fixing "Barish Foo" to "Foo of Bar", it's assigning articles to a category in the right genre in the first place.

The time-consuming part of WP:CFD is not so much the mechanics of moving articles and categories around; it's figuring out what to do when people disagree on what the proper names are. The backlogged decisions on WP:CFD are usually those that aren't straightforward. There was also a backlog for a while while Pearle was inactive, but that's since been cleared. I hope to prevent that sort of thing from happening in the future by publishing her category cleanup code so that others can use it. (But first it needs to stabilize a bit.)

The category hierarchy is currently in an expansion phase, given how recently (compared to the size of the project) the category feature has been added. Articles are currently being added to categories faster than articles are being created. Obviously, the rate of categorization will slow down as the number of existing articles which have been categorized approaches 100%. At that point, there will be far fewer spurious or redundant new categories created, and the "standardization crew" will have a chance to catch up. Though some people already spend a lot of time "fixing" the category system without the approval of WP:CFD, either alone or in discussion on a local category talk page. I think it's important that we encourage them to continue to do so. -- Beland 06:16, 21 Mar 2005 (UTC)

On starting a new WikiProject[edit]

Please keep in mind while pondering starting new WikiProjects that Wikipedia:Categorization and Wikipedia:Categorization projects (current) already exist as central collaboration points. I also recently started Category:Wikipedia_categories in need of attention as a central collaboration point for remedial category work. It could certainly use more people fixing things up and also pointing out which categories need help. -- Beland 06:16, 21 Mar 2005 (UTC)

It's pretty obvious to me that people use and think of categories in different ways. As an inside joke in the knowledge management community would have it: "all discussions eventually end up pondering the meaning of life, the meaning of art, and classification." Here are some examples that I've seen, with advance apologies if it seems a bit flip:

  • "Everything needs a category," with the corollary that "much used categories are better categories." This favors broad categories that tell the reader what bucket(s) an article belongs to. These categories are pleasing to those of us with a smattering of OCD, since they make us feel that everything is in its rightful place.
  • "Categories as a means to create finite lists," with the corollary that "exclusive categories are better categories." These categories need to be precisely qualified to be effective, lest people happily dump things in there; and membership should minimally be based on criteria that could be argued about and ideally be objective.
  • "Categories as navigation schema," with the corollary that no category-subcategory structure should be recursive. This often goes along with the "everything needs a category" school of thought, but adds the need for hierarchy as well.

WP does not allow us to use categories for structured boolean searches, i.e., "give me a list of articles that are categorized both as 'Italian people' and 'Tenors'" to find "Italian tenors." As a result, people feel the urge to create a subcategory both under "Italian people" and "Tenors" called "Italian tenors," even if that completely clutters everything up.

My bias at this point is that it's better to have too many categories than too few; simply because we don't want to take away information from WP. If somebody wants to act as a de facto librarian and point out categories that are redundant, imprecisely defined, etc., that's fine.

My recommendation is that we beef up the guidelines on categories considerably, so that those who like to create categories and/or delete them understand fully what these categories are intended to do and not to do. My guess is that most people who create categories aren't looking to create extra work for themselves just for fun - they either aren't aware of an existing category or have something new in mind. --Leifern 14:01, 2005 Apr 1 (UTC)

Contra Contra categories[edit]

Categories and navigational boxes are really the same concept with different front-end and back-end implementations. Both mechanisms provide groups with membership lists, so all the related articles are linked to all the others, and updates are automatically propagated around.

There are pluses and minuses to each approach. A page can get awfully crowded if there are too many navboxes on it, but viewing a category requires an extra click, and not everyone may realize what they are for. (This could be changed, of course.)

The idea that "see also" links make the formation of categories unnecessary is somewhat misguided, I feel. There are several reasons for this.

  • In many cases, several articles will have "see also" lists that are nearly identical, because they form a logical group. The problem comes when changes are made in one article but not the others. Maintenance is simplified when the group is given a name and changes propagate automatically.
  • Navigating around by inline and "see also" links is like moving around a maze of twisty passages, where you can only see one step ahead, and you're not sure where it will lead on the second step. Even worse, it's a many-dimensional maze, with no clear sense of up and down. Even with their many non-hierarchical cross-links, categories give more of a sense of having birds-eye view of things. In a mature category system, you know that the list in front of you is the complete list of articles available on the subject, and that if you don't see what you need, you have a better idea of what direction you need to head. A lot of times that means going up in abstract concept-space until you find the branch you want to be on, and then you follow that down to the specific topic you were interested in. There's a reason that libraries have card catalogs - readers should be able to just know what resources are available on a particular topic. They shouldn't have to waste time discovering them. Putting articles into neat, tidy boxes and hierarchicalizing them makes article-space a lot easier for the human brain to comprehend and to navigate. Though we do realize that our boxes are not and should not be entirely tidy and we do not and should not have a strict hierarchy. But if we wanted to let readers explore the universe in its full and confusing complexity, we'd advise them to log out and go get lost in the woods.
  • Categories carry a unique sort of information that creates new opportunities for useful information systems. They are essentially tagging a particular article as being associated with a particular topic. Inline links can't always be considered to be doing this. Many of them link to other articles that are only tangentially related, because the article happens to mention a particular year or person or use a particularly confusing word. Human-created, machine-readable "topic tags" are very useful as input for search engines and language-processing systems. Yahoo! originally depended on a similar list which classified web pages. Google's innovation was to extract information from inline-style links. But what we have here is a web which directly tags and links semantic concepts; we are just beginning to experiment with ways to make productive use of this valuable information.
  • As others have pointed out, these properties make categories useful as a maintenance tool.

-- Beland 06:16, 21 Mar 2005 (UTC)


What problem(s) are we solving?[edit]

Is there a succinct description of the problem or problems with the current system that this proposal is meant to solve? The Description section on the proposal is fairly vague. Is the real problem the complexity? Inconsistent naming? Redundancy between categories and lists? Overly narrow categories? Is there a priority among these issues? I think without agreeing on the problem we won't have any chance of agreeing on the solution. One thing a wiki is good for is collaboration. How about if we collaborate on a clear statement of problems with categories without (at the moment) trying to solve them? This page has generated a fair amount of traffic, so how about here? I'll start. Feel free to add or change. -- Rick Block 00:27, 23 Mar 2005 (UTC)

Noted Problems or a Critique on Wikipedia's use of Categories[edit]

Categories don't come close to replacing a well-annotated list to related articles.

Anybody casually familiar with searching on Wikipedia should know this. I only know what categories are, because I am an editor. New visitors to Wikipedia are most likely to be using the default user skin [5]. With the default skin, categories are listed at the very bottom of the web page. Hence, nobody not currently an editor would ever even be likely to see any of the various categories, let alone know what they are for. Once found by a new visitor to Wikipedia, categories are still extremely confusing and time consuming to use.

I am objecting to this specific category guideline.

  • "An article should not be in both a category and its subcategory."[6], [7]

Take Category:Alternative medicine, for example. Every article on this topic can be put into some type of sub-category that would logically fall under this category. Yet, I see a whole bunch of articles listed in this major category when few if any articles should be listed per this category guideline.

Body work (alternative medicine) is a case in point. Note how alternative medicine is actually included in the title of this article. Yet, you wont find Body work (alternative medicine) in Category:Alternative medicine thanks to the editing efforts of editors following this Wikipedian category guideline.

Homeopathy is another case in point which has its own Category:Homeopathy. There are a number of articles in this category. But, not one of them refers to category:Alternative medicine. The only place category:Alternative medicine is found is within its sub-category Category:Homeopathy. My question is this. Why would a new visitor to Wikipedia reading homeopathy interested in finding other articles on alternative medicine ever click on Category:Homeopathy when they are already in the article on homeopathy? This assumes of course that they could find the link on the very bottom of the page that they are supposed to click on.

As an editor, I am familar with categories. There is a certain amount of logic to them. However, I am concerned only with the likely behavior of a new vistor to Wikipedia who is using the default user skin[8]. Suppose that visitor is trying to find articles on alternative medicine. That new visitor is not likely to find Category:Alternative medicine. And if they manage to find it, they wont find Body work (alternative medicine) on the list, thanks to this category guideline.

Visitors visit Wikipedia in order to obtain knowledge, but using the categories feature assumes that these visitors already have the knowledge that they are searching for.

Now image trying to do something really imagative with categories like creating a category in order to replace a well-annotated list to related articles. It would never work because of this guideline. Categories will therefore NEVER replace the value of a well-annotated list to related articles. -- John Gohde 07:33, 19 Feb 2005 (UTC)

Have you considered fixing the categorization? I suspect if you spent half the time you're spending railing against the horrors of your infoboxes being deleted on redoing the CAM categorization, the outcome would be more to your liking. Snowspinner 13:25, Feb 19, 2005 (UTC)
At least visitors will be able to find a few articles. Once it is fixed, they wont be able to find anything with categories. -- John Gohde 20:03, 19 Feb 2005 (UTC)

The problem with categories[edit]

In addition, categories do not work, unlike lists to related articles, during server problems. Categories are a real time feature that obviously puts a tremendous burden on Wikipedia's limited computer resources. -- John Gohde 05:59, 24 Feb 2005 (UTC)

Looking at the bigger picture, categories are generally totally botched on all mirror copies of Wikipedia. And the reason is totally obvious. Categories are not a file. The implication is clear. Categories do not exist on mirror copies of Wikipedia, whereas commonsense systems that are based on Lists are totally functional. -- John Gohde 01:12, 18 Mar 2005 (UTC)

To me, these are good reasons to a.) re-architect categories to make them more efficient, b.) visit the One True Wikipedia, which we should be encouraging people to do anyway, since that means that more edits will be contributed back to the main project. -- Beland 02:17, 26 Mar 2005 (UTC)

The Paradox of Sub-Categorization[edit]

So, any and all editors are supposed to assign CAM articles at their own whim to a sub-category, without any advanced planning, guidance, or control from a Wikiproject? The science people like to say that all of alternative medicine is quackery, yet they seem to be failing to put articles in category:Quackery rather than in category:alternative medicine.

Wikipedia:Wikiproject:Alternative Medicine/Classification Systems has documented that there are at least seven different ways to classify CAM articles.

The WikiProject's infoboxes currently classifies branches of medicine four different ways. That is NOT 4 categories. That is four parallel ways of classifying each of which requires more than one category.

To implement categorization by Classification by Standard of Knowledge and Quality of the Evidence alone requires 5 different categories: Real Science, Protoscience, Pseudoscience, Enlightenment, and the Supernatural. In addition to these 5 categories, two other categories have already been implemented: category:Quackery and category:Fraud.

Kindly, explain the difference between the category:Pseudoscience, category:Quackery and category:Fraud categories? What if someone decides to implement category:Health fraud?

The implications of this is that most branches of alternative medicine articles can be classified 5 different ways and could have up to 5 sub alternative medicine categories alone.

Currently, there are already 11 subcategories in category:alternative medicine. Should we add one on CAM stubs? Medicine already has Category:Medicine stubs. As time goes by, without guidance and more guidelines to follow the number of these subcategories in category:alternative medicine will get a lot bigger.

Now, exactly how does the use of categories enable visitors to find articles on alternative medicine? How do all these sub-categories help visitors find articles?

Time spent categorizing articles is a bottomless pit. Putting infoboxes in articles takes time, but at least the number of articles is finite. The way it is now, you could categorize articles for ever. And, somebody is sure to come along at a later point in time to undo what you have spent time doing.

This does not motivate me to spend any more of my limited time categorizing. -- John Gohde 11:57, 20 Feb 2005 (UTC)

Then please don't. Simply complaining about the system isn't going to fix it. Write and edit your articles how you choose, and let someone else categorize them. That's the beauty of the system. You don't have to do everything. -Kbdank71 15:46, 23 Mar 2005 (UTC)

Categories as semi-dynamic lists?[edit]

The current way people work on lists is usually like this:

  • An item is added to a list, with or without redlink.
  • A new page is created.

Since both the new page and the list are regular articles they are all cached, wich is fine. The current way of using cats is:

  • A new page is created.
  • This new page is added to a category.

Since the category is (at least to my knowing) currently not cached this degrades server performance.

Wouldn't it be possible to implement cats as semi-dynamic lists?

When adding a page to a cat that would mean that the cat/list-hybrid page would be automatically updated. The only difference from the current situation with lists would then be that the creator of the article wouldn't have to manually add his article to the list. Then both the article and the cat could be regular pages. Thus it would (from caching perspective) behave just as good/bad as the current lists.

Since those things would be quite light-weight they would (perhaps) also be suitable for other things than cats.

Also some lists are embedded in other articles. That makes them very hard to find when compared to categories. This increases the risk of incomplete lists, i.e. articles that should be in some list, but aren't because the creator of the article didn't know about the list.

Just a thought. Shinobu 19:23, 30 Mar 2005 (UTC)

Problems with current category mechanism[edit]

  1. Well meaning users add articles to categories that don't exist.
  2. Well meaning users add articles to categories that are effectively the same as existing categories but differ in wording (German cities vs. Cities in Germany) or spelling (UK vs US English variants) or capitalization.
  3. Various people disargree about what constitutes a reasonable category.
  4. The relationship of categories, lists, and navigational boxes is unclear.
The problem with the category mechanism is the problem with the template mechanism is the problem with the search mechanism, the mechanisms for lists, signatures, VfD/CfD/TfD, Help, Talk, disambiguation, etc., etc., and so on and so forth, almost without limit.
The MediaWiki engine works nicely on an article-by-article basis. However, if you reduce each article into an identity and some content, and momentarily disregard the content, you see we have essentially One Big Bag of unrelated items. On top of the One Big Bag is overlaid a patchwork of makeshift relationship-links and -groups, for each of which a different ad-hoc mechanism operates. In small, it is excessively rigid; in large, excessively fragile; at any scale of consideration, excessively inefficient.
Does anyone here have experience with relational databases? (Sorry, but the link will not help you; the article is a mere stub -- disgraceful.) If you have never used a relational database to organize information, you will be amazed at how much power and ease it brings to you.
The problem of relating sets of data to one another, and a set of data to itself (a self-relationship) is complex, but not novel. Relational database architects work on solutions to all the problems I've noted. We know:
  • How to organize data
  • How to search for data
  • How to group data
  • How to link one item of data to another
  • How to link one item to many others
  • How to link many items to one (not the same thing at all!)
  • How to log/archive/maintain a history of data
  • How to move data
  • How to delete data (and how to restore it)
MediaWiki is very strong on content formatting, but extremely weak on database structure. The Wikipedia community is strong on philosophies and policies intended to improve content, but content organization is a black hole into which the most selfless, dedicated members toss time and energy which would be better spent creating and editing content.
The solution to all metaquestions of content organization is to upgrade the MediaWiki engine to equal that of a powerful relational database management tool, such as (my personal favorite) FileMaker Pro. For our purposes, many aspects of database design will have to be implemented in a manner consistent with the overall WP Way (open collaboration). This is a departure from traditional RDB implementation, which focuses on permissions, access, restrictions, and security.
I dislike intensely excessive emphasis, which I feel is rarely justified, but in this case I am compelled to repeat: The solution to all metaquestions of content organization is to upgrade the MediaWiki engine to equal that of a powerful relational database management tool. While we wait for that to happen, any discussion of revamping the current system is like debating sand-castle design by a rising tide.
It is perhaps not wasted effort to continue to do the drudgework of eliminating categories, merging pages, converting templates to categories, and all such manual organization tasks. All this structure, however makeshift and flawed, will be imported into the new engine and will be the starting point for dynamic reconfiguration. But there is no need to discuss minor improvements to the current system. Okay? — Xiong (talk) 02:30, 2005 Mar 23 (UTC)
I took a graduate class in database design and did some relational database programming in graduate school. Shall we normalize the articles here to their logical, logical addresses? Finally a voice of reason. But, I am afraid that you are trying to talk to a wall. -- John Gohde 13:31, 23 Mar 2005 (UTC)

Do you have a preliminary proposal on what tables and keys you would use? Cburnett 19:34, 23 Mar 2005 (UTC)

Forgive me if this reads like a snarky comment, but that is SQL jargon, and I don't consider SQL to be a modern RDBM tool. Instead, it's a sort of DB analogue of Windows -- something ramshackle built on top of ancient foundations, rebuilt after periodic earthquakes several times until it appears reasonable; respectable due to huge, heavily-invested user base.
I uphold FileMaker as the standard of the modern tool; it does have its limits, but I feel it is mainly a question of employing more horsepower to run a more advanced machine. FileMaker jargon is much more straightforward and in line with DB theory; we speak of fields, files, records, relationships, and matches. (None of which is to advance FileMaker as the new MW engine! It's not free.)
To answer more directly, if not more fully, we are building a manifesto around the UI -- the goal, not the implementation. Any competent programmer can code an engine, given time, to produce a desired result. Right now, I believe, we can use more design. — Xiong (talk) 03:37, 2005 Mar 24 (UTC)
"more design" is what my question was asking for. Right now, all you've said is we need RDBM and "FileMaker" is great. That doesn't come close to "more design." Cburnett 19:19, 24 Mar 2005 (UTC)

Relational databases as inspiration[edit]

I think we should use relational databases as a source of inspiration, rather then as a design template. RDBs definitely have some abilities that would be very useful for MediaWiki to have. However, the nature of relational databases is that (1) they work best when implemented with a grand design, like a cathedral (2) there is a significant learning curve to using them. Wikipedia needs a system that (1) works well in a distributed environment (2) is easy (and "obvious") to use. A pure RDB sistem is just not appropriate, but a system inspired by RDBs might be. - Pioneer-12 22:26, 31 Mar 2005 (UTC)

  • This is only tangentally related, but... Is there any software tool that could be used to render graphical maps of wiki categories? For example, I could tell the engine to map Category:Boardsports, and it would render a graphical map of the sub-categories and articles "below" that category. Obviously, there would have to be some software limits to prevent it from going more than X categories deep. Still, such software would be very helpful in cleaning up categorization and making wiki categories more of a useful tool. I know such mapping software exists commercially, so I'm hoping someone has adapted it to work with wiki. Feco 20:34, 10 Apr 2005 (UTC)

Categories WikiProject instead of new policy?[edit]

Would it be better to promote categorisation work within the community instead of restricting some functionality to admins? I have been drafting a project for working on the categories: User:GregRobson/Categories/Draft Project Page. I have done some work with Category:Education in the United Kingdom (see User:GregRobson/Schools and have made some progress. Feedback from the community has been good. There appears to be a number of people devoting time towards the categories, and I assume that some of those may not be admins. I say we should divert their energies into improving the category system - the more manpower we can focus on this task the better - if that fails, then perhaps we do need to limit the control people have? Greg Robson 15:32, 24 Mar 2005 (UTC)

Keep in mind that there are some project that have already taken a chunk of category space as their own. See the list at Wikipedia:Categorization projects (current). Though many efforts there have probably been completed (and this page could thus itself use some cleanup), and these are mostly efforts to get articles categorized at all, much less in a standard way. There's also an existing category clean collaboration page at Category:Wikipedia categories in need of attention. That said, it would be great if there were an organized effort to start tidying up category space. -- Beland 22:57, 30 Mar 2005 (UTC)

Actually, some of those category spaces overlap. For example, "History of Country X" articles are influenced by both Wikipedia:WikiProject Countries and Wikipedia:WikiProject History. Also, many wikiprojects run into the same types of naming and categorization problems--such as "German cities vs. Cities in Germany" and when to make a subcategory. A "meta" WikiProject to compile, coordinate, and share knowledge would be very useful. - Pioneer-12 00:11, 1 Apr 2005 (UTC)

Merging Lists Into Categories[edit]

Regarding the topic of redundant lists...

Yesterday morning, I noticed that the page Gay icon included a "List of gay icons". There is also a category for gay icons. (A gay icon is a celebrity--not necessarily a gay celebrity--who is somehow important to the gay community. Check out the article for more information). The list in the article and the category were different, with some on the list and not in the category, some in the category and not on the list, and some in both. Last night, I rectified this by going to the page of every single person or group on the "List of gay icons", categorizing them as gay icons, and deleting the list from the article.

Now, the obvious question is, if the list was somehow vandalized or inaccurate (and it almost definitely was), wasn't I just doing more damage to Wikipedia by spreading it over 238(!) different articles? The answer, I believe, is no. The vandalized or inaccurate information in that list was already in Wikipedia, but it was only apparent to those who ventured to the article Gay icon. Now that the information, accurate or not, has been spread around to some 238 articles. So if we assume that only one out of four Wikipedia articles has a full-time editor (i.e. someone who watches the page to make sure it isn't vandalized or otherwise made inaccurate), and further assuming that the article Gay icon had five full-time editors, I have increased the number of editors overseeing this potentially incorrect information from five to 60. And these are only very conservative figures--Gay icon probably didn't have five full-time editors (for instance, LaToya Jackson was listed as "American singer/trainwreck"), and I'm sure more than 1 in 4 Wikipedia articles has at least one full-time editor.

My argument is that by blindly merging information from lists into categories where appropriate, fact-checking will be more effective. Another advantage to the technique of blindly merging information is that it can be done by a script or a bot.

I am, therefore, proposing a categorization project in conjunction with the redundant list policy proposed in this policy proposal. I am proposing that all lists on Wikipedia that can be made into categories, or already have been made into categories, be merged into said categories by a script. My work on the Gay Icon Project (as I am beginning to call it) is a test case for this.

I recognize that this may not be the best place to post this, but in order for this project to happen, we need to have consensus, manpower, and someone to write a script/bot for the grunt work, and this looked like a good place to find some of those things.

Philwelch 23:21, 24 Mar 2005 (UTC)

Proposal posted at Wikipedia:Merge lists to categories Philwelch 05:55, 25 Mar 2005 (UTC)
It seems to me that List of computer viruses, cited in the proposed policy, is a good counterexample to its wisdom. Since computer viruses cause billions of dollars of damage, having a comprehensive list arguably has at least as much value as a list of Fictional chemical substances. New viruses appear so frequently that one can expect that most entries on the list will not be articles. Being able to quickly scroll through the entire list (not possible with large categories) is also useful. Finally, a list like this could (and should) eventually be annotated, say by adding the year of first appearance and operating system affected. Once it's a category, that becomes hard. Another example is legal cases, whose names tell nothing about their content. A list of cases on some area of law is useful in itself and can later be annotated. A category cannot. Compare for example List of leading legal cases in copyright law with Category:U.S. copyright case law. --agr 12:49, 4 Apr 2005 (UTC)

Graphical Category Mapping Tool[edit]

I've posted similar thoughts in a few other places, but I'll throw it up here as well: Is there any software tool that could be used to render graphical maps of wiki categories? For example, I could tell the engine to map Category:Boardsports, and it would render a graphical map of the sub-categories and articles "below" that category. Obviously, there would have to be some software limits to prevent it from going more than X categories deep. Still, such software would be very helpful in cleaning up categorization and making wiki categories more of a useful tool. I know such mapping software exists commercially, so I'm hoping someone has adapted it to work with wiki.Feco 20:48, 10 Apr 2005 (UTC)

Yes a tool does exist! I only made it yesterday/today. I have managed to get Java to create an XML file from a local copy of the database, and created a file that works with FreeMind. See User:GregRobson/categorymap for more detail, and links.
The category namespace seems to be fairly tidy, but seems to lack organisation (perhaps it's because people do not see the "big picture" when they view a single category at a time). So I founded WikiProject Categories, to try and tie everything together. This tool is one of the things that should help. Now if I can get my degree finished I could crack on with this a lot quicker ;) Greg Robson 17:51, 17 Apr 2005 (UTC)
I've also written such a category view maker. See User:JesseW#Category_Browsing_Tools. It's not web-based, so you'll have to either download the scripts(written in Python) and category and cur tables, and use them yourself, or just leave me a talk page message and I'll do it for you. If anyone knows of somewhere this should be publicised (maybe Wikipedia:Tools?) please do so or let me know. JesseW 20:11, 9 Jun 2005 (UTC)

DOT format[edit]

If anyone can come up with this in DOT language format, WikiTeX has a Mediawiki extension which will be able to render this directly (see here). Note that this tool can handle cyclic graphs if necessary. HTH HAND --Phil | Talk 08:49, Jun 10, 2005 (UTC)

Close?[edit]

Can this proposal be closed as accepted or rejected? Hiding 5 July 2005 09:16 (UTC)

  • Good point, it's kind of old. I've flagged it as historical, since the discussion above shows no plain consensus either way. Radiant_>|< July 5, 2005 10:07 (UTC)