Wikipedia talk:WikiProject Mountains/List of mountains

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconMountains Project‑class
WikiProject iconThis page is part of WikiProject Mountains, a project to systematically present information on mountains. If you would like to participate, you can choose to edit the article attached to this page (see Contributing FAQ for more information), or visit the project page where you can join the project and/or contribute to the discussion.
ProjectThis page does not require a rating on Wikipedia's content assessment scale.

Re-generating the list[edit]

Originally, "What links here" was used to list them and the names extracted to build this list. However, as the list approached 500 entries, the extraction method needed to be changed to search a database dump imported into a local copy of an MySQL database. At the time, "What links here" only showed up to 500 links. However, Wikimedia software changes have increased the limit to 5,000 links at a time.

Using What links here[edit]

Prior to 2023, the Pywikipedia robot framework was used to quickly extract the links into a format that can be posted into the article. However, this framework has not been kept up to date and thus no longer works with Python 3.10+. This functionality was replaced by an awk script in 2023.

  1. Get the list of links to {{Infobox mountain}} by clicking the following link (if you are using tabs in Firefox, you might want to use the key combination to open the link in a new tab).
    http://en.wikipedia.org/w/index.php?title=Special:Whatlinkshere/Template:Infobox_mountain&limit=5000
  2. Save the page as a local file (In Firefox, select "Save Page As..." from the File menu). Since this will be repeated several times, save using a name of the form: im_links_<date>_1a.html where <date> is in the form YYYYMMDD
  3. Click the "Next 5,000 links" and save that to im_links_<date>_1b.html. Repeat this step until there are no more links. As of July 2023, you should end up with 6 HTML files (1a-1f).

Process using shell script[edit]

  1. Save the following shell script as wlh_im.sh in your build directory. Save ew.awk in the same directory.
  2. Run it, specifying the run date. e.g. ./wlh_im.sh 20230702
  3. If you are not running Mac OS X, you may need to change the last command "open" to a suitable command for your system.
#!/bin/sh
# Shell script to extract wiki links and generate a list of these pages.
# 2023-07-02 Replaced Pywikipedia usage with awk

if [ "$#" -lt 1 ]
then
    echo "Specify a run date"
    exit 1
fi

rundate=$1
prefix=im_links
s2_file=${prefix}_${rundate}_s2.html
s3_file=${prefix}_s3.txt
s4_file=${prefix}_s4.txt
s5_file=${prefix}_s5.txt
s6_file=${prefix}_s6.txt
s7_file=${prefix}_s7.txt

echo "Concatenating link files into $s2_file"
cat im_links_${rundate}_1?.html > $s2_file

echo "Extracting wikilinks from $s2_file to $s3_file"
awk -f ew.awk $s2_file > $s3_file
if [ "$?" != "0" ]; then
    echo "[Error] Wiki links extraction failed!" 1>&2
    exit 1
fi

echo "Sorting $s3_file into $s4_file"
sort $s3_file -o $s4_file

uniq $s4_file $s5_file

echo "Removing non-mainspace article links"
grep -v -e "^\[\[Special\:" -e "^\[\[Wikipedia\:" -e "^\[\[Portal\:" -e "^\[\[Template\:" -e "^\[\[Template_talk\:" \
	-e "^\[\[User\:" -e "^\[\[User_talk\:" -e "^\[\[Help\:Contents\]\]" -e "^\[\[Main_Page\]\]" \
	-e "^\[\[Wikipedia_talk\:" -e "^\[\[Category\:" -e "^\[\[Help\:" -e "^\[\[Help_talk\:"  \
	-e "^\[\[Talk\:" -e "^\[\[Module\:" -e "^\[\[Module_talk\:" -e "^\[\[Draft\:" \
	-e "^\[\[Category_talk\:"  -e "^\[\[File_talk\:" -e "^\[\[Privacy_policy\]\]" \
	$s5_file >$s6_file

echo "Inserting #"
sed -e "s/^/# /" $s6_file > $s7_file
if [ "$?" != "0" ]; then
    echo "[Error] sed insertion failed!" 1>&2
    exit 1
fi

wc -l $s7_file
open $s7_file
ew.awk
BEGIN { matches = 0; ignored = 0 }

/\/wiki\/[^"]*/  {
	matches++
	s1 = match($0,/\/wiki\/[^"]*/)
	if (s1 != 0) {
	    page =substr($0, RSTART+6, RLENGTH-6);
	    s2 = match(page, ":")
	    if (s2 == 0)
		printf("[[%s]]\n",page)
	    else
		ignored++
	}
}

END {
    printf("matches = %d, ignored = %d\n", matches, ignored)
}

Process manually[edit]

  1. Concatenate all the HTML files you saved. Make sure you redirect the output to a file.
  2. Run the awk script
  3. Sort the file and redirect the output to another file (if you have a Unix based system such as Mac OS X or Linux, use the "sort" command).
    sort -k 3 links2.txt > links_sorted.txt
  4. There are probably duplicate lines output by "What links here" so you can use the "uniq" Unix command to remove them.
    uniq links_sorted.txt links_unique.txt
  5. Edit the file and remove any common site links as well as any pages in the Wikipedia, talk and user name spaces.
  6. Add a "# " to the start of each line. Again, if you have a Unix based system, you can use "vi" or "sed" to do this: %s/^/# /
  7. Copy and paste the updated list into the List of mountains.

Using a database dump[edit]

NOTE: This method has not been used in several years as the process above is simply much faster and easier. However, this approach has been saved here for posterity.

To re-generate the list using a database dump:

  1. Install MySQL version 4.x.
  2. Download the latest version of the English database dump from http://download.wikipedia.org. You need a broadband connection or you might as well forget about it.
  3. Decompress the database dump using bzip2 (already installed on Mac OS X).
  4. Create a Wikipedia database:
    mysql -u [user name]
    create database wikipedia;
  5. Import the database dump (takes about two hours):
    mysql -u [user name]
    source 20050309_cur_table.sql;
  6. Run the following query (15-20 minutes) to extract articles that have {{mountain}} on their talk page:
    tee mountains.txt;
    select concat('#[[', cur_title, ']]') from cur where cur_namespace=1 and locate('{{Mountain}}',cur_text) > 0;
  7. Edit mountains.txt and format the file for Wikipedia use. If you are using vi, try:
    *%s/^| //
    *%s/\]\] *|$/\]\]/
  8. Copy and paste the updated list into this article.

You should have at least 10 GB of free disk space for accomodating the decompressed database dump and the database instance.