- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was
Approved.
Operator:
Mikaey
Automatic or Manually Assisted: Automatic, unsupervised
Programming Language(s): C#
Function Overview: Puts a DEFAULTSORT tag into biography articles that do not have them.
Edit period(s): Continuous
Already has a bot flag (Y/N): N
Function Details:
- Grab a list of talk pages from
Category:Biography articles with listas parameter
- Check corresponding article page for a DEFAULTSORT tag. If it exists, jump to step 7.
- Grab value of listas parameter from {{
WPBiography}} banner on talk page.
- Insert {{
DEFAULTSORT}} at end of article with value grabbed from listas.
- Remove sort keys from any category tags with a sort key equal to the new DEFAULTSORT value.
- Save page
- Proceed to next article and jump to step 2.
I'll come right out with it, this is pretty much
ListasBot 1/
ListasBot 4 in reverse. This was requested by
Carcharoth, you can see the conversation(s)
here and
here.
- Roughly how many articles would be edited on the initial run?
Nakon
04:12, 20 May 2009 (UTC)
reply
- Sorry for the delay -- I didn't quite know how to answer that. So, I wrote out the bot's code, and just commented out the part where it actually commits the change back to the Wiki. I let it run for 500 articles, and it ended up "editing" 139 of them. So, that's a 27.8% edit rate, and with
Category:Biography articles with listas parameter having 609,677 pages as of this writing, I'm going to have to put my estimate at (about) 170,000.
Matt (
talk)
05:31, 20 May 2009 (UTC)
reply
As long as you're fairly sure the listas parameter is accurate most of the time, this sounds fine to me. Can you publish the source code somewhere? --
MZMcBride (
talk)
05:34, 20 May 2009 (UTC)
reply
-
Sure thing. And I think "most of the time" would be pretty accurate. DEFAULTSORT/listas falls under more scrutiny than I would have thought when I started writing bots.
Matt (
talk)
05:44, 20 May 2009 (UTC)
reply
- If you want to add additional tasks: at
WP:CHECKWIKI#Category_DEFAULTSORT missing for titles with special letters (partial AWB, partial BOT), there is a list of pages that need sortkeys. -- USer:Docu
- Respectfully, I think I'm going to pass on that one, at least for right now. The intent here is to stick to biographical articles, or at least articles where a bot can safely pick out a DEFAULTSORT key on its own. From glancing at the list you gave, it doesn't appear that a bot would be able to safely pick out a DEFAULTSORT tag in all situations for those articles.
Matt (
talk)
19:23, 20 May 2009 (UTC)
reply
- I can understand, it was just a thought. ((BTW part of it is easy to do: (1) if it's not a bio, basically the sortkey should be the title stripped of diacritics. (2) Some articles in the group are false positives ({{
lifetime}} isn't taken in account. (3) numbers are probably better checked on a per category basis. The reminder should be (4) bios. )) -- User:Docu
- Would you be willing to convert special characters and remove apostrophes from the listas values before you put them in more places? –
Quadell (
talk)
19:51, 20 May 2009 (UTC)
reply
- Erm...the code already does that. If you look at the code, that's what the StripPunctuationAndUnicode function does.
Matt (
talk)
22:34, 20 May 2009 (UTC)
reply
- Excellent. –
Quadell (
talk)
23:20, 20 May 2009 (UTC)
reply
What will you do about sort keys already in place? What if they're the same? What if they're different? Will it remove duplicate sort keys? Will it remove non-duplicates? Should it? --
MZMcBride (
talk)
20:25, 20 May 2009 (UTC)
reply
- Are we talking about a situation where a DEFAULTSORT tag is already on the page? In that situation, the bot skips over the page.
Matt (
talk)
22:34, 20 May 2009 (UTC)
reply
- No, no. I'm talking about a page like this:
[[Category:Foo artists|Smith Jones, Betty]]
[[Category:2039 births]]
[[Category:Rotarians|Jones, Betty Smith]]
- And the listas parameter is
| listas = Smith Jones, Betty
. How would the bot deal with such a case? Does it remove the exact duplicates? Does it remove the non-exact duplicates? Should it? --
MZMcBride (
talk)
22:41, 20 May 2009 (UTC)
reply
- I've never really thought to look at category tags before, so the answer to "how would the bot deal with it" would be that it would pull the listas value and leave the category tags alone. Best solution I can think of is to remove the sorting keys on category tags and just let DEFAULTSORT take over. However, which one should the bot pick to use for DEFAULTSORT? I don't know. My inclination would be to use listas for everything, for consistency's sake.
Matt (
talk)
00:20, 21 May 2009 (UTC)
reply
- You don't want to remove sorting keys on category tags with a bot. There are some cases where a pipesorting should be different for an individual category. For instance,
Category:Richard Nixon in the
Richard Nixon article should be piped to a single space; and
Category:Dukes of York in
Henry VIII of England should be piped to "301". 99% of the time they should be removed, but we don't want a bot to remove them because that last 1% is important. –
Quadell (
talk)
01:58, 21 May 2009 (UTC)
reply
- Addendum: I think listas would be your best pick. –
Quadell (
talk)
02:01, 21 May 2009 (UTC)
reply
- I was thinking more about the instances where the pre-existing sort keys are identical to the new DEFAULTSORT. --
MZMcBride (
talk)
02:24, 21 May 2009 (UTC)
reply
- *scratches head*...so,
[[Category:Richard Nixon| ]]
in the
Richard Nixon article causes the article to appear at the very top of
Category:Richard Nixon, and [[Category:Dukes of York|301]]
in the
Henry VIII of England article causes the article to appear 301st in the list (assuming that every other article in that category was also similarly numbered)? Am I understanding that right?
Matt (
talk)
03:16, 21 May 2009 (UTC)
reply
- Yep. –
Quadell (
talk)
11:04, 21 May 2009 (UTC)
reply
- Ok, I understand what MZMcBride is trying to say. So, basically, if any category tags have a sort key equal to what we're putting in as the DEFAULTSORT, they can be taken out, since they would be redundant at that point.
Done. Note that I've changed the function details above to reflect that.
Matt (
talk)
05:46, 21 May 2009 (UTC)
reply
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's see it go. –
Quadell (
talk)
11:04, 21 May 2009 (UTC)
reply
Trial complete. Did fix a minor bug or two along the way. One thing I'm not quite sure about -- if a category sort key has a space before the name, it'll get shot up to the top of the list in that category. If it's still identical to the new DEFAULTSORT value (without the leading whitespace), should it still be removed? I can see instances where you'd want it at the top of the category (such as in the Richard Nixon example above), but that could be done with a sort key of just a single space, instead of the entire name.
Matt (
talk)
19:32, 21 May 2009 (UTC)
reply
- I just tested that, and damn, you're right. I'm gonna go out on a limb here and say that if a sortkey is whitespace plus the DEFAULTSORT value, it's always a typo, and should be treated like it was just the DEFAULTSORT value (i.e. taken out). –
Quadell (
talk)
00:05, 22 May 2009 (UTC)
reply
- Agreed. The only other (minor) issue is putting DEFAULTSORT above the category links, not below them. It screws with editors to have it below and we end up getting duplicate code by people who don't see it below the categories. --
MZMcBride (
talk)
00:08, 22 May 2009 (UTC)
reply
- I think I can swing that. Do you want another (short) trial to see if I got it right?
Matt (
talk)
00:12, 22 May 2009 (UTC)
reply
- Sure. Another 20 edits or so should be fine. Assuming the trial edits are problem-free (I'm sure they will be), I have no objection to approving the bot. --
MZMcBride (
talk)
00:14, 22 May 2009 (UTC)
reply
The bot should skip pages which already have {{
Lifetime}}, just as it does for DEFAULTSORT. Also, could it put DEFAULTSORT in its traditional position immediately before the first category?
MANdARAX •
XAЯAbИAM
03:41, 22 May 2009 (UTC)
reply
- I suppose...could we do it to where it skips pages with {{
Lifetime}} if it has at least 3 parameters to it (thereby ensuring that the DEFAULTSORT parameter is filled in)? And the code has been modified to put the DEFAULTSORT before the first category tag, we're just waiting for approval of some sort before it takes effect.
Matt (
talk)
03:47, 22 May 2009 (UTC)
reply
- I've never encountered a Lifetime without the sort key parameter, but it wouldn't hurt to check. The ideal action in such a case would be either to fill in the Lifetime's sort key, or expand the birth/death categories, remove the Lifetime, and add the DEFAULTSORT. But I expect that situation to be extremely rare, so it probably isn't worth the extra effort, and I could live with both DEFAULTSORT and Lifetime on the page. Incidentally, about half of the bot's edits which I examined added DEFAULTSORT to pages with Lifetime.
- And I discovered another item in need of tweaking. The bot removes parentheses (e.g. from
Cabinessence (band)), but it should also remove what's inside the parentheses. According to
WP:Categorization of people#Ordering names in a category, "The sort key should mirror the article's title as closely as possible, while omitting disambiguating terms."
MANdARAX •
XAЯAbИAM
07:23, 22 May 2009 (UTC)
reply
- Okey doke,
Done. I'll get that change mirrored into ListasBot shortly.
Matt (
talk)
07:28, 22 May 2009 (UTC)
reply
Approved for trial (20 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Just to make MZ's comments official. –
Quadell (
talk)
14:48, 22 May 2009 (UTC)
reply
Trial complete. I think it looks pretty good. I didn't have to make any changes mid-run this time around.
Matt (
talk)
20:02, 22 May 2009 (UTC)
reply
- Thanks for implementing my suggestion, which I see worked for
Da Vinci (band). I have an additional tiny formatting request. Could you have the DEFAULTSORT immediately above the categories without a blank line in between? That's how it's almost always formatted.
MANdARAX •
XAЯAbИAM
21:21, 22 May 2009 (UTC)
reply
- Sure thing. Everyone happy with the results?
Matt (
talk)
04:52, 24 May 2009 (UTC)
reply
Approved. Looks good. –
Quadell (
talk)
14:28, 24 May 2009 (UTC)
reply
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.