User Talk:citationcleanerbot Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Misspelling

(del/undel) 21:12, 5 September 2011 (diff | hist) m The New England Journal of Medicine ‎ (Various citation cleanup. Report suggestoins and problems at User talk:CitationCleanerBot using AWB)
"suggestions" is misspelled. :P Reaper Eternal ( talk) 19:26, 6 September 2011 (UTC)

Good spot. Headbomb { talk / contribs / physics / books} 19:38, 6 September 2011 (UTC)

Question

It removed the access date here, along with replacing the URL with |jstor=. Assuming this is intentional, what's the reasoning for this? Cúchullain ^t/ c 02:11, 11 September 2011 (UTC)

Because it's what should be done? |url= should usually be used for freely-accessibly version if possible. It's not wrong to link to a source behind a paywall, but in this case we have |jstor= which will give the link to jstor and free |url= in case a free version is found. It also explicitly tells the reader where they will land (aka, you're going to the JSTOR website when clicking on the JSTOR link). It also makes the jobs of bots easier, and cleans the appearance of citations in printed form. E.g. instead of seeing something like

Portanova, Mary Spaulding (May 1975). "Music Is Beauty (http://www.jstor.org/stable/1214290)". The Black Perspective in Music 3 (2): 196–198

They will see

Portanova, Mary Spaulding (May 1975). "Music Is Beauty". The Black Perspective in Music 3 (2): 196–198. JSTOR 1214290.

The accessdate is removed, since there's no url, and thus would not be displayed anyway. (This also prevents a bad accessdate from displaying, in case someone adds a url to a free version later on, but forgets to update the accessdate).

Hope this answers the "why". Headbomb { talk / contribs / physics / books} 02:23, 11 September 2011 (UTC)

Oh, okay, I didn't realize the access date doesn't show if |jstor= is used. Seems like it ought to, but no big deal. Cúchullain ^t/ c 03:25, 11 September 2011 (UTC)

Well JSTOR is a fixed resource. Makes no difference if you accessed it today or 25 years from now, JSTOR 1214290 will always refer to "Music is Beauty" as it was written in 1975. So there's no reason to display the accessdate in that case. Headbomb { talk / contribs / physics / books} 03:28, 11 September 2011 (UTC)

Changes to 23 article

The changes made to the 23 article on 2011-09-13 broke at least the first link. I did not check them all, but the changes appeared to be similar. Makyen ( talk) 23:04, 13 September 2011 (UTC)

It's working fine here... Compare [1] with [2] Headbomb { talk / contribs / physics / books} 23:05, 13 September 2011 (UTC)

Originally, I merely clicked on the link in the reference. It did not work for the changed link, producing a Google generated access violation report from the modified link (it appeared to attempt to get the right page from the correct book, but access was not permitted -- the frame for the pdf was generated, but an access violation was shown instead of the pdf). The original link, which I tried second, worked fine. At the time, I only tried the first changed link on both the changed and unchanged (via the page history) articles. The changed links are now working for me. It could have been an intermittent issue with Google. Alternately, a cookie is set upon access to the original link which permits all subsequent accesses. That would produce the results I have experienced, but I have not experimented (removed all cookies) to determine if that is the case.

Scratch that. I duplicated the problem trying the modified link in a different browser (IE) in which I had not visited the original links. Hmmm... However, refreshing the link in the second browser does work.

After a bit more testing: The first access to either the modified, or unmodified, link results in an error reported by Google. Refreshing, or access to another similar link results in correct access to the desired page. In other words, the same results are obtained with both the modified and unmodified links, but the first attempt at either results in an error. Deleting internet history (i.e. cookies) results in the problem reoccurring. Makyen ( talk) 00:35, 14 September 2011 (UTC)

Alright. I'll resume. Headbomb { talk / contribs / physics / books} 04:43, 14 September 2011 (UTC)

removing spacing

The bot is removing spacing which improves the internal readability of citations. For example, on this edit to D.B. Cooper, the vast majority of text changes are to remove spaces between <ref> and {{ as in

before: <ref> {{cite whatever

after: <ref>{{cite whatever

Those spaces greatly enhance human maintainability:

They help guide the eye to identify the elements. Since the <ref> must be butted against preceding text for the resulting html to render correctly, the lack of space introduces a very long block of symbols: much beyond any readability guideline.
A space helps line-length wrapping break at an appropriate place.
Text editors with "skip to next word" cursor movement operation is intuitive with a space, but cumbersome without.

If anything, citation cleaning should be going exactly the opposite direction, changing citations to multi-line indented elements:

some fact<ref name="CrimeLibrary2"> {{cite web
  | url = http://www.crimelibrary.com/criminal_mind/scams/DB_Cooper/2.html
  | title = The D.B. Cooper Story: The Crime
  | last = Krajicek
  | first = David
  | work = [[Crime Library]]
  | accessdate = January 3, 2008
 }} </ref>

It should also be arranging the elements into a standard, consistent order. I find the most effective order is a sort of top-to-bottom arrangement of URL, title, author, publisher, work, page, date, accessdate. Also note the spaces before and after the equal signs and the space after the | delimiter. These conventions are to make reading the wikitext more friendly by humans. In the case of consecutive citations, join them like this to easily identify each citation:

some fact<ref name="refname"> {{cite x
  ...
 }} </ref><ref> {{cite x
  ...
 }} </ref><ref> {{cite x
  ...
 }} </ref>

I notice that AWB does similar transformations as CitationCleaner, and so does Yobot, all of which I revert if there were no valuable edits. — EncMstr ( talk) 18:49, 14 September 2011 (UTC)

Those are part of WP:AWB genfixes. I'm all for greater maintainability, but a space in <ref> {{cite .... or }} </ref> adds little. Linebreaks are much better. Anyway, the bot isn't doing those edits on its own, the point of that one was to remove a few spurious accessdates. If you want to change this, I suggest going to WT:AWB. Headbomb { talk / contribs / physics / books} 18:57, 14 September 2011 (UTC)

Thanks! I noticed the similarity, but didn't realize that AWB was the basis for this. That's all the better: it should be possible to fix all the trouble in once place. Thanks for the pointer. — EncMstr ( talk) 19:00, 14 September 2011 (UTC)

Removing title from link

this edit removed the title from a link which is not what I expect it should be doing. Keith D ( talk) 19:31, 14 September 2011 (UTC)

Good spot. I've updated the bot, so this shouldn't happen anymore. Headbomb { talk / contribs / physics / books} 19:35, 14 September 2011 (UTC)

Thanks. Keith D ( talk) 22:04, 14 September 2011 (UTC)

List of professional cyclists who died during a race

I think your Bot fixed everything in this article pretty well, but I was wondering about "accessdate'. I thought that accessdate was an acceptable parameter on all citation templates, but the Bot removed them from "citation". Glad the Bot went in and cleaned up the cites, looking through its edits I have found an invalid link and am now trying to find the information in another source. Shearonink ( talk) 04:38, 15 September 2011 (UTC)

|accessdate= should only be used when there's a given url (they aren't displayed if there is no url). It makes little sense to say "I checked this magazine on 26 April 2009" since the content of the magazine will not ever change following publication. So it's better to remove them if there's no url. It prevents bad accessdate to show up in the future, if someone adds a url but doesn't bother updating the accessdate. See #Question above for more details on this. Headbomb { talk / contribs / physics / books} 04:42, 15 September 2011 (UTC)

FD

This message was left in a footnote in Fermi-Dirac Statistics. "A bot will complete this citation soon. Click here to jump the queue" . The message didn't appear in the diff and seemed to remain by mistake. The only way I could remove it was by undoing the edit. How did you put the message in the footnote and not have it appear in the diff? -- Bob K31416 ( talk) 15:25, 15 September 2011 (UTC)

The message is built in {{ cite arxiv}} and appears if the title or the author is missing. When that happens, you can click the given link, or wait a bit and User:Citation bot will complete the citation. Headbomb { talk / contribs / physics / books} 17:04, 15 September 2011 (UTC)

Please stop your bot from removing whitespace

It is against policy, bot policy, and it is annoying. I like my blank lines in my citations, thank you. - ʄɭoʏɗiaɲ ^τ _¢ 05:34, 15 September 2011 (UTC)

Hum? Diff? And it's perfectly fine to remove whitespace, as long as it's not the only thing being done. Headbomb { talk / contribs / physics / books} 05:41, 15 September 2011 (UTC)

[3], it was the only thing it was doing, and yes it is regarded as a bad move to change the stylistic setup of an article to suit your own preferences. Bot policy forbids making purely cosmetic changes (something I don't necessarily agree with, but something that is the case). - ʄɭoʏɗiaɲ ^τ _¢ 05:52, 15 September 2011 (UTC)

It certainly wasnt the only thing it was doing. It cleaned up a url. Headbomb { talk / contribs / physics / books} 05:53, 15 September 2011 (UTC)

That produced no visual change on the article nor effect on the link, thus making is purely cosmetic. Are you going to sit and argue semantics or should I just revert it as I see it edit articles on my watchlist? Stop removing blank lines and changing spacing to suit your personal preferences with a bot. - ʄɭoʏɗiaɲ ^τ _¢ 05:58, 15 September 2011 (UTC)

Except it does produce a visual difference, because these URLs show in print version. Headbomb { talk / contribs / physics / books} 05:59, 15 September 2011 (UTC)

How about these two? [4] [5] Anyways, I don't want to argue semantics with you. I've adjusted my watchlist so bot edits show up, and I will be keeping my eye out. Please stop removing blank lines and changing spacing to suit your personal preferences with a bot. - ʄɭoʏɗiaɲ ^τ _¢ 06:13, 15 September 2011 (UTC)

Part of a small minority of (at the moment) unavoidable edits due to limitations of AWB. Headbomb { talk / contribs / physics / books} 06:15, 15 September 2011 (UTC)

Actually those two were avoidable, I've tweaked the bot accordingly. Headbomb { talk / contribs / physics / books} 18:47, 15 September 2011 (UTC)

Accessdates

Are we to no longer use the access date? Bettymnz4 ( talk) 13:00, 8 September 2011 (UTC)

What do you mean exactly? The bot only removes superfluous accessdates (aka on citations without URLs). If there is no URL, the accessdate isn't displayed, and will cause problem if someone adds a URL to the citation in the future.

Suppose Jim adds this citation on 23 March 2010

{{cite journal |author=J. Smith |year=2000 |title=Important article |url=http://www.jstor.com/pss/12346789 |journal=Important journal |volume=1 |issue=2 |pages=3-4 |accessdate=2010-03-23}}

Later someone comes in and cleans the citation to use |jstor= instead of a hard url to the jstor database, but forgets to remove the accessdate.

{{cite journal |author=J. Smith |year=2000 |title=Important article |journal=Important journal |volume=1 |issue=2 |pages=3-4 |jstor=12346789 |accessdate=2010-03-23}}

Now on 2 June 2011, Bob says "hey, I know of a free copy of this article", and adds the url without paying a lot of attention.

{{cite journal |author=J. Smith |year=2000 |title=Important article |url=http:/www.example.com |journal=Important journal |volume=1 |issue=2 |pages=3-4 |jstor=12346789 |accessdate=2010-03-23}}

Now you're mislead into thinking that the link that's been added by Bob on 2 June 2011 was accessed on 23 March 2010. Headbomb { talk / contribs / physics / books} 14:21, 8 September 2011 (UTC)

If, however, the bot removes an accessdate to a citation with a url, then it's malfunctioning. Headbomb { talk / contribs / physics / books} 14:23, 8 September 2011 (UTC)

Thanks. I was just coming to ask about this bot removing accessdates from references without urls, but this discussion cleared up everything. Still, I always thought it was useful, and something we were supposed to do, to add the accessdate even to a reference without a url. Flyer22 ( talk) 14:47, 17 September 2011 (UTC)

I have to come back to state, though: About people not correcting accessdates, that happens even with references with urls. People update urls all the time without correcting the accessdates. So what makes it so different in the case of references without urls? Flyer22 ( talk) 14:52, 17 September 2011 (UTC)

Yeah when urls are updated accessdates should also be updated, but that's hardly something my bot can prevent or fix. Headbomb { talk / contribs / physics / books} 15:00, 17 September 2011 (UTC)

But is it really a good idea to not have the accessdate to tell us when a citation without a url was added, simply because an editor might make a mistake or show laziness in updating that accessdate? Flyer22 ( talk) 15:16, 17 September 2011 (UTC)

That said, I've noticed that the accessdates don't show up when they don't have a url to back them up anyway, at least for the reference templates I have added them to. So your bot is removing information that people won't see unless they open the article anyway. Flyer22 ( talk) 15:22, 17 September 2011 (UTC)

The reason for accessdate is to account for the changing nature of online ressources. If you accessed HyperPhysics in 2004, it's content were different then than in 2011. So it's important to tell the reader "BTW, my version might not have been the same as what you are seeing now". For an offline ressource, there's no real point in having an accessdate (this is why they aren't even displayed if there is no url). Say you pick up the December 1994 issue of Newsweek on 20 July 2008. You add the citation "J. Smith (December 1994). "Important article" Newsweek pp. 45–74." or something. What exactly would the accessdate add (even assuming it would be displayed)? The content of that December issue of Newsweek won't change.

Many people (myself included) even argue against having accessdates to online citations with publication dates, for the same reason (the content might be taken down or something, but by saying something like "Yves Schutz (26 August 2011) " Heavy ions in Annecy" CERN Courier. CERN. It's already clear you are citing the 26 August 2011 version, so the accessdate doesn't contain any actually useful information. However other people disagree on this, and think accessdates should be present whenever there is a url, so the bot doesn't touch those. Headbomb { talk / contribs / physics / books} 15:41, 17 September 2011 (UTC)

Thanks for explaining. I understand. Flyer22 ( talk) 16:03, 17 September 2011 (UTC)

Changes To Janet Ross

The bot changed the heading "Early Life" to "Early life." The headings are no longer parallel in terms of capitalization.Henry Heater 16:53, 19 September 2011 (UTC) — Preceding unsigned comment added by Henryheater ( talk • contribs)

See WP:MOSCAPS. Headbomb { talk / contribs / physics / books} 17:00, 19 September 2011 (UTC)

kill this bot please ... or fix it to not break things under the guise of fixing what wasn't broken

Your CitationCleanerBot bot needs to stop retitling references to conform to what WP would publish. WP is not publishing the references and therefore should not be deciding/declaring what the title of said reference should be. Date spans that are proper titles should not be reformatted to what would be used in the text of an article as titles of references are not subject to the WP Manual Of Style but that of the publisher of the reference. It is effectively faking/breaking the reference. Same with putting in a dash when the proper title has a hyphen. It would be like saying that CitationCleanerBot is misnamed and forcing this account to be renamed into three words rather than one. Editing of titles for references not from an html source is essentially a guess-work on the part of your bot and as such is fundamentally unreliable. delirious & lost ☯ ^~hugs~ 01:13, 17 January 2012 (UTC)

URL garbage in PMC after migrating URL → PMC

See e.g. this edit; URL arguments are being left in the PMC parameter. Should be trivial to fix. {{ Nihiltres | talk | edits}} 16:04, 11 July 2016 (UTC)

Yup, I cleaned the remainder of those instances (roughly 10). Thanks for pointing it out. Headbomb { talk / contribs / physics / books} 16:32, 11 July 2016 (UTC)

PubMed URL removed when no PMID/PMC parameter?

See here, not clear why pubmed URL would be removed when cite doesn't have PMID or PMC parameters set? (If it were converted to |pmid= that would make sense.) Thanks Rjwilmsi 21:43, 7 November 2016 (UTC)

That is really strange. I will investigate, because it should have replaced the url with a pmid in those cases. Headbomb { talk / contribs / physics / books} 21:47, 7 November 2016 (UTC)

@ Rjwilmsi: See Wikipedia:Administrators'_noticeboard#Mass_rollback_needed. Basically I goofed up. Headbomb { talk / contribs / physics / books} 22:07, 7 November 2016 (UTC)

Bot bug?

The bot introduced an error here. If you were going to go through and check them afterwards, sorry for jumping in early.

Also another error here, producing a duplicate parameter and also a PMID error. (A two-fer! Congrats.)

I suspect that most of the edits were fine, but you might want to run supervised until you get these sorts of kinks worked out. – Jonesey95 ( talk) 00:58, 8 November 2016 (UTC)

The first one was a bug, that's been fixed. The second one is some AWB issue I can't quite work out. In both cases, yes I'm cleaning up after the bot (although SparkBot seems to catch the duplicates before I do most of the time). Headbomb { talk / contribs / physics / books} 01:08, 8 November 2016 (UTC)

Glad to see the url removed to ncbi when pmid= is present Doc James ( talk · contribs · email) 08:51, 8 November 2016 (UTC)

Bot bug with comment tags and a question.

I've seen ~10 articles today in which the bot removed a closing comment tag. An example is at Jan Harold Brunvand. Towards the bottom, you will see |page=166| changed to |page=166<!--| This causes part of the reference to go blank.

Can the bot work on external links such as [https://www.ncbi.nlm.nih.gov/pubmed/18276894 18276894]? This is a very useful bot. Thank you. Bgwhite ( talk) 06:54, 11 November 2016 (UTC)

I'll tweak the behaviour once I get home Sunday/early next week. In theory, it could deal with that, but it's not been approved to do so. The main question would be how exactly those bareurls are used in references, and what the surrounding marking. I suspect an easier fix is to used AWB to put convert those to

{{
cite journal|pmc=12345}}

and then run Citation bot ( talk · contribs) to complete the references. Headbomb { talk / contribs / physics / books} 12:43, 11 November 2016 (UTC)

Reordering citations

Please don't do things like this. Citation tags should be presented in the order in which the attributions are given in the text. This is decided based on the quality of the citations, not on mechanical considerations like alphabetical or numerical order. I've reverted that one. -- Stfg ( talk) 09:03, 14 November 2016 (UTC)

( talk page stalker) This is a feature of AWB, not of the bot. There is an active discussion about whether it is a good idea. – Jonesey95 ( talk) 16:23, 14 November 2016 (UTC)

Thanks. I found what I needed there. -- Stfg ( talk) 17:05, 14 November 2016 (UTC)

Invalid JSTOR value and duplicate parameter

Sorry I didn't catch this earlier, but the bot converted a URL to an invalid JSTOR value, and the |jstor= parameter was already present. Two different bugs in one edit. – Jonesey95 ( talk) 16:32, 18 November 2016 (UTC)

The jstor value was valid btw. Headbomb { talk / contribs / physics / books} 16:48, 18 November 2016 (UTC)

Yes, I noticed in looking at a similar edit that the JSTOR value actually works, even though it's really a DOI. Those JSTOR folks are taking care of us sloppy humans. – Jonesey95 ( talk) 17:33, 18 November 2016 (UTC)

Removal of access date for IUCN refs

Hi, the bot has removed the access date from the reference ( link) because it doesn't recognize the internet address in the template as a URL. This change is unwanted as the access date is required for the reference. There are over 10,000 species of birds alone, each with its own page, and that does not include all other taxa (flora and fauna) that use the same template. Please curb your bot until this is fixed. 'Cheers, Loopy30 ( talk) 01:17, 17 February 2017 (UTC)

I'll add the template to the bot's logic. Thanks for reporting. Headbomb { talk / contribs / physics / books} 01:44, 17 February 2017 (UTC)

Removing access date

Here the bot has rempved an access date from a vaild url was given: [6]. TheMagikCow ( talk) 08:11, 17 February 2017 (UTC)

Yeah I caught those earlier (ftp://...), and the bot has been updated to cover for them. Headbomb { talk / contribs / physics / books} 09:54, 17 February 2017 (UTC)

Awesome. Thanks for all the great work the bot is doing! TheMagikCow ( talk) 10:12, 17 February 2017 (UTC)

Bad replacement of double doi

In this edit, the bot made a hash of a citation which previously had two dois listed, replacing a validly formatted |id= field with an invalidly formatted |doi= field. It should only replace |id= parameters where it can parse the whole parameter, rather than (as in this case) pattern matching something at the start of the |id= and hoping the rest fits the pattern. — David Eppstein ( talk) 02:11, 4 March 2017 (UTC)

Strange, I had tested for that case. Turns out I had a regex mistake. Should be fixed now. Headbomb { talk / contribs / physics / books}

ISBN

This tool can format many ISBN's, but it is javascript and you have to manually do it. Could you add isbn formatting (adding dashes) as a task? https://de.wikipedia.org/?title=Benutzer:TMg/autoFormatter.js AManWithNoPlan ( talk) 04:28, 4 March 2017 (UTC)

This is not possible to do with AWB (at least easily/within my coding skills), but I'd certainly love this as a bot. I suggest making a WP:BOTREQ, but this would likely need to be proposed at the WP:VP first. Headbomb { talk / contribs / physics / books} 12:45, 4 March 2017 (UTC)

Broken cite book with multiple OCLC numbers

This edit by CitationCleanerBot caused a Check |oclc= value error:

https://en.wikipedia.org/?title=Dhaka&type=revision&diff=768478948&oldid=768219320

The problem appears to relate to the use of multiple OCLC numbers with {{ cite book}}. I have reverted the edit. Verbcatcher ( talk) 07:21, 4 March 2017 (UTC)

Same bug as #Bad replacement of double doi above. Should be fixed now. Headbomb { talk / contribs / physics / books} 12:45, 4 March 2017 (UTC)

Is this really where you want {{ reflist}} and the references heading to go?

— Trappist the monk ( talk) 15:32, 4 March 2017 (UTC)

Some oddities on a few highway articles

In this edit, the bot completely removed an access date that was commented out. The bot also inserted a redundant references section, even though the article already contains one.
In this edit, the bot once again inserted a redundant references section, and it also removed the closing tag for an HTML comment, breaking a citation.
In [7], the bot again removed a commented out access date. (In this case, the commented out URL has gone missing, and I should restore that here shortly.)
In this edit, the bot removed the manual format parameter, yet the link generated by a sub template needs that.

All of these oddities have been reverted now. Imzadi 1979 → 01:38, 5 March 2017 (UTC)

Yeah, I saw the first one [8] and I fixed the problem after. The issue with comments caused the reflist being inserted (#1/#2/#3).

#4 was caused by the templated url, so I've updated the bot accordingly to leave |format= alone when there is no url, including templated ones. Headbomb { talk / contribs / physics / books} 02:09, 5 March 2017 (UTC)

Marion Boyd

The bot removed a link to a journal abstract for Marion Boyd. I checked the link and it is still OK. What is the issue here? EncyclopediaUpdaticus ( talk) 04:27, 19 March 2017 (UTC)

The link is still there, but now via the dedicated SSRN parameter of the template. Headbomb { talk / contribs / physics / books} 10:57, 19 March 2017 (UTC)

Bot thinks pipes are delicious when migrating URLs to equivalent PMCs

Hi, I just made a series of edits to fix cases like this where CitationCleanerBot migrated a URL parameter to an equivalent PMC parameter but "ate" the pipe leading to the next parameter. This is an obvious bug; please fix it. :) {{ Nihiltres | talk | edits}} 21:07, 16 February 2017 (UTC)

Good catch, will fix. Not really sure what caused this because it's made zillions of those edits before. Headbomb { talk / contribs / physics / books} 21:08, 16 February 2017 (UTC)

All bugs are rare, rare bugs are common :) This looks similar to the bug reported below. Solution is constant checking of the data to make sure it looks right before moving to next step. In semi-structured data there's a million ways for things to go wrong, impossible to know them. -- Green C 23:16, 21 March 2017 (UTC)

Page blankings

[9] [10] Any idea what caused this? Sro23 ( talk) 01:18, 19 March 2017 (UTC)

None whatsoever. That's a rather catastrophical bug too. I'll investigate. Headbomb { talk / contribs / physics / books} 01:22, 19 March 2017 (UTC)

Well, I can't reproduce it. So my guess is some database/server weirdness. Headbomb { talk / contribs / physics / books} 01:24, 19 March 2017 (UTC)

Is the bot using AWB by any chance as the posting agent? AWB has this problem intermittently but there is a regex that will catch and prevent it. -- Green C 23:06, 21 March 2017 (UTC)

In AWB -> Skip menu -> Check "Doesn't contain" value [^ ]. Also check "Regex" and "Check after". This should prevent page blanks. It means if the article after processing doesn't contain any text (ie. blank) then skip processing. -- Green C 23:26, 21 March 2017 (UTC)

Bug at Google

diff (last change). -- Green C 23:09, 21 March 2017 (UTC)

I don't see an issue here. What was the bug? Headbomb { talk / contribs / physics / books} 23:22, 21 March 2017 (UTC)

Ah sorry I didn't realize there is a CS1|2 argument for |ssrn= , it looked like a deleted URL in the diff. All is good. -- Green C 23:31, 21 March 2017 (UTC)

SSRN

This edit replaced a url with an ssrn. That's fine, but the citation now triggers two error messages: "Missing or empty |url=" and "|access-date= requires |url=".

If it's OK for cite web to have an ssrn but no url, please coordinate with the CS1 error checkers so this doesn't trigger the "Missing or empty |url=" error.
From my reading of the posts above, it sounds like the access-date should have been removed when the url was. Would you check into why it wasn't?

Thanks. -- Worldbruce ( talk) 04:42, 22 March 2017 (UTC)

citation/core

https://en.wikipedia.org/?title=Special:Search&limit=5000&offset=0&profile=default&search=insource%3A%22citation%2Fcore%22&searchToken=9wzievwkg9gsvgm6g8u8jizgm

These articles directly use {{ citation/core}}. It seems to me that this would be best to convert to {{ cite}}. Do you agree, and are you or someone else the right bot AManWithNoPlan ( talk) 21:16, 30 March 2017 (UTC)

May not be a bot task. These have been on my radar for a while but I haven't yet got round to doing anything about them. It would be best if they were converted to the appropriate cs1|2 template according to the style that exists in rest of the page; not simply bulk converted to

{{
citation}}

. Interestingly, the first few in that list all seem to have something to do with Germany so perhaps these originated at de.wiki.

— Trappist the monk ( talk) 21:54, 30 March 2017 (UTC)

( edit conflict)There may be formatting differences, in which case WP:CITEVAR may be a concern. In any event, starting with a TFD discussion is probably the right first step.

It may make sense to convert Template:Cite wikisource first, since uses of that template appear to be more than half of the uses of citation/core. – Jonesey95 ( talk) 21:57, 30 March 2017 (UTC)

I'm not sure that I understand the point you're making. This conversation is about articles that use

{{
citation/core}}

directly like this:

{{citation/core|Title=Stammtafeln des mediatisierten Hauses Stolberg |Year=1887|Date= 1887|language=German}}

so why should

{{
cite wikisource}}

be at issue here?

— Trappist the monk ( talk) 22:09, 30 March 2017 (UTC)

It looked like the goal of the OP was to eliminate use of citation/core, and converting cite wikisource seemed like a way to pick some low-hanging fruit. – Jonesey95 ( talk) 22:29, 30 March 2017 (UTC)

Kelly–Hopkinsville encounter

The bot removed an article link url from this citation [11] as redundant with the PMC. I suggest the article link url link be retained, since a vast majority of readers (such as myself) do not realize that PMC is a Pub Med article link. Thanks. - LuckyLouie ( talk) 13:38, 19 June 2017 (UTC)

@ LuckyLouie: If a link is are redundant with the PMID identifier, and if you don't know that PMID is PubMed, it's not by giving an explicit url to pubmed that will make it link clear to the reader that it links to pubmed. The main issue is that this redundancy serves no purpose, and takes the place of a link to a free version of that article, and discourages editors from seeking one since they'll think "oh there's already a link, don't need to search for one".

However, the diff above is about removing an explicit PMC (not PMID) link to PubMed Central. Citation templates automatically link the title to PubMed Central articles, so the explicit link is not needed. E.g.

{{cite journal |last1=Schmaltz |first1=Rodney |last2=Lilienfeld |first2=Scott O. |title=Hauntings, homeopathy, and the Hopkinsville Goblins: using pseudoscience to teach scientific thinking |journal=Frontiers in Psychology |date=17 April 2014 |volume=5 |doi=10.3389/fpsyg.2014.00336 |pmc=4028994}}

gives

Schmaltz, Rodney; Lilienfeld, Scott O. (17 April 2014). "Hauntings, homeopathy, and the Hopkinsville Goblins: using pseudoscience to teach scientific thinking". Frontiers in Psychology. 5. doi: 10.3389/fpsyg.2014.00336. PMC 4028994.{{ cite journal}}: CS1 maint: unflagged free DOI ( link)

Headbomb { t · c · p · b} 13:55, 19 June 2017 (UTC)

OK thanks, I tried the "cite journal" format. - LuckyLouie ( talk) 14:08, 19 June 2017 (UTC)

BOT problem

Hello, there appears to be a problem with the bot removing a parameter name |website= and leaving part of the value of the parameter in another field, for example in this case the fragment was left in a |date= field causing an invalid date error. Keith D ( talk) 08:36, 23 May 2018 (UTC)

Updated. The bot should only have touched cite journals, but I think I loaded the wrong setting files for that run. Headbomb { t · c · p · b} 13:28, 23 May 2018 (UTC)

Failure

this is bad, clearly bibcode = bibcode is not a valid bibcode! Plastikspork _―Œ^(talk) 00:33, 27 May 2018 (UTC)

Yup. I'll fix that. Headbomb { t · c · p · b} 00:34, 27 May 2018 (UTC)

arxivify

This edit.

— Trappist the monk ( talk) 15:52, 27 May 2018 (UTC)

Cite_arxiv does not support publisher, access-date, nor accessdate: Occurrences of {cite_web} should not be changed to {cite_arxiv} without prior consensus on each article talk-page. Today more than 70 science pages were broken by untested use of {cite_arxiv}, while other actual cite errors were buried in the same cluttered category. - Wikid77 ( talk) 17:05, 27 May 2018 (UTC)

Going to the talk page of each article to convert an improper cite web to a cite arxiv is silly. As for the errors, I'm cleaning them up. Headbomb { t · c · p · b} 17:21, 27 May 2018 (UTC)

I'm finding an unusually high proportion of the new citation cleaner bot / Headbomb changes that needed human fixup later, for citations that claimed to be in journals but were not in journals, or where only the arxiv version was cited but a more definitive version should have been cited instead (with an arxiv courtesy link). Citation bot may not be intelligent enough to catch these but Headbomb should know better. See e.g. 3SUM (CORR is not a journal, published elsewhere), Sexagesimal (published elsewhere), Sperner's lemma (Contemporary Mathematics is not a journal), Vi Hart (Bridges is not a journal). — David Eppstein ( talk) 18:25, 27 May 2018 (UTC)

What exactly is the issue with any of [12] / [13] / [14] or [15] ? Headbomb { t · c · p · b} 18:28, 27 May 2018 (UTC)

My parenthetical remarks were not clear enough? The edits themselves are not bad, but they gloss over badness in the underlying citation. The 3SUM citation, for instance, uses {{ cite journal}} to cite a journal called "CORR". There is no such journal. It is merely a synonym for the cs part of arxiv. The sexagesimal citation properly cites an arxiv version of a paper by Folkerts et al but the paper was since published in a peer-reviewed journal, Historia Math.; it is the peer-reviewed version, not the preprint, that should be cited. The Sperner's lemma citation cites a paper in an edited volume using {{ cite journal}} as if the name of the series of volumes, Contemporary Mathematics, were a journal. And the Vi Hart citation cites a conference proceedings paper in Bridges using {{ cite journal}}. — David Eppstein ( talk) 18:33, 27 May 2018 (UTC)

Well, there's not much a bot can do about that. However, the arxivification from the bot does make it immensely easier to invoke User:Citation Bot to convert a preprint citation ({{ cite arxiv}}) to a version of record ({{ cite journal}} / {{ cite conference}}), and making it abundantly clear that a preprint is being cited (or the linked moved to the identifier section, when citing a version of record). Headbomb { t · c · p · b} 18:38, 27 May 2018 (UTC)

bibcodify failure

This edit.

— Trappist the monk ( talk) 21:17, 26 May 2018 (UTC)

What's the problem with that? Headbomb { t · c · p · b} 21:38, 26 May 2018 (UTC)

Don't just look at the wikisource, look at the results. The bot changed this:

{{cite web |url=http://adsabs.harvard.edu/abs/2011agufmdi13a2141s |last=Schlitzer |first=W. |last2=Harpp |first2=K.S. |last3=Mittelstaedt |first3=E.L. |last4=Kurz |first4=M.D. |last5=Geist |first5=D. |title=The Effect of Lithospheric Discontinuities on the Composition of Lavas From the Northern Galápagos Platform Extension |year= 2011 |accessdate= 24 November 2013}}

Schlitzer, W.; Harpp, K.S.; Mittelstaedt, E.L.; Kurz, M.D.; Geist, D. (2011). "The Effect of Lithospheric Discontinuities on the Composition of Lavas From the Northern Galápagos Platform Extension". Retrieved 24 November 2013.

to this:

{{cite web |bibcode=2011agufmdi13a2141s |last=Schlitzer |first=W. |last2=Harpp |first2=K.S. |last3=Mittelstaedt |first3=E.L. |last4=Kurz |first4=M.D. |last5=Geist |first5=D. |title=The Effect of Lithospheric Discontinuities on the Composition of Lavas From the Northern Galápagos Platform Extension |year= 2011 |accessdate= 24 November 2013}}

Schlitzer, W.; Harpp, K.S.; Mittelstaedt, E.L.; Kurz, M.D.; Geist, D. (2011). "The Effect of Lithospheric Discontinuities on the Composition of Lavas From the Northern Galápagos Platform Extension". Bibcode: 2011agufmdi13a2141s.

{{
cite web}}

: |access-date= requires |url= ( help); Missing or empty |url= ( help)

Turning on all of the cs1|2 error messages is required to see the errors here.

— Trappist the monk ( talk) 21:49, 26 May 2018 (UTC)

So basically, there's no error save internal template stuff? Headbomb { t · c · p · b} 21:50, 26 May 2018 (UTC)

That is your reply? Really? The post-edit result is more 'erroneous' than the 'error' that prompted the edit in the first place. If it is necessary to 'correct' a direct bibcode url, then it is necessary to do the correction properly.

— Trappist the monk ( talk) 15:55, 27 May 2018 (UTC)

Yes, that is my reply. The citation is no wronger now than from before, and the "error" is now flagged and can now be resolved. This is an improvement, if anything. Headbomb { t · c · p · b} 15:58, 27 May 2018 (UTC)

If you remove a URL entirely, you need to remove the access-date parameter while you are doing it in order to avoid error messages. You'll also need to change {{ cite web}} to another template if you remove |url=, since {{ cite web}} requires |url=. Thanks in advance. – Jonesey95 ( talk) 06:49, 22 June 2018 (UTC)

Error: eprint parameter removed, value preserved

Please see this edit, in which |eprint= was removed but its value was kept, resulting in multiple errors. Also half a dozen similar edits that I have reverted.

Also, please improve the bot's edit summary. "Cleanup" is not helpful. Please link to a description of the approved task that the bot is performing. Removing |eprint= is not mentioned at Wikipedia:Bots/Requests for approval/CitationCleanerBot.

Please publish the bot's source code. The user page says that you will do so if asked. Thanks. – Jonesey95 ( talk) 06:33, 22 June 2018 (UTC)

@ Jonesey95: My bad, this was meant to be a pre-parse run to find such errors and fix them manually, but I had 'auto-save' enabled when it wasn't supposed to be. This bug is caused by AWB and will be fixed whenever @ Reedy: releases a new version of AWB. I'll review the most recent edits. Headbomb { t · c · p · b} 14:22, 22 June 2018 (UTC)

While you're at it, please take a look at this recent edit, which appears to be purely cosmetic and therefore not permitted under the bot policy. It also didn't clean up any citations, as far as I could tell. This one, this one and this one appear to have the same problem. – Jonesey95 ( talk) 16:16, 22 June 2018 (UTC)

Like I said, this was part of a pre-parsing run with auto-save enabled by accident. None of those edits were meant to be done. Headbomb { t · c · p · b} 17:07, 22 June 2018 (UTC)

url-access parameter after url to jstor replacement

In this edit on 9 May 2018 the bot changed {{cite journal|last=Hamilton|first=Henry|year=1928|title=The Founding of Carron Ironworks|journal=Scottish Historical Review|publisher=Edinburgh University Press|volume=25|issue=99|url=https://www.jstor.org/stable/25525835|url-access=subscription|via=JSTOR|pages=187-190}} to {{cite journal|last=Hamilton|first=Henry|year=1928|title=The Founding of Carron Ironworks|journal=Scottish Historical Review|publisher=Edinburgh University Press|volume=25|issue=99|jstor=25525835|url-access=subscription|pages=187–190}} which triggered a red "|url-access= requires |url=" warning on the article. I raised this at Help_talk:Citation_Style_1#URL_and_JSTOR_access_parameters, suggesting that |jstor-access=subscription should be permitted. A response pointed out that Help:Citation_Style_1#Access_level_of_identifiers expects JSTOR etc to be unfree by default, so only exceptions expressed as |jstor-access=free are supported. It could therefore be worth adjusting the Bot so that when it replaces |url= with |jstor=, etc.,it also erases any |url-access=subscription? (That said, I am now adjusting my new references to use |jstor= rather than |url= so the Bot shouldn't need to adjust my future edits.) AllyD ( talk) 16:28, 31 May 2018 (UTC)

@ AllyD and Jonesey95: This has been fixed now. Headbomb { t · c · p · b} 13:27, 23 June 2018 (UTC)

A simple request, on CitationCleanerBot's user page please put a description of the bots actions. Not all of us are psychics.-- TomStonehunter ( talk) 13:38, 8 June 2018 (UTC)

Blanking articles

Today, this bot has accidentally blanked a few articles. Opinions may vary but that seems like a rather extreme way of cleaning up citations. ElKevbo ( talk) 12:14, 27 June 2018 (UTC)

Yeah, that's a timeout issue. There's a way to bypass such errors normally, but it was disabled for this run (I had other skip conditions). If it happens, just revert the bot. Headbomb { t · c · p · b} 13:39, 27 June 2018 (UTC)

Leaving date error

Hello, in this edit the bot removed part of a |via= field, leaving the remaining part in the adjacent date field. I have move it out to |quote= for now. This is the second occurrence of this today. Keith D ( talk)

I've updated the bot. Fixed. Headbomb { t · c · p · b} 20:08, 27 June 2018 (UTC)

Many thanks for a quick response. Keith D ( talk)

Via

Why is the bot removing |via=, as here? Nikkimaria ( talk) 11:50, 28 June 2018 (UTC)

There's no URL, so VIA is rather pointless here. They will especially mislead the reader if a URL of a free version is later added. While it's true that the DOI points to Project Muse, that's a rather irrelevant fact. We don't add VIA to point to Wiley Online Library, Science Direct, or whatever else (in this article, scholar.lib.vt.edu for doi: 10.21061/alan.v23i3.a.5, and cambridge.org for doi: 10.1017/CCOL9780521429597.021, or many others), this is undue advertisement of those such services, and discourages the reader from making use of services which may be available to them. Headbomb { t · c · p · b} 14:19, 28 June 2018 (UTC)

@ Headbomb:, I think you're misunderstanding why the parameter was used. It wasn't used to advertise the service; it was used to acknowledge the access provided by Project Muse to certain Wikipedia users. When we received that access, acknowledging it was a requirement; your bot (and now you, since you've reverted me) are preventing me from making such an acknowledgement. Vanamonde ( talk) 14:47, 28 June 2018 (UTC)

This is textbook WP:PROMO/ WP:SPAM then! Headbomb { t · c · p · b} 14:49, 28 June 2018 (UTC)

Maybe, but in which case, you ought to be taking it up with TWL, rather than the users who use their resources. Vanamonde ( talk) 15:02, 28 June 2018 (UTC)

You can use TWL ressources if you want, what you can't do is put spam in our articles. If spamming is a condition of access, then TWL needs to be shut down. Headbomb { t · c · p · b} 15:09, 28 June 2018 (UTC)

Attribution is not spam. CC-BY on Commons for example. -- Green C 15:29, 28 June 2018 (UTC)

This is not attribution. Headbomb { t · c · p · b} 15:35, 28 June 2018 (UTC)

See This RFC. Headbomb { t · c · p · b} 15:33, 28 June 2018 (UTC)

For the benefit of anyone else finding their way to this discussion. Vanamonde93 is mistaken (quite possibly because they have been provided with inaccurate information somewhere; no criticism of them intended!). No acknowledgment of access is required or suggested (or, as far as I know, even requested) by any TWL partner resource. That would be promo/spam/etc. The intent of the information in |via= in the context of TWL has only ever been in accordance with WP:SAYWHEREYOUGOTIT (that I am aware of). How specifically that should look in various cases is a good and worthwhile discussion to have somewhere (an RFC on VPP would maybe not be my first suggestion, but…).

Oh, and before someone goes thermonuclear over that too… TWL provides userboxes for editors who have access to a particular partner resource too. These are intended to help other editors locate someone with access to a source they lack, and not to advertise, promote, or acknowledge any access from that partner. Apart from the general purpose (or lack thereof) for userboxes, these are intended as an extension or supplement to the resource exchange. There are lots of freindly people at WP:TWL that will be more than happy to clarify or answer any questions you may have. -- Xover ( talk) 17:15, 28 June 2018 (UTC)

For the record; the reason I've been using the |via= parameter is the instructions here, which were more explicit back when I received Project muse access. Vanamonde ( talk) 04:39, 29 June 2018 (UTC)

A question

In this edit, the bot changes the reference quite significantly. I get changing https://www.jstor.org/stable/823202 to jstor=823202 , but what it seems to have done is changed url=https://www.jstor.org/stable/823202|deadurl=no|archiveurl=https://web.archive.org/web/20180226083106/http://www.jstor.org/stable/823202|archivedate=February 26, 2018|df= to jstor=823202 instead, removing the link to the archive URL, and the archive date parameter. Is that intended? Fish+ Karate 09:50, 29 June 2018 (UTC)

It is intended yes. If you don't have a base URL, you don't need an archiveurl, and it will throw an error. See

Ween, Lori (2003). "This Is Your Book: Marketing America to Itself". PMLA. 118 (1): 90–102. JSTOR 823202. {{ cite journal}}: |archive-url= requires |url= ( help); Unknown parameter |deadurl= ignored (|url-status= suggested) ( help)

The point of the jstor identifier is that it's a stable link to a fixed resource, so doesn't need to be archived as it is both stable and unchanging. Headbomb { t · c · p · b} 13:11, 29 June 2018 (UTC)

That’s great. Appreciate the explanation. Fish+ Karate 08:23, 30 June 2018 (UTC)

July 2018

Hello and welcome to Wikipedia. Constructive contributions to Wikipedia are appreciated, but a recent edit of yours has an edit summary that appears to be inaccurate or inappropriate. The summaries are helpful to people browsing an article's history, so it is important that you use edit summaries that accurately tell other editors what you did. Feel free to use the sandbox to make test edits. Removing a citation's accessdate is NOT "cleanup", contrary to what you asserted. Snuggums ( talk / edits) 02:05, 1 July 2018 (UTC)

Yes, as I wrote above, "... please improve the bot's edit summary. 'Cleanup' is not helpful. Please link to a description of the approved task that the bot is performing." Thanks in advance for this simple change. – Jonesey95 ( talk) 05:30, 1 July 2018 (UTC)

Citations with accessdates and no URLs produce error messages, and without a diff, I can't be more specific than several bots are approved to remove access-dates without URLs, including this one, per its BRFA. See also User_talk:CitationCleanerBot/Archive_1#Accessdates. Headbomb { t · c · p · b} 10:52, 1 July 2018 (UTC)

I provided a diff above: here it is again. Please link to the BRFA for the bot task you are performing. This is standard for bots these days. Thanks. – Jonesey95 ( talk) 16:10, 1 July 2018 (UTC)

It's on the bot's user page in the bot template, as it usually is. And I already gave a reply about that edit. Headbomb { t · c · p · b} 16:22, 1 July 2018 (UTC)

Pipe detection error

[16] (not watching, please {{ ping}}) czar 13:35, 4 July 2018 (UTC)

I'll look into that. Headbomb { t · c · p · b} 13:59, 4 July 2018 (UTC)

Removing accessdates

When the bot is removing redundant URLs, like it did at Marble Bar, Western Australia, it should probably remove the accessdates as well. Graham 87 05:05, 27 May 2018 (UTC)

It tries to when it can. However, it needs a new version of AWB to be able to do that reliably, and that hasn't been released in over a year. Headbomb { t · c · p · b} 08:58, 27 May 2018 (UTC)

A recent instance: [17] (not watching, please {{ ping}}) czar 13:37, 4 July 2018 (UTC)

@ Czar: AWB still hasn't released a new version, so that still cannot be fixed. Headbomb { t · c · p · b} 14:01, 4 July 2018 (UTC)

Uhh. . .is your bot a predator or something?

I've never seen this tag before: [18] not sure what a "predatory open access journal" is, but it doesn't sound good. — Mr. Guye ( talk) ( contribs) 00:34, 12 July 2018 (UTC)

A predatory open access journal is basically place where you pay to publish. They make it open-access since no one would pay for a subscription. It is predatory since it pretends to be a peer reviewed journal when it is really a pay to publish vanity press. AManWithNoPlan ( talk) 01:35, 12 July 2018 (UTC)

@ Mr. Guye:, this should probably have been reported User talk:MinusBot, rather than here, but the answer is simply that there's a predatory open access journal cited on the page somewhere. Since the bot touched one of those citations, it triggered the edit filter. However, the bot only cleans up existing information, it does not add those citations. I don't know the exact details, but Wikipedia:Edit_filter/Requested/Archive_11#Predatory_open_access_journals has some information. Headbomb { t · c · p · b} 02:01, 12 July 2018 (UTC)

Changing issue numbers for no apparent reason

In this diff the bot changed three citations. The 2nd and 3rd changes make sense but not the first one, where it erroneously replaced an issue number adding a page number which then overrode the actual page number parameter. Kerry ( talk) 13:14, 18 July 2018 (UTC)

Strange, I thought I had that one disabled (I only leave it on during manual runs because of that issue). Either way, the bot is currently set back, as I've had a hard drive failure and I'll need to recode a lot of it. Headbomb { t · c · p · b} 14:08, 18 July 2018 (UTC)

Mostly harmless

Mistaking it for a deprecated magic link, CitationCleanerBot is inserting {{ PMID}} within quoted ref tags (see for example diff). This is mostly harmless, but nevertheless, an unnecessary edit. It would be better to replace "PMID xxxxxx" → "PMIDxxxxxx" if found within a ref tag. Boghog ( talk) 06:45, 4 August 2018 (UTC)

Weird, I told the bot to specifically skip those pages so it wouldn't cause those edits. I'll investigate. Headbomb { t · c · p · b} 16:23, 4 August 2018 (UTC)

Fixed, I had a stray+ in a regex check for page skipping. Headbomb { t · c · p · b} 16:24, 4 August 2018 (UTC)

Thanks! Boghog ( talk) 16:48, 5 August 2018 (UTC)

CiteSeerX access-dates

Hi CCBot / Headbomb, I came across a ref on Nano-threads where a |citeseerx= means the |access-date= shows up as an error. Is that something the bot could help clear up in general - for this and any similar ID parameters like it? It might already be working on these, but the error's been there 2016 (well, 2012 but the format was id={{citeseerx}} then), so I thought it was worth mentioning. I've left it unfixed for now in case it's useful as a test case. Discussion ( here) › Mortee _talk 12:23, 25 August 2018 (UTC)

@ Mortee: It can, but really you're much better of simply triggering User:Citation bot via WP:Citation expander or this page. Headbomb { t · c · p · b} 16:01, 25 August 2018 (UTC)

I tried Citation expander, but it doesn't see a problem on that page. (Also if a fully automated bot could clear some of these without needing to identify them manually, that'd be even better). › Mortee _talk 16:06, 25 August 2018 (UTC)

[19] worked for me. Headbomb { t · c · p · b} 16:11, 25 August 2018 (UTC)

Well huh. That's very strange. › Mortee _talk 16:26, 25 August 2018 (UTC)

There's been a few recent bot updates / citation expander updates recently, so just try it again in the future if you see an issue. If it doesn't work, report it to User talk:Citation bot. Headbomb { t · c · p · b} 16:37, 25 August 2018 (UTC)

Will do. There's 42,492 articles in the category I'm chipping away at, so lots of opportunity for testing! Thanks for your help. › Mortee _talk 18:00, 25 August 2018 (UTC)

Creating duplicated jstor id

Here, on 27 June, the bot changed a jstor url into a jstor id, but there was already a JSTOR id in the citation. DferDaisy ( talk) 22:32, 20 October 2018 (UTC)