The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at
WT:BRFA. The result of the discussion was Withdrawn by operator.
I know we have (or have had in the past) bots that do this. It might be useful to see how they handle this task for comparison. Some random thoughts:
How exactly is it determined whether an image is non-free?
Should users be notified that images they've added have been removed?
Should the images be straight-out removed as in the examples you linked, should they be commented out (like <!-- Non-free image outside of article space removed by Hazard-Bot: [[File:Non-free-image.jpg]] -->), or possibly a third option like colon-escaping them (that is, [[File:Non-free-image.jpg]] → [[:File:Non-free-image.jpg]])? The last two wouldn't work for infoboxes, but for straight inclusions, they may be more useful.
There is a tool on Toolserver with the file list, as maintained by checking for the existencce of {{Non-free media}} on the file description page. I could comment them out, but I'm not sure the text in the additional text within the comment would actually work. About the colon-escaping, I'm not sure I could handle that with AWB. As for your second question, I have no way of notifying them. It would be considerably difficult to parse the history of the page to check that. Hazard-SJ ✈ 23:28, 4 June 2012 (UTC)reply
Sorry, but (as with your last brfa), users really do need to be notified when you are messing with stuff they've been working on. If you don't notify users that you've removed their image, all you'll end up doing is confusing new users, and annoying experienced users. --
Chris09:10, 8 June 2012 (UTC)reply
Realistically, there are two strategies for determining who added the image – I don't see it as a "considerably difficult" process. Both of these are used by
WikiBlame, the de-facto "revision search tool". You've got your
linear search, and your
binary search. While binary search is faster for general cases (
O(log n) vs. O(n)), we'd probably want to go with a simple linear search from most recent to earliest, since the addition is likely to be rather recent and binary search can't guarantee that the diff it finds is the most recent addition of the image. Unfortunately, this might identify the wrong user if someone else vandalizes the page, removing the image in the process, followed a drive-by reversion by a random user, but I don't think there's anything we can do about that. All you have to do for linear search is find the most recent revision where the filename is not present in the wikitext, and the editor of the revision following that is the one you notify.
As for alternatives to straight-out removing the image, you can also try replacing it with
File:NonFreeImageRemoved.svg (see right). This might be an easier operation, demonstrated via
regex, where File:Foo.png is the offending image: first try replacing \[\[(File|Image):Foo.png(\||\]\]) with [[File:NonFreeImageRemoved.svg\2, then (File|Image):Foo.png with File:NonFreeImageRemoved.svg if that isn't found, and finally Foo.png with NonFreeImageRemoved.svg if that isn't found. This way, you only remove what is absolutely necessary and do your best to avoid, e.g., removing [[:File:Foo.png]], a regular link to Foo. But regardless, I'm okay with removing the image completely (using whatever AWB function removes images) as long as users are notified with some message containing – at least – a link to the page, a link to the image, and the diff where they added the image.
I'm not aware of AWB being able to do any of the searches you mentioned. As for confusing new users, the edit summary could explain, as well as a message in the bot's userspace. As for annoying experienced editors, depending on their experience, they might now about the policy. Otherwise, the edit summary etc. will inform them. Hazard-SJ ✈ 05:12, 10 June 2012 (UTC)reply
No, I don't think AWB can do this kind of searching. You would need to actually write the bot yourself, as far as I know (I've never worked with AWB directly, so don't take this as fact). Also, I disagree that experienced editors should know better; it's possible for an image previously considered free to end up incorrectly licensed and be converted to fair use. A explanatory edit summary is good (and required, too), but my concern is mainly that new users may not think to check edit history (because they are new and might assume "oh, it's probably just a server error, I'll re-add it"); experienced users are less benefited by a talk page explanation, but if they are not watching the page, they will never know the image was removed until manually checking it. Granted, I can't think of a situation where an unwatched page losing an image really matters, so perhaps this is only useful for newbies. —
TheEarwig(talk)18:24, 13 June 2012 (UTC)reply
I think I might have to drop out from this since I am unable to give what is wanted (unless I'll be allowed to do it without notification). I'll withdraw soon if there are no further developments. Hazard-SJ ✈ 23:57, 19 June 2012 (UTC)reply
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at
WT:BRFA.