Operator: TheFearow
Automatic or Manually Assisted: Automatic (Supervised)
Programming Language(s):Java
Function Summary:Patrols Special:newpages and checks for nuisance articles, then flags with the IdentifiedSpam template.
Edit period(s) (e.g. Continuous, daily, one time run):For a start, when I am active. Once bot has been running properly for a week or two I will make fully automatic on a dedicated server.
Edit rate requested: 10-12 edits per minute (MAXIMUM)
Already has a bot flag (Y/N):N
Function Details:Scans Special:newpages, and checks article for nuisance content using a scoring system, based on several factors and badword/goodword lists.
I have updated the request, so please make comments here. Thanks! Matt - TheFearow 01:38, 18 May 2007 (UTC) reply
OK. I'm deciding to wait until I get a better bot task, like some of my suggested stuff at WP:DEAD. Could a BAG member mark this as Withdrawn? Thanks! Matt - TheFearow 22:00, 19 May 2007 (UTC) reply
The discussion below was for old functions. Please add comments on updated in the section above. Thanks!
Note: You do not have a bot flag. --
ST47
Talk
23:20, 14 May 2007 (UTC)
reply
On the point of a trial - I would like to see this go through an extended bot trial for about 2 weeks, to allow objections to its use to surface. Notices will also need to be posted on Wikipedia:AN and Wikipedia:VP. (Note that the bot is not approved for trial yet) Now for some questions about the bot:
A number of the factors and some objects on the wordlist seem suspect, but again I would need to hear your definition of "spam" first (see also: Wikipedia:CSD and Wikipedia:SPAM). Mart inp23 18:03, 16 May 2007 (UTC) reply
Regarding entries suspect, none of the factors can individually makke an article spam. It needs at least 5 of the minor factors to be considered possible spam, at least 8 to be considered PROD-able, and at least 12 to be DB-able. Each factor gives it a different score, so I cannot say exactly the effects. A lot of things also lower score, so it needs more if there are good factors like templates, images, links, categories, etc. On the wordlists, some words were added that I didn't think needed to be in there, but I found were in at least 5 different spam articles while I was testing (e.g. Lobster). About the source code, I may publish the main evaluation function, to save space (as ther eare a lot of other functions that would waste space). Most of the functions I won't post are the functions to get the appropriate warning level etc, and the functions to flag pages. The only one that might be required is the warning level one, but that just counts the number of occurences of and uw- templates on the page. For the main wiki-access cde, see jwbf.sourceforge.net which is the framework I am using. I only modified it slightly to add the reading of newpages and history. I have read spam, and it is not the same as my definition of spam, I mean spam in the general sense of unwanted messages/articles, rather than the definition of Wikipedia:SPAM. Thanks, and if you have any more questions, or you wish to be privately sent the entire code, please reply. If you just want the evaluation functions (which decide if an article is spam or not) say so, and i'll make it public. TheFearow 21:21, 16 May 2007 (UTC) reply
I'll post the full source of the evaluation function in the factors page, as that gives it exactly in more detail than I can actually say. PROD was being used as a mid-way between suspected articles and defiate articles - it would clog AFD, maybe I should just make it use speedy if its sure, rather than only if its completely sure?
As for the warnings, I use the uw-create and PRODWarning templates, with a little note afterwards saying a bot left the note.
As for your example, the bot doesn't differentiate between criteria - that would be incredibly complex - it flags using the db|REASON tag (a shortcut of db-reason). Therefore the admin can delete under appropriate category. This is no different to current manual systems, as (mostly) patrollers use the db-nonsense and db-bio tags, whatever it's closest to, not what it actually should be under.
Regarding hangons, it doesnt delete, and the people who flag articles dont care anyway, its the admins who have to look at hangons.
Also, NP work is the same cateogry as RC work, you can't get it to inforce, even AntiVandalBot will not try again if you revert back (even on obvious vandalisms).
The idea of this is not to REPLACE editors, but to ease the load a significant amount. You put the required score to flag high enough so there are minimal false positives, while still removing a large percentage of nuisance articles.
Also, I am changing spam articles to nuisance articles in the request (the only change is on this page, code uses standardized page creation user warnings so no need to change).
If I missed anything, please point it out to me. Once I get the evaluation code etc I will ask for opinions on WP:VP and WP:AN.
Thanks!
TheFearow
00:11, 17 May 2007 (UTC)
reply
I'm responding to the posting at Wikipedia:AN. This looks like a reasonable proposal. Yechiel Man 02:22, 17 May 2007 (UTC) reply
Looks like a good idea, needs testing but all in all OK. I would suggest to make sure it does NOT check pages that is newer than say 10 minutes, or pages that is new but have had edits the last 10 minutes? This would be to not flag bad first versions that the editor is still working on. I know that when I make a new page I would probably not have many of the things you count as good things in the first version, but give me 1 more hour and a few updates and it would have these things and not be flagged. Stefan 03:06, 17 May 2007 (UTC) reply
I have some experience with my bot User:AlexNewArtBot that I used to generate bad lists. I had a few observations:
I am sceptical about the possibility of no false Prodding and DBing articles. I would suggest to generate lists of potentially bad articles and then Prod and DB them in semi-automatic regime. If no false positives during a week time then the bot is safe for the automatic prodding. You might consider re-writing the User:AlexNewArtBot/Bad rules, then the ANAB would provide most of the functionality required Alex Bakharev 06:22, 17 May 2007 (UTC) reply
I have a few hesitations regarding this bot. For one, I reviewed the
scoring algorithm that you provided, and I have to say, without testing it on a large sampling of pages, that the values used to determine the scores and the evaluation of what these scores indicate appears entirely arbitrary. I see absolutely no justification for the weights placed upon certain items, nor a clear parallel to the judgments of what the scores indicate. Of particular concern to me are arbitrary judgments such as if(numlinks < 3){p += 10;}
and if(numimages == 0){p += 2;}
along with others. By the judgments you provide, a well-written article that provides fewer than 3 wikilinks is tagged as spam--this seems highly illogical to me. Similarly, a simple formatting mistake, such as typing '''Bold Text'''The text I meant to make bold'''Bold Text''', especially when coupled with not using wikilinks, not adding images, or, for some odd reason, using one or more exclamation marks, could lead to such a page being tagged as spam, prodded, or tagged for speedy. The likelihood of producing false positives by this method is way, way too high, and I would certainly not support approving this bot unless the algorithm can be dramatically improved and tested on a wide sampling of both manually-deemed acceptable and unacceptable articles.
Coupled with the tendency of the algorithm to produce false positives, I must echo concerns mentioned above about biting newcomers. As you may be aware, many new users create pages in short spurts--they write a page like "== Headline Text == '''Bold Text'''Dick Cheney'''Bold Text''' is the vice president of the united states." and over the course of many, many edits in a short period of time convert the article to a well-written and well-formatted article. By immediately tagging these articles for deletion and then posting threatening messages on the article creators' talk pages, you serve only to discourage newcomers from contributing anything at all. I do still remember my first article creation, one that may well have been deemed by your bot to be spam, and had I received a threatening notice from some bot immediately after making that edit, I likely would have refrained from contributing, assuming that Wikipedia did not welcome me. What I would suggest is that, instead of tagging these articles for deletion immediately following their creation, you notify users that they may have made formatting or style mistakes and point them to Help:Editing and Wikipedia:MOS; then you might also leave a link on some page to be checked at a much later time by administrators to see if any progress in the way of improving the article has been made, or if it was simply spam.
Additionally, and this is more of a semantic concern, the bot's name really needs to be changed. Along with above concerns about biting, I must say that a FearBot sounds to me like a bot to be used to intimidate evil spammers; even if you do not regard the bot as such, I assure you that many users will regard a message from a "FearBot" as quite intimidating. AmiDaniel ( talk) 08:15, 17 May 2007 (UTC) reply
New articles that are spam are usually speedied with a minute by newpage patrollers - waiting any amount of time longer than that makes the bot unneeded. Generic album pages seem to always come up as spam - I am considering excluding them. The numlinks smaller than 3 was a mistake, it should NOT be that high, that was a big error. It was supposed to be == 0 or <= 1, not smaller than three. Numimages I am going to remove, as it seems to be unnecessary, as a lot of articles don't have images. I always try and add images to my new articles, however I understand it is often hard to do and newcomers often don't. Regarding using the bold tags wrong - After patrolling newpages for a while as well as watching newpages closely over the last week, there seems to be no problems with people accidentally using boldtext etc incorrectly - There was one time but it was obvious they accidentally clicked, and it was a spam article anyway. Regarding threatening messages - I am using the standardized uw-create and prodwarning templates - as far as I know these are not threatening, and the only reason it uses more than level 1 is if the user has already been warned. I could change bot to put all in IdentifiedSpam template, which puts them in a category, but that wouldn't be as fast and it would mean another category admins have to watch. Regarding FearBot - Would FearowBot be better, or do I need to eliminate the word fear altogether? One policy is that the name should somehow reflect the owner/operator's name, that's why i'm using Fear or Fearow. FearBot was also the name of an old MSN bot of mine, and an old IRC bot I made for a friend. If needed, I will be happy to change. Lastly, for first week of trial I would just have it report, not DB or PROD it. After that, I would have it do it but I would be closely watching. Once it's public, I would be only running it when i'm on for at least the first month, and I will have it set to message me when anything happens. If there is anything I missed, please don't hesitate to point it out. Thanks! TheFearow 08:55, 17 May 2007 (UTC) reply
I am against of using this bot. Let's indeed see User:AlexNewArtBot/COISearchResult for examples. Some articles are just fine - there are no reasons for deletion. Look Farah Abushwesha - this is just one of many examples. If these articles are marked for "non-controversial deletion" (and people are rarely looking through these lists) - they will be deleted. Such bot can be used only for one purpose: to identify articles that need editor's attention and mark them as such. No marking for deletion by bots, please. Biophys 16:31, 17 May 2007 (UTC). reply
Ok, to answer AKMask's qestion, it only monitors the main namespace. Also, I am considering switching this just to IdentifiedSpam, which can be watched by human editors. Can I get an oppinion for this, and should I resubmit a new bot request o edit this one, as it is a majorly different purpose. And yes, all nuisanse article are picked up already, however this was designed to reduce the load on human editors so they could work on more important tasks, however it would still fill that role just identifying spam Can I get comments on switching to IdenfitiedSpam rather than SPEEDY or PROD. Thanks! TheFearow 21:01, 17 May 2007 (UTC) reply
Should I edit this request or create a new one? If I should create a new one, mark this one as Withdrawn By Author and I will create a new one with new purpose. Thanks! TheFearow 22:49, 17 May 2007 (UTC) reply
Here are my thoughts...Newpages patrol isn't very taxing, pages aren't created on the same level that AntiVandalBot other anti-vandalism bots have to deal with on RCP. If anything, I say we need more admins on Newpages patrol (a Jeffrey O. Gusatafon Shazaam! moment), as I've had more problems with new users removing speedy tags instead of problems with tagging articles in the first place. hbdragon88 01:08, 18 May 2007 (UTC) reply
OK. I'm deciding to wait until I get a better bot task, like some of my suggested stuff at WP:DEAD.
Could a BAG member mark this as Withdrawn? Thanks!
Matt - TheFearow
21:59, 19 May 2007 (UTC)
reply
Withdrawn by operator.
Mart
inp23
22:03, 19 May 2007 (UTC)
reply