From Wikipedia, the free encyclopedia
This is a summary of enwiki's various copyright violation detector bots and tools.
Detection via Google searches
-
https://copyvios.toolforge.org/
- maintainer:
The Earwig,
Chlod
- source code:
https://github.com/earwig/copyvios
- last commit: 3 years ago N
- tech:
Python
- uses Google search API and the WMF eranbot Turnitin API
- Google Search API
- WMF pays for credits
- no discount (
NPerry (WMF) used to work on Wikimedia's partnership with Google, maybe this is something worth bringing up?)
-
hard daily limit (maximum for any user of this API) of 10,000 queries per day
- costs US$50 per day
- makes up to 8 queries per page
- 2,000ish checks per day (not all checks use all 8 queries)
- as of Aug 2024, hitting the quota around hour 12 of the 24 hour day
- AI scraping bots may be to blame for this higher than normal usage
- to counter this, there are plans to require login / implement OAuth
- Google has the best breadth of search coverage
- Bing might be a reasonable backup, but not as good
- tool used to use Yahoo until they ended their free service
- have looked into Yandex, but English coverage isn't great
- someone had the idea of adding The Wikipedia Library / EBSCO as another search backend, but discussions with EBSCO stalled
- has issues with concurrent queries
- uptime report:
https://stats.uptimerobot.com/BN16RUOP5/784331770
- false positive handling via a community-maintained exclusion list at
User:EarwigBot/Copyvios/Exclusions
- previous WMF contacts:
Kaldari,
Runab WMF,
DTankersley (WMF)
Wikipedia:Turnitin
CopyPatrol (original; undeployed)
This discussion has been closed. Please do not modify it.
|
The following discussion has been closed. Please do not modify it.
|
|