From Wikipedia, the free encyclopedia

This is a summary of enwiki's various copyright violation detector bots and tools.

Detection via Google searches

Earwig copyvio detector

  • https://copyvios.toolforge.org/
  • maintainer: The Earwig, Chlod
  • source code: https://github.com/earwig/copyvios
  • last commit: 3 years ago ☒N
  • tech: Python
  • uses Google search API and the WMF eranbot Turnitin API
    • Google Search API
      • WMF pays for credits
      • no discount ( NPerry (WMF) used to work on Wikimedia's partnership with Google, maybe this is something worth bringing up?)
      • hard daily limit (maximum for any user of this API) of 10,000 queries per day
      • costs US$50 per day
      • makes up to 8 queries per page
      • 2,000ish checks per day (not all checks use all 8 queries)
      • as of Aug 2024, hitting the quota around hour 12 of the 24 hour day
        • AI scraping bots may be to blame for this higher than normal usage
        • to counter this, there are plans to require login / implement OAuth
      • Google has the best breadth of search coverage
        • Bing might be a reasonable backup, but not as good
        • tool used to use Yahoo until they ended their free service
        • have looked into Yandex, but English coverage isn't great
    • someone had the idea of adding The Wikipedia Library / EBSCO as another search backend, but discussions with EBSCO stalled
  • has issues with concurrent queries
  • uptime report: https://stats.uptimerobot.com/BN16RUOP5/784331770
  • false positive handling via a community-maintained exclusion list at User:EarwigBot/Copyvios/Exclusions
  • previous WMF contacts: Kaldari, Runab WMF, DTankersley (WMF)

Google API Proxy

Detection via Turnitin

Wikipedia:Turnitin

CopyPatrol (rewrite)

Frontend

Backend

CopyPatrol (original; undeployed)

This discussion has been closed. Please do not modify it.
The following discussion has been closed. Please do not modify it.

Frontend (wikimedia-slimapp)

Backend (EranBot)

See also