Wikipedia:turnitin Trial

This page is not only subject to change, but likely to change. We're working on designing a rigorous trial. If you have any suggestions, please join in the discussion.

The Turnitin trial, should it be approved by the community will have certain questions it seeks to answer:

Does Turnitin's system effectively screen out false positives created by Wikipedia mirrors or sites that legitimately reuse our content under a compatible license?
Can Turnitin's system work on old as well as new articles? Perhaps the webcrawler should be excluded and only focus on content database?
What 'percent-match' present in a Turnitin report would optimize copyvio detection while minimizing false positives?
Does Turnitin's system improve upon our current investigation tools, namely Coren's Bot or Madman's Bot? (note that those two only works on new articles)
Does Turnitin catch known copyvio issues?
Does Turnitin have blind spots, areas that it can't detect (e.g. New York Times content...)?

Type of trials

We could run a trial looking at how Turnitin works with new edits before they have been mirrored around the web.
We could also run a trial looking at how Turnitin handles long established pages