From Wikipedia, the free encyclopedia
Status and updates for
Task 17
- Doing
- Possible
- CNDID
- WT.ec_id
- cid
- sp_mid
- sp_rid
- Necessary to keep
Bugs to fix/patches to make
- Parameter order matters? Found a few instances where &a=___?b=___ worked but not &b=___?a=____
- Avoid removing
-->
if stuck to the end of the URL
because these things are boring
|
Original
\??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&
27 May (BRFA trial) - add green code to catch utm_ params in the middle, and catching more end-of-URL possibilities
\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&
7 June (
catch ref tags) - add < to end-of-check exceptions
\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&
8 June (
catch malformed utm_ params) - utm_ must be followed by text and an =
\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&
10 June (
avoid web archive links)
(?<!https://web.archive.org[\S]+)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)
1 July (
avoid _utms just hanging out in text)
(?<!https://web.archive.org[\S]+|\||\s)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)
|