![]() | This page is an archive. Do not edit the contents of this page. Please direct any additional comments to the current main page. |
There were two references to this website. I have removed one. The archived url has the content. Should this citation be preserved or removed?
My edit and existing citation -- DaxServer ( talk) 07:50, 9 April 2021 (UTC)
|url-status=usurped
. The talk page instance
is removed because the "External links modified" section can be removed it is an old system no longer used. I'll need to update the InternetArchiveBot database to indicate this domain should be blacklisted but the service is currently down for maintenance.
https://iabot.toolforge.org/ --
Green
C
17:10, 9 April 2021 (UTC)
|url-status=usurped
(
new edit).
-- DaxServer (
talk)
20:33, 9 April 2021 (UTC)Links to
https://www.nytimes.com/movies/person/* are dead and reporting as a soft-404 thus not picked up by archive bots. There are
about 1300 articles with links in https and
about 150 in http. The URLs are to The New York Times, but the content is licensed to
All Movie Guide thus if in a CS1|2 citation it would convert to |work=
All Movie Guide
and |via=
The New York Times
. In addition an archive URL if available otherwise marked dead. Extra credit it could try to determine the date and author by scraping the archive page.
Example. --
Green
C
18:00, 6 April 2021 (UTC)
Results
|work=
)|url-status=live
--> |url-status=dead
{{
dead link}}
-- Green C 00:25, 15 April 2021 (UTC)
There are several thousand "http" links on WP to many different pages of my site (whose homepage is http://penelope.uchicago.edu/Thayer/E/home.html) which really should be httpS. The site is secure with valid certificates, etc. Is this something a bot can take care of quickly?
24.136.4.218 ( talk) 19:20, 11 February 2021 (UTC)
Several years ago all the content on this subdomain was moved to timesofindia.indiatimes.com. However, the links are not the same and don't have any redirects and also cannot be re-constructed or guessed using any algorithms. One has to search in Google with the title of the link with former domain and update the link with the new domain.
Old URL - http://articles.timesofindia.indiatimes.com/2001-06-28/pune/27238747_1_lagaan-gadar-ticket-sales ( archived)
Is there a possibility for a WP:SEMIAUTOMATED bot with inputs from the user about the new url and update WP? Is there an existing bot? If not, I created a small semi-automated script ( here) to assist me with the same functionality. Do I need to get an approval for this bot, if this is even considered a bot? -- Srihari Thalla ( talk) 19:20, 8 April 2021 (UTC)
|archive-url=
, |archive-date=
and |url-status=
since can't change |url=
and not |archive-url=
, which if changed has to be verified working. There is {{
webarchive}}
that sometimes follow bare and square links might need removed or changed. The |url-status=
should be updated from dead to live. There are {{
dead link}}
that might need to be added or removed. Should verify the new URL is working not assume it does; and if there are redirects in the headers capture those and change the URL to reflect. Those are the basics for this kind of work, it is not easy. Keep in mind there are 3 basic types of cites: those within a cite template, those in a square link, and those bare. Of those three types, the square and bare may have a trailing {{
webarchive}}
. All types may have a trailing {{
dead link}}
.{{dead link}}
) manually search for those to start. I could generate a list of those URLs with {{dead link}}
while making sure everything else is archived. --
Green
C
20:24, 8 April 2021 (UTC)
User:GreenC/software/urlchanger-skeleton-easy.nim is a generic skeleton source file. To give a sense of what is involved. It only needs modifying some variable at the top defining the domains old and new. There is a "hard" skeleton for more custom needs where mods are done throughout the file when the easy version is not enough. The file is part of the main bot, isolating domain-specific changes to this file. I'll start on the above it will take a few days probably depending how many URLs are found. -- Green C 01:42, 11 April 2021 (UTC)
@
DaxServer: The bot finished. Cites with {{
dead link}}
are recorded at
Wikipedia:Link rot/cases/Times of India (raw) about 150. --
Green
C
20:57, 16 April 2021 (UTC)
Results
|url-status=live
to |url-status=dead
{{
dead link}}
added about 100|work=
, removed "Times of India" from |title=
)Old URLs from sometime before 2010 have a different URL structure. The content is moved to a new URL but a direct redirect is not available. The old URL is redirected to list page which is categorized by date the article is published. One has to search the title of the article and follow the link. Surprisingly, some archived URLs I tested were redirected to the new archived URL. My guess is that the redirection worked in the past, but was broken at some point.
Old URL - http://hindu.com/2001/09/06/stories/0406201n.htm ( archived in 2020 - automatically redirected to the new archived url; old archive from 2013)
Redirected to list page - https://www.thehindu.com/archive/print/2001/09/06/
Title - IT giant bowled over by Naidu
New URL from the list page - https://www.thehindu.com/todays-paper/tp-miscellaneous/tp-others/it-giant-bowled-over-by-naidu/article27975551.ece
There is no content shift from the old URL (2013 archive) and new URL.
Example from N. Chandrababu Naidu - PS. This citation is used twice (as searched by the title), one with old url and the other with new url. -- DaxServer ( talk) 14:18, 9 April 2021 (UTC)
https?\:\/\/(www\.)?(the)?hindu\.com\/(thehindu\/(fr|yw|mp|pp|mag)\/)?\d{4}\/[01]\d\/[0-3][0-9]\/stories\/[0-9a-z]+\.htm
\d{4}\/[01]\d\/[0-3][0-9]\/
.. maybe it is a different URL variation? --
Green
C
21:52, 9 April 2021 (UTC)
https?\:\/\/(www\.)?(the)?hindu\.com\/(thehindu\/(fr|yw|mp|pp|mag)\/)?\d{4}\/\d{2}\/\d{2}\/stories\/[0-9a-z]+\.htm
-- DaxServer (
talk)
12:02, 10 April 2021 (UTC)
insource:
and some additional matches.
12,229
insource:/\/{2}(www[.])?(the)?hindu[.]com\/(thehindu\/)?((cp|edu|fr|lf|lr|mag|mp|ms|op|pp|seta|yw)\/)?[0-9]{4}\/[0-9]{2}\/[0-9]{2}\/stories\/[^.]+[.]html?/
DaxServer, the Hindu is done. Dead link list: Wikipedia:Link rot/cases/The Hindu (raw). -- Green C 13:24, 23 April 2021 (UTC)
|url-status=live
to dead{{
dead link}}
|work=
, removed "The Hindu" from |title=
, etc.)Any link that redirects to the home page. Example. Example.-- Green C 14:27, 17 April 2021 (UTC)
Results
|url-status=dead
(
Example)Everything dead. Some redirect to a new domain homepage unrelated to previous site. Some have 2-level deep sub-domains. All now set to "Blacklisted" in IABot for global wiki use, a Medic pass through on enwiki will also help. -- Green C 04:13, 25 April 2021 (UTC)
Results
|url-status=dead
to existing archive URLs{{
dead link}}