![]() | This is an
essay. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of
Wikipedia's policies or guidelines, as it has not been
thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
US Census Migration
This page is for tracking the migration of external links to the US Census database website. The original content of this page was copied from WP:Village pump (technical)#Programming help - US Census links going dark, specifically, this revision.
The main site for the US Census will be taken offline March 30 ( https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml). We have Census links in about 40,000 articles on Enwiki, possibly more. They will be dead soon.. There are a number of technical complications that make it challenging to archive and/or move links to the new site. Most of the links are not and can not be archived at Wayback, but they can be archived at archive.today. Seeking a collaborator(s) to help with this project. My bot WaybackMedic can add archive.today links into Enwiki if they already exist at archive.today, and/or it can migrate links to the new form, if there is a translation program. Need one or both of these:
{{
dead link}}
tags etc..If you would like to help let's try to save the Census data from disappearing. Note that #2 could be done any time in the future is not limited by the March 30 cut off. #1 will not work after that date since the site will be dead. -- Green C 17:16, 9 February 2020 (UTC)
Template:American Factfinder Template:American Factfinder2 Template:American Factfinder2/doc Template:American Factfinder2/sandbox Template:American Factfinder3 Template:American Factfinder3/doc Template:American Factfinder/doc Template:Cite American Factfinder Template:Cite American Factfinder2 Template:Cite American Factfinder3 Template:Data United States Template:Editnotices/Page/Spring, Texas Template:Historical populations Template:Historical populations/doc Template:Historical populations/sandbox Template:Historical populations/testcases Template:Historical populations/USCensusRef Template:Infobox ethnic group/testcases Template:Largest urban areas of Oceania Template:Middle Eastern American Template:NYC Chinatowns
-- RoySmith (talk) 19:05, 9 February 2020 (UTC)
externallinks
table, which doesn't know or care how a link got into the page, just that it exists. Therefore, we can use Quarry to (fairly) easily generate a list of all ~20000 links from the English Wikipedia to http{s}://factfinder.census.gov:
https://quarry.wmflabs.org/query/42039. That makes part 1 a bit easier, at least on the enwiki side. As far as figuring out where the links are coming from,
https://quarry.wmflabs.org/query/42040 is the list of templatespace pages that link to FactFinder. --
AntiCompositeNumber (
talk)
19:29, 9 February 2020 (UTC) (
edit conflict){{cite web}}
in the template /doc page that contains a factfinder URL, but I don't see anything generated by the template code that is probably what Fabrikator was seeing. --
Green
C
19:42, 9 February 2020 (UTC)@ AntiCompositeNumber: is it possible to adjust the query ( https://quarry.wmflabs.org/query/42039 ) to work across all wiki languages and projects (including Wikidata and Commons) or does it require specifying the database? It would be good to try and save everything if possible even if we can't right away edit those other projects, in particular since the #2 option of translation might not be feasible. -- Green C 16:24, 10 February 2020 (UTC)
show databases like '%wiki_p';
. From there, I could see writing a script which iterates over each one, executes the query there, and combines the result. Beats me if there's some way to express all that in a single SQL query, however. --
RoySmith
(talk)
16:45, 10 February 2020 (UTC)
/usr/bin/mysql --defaults-file=$HOME/replica.my.cnf -h enwiki.analytics.db.svc.eqiad.wmflabs enwiki_p < $HOME/census/test2.sql > $HOME/census/allwikis.txt
which works but still needed a wrapper program to modify the .sql file for each site. I tried running your .py but unknown import of "toolforge" is that something I can find on Toolforge? --
Green
C
17:43, 10 February 2020 (UTC)/data/project/botwikiawk/census
that would be great. --
Green
C
17:46, 10 February 2020 (UTC)
/mnt/nfs/labstore-secondary-tools-home/roysmith/factfinder/venv/bin/pip
. --
RoySmith
(talk)
19:05, 10 February 2020 (UTC)
Now there is a list of target URLs (thank you RoySmith and AntiCompositeNumber) the next step is to create the archives. I've contacted Wayback to see if they can determine why the pages won't save correctly. I've also contacted archive.today - pages save correctly there but given the scale want to get their permission and how they want to manage archivals. -- Green C 18:51, 10 February 2020 (UTC)
{{cite web |url=http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk |accessdate=24 Dec 2014 |title=Selected Economic Characteristics |url-status=dead |archiveurl=https://web.archive.org/web/20160417040001/http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk |archivedate=2016-04-17 }}
Dataset | CEDSCI Availability | Tranformation |
---|---|---|
ACS | 2010 and later summary data tables only | Yes |
AHS | No | — |
ASM | Not yet | — |
BES | No | — |
BP | Pre-2012 County Business Patterns not available | No |
CFS | Not yet | — |
COG | Not yet | — |
DEC | 2010 Congressional District 113, CD 115, and Summary File 1 only | Yes |
ECN | Yes | No |
EEO | No | — |
GEP | No | — |
NES | Pre-2012 Nonemployer data unavailabe | Yes |
PEP | Not yet | — |
PP | No | — |
SBO | Only SBO Company Summary tables available | Yes |
SGF | No | — |
SLF | No | — |
SSF | No | — |
STC | No | — |
Endpoint | CEDSCI availability | Transformation |
---|---|---|
factfinder.census.gov/bkmk/table/* | Yes | Partial |
factfinder.census.gov/bkmk/cf/* | Zip codes not supported | Yes |
factfinder.census.gov/bf/* | All links from enwiki dead | N/A |
factfinder.census.gov/faces/nav/jsf/pages/* | URL contains no data | N/A |
factfinder.census.gov/faces/affhelp/jsf/pages/geography.xhtml?* | No single page can replace the "About this geography" view | No |
factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=* | Default geographies (US, all avaliable states, first available) assumed | Yes |
factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=* | URL contains no data | N/A |
factfinder.census.gov/help/* | Unknown | ? |
factfinder.census.gov/rest/* | No | No |
factfinder.census.gov/servlet/QTTable?* | Quick Tables no longer available | No |
factfinder.census.gov/servlet/GCTTable?* | Geographic Comparision Tables no longer available | No |
factfinder.census.gov/servlet/DTTable?* | ? Search results |
Not yet |
factfinder.census.gov/servlet/MapItDrawServlet?* | Significant sample of links dead | N/A |
factfinder.census.gov/servlet/IPTable?* | All appear to be pre-2010 ACS selected population profiles | No |
factfinder.census.gov/servlet/ADPTable?* | Pre-2010 ACS | No |
factfinder.census.gov/servlet/DTGeoSearchByListServlet?* | URL contains only table name and no default geography | N/A |
factfinder.census.gov/servlet/SAFFFacts?* | Zip codes not supported | Yes |
factfinder.census.gov/servlet/SAFFPopulation?* | Zip codes not supported | Yes |
factfinder.census.gov/servlet/ACCSAFFFacts?* | Zip codes not supported | Yes |
factfinder.census.gov/servlet/ReferenceMapFramesetServlet?* | ? | Not yet |
Marker | CEDSCI availability | Transformation |
---|---|---|
Yes | All corresponding data is available in CEDSCI | All AFF links can be automatically transformed to CEDSCI links |
Partial | Some AFF data is not available in CEDSCI or assumptions must be made | Some AFF links are not currently able to be transformed, but may be later. |
Not yet | The US Census Bureau plans to add this data to CEDSCI, but has not done so | Transformation has not been attempted |
? | This endpoint/feature has not been matched to a CEDSCI feature | — |
— | — | Transformation for this dataset is not possible |
No | Data is not available in CEDSCI at all | Automatic transformation is not possible due to a lack of information or data availability |
N/A | The target dataset can not be derived from the URL | The URL can not be transformed and needs to be replaced based on other citation information |
I've been in contact with sombody from the Dissemination Outreach Branch, Census Enterprise Dissemination Services and Consumer Innovation (CEDSCI), U.S. Census Bureau. I'm told that starting April 1st, they are going to be working on "deep link redirects from the American Fact Finder to data.census.gov". It's unclear how this impacts our archiving work. Possibly not at all, but I'm trying to obtain more details. -- RoySmith (talk) 20:22, 11 February 2020 (UTC)
The archive.today links are loaded into IABot and it is ready to run across all wiki languages (that use IAbot about the top 20).
Tomorrow (Saturday) I will set the domain to Blacklisted and begin the IABot queue to process Enwiki, unless there are other thoughts. -- Green C 12:34, 27 March 2020 (UTC)
As expected issue have arisen:
-- Green C 22:19, 28 March 2020 (UTC)
After 12 days of plugging away I am done with it. This was a huge job with more twist and turns than are worth documenting. Sample diff, repeat 60k+ times with endless variations. -- Green C 00:01, 13 April 2020 (UTC)
This is a bit off subject, but rather important. In the beginning, Wikipedia generated demographic sections for all U.S. census places based upon United States Census 2000 data. This was not repeated for the United States Census 2010. The 2000 demographic sections are now obsolete and need to be removed. Will (and can) new demographic sections be generated for United States Census 2020? Yours aye, Buaidh talk contribs 04:05, 19 December 2020 (UTC)