This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the
current talk page.
Old Stuff
Hi Andrew, I have been playing with your gene ontology content in the template. Hope you don't mind. I have created a
master list of the gene ontology functions rather than having a separate page for each element. This should make things simpler. I added a hyperlink as link to each function but we could probably find something more elegant. Hope you like the changes.
David D.(Talk)18:58, 30 March 2007 (UTC)
Brilliant, thanks David. I noticed you were playing around with the templates, and I've been trying to "follow along" (frequent reloads). I saw a method to your madness (and I knew the way I implemented the GO index was ugly), and agreed, I like your new structure better. Thanks for the contribution! I like they way
ITK (gene) is coming together!
AndrewGNF19:16, 30 March 2007 (UTC)
Need to take a break here. I can't figure out how to stop the link being preceeded by a double line break for the gene ontology info. I'll have another crack at it in a while. Glad you like it.
David D.(Talk)19:52, 30 March 2007 (UTC)
Yeah, weird. Nothing obvious to me why those paragraph breaks appear like that, and it looks like from your edit comments you've tried everything that I'd try. Hmm, rather than beat our heads against this too much, what do you think about going back to the form where the function name is hyperlinked (rather than having that plain text and hyperlinking a superscripted "link" tag)? Maybe we can even convert it to a bulleted list... Thoughts?
AndrewGNF20:30, 30 March 2007 (UTC)
Actually the first thing i tried was to hyperlink the whole name and that seemed to be incompatible. Instead of the GO blue linked it gave the whole http long address too. I think we need to play around with the pretein taxobox template. I'll set up a tempory one in CZ, or here, so we can play around without distrupting the other protein pages. I like the bullet idea too.
David D.(Talk)20:56, 30 March 2007 (UTC)
Got it, just started playing around with it. Must have imagined it that it was working at some point. Anyway, I just started playing around with the GNF_GO template. But I'm off to a meeting now for a while so feel free to revert back to your previous working stage. Otherwise, I'll play more in an hour or so...
AndrewGNF20:59, 30 March 2007 (UTC)
Oh, how do you do temporary templates? It seems like there should be the equivalent of "preview" for templates...
AndrewGNF21:00, 30 March 2007 (UTC)
Finally found the problem. When i annotated the master lsit i included two spaces before the <noinclude> this was then introduced into the taxo box. Once i fixed the master list everything worked as expected. Good thing i took a break rather than banging my head against the wall. Amazing how taking a step back allows you to see the real problem.
As far as template are concered you can preview them but it does not always help ypou know what it will look like on the page iteself. Especially when you are dealing with nested templates.
David D.(Talk)21:13, 30 March 2007 (UTC)
Not sure if Andrew is around but he mentioned on the MCB wikiproject page that he was in the process of doing a ten article test run.
See this post from Andrew, it should serve as an update. While Andrew is doing the grunt work the MCB wikiproject is fully behind this experimental bot, if that helps.
David D.(Talk)03:01, 11 May 2007 (UTC)
Hi
ST47. Thanks for the note, I didn't realize that bot requests expire. (makes sense though...) In retrospect, I proposed the bot a little bit far in advance of when we'd actually be creating pages. Anyway, how do I go about delaying that expiration, or should I just repropose it when we're going to start creating pages (within the next couple weeks). Thanks,
AndrewGNF13:29, 11 May 2007 (UTC)
Hi, I read your note on my talk page and wrote up my concerns
here. I imagine when that's addressed, you'll have your answer to the questions specific to this article. Thanks! --But|
seriously|
folks17:04, 13 August 2007 (UTC)
Anything needed?
Hi there. Is there anything you need doing for the bot approval or implementation? I'll be happy to help in any way.
Tim Vickers16:49, 15 August 2007 (UTC)
Tim, you've been a great help already. I think you and others at the BAG have come up with really great enhancements to the hard work that Jon's put in. Me, I'm just a spectator now... ;) Actually, pretty soon we will need the help of PBB "curators" who can help merge enhanced protein boxes with existing articles. (Right now the emphasis has been on new articles.) Check out my edits of
Apolipoprotein_E and
Amyloid_precursor_protein for examples... Cheers,
AndrewGNF17:01, 15 August 2007 (UTC)
Tim, sure, happy to... But do you mean our WP / Gene page project (in which I would link to
User:ProteinBoxBot under daughter projects even though it isn't an official daughter project), or do you mean to
SymAtlas our free and publicly-available gene portal for gene expression data and annotation? The latter is not a wiki, so perhaps that should go under another section for "External resources"? Added a link to OpenWetWare, which is another biological wiki I'm aware of...
AndrewGNF17:28, 16 August 2007 (UTC)
Looks good. Yes, links to ProteinBox bot and Symatlas please, with a short explanation of what SymAtlas offers and how it relates to MCB. I'm trying to integrate MCB more with both the wider web organizations and the academic community.
Tim Vickers18:00, 16 August 2007 (UTC)
EMBOencounters
Hi Andrew, just a note to let you know I mentioned this project to EMBO and their magazine EMBOencounters is in touch about writing an article on it.
Tim Vickers17:24, 30 August 2007 (UTC)
cool, thanks for the heads up... I guess that means we better hurry it up. Any idea when it'd come out? Cheers,
AndrewGNF17:53, 30 August 2007 (UTC)
Glad you like it... the PBB (as we affectionately call it) gets its data from our gene portal at
http://symatlas.gnf.org (well, actually PBB talks to Symatlas' successor). We plan to update the data in our gene portal's database quarterly (from NCBI, Ensembl, PDB, etc.), and after that will re-run PBB to refresh the data in the protein boxes.
AndrewGNF22:33, 14 September 2007 (UTC)
Sure thing, we'll get that to the top of the list. Look for it hopefully next week (we're in a bit of hiatus while we do some code improvements). Also, I created a "requests" section on the
ProteinBoxBot user pages and took the liberty of adding your request there as the first entry...
AndrewGNF17:54, 6 October 2007 (UTC)
Yeah, I suppose I could. Somehow though it feels a little bit slimy to me (though not quite as bad as the folks that start articles on themselves... ;) ) Anyway, I'll put it on my list of things to do...
AndrewGNF00:18, 2 November 2007 (UTC)
You know this subject much better than others. I can take a look then and correct anything if needed. But make sure that the subject does satisfy
WP:Notability - you will need some external publications that refer to SymAtlas.
Biophys04:44, 2 November 2007 (UTC)
It's great to see the
ProteinBoxBot up and running! I had no idea that it was so sophisticated. If we do decide to change the categories on all those enzymes, I may have to ask you about how to write a bot to update them, unless the PBB could do it. :)
I'm writing for another reason. I've recently taken an interest in
Usher syndrome, sparked by an article in Marie Claire but also inspired by some amazing people I've found at YouTube. Anyway, it would be great if the human genes on that page had stubs instead of redlinks; does that fall within the PBB's scope? I'd be really grateful if you could do those; I think there are roughly 12 or so, some of which already have articles. Thanks muchly! :)
Willow11:47, 5 November 2007 (UTC)
Hey, I just read a few messages above that you already have a request list for the PBB. I'll add the Usher genes, so that you don't have to go looking for them. Thanks! :)
Willow11:52, 5 November 2007 (UTC)
Hi
Willow. We're in the process of resolving a couple of technical issues right now, but as soon as we're back up and running, we'll process your genes. Cheers,
AndrewGNF14:29, 5 November 2007 (UTC)
Thanks! Is it possible to share a barnstar? While I get to be the mouthpiece of PBB, our trusted masters student
JonSDSUGrad has done all the heavy lifting in the background. He deserves (at least) half the credit! Cheers,
AndrewGNF17:30, 7 November 2007 (UTC)
Overlinking
Despite the edit summary, I actually kept two links in
Tuberous sclerosis: one in the lead and one it the Genetics section. I've since added one more to the
Timeline of tuberous sclerosis: one in the lead and one where TSC2 is first cloned. The advice in
WP:OVERLINK seems reasonable: one link is usually enough but sometimes a link later may be appropriate. One link per section, on a word like TSC2 that would appear all over the article, is probably too much. Be aware that the MOS is subject to random edits like any other part of WP, and it is a constant battle to keep it self-consistent. There seem to be too many MOS pages with something to say on links.
It is a judgement call. Some people find the multicoloured text a distraction. I suppose, reading through each article, you have to guess how often and where you should remind/inform the reader that there is a linked article. There may be a case for a further link in the History section of
Tuberous sclerosis, what do you think?
Your
TSC2 doesn't mention its protein: tuberin, though it mentions the one for TSC1: hamartin. Do you plan to create a
TSC1? It is currently a redirect to something obscure but I think the gene TSC1 should replace it.
Colin°
Talk23:01, 8 November 2007 (UTC)
I agree, so I added the one more link in the History section (and also wikilinks from the infobox, which we've been doing when a gene family page exists as well as individual gene pages). Yeah, odd that the TSC2 page doesn't mention its more common name (which I now see through the OMIM link). If you haven't seen our
ProteinBoxBot effort, we're synthesizing information from many common sources to populate the infobox and stub text, and our sources aren't always complete. In any case, I've added a brief mention of "Tuberin" to
TSC2 and also created the redirect from
Tuberin. We absolutely plan on creating a page for TSC1 (at which time we'll replace the redirect with a disambig). In fact, it looks like
TSC1 is all ready to go, just waiting for
a volunteer to claim it and put it in its correct home...
AndrewGNF23:32, 8 November 2007 (UTC)
A disambig is only needed if the options are evenly weighted. To be honest, I never understood the relationship between the TSC1 redirect and what it points at—it seems tenuous at best. Perhaps this is an acronym used by a tiny number of people. A Google(Scholar) should help settle this. There are no links to TSC1 that use the current meaning, so my vote would be to simply eliminate the connection. If you feel that a number of people would search for the other thing with "TSC1" then we can put a dab link at the top of the TSC1 article.
Colin°
Talk07:45, 9 November 2007 (UTC)
Great point. For existing redirects that point to non-biology topics, I've been making disambig pages. But this one points to another gene (for unknown or shaky reasons), and I agree that in this case, the redirect should just be replaced... Thanks...
AndrewGNF16:57, 9 November 2007 (UTC)
Your bot's images
Would it be possible if the bot uploaded .png or .svg images? The .jpeg artifacts are pretty ugly. Thanks! Ρх₥α21:34, 9 November 2007 (UTC)
I assume that you mean the protein structure images, e.g.,
Image:PBB_Protein_MCL1_image.jpg? If so, then sorry, we download those from a public domain source and that's how we get them. As an aside, I'm no image expert but I haven't noticed any particularly distracting jpg artifacts. (The ones I'm familiar with are splotchy patterns in uniform blocks of color.) Care to explain further? Cheers,
AndrewGNF22:00, 9 November 2007 (UTC)
I strongly support your bot. But you should know that if someone wants to challenge your work, he might nominate any bot-created article without abstract (e.g.
CSNK2A2) for deletion claiming that it "does not belong to WP" (see
WP:Deletion policy) and then refer to this:
[1]. So, first thing would be to go through all such articles and create short abstracts using UniProt, for example. Then your work will be completely safe. I will try to do some of that, but my time is very limited. Thank you for creating great bot!
Biophys15:44, 13 November 2007 (UTC)
Thanks for the comments and support
Biophys. Yes, the Uniprot summary would definitely be a good thing to have. For the time being though, I think we're going to have to deal with that on a one-off basis. We, too, are very limited on time (in particular our masters student), which is why I'm pushing so hard to get version 1.0 done. Unfortunately, Uniprot is not as tightly integrated into our local database as Entrez Gene. It may not have been clear before that PBB gets all of its content from
SymAtlas (well, actually its successor), which involves a huge amount of work resolving all of these database cross-references. Querying SymAtlas allows PBB to retrieve all gene annotation data in a single XML file. The advantage is that PBB just has to go to one source. The disadvantage (evident here) is that adding new annotation data is more work than simply adding a few lines of code to PBB itself. But again, we'll fix that on round 2!
AndrewGNF16:45, 13 November 2007 (UTC)
From WP perspective, the most important databases are those providing abstracts/annotation for individual proteins, such as Entrez Gene and Uniprot. I will comment more in Box Ideas.
Biophys15:32, 14 November 2007 (UTC)
Hello! I've made a stub on
MLL1 and then discovered that it is already here, created by you under the
HRX name. I wonder what is the "most official" name for it and what article should be made a redirect..
CopperKettle09:51, 16 November 2007 (UTC)
ProteinBoxBot reruns
I'll probably start working again on the project this weekend. I noticed you've been doing reruns based on all of the previous discussion. I've already claimed
User:ProteinBoxBot/PBB_Log_Wiki_11-4-2007_B-0 but I haven't started it. I can claim one of the rerun files instead, or go ahead and finish this file. Whatever is easier for you.
Forluvoft (
talk)
23:54, 16 November 2007 (UTC)
If you want to "unclaim" it and move on to one of the newer files, that will probably be more compliant with the recent discussions. We can then ask Jon to re-run that log file so all the updated wikicode is in the log. Thanks again for all your efforts. Your pages look great!
AndrewGNF (
talk)
03:11, 17 November 2007 (UTC)
The protein info box looks great, but maybe you can compress it to the side so that it does not leave the huge blank white space in the middle of the article. I would do it myself, but I am not sure how to play with your table. Thanks!
198.137.30.179 (
talk)
Not sure which page you're referring to. Can you provide a wikilink? Also, the best way to get rid of that whitespace is to add content... ;)
AndrewGNF (
talk)
19:16, 9 December 2007 (UTC)
Speaking of adding content...
What is the point of creating stubs for genes if nothing is said about what they do? For example,
GPR155,
GPR156,
GPR157,
GPR158 all read exactly the same. Aren't you just mirroring some database(s)? Adding 10,000 stubs will increase the size of Wikipedia by 0.47%.
AnteaterZot (
talk)
08:59, 12 December 2007 (UTC)
Those GPCR genes were created by request -- see
User_talk:ProteinBoxBot. Presumably now that those stubs are created, the interested user(s) will add additional useful content. But, I'd argue that the stubs even as they are now are useful and notable (also the consensus of the BAG, MCB, etc.), even if slightly less full than some of the other gene pages.
AndrewGNF (
talk)
17:34, 12 December 2007 (UTC)
Yes, we eventually will work up to ~10k genes, as described on the bot approval page. Not sure what you're getting at here...
AndrewGNF (
talk)
00:03, 13 December 2007 (UTC)
Well, the notability issue has been discussed extensively on the
bot approval page, on the MCB/Proposals page,
PBB talk page, and at the village pump. (Sorry, if you can't find any of those pages, I'm happy to wikilink. Just feeling lazy...) Each time, the consensus of users has been to move ahead. If you still want to raise notability issues (hopefully with arguments that haven't been previously raised), I'd suggest doing it at the bot talk page.
AndrewGNF (
talk)
00:13, 13 December 2007 (UTC)
I've looked it over, and I must commend you in your efforts to digest material from the various databases into a more accessible format than Entrez. But a couple of things still worry me. One is the assertion that a gene is inherently notable; "Notability of the genes themselves, I think, is a given. These are human genes, the stuff of life!" This is simply not true. Most genes, if knocked out, have little or no effect on phenotype. You address this by requiring the gene be mentioned in more than a couple papers, which is a good start. Two is the heavy reliance on primary sources, and I mean this in the scientific literature sense. Wikipedia requires secondary and/or tertiary sources to establish notability. I take this to mean that a gene should have a couple of mentions in review articles, and/or a mention in the popular press. Take for example,
BRCA1. It has
174 mentions in the New York Times. You might say that example is a bit unfair, so how about
C5a receptor? It appears in the title of a couple of review journals, and
here in a story about a pricy biotech startup. So it might be okay. Now let's take one your bot created,
GPR32. It has
208 unique g-hits, none of which amount to anything. I found only one citation on webofknowledge, the (
Marchese et al. 1998) one, which is a short communication. They don't really seem to know what the gene does. The gene does not appear to have been in the title or abstract of any review articles. Therefore the gene appears to be not notable. Do you disagree?
AnteaterZot (
talk)
10:56, 13 December 2007 (UTC)
Having said that, is there any way you can tune your bot to not create stubs on genes like GPR32 while keeping notable ones? Perhaps it can require the word "review" in two sources?
AnteaterZot (
talk)
10:56, 13 December 2007 (UTC)
I second Andrew's suggestion to move this discussion to the
PBB talk page. Since I was the one that requested these GPR pages, I feel that I have an obligation to respond, but on the PBB talk page, not here. Cheers
Boghog2 (
talk)
17:20, 13 December 2007 (UTC)
Good day. Do you happen to know if there is any existing article on the Insulin gene? I just started a stub on it, but there sure is enormous information about it out there.
Mikael Häggström (
talk)
15:28, 13 January 2008 (UTC)
Hi Mikael... No, I was not aware of an article on the
Insulin gene which is separate from the main
Insulin article. As you've apparently found, PBB did process that gene and normally I would have tried to integrate it into
Insulin. I didn't because that was one of the few cases where the existing infobox had substantial content that the PBB infobox did not have. So, I left it at
Talk:Insulin#User:ProteinBoxBot_content in case someone wanted to do the legwork to merge and eliminated duplicate content. If you think this is a case where a separate page for the gene and the product is warranted, I'd certainly support you on that... Cheers,
AndrewGNF (
talk)
03:24, 14 January 2008 (UTC)
Oh, and bravo on all your recent hard work on various gene pages. I think you and
User:Boghog2 are the two who perpetually show up on my watchlist. Makes me feel like a slacker sometimes...
AndrewGNF (
talk)
03:32, 14 January 2008 (UTC)