Talk:Bioinformatics

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Former featured articleBioinformatics is a former featured article. Please see the links under Article milestones below for its original nomination page (for older articles, check the nomination archive) and why it was removed.
Main Page trophyThis article appeared on Wikipedia's Main Page as Today's featured article on March 28, 2004.
Article milestones
DateProcessResult
February 26, 2004Featured article candidatePromoted
September 21, 2005Featured article reviewDemoted
December 20, 2005Good article nomineeListed
August 7, 2007Good article reassessmentDelisted
Current status: Former featured article

Wiki Education Foundation-supported course assignment[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Heatherjanee.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 15:46, 16 January 2022 (UTC)[reply]

Introduction[edit]

In the introduction, we read the following:

...the development of new algorithms (mathematical formulas) and statistics...

I am not a Computer scientist, but I think an algorithm is not a mathematical formula. Furthermore, I don't think one develops statistics, one applies statistcal methods or a theoretical scientist might develop new statistical theories. —Preceding unsigned comment added by 88.67.52.3 (talk) 12:48, 23 January 2011 (UTC)[reply]

I think you are correct. Though I have seen the assignment operator confused a number of times with the equality symbol or logical syllogism etc. Maybe it should read "... the development of new algorithms which apply mathematical formulas and statistical models to ..."

Comedy?[edit]

You spelled Gnome incorrectly —Preceding unsigned comment added by Henriettaminge (talkcontribs)

  • I don't know if you're trying to be serious, but the word is 'genome'. ju66l3r 06:35, 20 July 2006 (UTC)[reply]

GA Re-Review and In-line citations[edit]

Members of the Wikipedia:WikiProject Good articles are in the process of doing a re-review of current Good Article listings to ensure compliance with the standards of the Good Article Criteria. (Discussion of the changes and re-review can be found here). A significant change to the GA criteria is the mandatory use of some sort of in-line citation (In accordance to WP:CITE) to be used in order for an article to pass the verification and reference criteria. Currently this article does not include in-line citations. It is recommended that the article's editors take a look at the inclusion of in-line citations as well as how the article stacks up against the rest of the Good Article criteria. GA reviewers will give you at least a week's time from the date of this notice to work on the in-line citations before doing a full re-review and deciding if the article still merits being considered a Good Article or would need to be de-listed. If you have any questions, please don't hesitate to contact us on the Good Article project talk page or you may contact me personally. On behalf of the Good Articles Project, I want to thank you for all the time and effort that you have put into working on this article and improving the overall quality of the Wikipedia project. LuciferMorgan 02:20, 16 December 2006 (UTC)[reply]

Whole Genome Shotgun Sequencing[edit]

I just realized that Whole Genome Shotgun Sequencing and sequence assembly is really considered an informatics solution to the sequencing problem compared to BAC for large sections of DNA. This might be an important accomplishment of bioinformatics to mention. 128.206.82.56 (talk) 19:46, 23 April 2009 (UTC)done[reply]

  • Yes. Whole Genome Shotgun (WGS) sequencing projects constitute the largest part (in terms of bp size) of GenBank. It is very much a bioinformatics approach and, as such, should be mentioned in this article. When I have some more time, I will add some information. --Thorwald (talk) 04:52, 24 April 2009 (UTC)[reply]
    • You might consider adding next generation sequencing information. Huge challenge at the moment.193.190.172.82 (talk) 17:39, 11 May 2009 (UTC)[reply]
      • I would prefer it be called "High throughput" rather than "next generation", "next generation" seems like a dated term. Will there be yet another generation. High throughput sequencing seems more meaningful, this term may be outgrown as well, but it may weather better than "next generation". —Preceding unsigned comment added by 128.206.82.56 (talk) 21:48, 28 July 2009 (UTC)[reply]

Insight from Michael Watterman and proposed changes[edit]

Much of this discussion and page deals with trying to describe what bioinformatics is by either enumerating it's parts, or defining it in contrast to something else, like 'computational biology' (or trying to insist they are the same thing). The remainder of the page tends to be a battle between people adding thier favourite legitimate bioinformatics resources, tools, publications, centres, and others trying to trip them down (because of WP:NOT#DIRECTORY and spam)

I don't have an exact quote, or a reference as it was presented during a talk, but this very rough paraphrase might start us thinking about this page in a slightly different way.

Every major advancement in the field of Bioinformatics is a direct result of a new type of data being generated, from a real experiment, which is in a form or volume that we did not yet have the capability of understanding. Then there is a scramble to understand this new data. Once the new tools are available, there is only so far you can go with the existing data. Eventually, there is another new experiment done, and a new type of data emerges. -- MICHAEL WATERMAN, University of Southern California in a lecture at ISMB2006, Fortaleza
I don't know if I necessarily agree. I think the example above is shotgun sequencing, which was primarily a bioinformatic improvment on previous sequencing capabilities. Was it a bioinformatics break through or laboratory break through. I also doubt it was Michael Waterman who said that, since he was trained as a physicist. 128.206.82.56 (talk) 22:01, 16 July 2009 (UTC)done[reply]

-- Jethero 05:46, 23 February 2007 (UTC)[reply]

From this, I think we might be able to find a focus or thread through some of the content, and perhaps a way of identifying what have been core 'bioinformatics' breakthroughs, versus what are areas and fields and disciplines that simply make use of computers and computational methods and expertise, or focus on lists of people, books or software that call themselves 'bioinformatics'. (To be clear, I am not saying these things are not bioinformatics, in a broad sense)

To go along with this, I would propose that we:

  • remove all current 'references' that are not used as inline citations (move them to a 'list of books' if we want). It's almost impossible to remain NPOV and exclude some books and papers and not others. -- Jethero 05:46, 23 February 2007 (UTC)[reply]
  • remove all external links, move them to the 'list of bioinformatics research groups' or a similar page and have a policy of no external links (unless they are inline citations). They invite spam, and also legitimate additions that we don't have room for but can't eliminate without violating NPOV or upsetting a fan -- Jethero 05:46, 23 February 2007 (UTC)[reply]
  • continue to remove all software references (move them to a list of bioinformatics software if we want). same arguments -- Jethero 05:46, 23 February 2007 (UTC)[reply]
  • remove all but the slightest trace of the 'computational biology vs. bioinformatics' debate from the top. The intro paragraph is much too long, compared to other 'dispiplines of science' pages, and confusing for someone not familiar with either term. -- Jethero 05:46, 23 February 2007 (UTC)[reply]
  • set up sections which encourage additions of names and dates in bioinformatics (famous bioinformaticians, watersheds, large projects in the past) -- Jethero 05:46, 23 February 2007 (UTC)[reply]
  • the field develops rapidly, but we don't yet have the perspective to neutrally call paper/tool/discovery A more important that B, so we should focus on things that have been established as 'notable' in the past, say 5+ years, and get that right. This is another way to avoid the article reading like it should contain a list of tools and resources that are relevant today. (NCBI's GenBank, established in 19xx, was critical for xxxx rather that just a link) -- Jethero 05:46, 23 February 2007 (UTC)[reply]

Take a look at some of the other Natural Sciences pages for inspiration, or vote on the proposals above. -- Jethero 05:46, 23 February 2007 (UTC)[reply]

Suggestions[edit]

The LEAD is awful. The references are not in WP standard form. There is a huge internal and external link farm. There are too many redlinks. It includes too much unreadable text for the beginning reader. I suggest that an introductory article be made, called Introduction to bioinformatics, as was done at evolution, quantum mechanics, general relativity and other technical science articles.--Filll 13:19, 26 July 2007 (UTC)[reply]

Please insert references that support the following claim: "The term bioinformatics was coined by Paulien Hogeweg in 1978 for the study of informatic processes in biotic systems". I was searching for them and I did not find any. —Preceding unsigned comment added by Daforerog (talkcontribs) 14:28, 17 February 2009 (UTC)[reply]

This seems like a very good idea. —Preceding unsigned comment added by 204.134.43.129 (talk) 22:10, 22 April 2011 (UTC)[reply]

This article is now at Good Article Review for possible delisting of its Good Article status. Concerns are listed at the good article review page. Please remember to assume good faith and improve the article to meet the Good Article criteria. -Malkinann 10:07, 26 July 2007 (UTC)[reply]

Editors can go to Wikipedia:Good article review#Bioinformatics to see what others have written, and to add their own comments. In that review, someone has already suggested making an introductory article. I can see us needing Introduction to general relativity because of the profundity of that topic, but Introduction to bioinformatics seems like overkill. Making the article better would eliminate the need for such an introduction. EdJohnston 15:58, 26 July 2007 (UTC)[reply]
If there was a firm committment to augment this article with much more explanatory material, this would obviate the need for an introductory article. This article is currently short enough that probably both advanced and introductory material could be accommodated in the same article.--Filll 16:00, 26 July 2007 (UTC)[reply]

This article has been delisted per consensus at WP:GA/R. The discussion, now in archive, can be seen here. Once the article is brought up to standards, it may be renominated at WP:GAC. Regards, Lara♥Love 15:19, 7 August 2007 (UTC)[reply]

Does the External Links section need cleanup?[edit]

Please give your opinion on the {{External links}} cleanup tag that was just added. From my limited personal knowledge, the items that are now listed under 'major organizations' and 'major journals' do in fact appear to be major ones for this field. At the recent GA review, no-one complained that there were too many external links in the article. Comments? EdJohnston 21:24, 21 August 2007 (UTC)[reply]

I recently submitted Bio-IT World for inclusion as a major journal in the field of bioinformatics, and hope that the editors of this post consider its value as an industry resource. Bio-ITWorld (talk) 17:56, 30 October 2008 (UTC)[reply]

No, first and foremost this is self promotion and is therefore ethically wrong, second I have been working in bioinformatics for ten years now and I never heard of your journal, so I doubt it is a major journal in the field. Looking at the website, it doesn't look like a peer reviewed scientific publications journal, but more a sort of business advertising thingy irrelevant to Bioinformatics article. Blastwizard (talk) 09:58, 31 October 2008 (UTC)[reply]

Modeling[edit]

I think "computational docking" and "protein structure prediction" belongs to "Modeling of biological systems". Should it be placed there?Biophys (talk) 01:55, 3 July 2008 (UTC)[reply]

I would agree since people in those fields in consuming bioinformatics generally make no pretense of contributing to algorithmic understanding of biological data. Specifically protein structure predictions usually are only concerned with a single molecule of significance and likewise with protein docking interfaces and the algorithms for these tasks seldom scale to large databases. 69.29.27.17 (talk) 02:09, 5 March 2009 (UTC)OK[reply]

Metabolic Pathways[edit]

Bioinformatics is also here to put some light on catabolic and anabolic pathways which are usually coded in the genes. How about documenting it explicitly folks? --84.157.227.183 (talk) 09:35, 25 July 2008 (UTC)[reply]

Recently, an editor added a link here to a List of bioinformatics companies. It seems perfectly reasonable to maintain such a list, but my fear is that it may accumulate spam, or to speak more delicately, it may attract 'less notable entries, added by company representatives.' Part of the problem is this preamble to the list:

The primary purpose of this list is to serve as a holding place for the identities of Bioinformatics companies, particularly those for which articles have not yet been created.

I have suggested over at Talk:List of bioinformatics companies that the list should *exclude* the companies that do not have articles. I'd welcome any comments on that article's talk page, either pro or con. EdJohnston (talk) 15:35, 14 August 2008 (UTC)[reply]

Out of place material[edit]

I have removed the following:

Gene finding typically refers to the area of computational biology that is concerned with algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically functional. This especially includes protein-coding genes, but may also include other functional elements such as RNA genes and regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

This was placed under "Analysis of gene expression" and does not match that topic. Please see the section on "Genome annotation" and improve that if you think it needs additional information.

Also:

Bioinformatics is the rapidly growing and developing field in computational science era. The major databases which are useful for life science research are NCBI, DDBJ, EMBL, TIGR, PDB, SWISS-PROT and TrEMBL. These databases are public databases, conduct research in computational biology, and develop software tools for analyzing genome and rpoteome data. With the rapidly emergence and vast development of this field, it has the bright perspectives in upcoming decades.[11]

This appears to be an "introduction" - it reads like the introduction to a student essay. The article already has a lead, you might consider making an addition there but I don't see what this contributes. I don't think adding a list of databases is appropriate without a relevant reference since it is biased - it's just what one writer thinks is useful for his own research.

— Preceding unsigned comment added by Madeleine Price Ball (talkcontribs) 06:11, 23 September 2009‎

I'm not sure, but does the description of Genome annotation appear once at the end of the sequence analysis section, and then in its own section? Seems like a repeat.--92.2.233.95 (talk) 17:10, 4 February 2012 (UTC)[reply]

Definition[edit]

"Bioinformatics is the application of information technology to the field of molecular biology. "

We currently are applying information technology to the field of (bological) taxonomy, and will be including profile data. Perhaps this definition might be made broader?

I am not so sure.
What kind of profile data do you have? Bobthefish2 (talk) 05:25, 20 February 2011 (UTC)[reply]
Oh my bad.. it seems this post was made ages ago. Bobthefish2 (talk) 05:27, 20 February 2011 (UTC)[reply]

Vague reference[edit]

Reference [1] "Bioinformatics Journal" seems a bit vague, as the page it directs to has none of the information cited in the Introduction. Anybody have any thoughts on this? gzur (talk) 13:28, 22 September 2011 (UTC)[reply]

Deleted it and the sentence it tagged, which was so broad as to be meaningless. I guess this talk page section serves as a record that Bioinformatics Journal might be a resource for expanding the article. --Danger (talk) 13:53, 22 September 2011 (UTC)[reply]

Start of Article[edit]

The start of the article has THREE lines of subject areas! I don't think this is necessary and it can (and is) summed up before it with "computer science." — Preceding unsigned comment added by 68.9.159.72 (talk) 05:30, 29 October 2011 (UTC)[reply]

Propose to add ISCB to external links[edit]

I propose to add the International Society for Computational Biology as an external link (http://www.iscb.org/). I think that this is the major society for the field that organises one of the largest conferences. Please let me know if you think this is innappropriate. Alexbateman (talk) 16:58, 29 February 2012 (UTC)[reply]

I consider the ISCB appropriate, but would be better to add it in the See Also section, as there is an article about it at International Society for Computational Biology. I have just added it, let's see what others think. --Mark viking (talk) 06:03, 3 February 2013 (UTC)[reply]

Disorganized Introduction section[edit]

The Introduction section seems long, rambling and somewhat repetitive. It could probably be improved by splitting into subsections, such as "History", "Approaches to modeling", etc. --Mark viking (talk) 06:05, 3 February 2013 (UTC)[reply]

Re-organization of this article[edit]

I suggest to reorganize this article by replacing the Major Research areas heading. Instead, I would group the existing sections into the following groups: (1) sequence analysis, (2) Gene and protein expression, (3) Regulation and networks, (4) Structural bioinformatics, i.e. using molecular structures, docking etc., (5) Text mining, (6) Image analysis, (7) Others (not sure if this is needed at this point but it may be later). All existing sections can be sorted into one of these groups. This will make navigation easier. If no one objects I will go ahead and reorganize things by mid February or so. Peteruetz (talk) 20:09, 30 January 2014 (UTC)[reply]

Yeah I think it'd be great to improve this article by cleaning it up a bit. Regarding your suggested headings, do you think text mining needs it own section? The current section doesn't really make a case for its specific significance in the field. Also, headings like "sequence analysis" could potentially be too broad, as it could cover things like genome annotation, alignment, assembly, variant calling, phylogenies… Maybe subheadings will be required. Anyway, yes I think a simplified structure such as the one you suggest would significantly improve the article. benmoore 21:12, 30 January 2014 (UTC)[reply]
Yes, "sequence analysis" is certainly very broad and would need a bunch of subheadings. In fact, some of the sections currently outside of "sequence analysis" should be moved to this subheading, e.g. Genome annotation or Comparative genomics (although one could argue that the latter also may cover structural bioinformatics, gene expression etc.). I will try to make those changes judiciously ;-) Peteruetz (talk) 19:42, 13 February 2014 (UTC)[reply]
Text mining: I would argue that a separate section makes sense. Text mining is a bit independent of other areas while it can be applied to pretty much all of these areas. For instance, you could mine texts for protein-protein interactions, for regulatory interactions, for phenotypes, disease assocations, functions, expression patterns, etc. So, it's more like a general methodology rather than a subject-specific area. Peteruetz (talk) 19:48, 13 February 2014 (UTC)[reply]
Text mining is an important part of bioinformatics in some areas, such as determining the significance of gene found in a genetic association study. Gene ontologies and pathway analysis can be based in part on text mining. Some secondary sources for text mining in bioinformatics are [1] and [2]. We also have an article Biomedical text mining which has some overlap with this subfield. --Mark viking (talk) 20:07, 13 February 2014 (UTC)[reply]
Ok great, I can see now why text mining is deserving of its own section in this article. benmoore 17:15, 20 February 2014 (UTC)[reply]

A cursory assessment[edit]

A few things I noticed (this may turn into a longer list):

  1. The annotation section is confused about whether it wants to link to Gene prediction or Genome project#Genome annotation. The latter refers to "attaching biological information" and BLAST, which implies that assigning putative function to identified fragments is part of annotation; however, the way the section in this article currently reads, you could come away thinking that annotation is only identifying where genes are, not what they might do (even in a heuristic way).
  2. There is a section "genetics of disease", which links to Genome-wide association studies. This is a myopic perspective - GWAS are also used for non-disease traits, most importantly in livestock breeding but also similarly in plants.
  3. Then we suddenly dive into oncogenomics, which is really a sub-topic of SNP identification. However, SNP identification was not mentioned in the article until I introduced it in the lede.
  4. Overall, there is a pattern of complete lack of structure - for instance, there is a section entitled "Others". Oh deary me?

I think one thing that is badly needed is determining, obviously based on WP:RS, a clear and restrictive definition of what is and isn't part of bioinformatics, otherwise this article will continue to devolve into a series of "me too" sections. Samsara (FA  FP) 09:31, 27 August 2014 (UTC)[reply]

I don't think such a bright line exists in terms of a definition of bioinformatics (though it would be great if it did). I think the closest thing the field had to a core text was Mount's Bioinformatics, but looking at it now it's very dated (2nd ed. 2004) and focuses almost exclusively on what we might now call "classical bioinformatics". The scope of the textbook might be a good starting point though, unless anyone knows of a more modern replacement (?). (As an aside, I have the first edition at home which might be a great source for expanding history section, potentially to the point of spinning off an article.) benmoore 13:53, 27 August 2014 (UTC)[reply]
The subtitle of that work, "sequence and genome analysis", could be used as part of the evidence for saying that bioinformatics now is a narrower term than it was originally intended to be. In Hogeweg's recent paper, there is a section that acknowledges exactly that:
But was our definition of bioinformatics as the study of informatic processes in biotic systems at multiple levels just an historical quirk, to be superseded by the common meaning of the term as denoting the development and use of computational methods for comparative analysis of genome data?
I think this is an important distinction to present in the article. Thoughts? Samsara (FA  FP) 11:10, 29 August 2014 (UTC)[reply]

No mention of metagenomics and analysis of genome structure data like Hi-C[edit]

I think these deserve to be mentioned somewhere. They were featured prominently at the last RECOMB conference. — Preceding unsigned comment added by Djh901 (talkcontribs) 20:07, 24 July 2016 (UTC)[reply]

Agree. I think Hi-C could come under "analysis of regulation" - this particular part of the article could do with being made more specific to differentiate it from gene expression, and I've made a start on that. Hi-C could also go in with "Structural bioinformatics" (which mostly contains protein folding currently), though it would need to be clear that Hi-C is lot more macroscopic.Jmc200 (talk) 11:17, 8 August 2016 (UTC)[reply]
I started a small paragraph about Hi-C - I'm not sure where it should go, so it's in its own section for the time being. Jmc200 (talk) 14:06, 8 August 2016 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 2 external links on Bioinformatics. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 22:54, 2 November 2016 (UTC)[reply]

Requested edit[edit]

Here I am writing to propose a change to this page that I believe to be appropriate and factual, but I am disclosing my COI, namely that I lead the development of BioCyc, and am paid to work on BioCyc by the institution that employs me (they have not directed me specifically to modify this page). Many peer-reviewed articles have been published about BioCyc, so I have no problem with peer-review of my Wikipedia contribution. The most recent article is: https://www.ncbi.nlm.nih.gov/pubmed/26527732. I also note that a Google search of "Metabolic Pathways" includes both KEGG and MetaCyc (one of the BioCyc databases) right after each other on the first page. You will note I have also made some other changes to the page to suggest additional databases that are among the most highly used databases in the bioinformatics field and not listing them is a great omission.

I'm proposing to add BioCyc as well as KEGG in the listing of metabolic pathway databases, as so. BioCyc and KEGG are both highly used databases in the field, as anyone in the field knows.

Used in Network Analysis: Metabolic Pathway Databases (KEGG, BioCyc), Interaction Analysis Databases, Functional Networks

Pkarp11 (talk) 20:44, 17 April 2017 (UTC)[reply]

 Done, since BioCyc has its own Wikipedia article. Altamel (talk) 06:55, 20 May 2017 (UTC)[reply]