![]() | This article is rated Start-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||
|
This article was the subject of a Wiki Education Foundation-supported course assignment, between 26 August 2019 and 11 December 2019. Further details are available
on the course page. Student editor(s):
Porterwi.
Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT ( talk) 00:31, 17 January 2022 (UTC)
Data extraction is the proper industry word, as it goes well with data mining, data farms, etc. Information Extraction, is a bit vague. Removing the merge suggestion. —Preceding unsigned comment added by Khazakistyle ( talk • contribs) 18:12, 12 August 2009 (UTC)
Should the link really be there? 66.93.3.210 05:41, 11 May 2007 (UTC)
I don't think that information extraction is a type of information retrieval, is it? Aren't they two seperate concepts? In text mining, one would retrieve unstructured text using search filters or querying databases (this is IR), and then the information would be structured (this bit being the IE)... 86.146.112.199 10:53, 4 January 2007 (UTC)
For instance, web is semistructured. Web can be source for IE, can't it? [2] [3] Vsatayamas 09:07, 7 January 2007 (UTC)
As an answer to the previous question, semistructured text can indeed be the source. IE should be connected to "Web", "wrappers" and eventually to "machine learning" and "wrapper induction" concepts. I made an initial contribution. Please discuss/contribute. —Preceding unsigned comment added by George1975 ( talk • contribs) 14:20, 12 February 2009 (UTC)
How are these different? Should these topics be mered? ---- CharlesGillingham ( talk) 22:40, 10 December 2007 (UTC)
I recently tried to clean this page up a bit as part of a broader effort to improve the quality of Wikipedia entries related to information retrieval (see my contributions for more examples of my individual efforts towards this end). Specifically, I hope we can collectively make an effort to keep this accurate and spam-free. Please feel free to contact me directly at dtunkelang at gmail dot com if you'd like to be part of this effort, which I've been rallying via my blog, The Noisy Channel.
Dtunkelang ( talk) 17:05, 26 October 2008 (UTC)
Can whoever keeps linking to ECHELON here stop? This is an entry about information extraction. While ECHELON may be applying information extraction, so are thousands of other projects. This is off topic, and seems motivated by activism, however well intended. Dtunkelang ( talk) 02:54, 13 January 2009 (UTC)
I believe this section should remain in the article and not be removed. There are free tools or services for IE, except for GATE, like Mallet, OpenCalais or CRF++ etc, that should be mentioned here. This is not far off topic. Of course, commercial tools or services should be removed. —Preceding unsigned comment added by George1975 ( talk • contribs) 14:59, 4 October 2009 (UTC)
DBpedia is a good example, I will not add it myself because WP:COI SebastianHellmann ( talk) 13:19, 10 January 2010 (UTC)
A section regarding the use of machine learning for information extraction should be added soon. I'll try to make some contributions. Could also be a separate Wiki article. George1975 ( talk) 10:29, 2 December 2009 (UTC)
This is really disappointing. I have been trying to contribute to this article for a long time now and I believe I am an expert on this field, but most of my contributions have been discarded by a single editor. I do not know this editor's expertise, but his arguments (off-topic, promotional, spam, etc.) are far from real. The PASCAL challenge is an important point of referernce in recent publications for information extraction. It is neither off-topic, nor promotional, while its source is reliable. The CRF++ is a well known tool (not yet "notable", that's why I tried to add it as an external link) used in many information extraction projects. Where did you see the off-topic, promotional, spam, etc? I am not sure whether it is for the best interest of Wikipedia to discourage authors from contributing. The current article for IE is really not well written and needs significant improvements. George1975 ( talk) 17:35, 2 December 2009 (UTC)
Whether a software tool, project or whatever else, is important for a topic, is not only a matter of whether its link meets some subjective requirements. I have been trying -as an "insider" to the topic- to update the article with information that I really find useful, without intending to promote anybody or to attract spam. I accepted your argument about notability. However, your arguments about off-topic, spam, etc. in the external links I added, can be considered as subjective. I am not willing to fight a lot more on this. If Wikipedia does not need my contribution, then I probably won't come back. George1975 ( talk) 18:06, 2 December 2009 (UTC)
The wiki article about Conditional Random Fields contains a list of external links, including CRF++ and Mallet (before the latter becomes a wiki entry). So it seems that the same external links are appropriate for one article (conditional random fields), while inappropriate for another (information extraction). This definitely creates an inconsistency. George1975 ( talk) 13:22, 20 January 2010 (UTC)
I took the liberty of making a fairly aggressive clean-up. Boggles my mind that this has been proposed for merging with data extraction, so I removed that. I doubt the article on fractionating petroleum includes discarding the vast quantity of water and sand pumped up with the crude oil in most oil fields. Data extraction is about discarding chaff. IE is about cracking for value.
I was specifically trying to make the lead more accessible to a non-specialist. I have a fair amount of background with NLP, almost none with IE, so I had to content myself with gluing existing material together in a different order.
Since I don't know the IE literature as such, I wasn't able to supply the references which this article badly needs, other than where I drew attention to IE as somewhat of a stop-gap measure in the era of the distinctly non-semantic web.
There's much left to be done. — MaxEnt 09:49, 27 March 2010 (UTC)
The document Peggy M. Andersen et al. "Automatic Extraction of Facts from Press Releases to Generate News Stories" is not found at http://acl.ldc.upenn.edu/A/A92/A92-1024.pdf It may be found at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.7943&rep=rep1&type=pdf or ttp://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.7943 Ronbarak ( talk) 08:40, 25 October 2011 (UTC)
A word is missing in the following sentance:
Ronbarak ( talk) 12:24, 23 January 2012 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on Information extraction. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018.
After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than
regular verification using the archive tool instructions below. Editors
have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
RfC before doing mass systematic removals. This message is updated dynamically through the template {{
source check}}
(last update: 5 June 2024).
Cheers.— InternetArchiveBot ( Report bug) 21:40, 13 November 2017 (UTC)
What is the relationship between Information Extraction and Document Layout Analysis? -- MartinThoma ( talk) 13:00, 27 August 2020 (UTC)