<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7618678606233996595</id><updated>2011-09-12T12:02:58.097-07:00</updated><category term='guidelines'/><category term='I-Know'/><category term='Recurrent Networks'/><category term='sequence labeling'/><category term='Relational Learning'/><category term='sentiment detection'/><category term='text mining'/><category term='Graz'/><category term='must-read'/><category term='research paper'/><category term='Near duplicate detection'/><category term='information retrieval'/><category term='kdd'/><category term='Artifical Neuronal Network'/><category term='Presentation'/><category term='data set'/><category term='TEAM_IAPP_mgrani'/><category term='Author'/><category term='Members'/><category term='machine learning'/><category term='statistical significance testing'/><category term='linguistic'/><category term='Disambiguation'/><category term='Event'/><category term='sentence similarity'/><category term='metadata extraction'/><category term='Conditional Random Fields'/><category term='Recruitment'/><category term='Secondment'/><title type='text'>Readings on Knowledge Relationship Discovery</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>19</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-6388371116177978509</id><published>2011-09-12T11:58:00.000-07:00</published><updated>2011-09-12T12:02:58.104-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Members'/><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><category scheme='http://www.blogger.com/atom/ns#' term='Recruitment'/><title type='text'>Two new TEAM members</title><content type='html'>Two new TEAM members could be recruited by Mendeley and ELIKO in the TEAM project. Christian Prokopp starts at Mendeley working on topics models and their impact. Honghan Wu started at ELIKO and will focus on Linked Open Data and its usage in recommended systems. Their details can be found on the &lt;a href="http://team-project.tugraz.at/the-project/team/"&gt;TEAM Research Page.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We are happy to have both on board for bringing our research project forward!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-6388371116177978509?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/6388371116177978509/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/09/two-new-team-members.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6388371116177978509'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6388371116177978509'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/09/two-new-team-members.html' title='Two new TEAM members'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-6935301463858442865</id><published>2011-09-03T08:25:00.000-07:00</published><updated>2011-09-03T08:31:32.440-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='I-Know'/><category scheme='http://www.blogger.com/atom/ns#' term='Event'/><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><category scheme='http://www.blogger.com/atom/ns#' term='Graz'/><title type='text'>Kris Jack gives opening Keynote at the I-Know'11 Special Track on Recommendation, Data Sharing, and Research Practices in Science 2.0</title><content type='html'>Next week the I-Know'11 conferences started, where the TEAM team organised the special track on &lt;a href="http://i-know.tugraz.at/i-science/rdsrp"&gt;Recommendation, Data Sharing, and Research Practices in Science 2.0&lt;/a&gt;. Kris Jack from Mendeley will give the opening keynote of the special track on &lt;a href="http://i-know.tugraz.at/wp-content/uploads/2011/08/program_i-know_i-semantics_praxisforum_2011_final.pdf"&gt;"Mendeley: Crowed-Sourcing and Recommending Research on the Large Scale"&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I am looking forward to hear his talk. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-6935301463858442865?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/6935301463858442865/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/09/kris-jack-gives-opening-keynote-at-i.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6935301463858442865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6935301463858442865'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/09/kris-jack-gives-opening-keynote-at-i.html' title='Kris Jack gives opening Keynote at the I-Know&apos;11 Special Track on Recommendation, Data Sharing, and Research Practices in Science 2.0'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-6025608901519232446</id><published>2011-09-03T08:09:00.000-07:00</published><updated>2011-09-03T08:15:25.099-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><category scheme='http://www.blogger.com/atom/ns#' term='Disambiguation'/><category scheme='http://www.blogger.com/atom/ns#' term='Author'/><category scheme='http://www.blogger.com/atom/ns#' term='Presentation'/><title type='text'>Author Disambiguation Presentation @ TIR Workshop</title><content type='html'>This week i presented our work on author disambiguation at the &lt;a href="http://www.uni-weimar.de/medien/webis/research/events/tir-11/program.htm"&gt;TIR'11 workshop&lt;/a&gt; in Toulouse. The &lt;a href="http://www.uni-weimar.de/medien/webis/research/events/tir-11/talks/kern11-talk-model-selection-strategies-for-author-disambiguation.pdf"&gt;presentation&lt;/a&gt; outlined our findings on the need of very clean features and better model selection methods for disambiguating author names in the wild. Discussions raised the idea to use overlapping blocking methods and to apply outlier detection afterwards. That may be a nicer approach in solving the model selection problem.&lt;br /&gt;&lt;br /&gt;Of course the presentation is also uploaded to the Mendeley based TEAM folder for sharing our documents in the project&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-6025608901519232446?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/6025608901519232446/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/09/author-disambiguation-presentation-tir.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6025608901519232446'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6025608901519232446'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/09/author-disambiguation-presentation-tir.html' title='Author Disambiguation Presentation @ TIR Workshop'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-7538615708048810439</id><published>2011-08-30T09:38:00.000-07:00</published><updated>2011-08-30T09:40:27.623-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata extraction'/><category scheme='http://www.blogger.com/atom/ns#' term='Secondment'/><category scheme='http://www.blogger.com/atom/ns#' term='Near duplicate detection'/><title type='text'>All good things come to an end</title><content type='html'>My secondment has finally come to an end. Overall it was a great experience. From my point of view, knowledge transfer worked perfectly in both directions. The achieved results are quite nice, since we could show that title extraction works reliably with support vector machines to be used in subsequent de-duplication methods. Detailed results will follow soon in a publication and a deliverable. For de-duplication, the comparison between finger printing and inverted index based methods has been triggered and will now be continued by James Hammerton in Graz. Ago Luberg from Eliko also addresses similar topics, but in the different scenario of knowledge acquisition from the web. So my work on that topics will fortunately continue.&lt;br /&gt;&lt;br /&gt;Overall i see the TEAM project becoming more and more successful in establishing knowledge transfer among all participants.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-7538615708048810439?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/7538615708048810439/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/08/all-good-things-come-to-end.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7538615708048810439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7538615708048810439'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/08/all-good-things-come-to-end.html' title='All good things come to an end'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-1016638280198979513</id><published>2011-06-23T08:36:00.000-07:00</published><updated>2011-06-23T08:53:13.475-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata extraction'/><category scheme='http://www.blogger.com/atom/ns#' term='Near duplicate detection'/><title type='text'>Near-Duplicate-Detection, Metadata Extraction, Visualisation and other things</title><content type='html'>Since there is so much interesting work to do at the secondment, i tend to forget writing blog posts on what I did. So this one will sum up the last 1.5 months, where i have investigated topics like near duplicate detection, metadata extraction and gave talks on visualisation of bibliographic data and metadata extraction.&lt;br /&gt;&lt;br /&gt;Near duplicate detection is essential for the quality of a data set with many committers and sources, like it is the case in our use case. Fingerprinting is the traditional approach to near-duplicate detection, but it does not have the flexibility of a inverted index based approach. So i tried NDD using inverted indizes, in particular Lucene.  Lucene is really lightning fast, which allows to achieve title lookups on a set of a 40 million metadata entries in in less than 200ms while still being able to add new titles in real time. Results in terms of accuracy are also promising, although a comparison with fingerprinting is still open. One particular question in the context of research paper management is how to recover from metadata extraction errors? Given a pdf and automatically extracted titles and authors, how good can we recover from errors? What accuracy can be achieved by metadata extraction using state-of-the-art methods like Conditional Random Fields?&lt;br /&gt;&lt;br /&gt;Using Conditional Random Fields it was possible to rely on layout features like font-size, position etc. only to extract titles with an recall of appox. 0.8 and a precision of 0.7. Compared to state-of-the-art tools like ParsCit, which achieve similar accuracy but having more domain knowledge, that is quit good. Also experiments showed that metadata extraction, and hence near duplicate search, depend on the domain and the type of journal. Without re-training, ParsCit achieves great performance on IEEE Computer Science papers, but fails on Medical Papers from the BMJ. With the approach developed using Layout information, we can automatically adapt to different journal types and fields, which allows to improve accuracy and recall. I guess that is worth a publication. Working that out will take the rest of my secondment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-1016638280198979513?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/1016638280198979513/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/06/near-duplicate-detection-metadata.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/1016638280198979513'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/1016638280198979513'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/06/near-duplicate-detection-metadata.html' title='Near-Duplicate-Detection, Metadata Extraction, Visualisation and other things'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-7369553974328269858</id><published>2011-05-05T04:08:00.000-07:00</published><updated>2011-05-05T04:12:35.376-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><title type='text'>Near Duplicate Search on Sparse Metadata &amp; Learn2Rank</title><content type='html'>In the first month we focused on applying learning to rank to sparse metadata and to investigate the applicability of inverted index based metrics and lookups for near duplicate search. Results in both directions seem to be promising. Especially inverted indices using simple word grams or bi-word grams provide efficient near duplicate search facilities.  &lt;br /&gt;&lt;br /&gt;Next, learn to rank will find its way into tag recommendation in particular and recommendation in general. Basically, will learn to rank outperform traditional recommendation approaches? Lets see.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-7369553974328269858?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/7369553974328269858/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/05/near-duplicate-search-on-sparse.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7369553974328269858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7369553974328269858'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/05/near-duplicate-search-on-sparse.html' title='Near Duplicate Search on Sparse Metadata &amp; Learn2Rank'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-6490113335685575709</id><published>2011-04-07T11:33:00.000-07:00</published><updated>2011-04-07T11:51:11.817-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEAM_IAPP_mgrani'/><title type='text'>Starting my research stay at Mendeley</title><content type='html'>Finally i started my research stay at Mendeley. We settled and i got the first impressions on how everything is working here, the city, the people and of course, colleagues at Mendeley.&lt;br /&gt;&lt;br /&gt;The talks i did with people here helped to narrow down the research topics towards Learning-to-Rank and stochastic machine learning and identified possible application areas. Next week i will give a talk on that particular topic and maybe i will have first results. I am looking forward in presenting the possible research ideas and their application, as well as research topics my group in Graz is currently addressing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-6490113335685575709?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/6490113335685575709/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2011/04/starting-my-research-stay-at-mendeley.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6490113335685575709'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6490113335685575709'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2011/04/starting-my-research-stay-at-mendeley.html' title='Starting my research stay at Mendeley'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-7477397187962062244</id><published>2009-05-05T23:35:00.001-07:00</published><updated>2009-05-05T23:35:45.091-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='text mining'/><category scheme='http://www.blogger.com/atom/ns#' term='research paper'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical significance testing'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>[Topic] Centroid based Classification</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;[1] claimed a very high increase in accuracy due to the use of a different centroid weighting scheme. The weighting scheme extracts the "discriminative" features. The increase is around 0.7-0.10 F1 measure.&lt;br/&gt;&lt;br/&gt;[2] Analyzes centroid based learning approaches in detail. k-nn, c4.5 and centroid base approaches are compared. The success of centroid based approaches is explained as comparing the inter class similarity distribution (=Length of the centroid) vs. the average similarity of a new item to all documents (interpretation of centroid based cosine similarity). While the average similarity of a new items do not take term dependencies into account and suffer similar drawbacks than naive bayes algorithms (over estimate of positive term co-occurrences, underestimate of negative term co-occurrences), the second term (=centroid length) addresses the co-occurrence aspect. (see Section 5 of the paper for details)&lt;br/&gt;&lt;br/&gt;Further the paper provides: statistical testing of classification results (resampled t-test and sign test) &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;[1] &lt;a href='http://www2009.eprints.org/view/author/Guan=3AHu=3A=3A.html'&gt;&lt;span class='person_name'&gt;Guan, Hu&lt;/span&gt;&lt;/a&gt; and &lt;a href='http://www2009.eprints.org/view/author/Zhou=3AJingyu=3A=3A.html'&gt;&lt;span class='person_name'&gt;Zhou, Jingyu&lt;/span&gt;&lt;/a&gt; and &lt;a href='http://www2009.eprints.org/view/author/Guo=3AMinyi=3A=3A.html'&gt;&lt;span class='person_name'&gt;Guo, Minyi&lt;/span&gt;&lt;/a&gt; (2009) &lt;span style='padding: 0.25em 0em; display: block; font-size: 130%; font-weight: bold;'&gt;&lt;a href='http://www2009.eprints.org/21/'&gt;A Class-Feature-Centroid Classifier for Text Categorization.&lt;/a&gt;&lt;/span&gt; In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.&lt;br/&gt;&lt;br/&gt;&lt;a href='http://www2009.eprints.org/21/'&gt;http://www2009.eprints.org/21/&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;[2] Han, E. and Karypis, G. 2000. Centroid-Based Document Classification: Analysis and Experimental Results. In &lt;i&gt;Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery&lt;/i&gt; (September 13 - 16, 2000). D. A. Zighed, H. J. Komorowski, and J. M. Zytkow, Eds. Lecture Notes In Computer Science, vol. 1910. Springer-Verlag, London, 424-431. &lt;br/&gt;&lt;br/&gt;&lt;a href='http://portal.acm.org/citation.cfm?id=669671#'&gt;http://portal.acm.org/citation.cfm?id=669671&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=f1576886-8334-8c0a-8e15-ec188835b69b' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-7477397187962062244?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/7477397187962062244/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/05/topic-centroid-based-classification.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7477397187962062244'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7477397187962062244'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/05/topic-centroid-based-classification.html' title='[Topic] Centroid based Classification'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-482448940822331080</id><published>2009-05-01T02:57:00.001-07:00</published><updated>2009-05-01T02:57:36.990-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linguistic'/><category scheme='http://www.blogger.com/atom/ns#' term='data set'/><category scheme='http://www.blogger.com/atom/ns#' term='sentence similarity'/><title type='text'>[Paper] A comparative Study of Two Short Text Semantic Similarity Measures</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;    Book Series - &lt;br/&gt;    Book Title  - Agent and Multi-Agent Systems: Technologies and Applications&lt;br/&gt;    Chapter Title  - A Comparative Study of Two Short Text Semantic Similarity Measures&lt;br/&gt;    First Page  - 172&lt;br/&gt;    Last Page  - 181&lt;br/&gt;    Copyright  - 2008&lt;br/&gt;    Author  - James Oâ€™Shea&lt;br/&gt;    Author  - Zuhair Bandar&lt;br/&gt;    Author  - Keeley Crockett&lt;br/&gt;    Author  - David McLean&lt;br/&gt;    DOI  - 10.1007/978-3-540-78582-8_18&lt;br/&gt;    Link  - http://www.springerlink.com/content/v0867641u342pm2&lt;br/&gt;&lt;br/&gt;&lt;p class='AuthorGroup'&gt;James O’Shea&lt;sup&gt;1 &lt;a href='http://www.springerlink.com/content/v0867641u342pm28/#ContactOfAuthor1'&gt;&lt;img border='0' src='http://www.springerlink.com/images/contact.gif' alt='Contact Information'/&gt;&lt;/a&gt;&lt;/sup&gt;, Zuhair Bandar&lt;sup&gt;1 &lt;a href='http://www.springerlink.com/content/v0867641u342pm28/#ContactOfAuthor2'&gt;&lt;img border='0' src='http://www.springerlink.com/images/contact.gif' alt='Contact Information'/&gt;&lt;/a&gt;&lt;/sup&gt;, Keeley Crockett&lt;sup&gt;1 &lt;a href='http://www.springerlink.com/content/v0867641u342pm28/#ContactOfAuthor3'&gt;&lt;img border='0' src='http://www.springerlink.com/images/contact.gif' alt='Contact Information'/&gt;&lt;/a&gt;&lt;/sup&gt; and David McLean&lt;sup&gt;1 &lt;a href='http://www.springerlink.com/content/v0867641u342pm28/#ContactOfAuthor4'&gt;&lt;img border='0' src='http://www.springerlink.com/images/contact.gif' alt='Contact Information'/&gt;&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;      &lt;table&gt;         &lt;tbody&gt;            &lt;tr valign='top'&gt;               &lt;td&gt;&lt;span class='Affiliation'&gt;&lt;a name='Aff1'/&gt;(1) &lt;/span&gt;&lt;/td&gt;               &lt;td&gt;&lt;span class='Affiliation'&gt;Department of Computing and Mathematics, Manchester Metropolitan University, Chester St., Manchester, M1 5GD, United Kingdom&lt;/span&gt;&lt;/td&gt;            &lt;/tr&gt;         &lt;/tbody&gt;      &lt;/table&gt;&lt;a name='Abs1'/&gt;&lt;div class='Heading3'&gt;Abstract&lt;/div&gt;      &lt;div class='Abstract'&gt;This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer self-service, but achievements have been limited to simple FAQ systems. We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures. “Short texts” are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages. We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings. This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers. &lt;/div&gt;      &lt;p class='Keyword'&gt;&lt;span class='KeywordHeading'&gt;Keywords  &lt;/span&gt;Natural Language - Semantic Similarity - Dialogue Management - User Modeling - Benchmark - Sentence &lt;br/&gt;&lt;/p&gt;&lt;p class='Keyword'&gt;&lt;br/&gt;&lt;/p&gt;&lt;p class='Keyword'&gt;&lt;b&gt;Important Points&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Discussion and summary of different kinds of similarities (Taxonomic, related, goal derived and radial)&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Introduction of a (small) test corpora and how the corpora was created. This includes some discussion on how humans rate.&lt;/li&gt;&lt;li&gt;Statement that co-occurrence measures yield also high similarity values for antonyms&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=70af8193-faa7-8ff7-a368-daeeef4e0b64' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-482448940822331080?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/482448940822331080/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/05/paper-comparative-study-of-two-short.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/482448940822331080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/482448940822331080'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/05/paper-comparative-study-of-two-short.html' title='[Paper] A comparative Study of Two Short Text Semantic Similarity Measures'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-1359207890166611313</id><published>2009-04-21T01:51:00.001-07:00</published><updated>2009-04-21T01:51:48.847-07:00</updated><title type='text'>[Paper] Sentence Similarity Based on Semantic Nets and Corpus Statistics</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;&lt;b&gt;&lt;span class='heading'&gt;&lt;a name='abstract'&gt;&lt;br/&gt;&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;a name='abstract'&gt;Li, Y., McLean, D., Bandar, Z. A., O'Shea, J. D., and Crockett, K. 2006. Sentence Similarity Based on Semantic Nets and Corpus Statistics. &lt;i&gt;IEEE Trans. on Knowl. and Data Eng.&lt;/i&gt; 18, 8 (Aug. 2006), 1138-1150. DOI= http://dx.doi.org/10.1109/TKDE.2006.130 &lt;br/&gt;&lt;/a&gt;&lt;b&gt;&lt;span class='heading'&gt;&lt;a name='abstract'&gt;&lt;br/&gt;ABSTRACT&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;br/&gt;			&lt;br/&gt;				&lt;p class='abstract'&gt;&lt;br/&gt;Sentence similarity measures play an increasingly important role in&lt;br/&gt;text-related research and applications in areas such as text mining,&lt;br/&gt;Web page retrieval, and dialogue systems. Existing methods for&lt;br/&gt;computing sentence similarity have been adopted from approaches used&lt;br/&gt;for long text documents. These methods process sentences in a very&lt;br/&gt;high-dimensional space and are consequently inefficient, require human&lt;br/&gt;input, and are not adaptable to some application domains. This paper&lt;br/&gt;focuses directly on computing the similarity between very short texts&lt;br/&gt;of sentence length. It presents an algorithm that takes account of&lt;br/&gt;semantic information and word order information implied in the&lt;br/&gt;sentences. The semantic similarity of two sentences is calculated using&lt;br/&gt;information from a structured lexical database and from corpus&lt;br/&gt;statistics. The use of a lexical database enables our method to model&lt;br/&gt;human common sense knowledge and the incorporation of corpus statistics&lt;br/&gt;allows our method to be adaptable to different domains. The proposed&lt;br/&gt;method can be used in a variety of applications that involve text&lt;br/&gt;knowledge representation and discovery. Experiments on two sets of&lt;br/&gt;selected sentence pairs demonstrate that the proposed method provides a&lt;br/&gt;similarity measure that shows a significant correlation to human&lt;br/&gt;intuition. &lt;br/&gt;&lt;/p&gt;&lt;p class='abstract'&gt;&lt;br/&gt;&lt;/p&gt;&lt;p class='abstract'&gt;&lt;b&gt;Points made:&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Sentence Similartiy based on path length and depth in the WordNet hierarchy. &lt;br/&gt;&lt;/li&gt;&lt;li&gt;Word order similarity. The metric seems to work rather well.&lt;/li&gt;&lt;li&gt;Data Sets: Brown Corpus (Content&amp;amp;Statistical Information), WordNet (Semantics)&lt;/li&gt;&lt;li&gt;The paper points to  a dataset created for estimating sentence similarity &lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;Overall, good to read. Provides good resources to &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=c8bb5e05-99fa-852f-b14e-338f6c6b92d5' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-1359207890166611313?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/1359207890166611313/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/04/paper-sentence-similarity-based-on.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/1359207890166611313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/1359207890166611313'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/04/paper-sentence-similarity-based-on.html' title='[Paper] Sentence Similarity Based on Semantic Nets and Corpus Statistics'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-2172852187210802503</id><published>2009-04-21T01:26:00.001-07:00</published><updated>2009-04-21T01:26:45.389-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='text mining'/><category scheme='http://www.blogger.com/atom/ns#' term='sequence labeling'/><category scheme='http://www.blogger.com/atom/ns#' term='sentiment detection'/><title type='text'>Sentiment Detection and Opinion Mining</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;Sentiment detection and opinion mining is currently a hot topic of extracting subjective opinions and assertions from web portals.  A good survey is provided by &lt;a href='http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html'&gt;Pang and Lee&lt;/a&gt;. The survey addresses several aspects of this field. Most important in the field of knowledge discovery and text mining is the question how algorithms for analysing unstructured texts written by web users differs with standard text mining tasks like text classification, named entity recognition. From the survey i have taken the following points&lt;br/&gt;&lt;ul&gt;&lt;li&gt;A smaller number of classes compared to text classification (e.g. positive, ambivalent, negativ vs. Topic hierarchies)&lt;/li&gt;&lt;li&gt;Higher dependency on subjective writing stile (e.g. sacarsm)&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Higher dependency on common sense knowledge: Sentiments can be expressed using non sentiment words and comparing to very good/very bad situation (it feels like driving a car at 360 kmh)&lt;br/&gt;&lt;/li&gt;&lt;li&gt;High degree of subjectivity: Given the above sentence, some people may like it, some may not&lt;/li&gt;&lt;li&gt;Order effects might overthrough frequency effects&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Sentiment Tasks&lt;br/&gt;&lt;/b&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Polarity Opinion Classification: Deterine whether a piece of text is good or bad&lt;/li&gt;&lt;li&gt;Rating inference/ordinal regeression: Determine the scale of goodness/badness&lt;/li&gt;&lt;li&gt;Subjectivity Detection: Detect whether a piece of text contains subjective/objective material&lt;/li&gt;&lt;li&gt;Joint Topic/Sentiment analysis&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;b&gt;Facts&lt;/b&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Machine Learning using Unigram Models can achieve over 80% accuracy (Pang et. al. Thumbs Up! Sentiment Classification using Machine Learning)&lt;/li&gt;&lt;li&gt;Templates are more stable among domains (compared to IE)&lt;/li&gt;&lt;li&gt;Finding correct keywords expressing sentiments seems to be hard ("Go read the book" in movie vs. book domain)&lt;/li&gt;&lt;li&gt;Unclear whether bigrams help or not&lt;/li&gt;&lt;li&gt;POS Tagging can be considered as a rough version of WSD&lt;/li&gt;&lt;li&gt;Syntax has found to be usefull (dependency tree)&lt;/li&gt;&lt;li&gt;Negations count (as second feature, by transforming words e.g. NOT, deeper modelling)&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=58ec4cb3-cd48-8f06-94c3-9da943f9b8d0' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-2172852187210802503?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/2172852187210802503/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/04/sentiment-detection-and-opinion-mining.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2172852187210802503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2172852187210802503'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/04/sentiment-detection-and-opinion-mining.html' title='Sentiment Detection and Opinion Mining'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-2943498527077035801</id><published>2009-04-21T01:25:00.001-07:00</published><updated>2009-04-21T01:25:10.511-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='text mining'/><category scheme='http://www.blogger.com/atom/ns#' term='linguistic'/><category scheme='http://www.blogger.com/atom/ns#' term='sentence similarity'/><title type='text'>[Paper] Sentence Similartiy Based on Semantic Nets and Corpus Statistics</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;&lt;b&gt;&lt;span class='heading'&gt;&lt;a name='abstract'&gt;ABSTRACT&lt;/a&gt;&lt;/span&gt;&lt;/b&gt; 			 			  	 			  			  			 				&lt;p class='abstract'&gt; Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require human input, and are not adaptable to some application domains. This paper focuses directly on computing the similarity between very short texts of sentence length. It presents an algorithm that takes account of semantic information and word order information implied in the sentences. The semantic similarity of two sentences is calculated using information from a structured lexical database and from corpus statistics. The use of a lexical database enables our method to model human common sense knowledge and the incorporation of corpus statistics allows our method to be adaptable to different domains. The proposed method can be used in a variety of applications that involve text knowledge representation and discovery. Experiments on two sets of selected sentence pairs demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition. &lt;br/&gt;&lt;/p&gt;&lt;p class='abstract'&gt;&lt;br/&gt;&lt;/p&gt;&lt;p class='abstract'&gt;&lt;b&gt;Points made:&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Sentence Similartiy based on path length and depth in the WordNet hierarchy. &lt;br/&gt;&lt;/li&gt;&lt;li&gt;Word order similarity. The metric seems to work rather well.&lt;/li&gt;&lt;li&gt;Data Sets: Brown Corpus (Content&amp;amp;Statistical Information), WordNet (Semantics)&lt;/li&gt;&lt;li&gt;The paper points to  a dataset created for estimating sentence similarity &lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;Overall, good to read and very detailed. Provides good resources to sentence similarities&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=0c6f6da4-ee59-82ee-9ac1-168df1de3277' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-2943498527077035801?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/2943498527077035801/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/04/paper-sentence-similartiy-based-on.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2943498527077035801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2943498527077035801'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/04/paper-sentence-similartiy-based-on.html' title='[Paper] Sentence Similartiy Based on Semantic Nets and Corpus Statistics'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-7685561126304778600</id><published>2009-02-26T03:45:00.001-08:00</published><updated>2009-02-26T03:45:45.021-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='text mining'/><category scheme='http://www.blogger.com/atom/ns#' term='Artifical Neuronal Network'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Recurrent Networks'/><title type='text'>Recurrent Neural Networks for Robust Real-World Text Classiﬁcation</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;Garen Arevian&lt;br/&gt;2007 IEEE/WIC/ACM International Conference on Web Intelligence&lt;br/&gt;&lt;br/&gt;&lt;b&gt;&lt;span class='heading'&gt;&lt;a name='abstract'&gt;ABSTRACT&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;br/&gt;			&lt;br/&gt;			  	&lt;br/&gt;			 &lt;br/&gt;			 &lt;br/&gt;			&lt;br/&gt;				&lt;p class='abstract'&gt;&lt;br/&gt;This paper explores the application of recurrent neural networks for&lt;br/&gt;the task of robust text classification of a real-world benchmarking&lt;br/&gt;corpus. There are many well-established approaches which are used for&lt;br/&gt;text classification, but they fail to address the challenge from a more&lt;br/&gt;multi-disciplinary viewpoint such as natural language processing and&lt;br/&gt;artificial intelligence. The results demonstrate that these recurrent&lt;br/&gt;neural networks can be a viable addition to the many techniques used in&lt;br/&gt;web intelligence for tasks such as context sensitive email&lt;br/&gt;classification and web site indexing. &lt;b&gt;&lt;br/&gt;&lt;/b&gt;&lt;/p&gt;&lt;p class='abstract'&gt;&lt;b&gt;Noteworthy&lt;br/&gt;&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;b/&gt;Use of recurrent neural networks (Elman Networks) with a context layer, able to consider word orders&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Further references for NN's in text mining&lt;/li&gt;&lt;li&gt;Title based semantic representation (at least pointers to prior literature on the topic)&lt;/li&gt;&lt;li&gt;Word order was not important&lt;br/&gt;&lt;/li&gt;&lt;li&gt;The claim made that NNs acn outperform other classifiers is very strong and does not hold in general&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p class='abstract'&gt;&lt;br/&gt;&lt;/p&gt;&lt;p class='abstract'&gt;&lt;br/&gt;&lt;/p&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=eccb7c2f-ae87-4ab0-86c7-9b6c7ff457cf' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-7685561126304778600?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/7685561126304778600/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/02/recurrent-neural-networks-for-robust.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7685561126304778600'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/7685561126304778600'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/02/recurrent-neural-networks-for-robust.html' title='Recurrent Neural Networks for Robust Real-World Text Classiﬁcation'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-3760194963166924515</id><published>2009-02-16T09:06:00.001-08:00</published><updated>2009-02-16T09:06:19.174-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='information retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical significance testing'/><category scheme='http://www.blogger.com/atom/ns#' term='must-read'/><title type='text'>Information Retrieval System Evalution: Effort, Sensitivity, and Reliabilitiy</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability&lt;br/&gt;Mark Sanderson, Justin Zobel&lt;br/&gt;&lt;br/&gt;The paper is excellent in terms of comparing IR Systems and the difference in MAP and other measures. A must read for evaluation.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Abstract&lt;/b&gt;: The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests over-estimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. This investigation shows that assessor effort would be better spent building test collections with more topics, each assessed in less detail.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Important Aspects Covered:&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Brief introduction to statistical significance testing in IR (how and why)&lt;/li&gt;&lt;li&gt;Summary of results found by Zobel and Vorhees/Buckley:  8-9% MAP difference on 25 topics (conf = 95%)m, 5-6% MAP difference on 50 topics (conf = 95%)&lt;/li&gt;&lt;li&gt;Impact of significance testing on projecting MAP accuracy. &lt;br/&gt;&lt;/li&gt;&lt;li&gt;Large difference in MAP does not necessarily imply a statistical significant difference, especially on small topic set sizes (e.g. 25). At its worst, comparison must be significant and the difference for MAP must be higher than 10%.&lt;/li&gt;&lt;li&gt;T-Test produces lower error rates than sign and Wilcoxon test&lt;br/&gt;&lt;/li&gt;&lt;li&gt;MAP is more reliable than P@10, but building a reliably P@10 only collection should be cheaper (From an assessors point of view). However, the stability of shallow pool sizes is unclear, not yet tested.&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=0943e715-857f-4341-ae45-05663ecc4c50' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-3760194963166924515?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/3760194963166924515/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/02/information-retrieval-system-evalution.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/3760194963166924515'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/3760194963166924515'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/02/information-retrieval-system-evalution.html' title='Information Retrieval System Evalution: Effort, Sensitivity, and Reliabilitiy'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-2417514098774450313</id><published>2009-01-29T08:45:00.001-08:00</published><updated>2009-01-29T08:45:19.032-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research paper'/><category scheme='http://www.blogger.com/atom/ns#' term='guidelines'/><category scheme='http://www.blogger.com/atom/ns#' term='kdd'/><title type='text'>Mini How-to write a KRD/KDD Research Paper</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;I recently stumbled over the ACM SIG KDD 09 Call for Papers, which contains a excellent and comprehensive guid on writing an good research paper...at least for data intensive domains ;)&lt;br/&gt;&lt;br/&gt;You can find the link &lt;a href='http://www.sigkdd.org/kdd2009/papers.html#researchcfp'&gt;here&lt;/a&gt;. The important part is also cited below:&lt;br/&gt;&lt;br/&gt;&lt;p&gt;" In writing your paper, we suggest you try to address the following questions,  credited to George Heilmeier:&lt;/p&gt;&lt;br /&gt;                &lt;ul&gt;&lt;li&gt;What are you trying to do? Articulate your objectives using absolutely no jargon.&lt;/li&gt;&lt;li&gt;How is it done today, and what are the limits of current practice?&lt;/li&gt;&lt;li&gt;What's new in your approach and why do you think it will be successful?&lt;/li&gt;&lt;li&gt;Who cares?&lt;/li&gt;&lt;li&gt;If you're successful, what difference will it make?&lt;/li&gt;&lt;li&gt;What are the risks and the payoffs? (in other words, what are the limitations and strengths of your work)&lt;/li&gt;&lt;li&gt;What&lt;br /&gt;are the midterm and final "exams" to check for success? (in other&lt;br /&gt;words, what are the measures of evaluation and evidence of success)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;                &lt;p&gt;In light of the above principles, we &lt;em&gt;suggest&lt;/em&gt;&lt;br /&gt;the following guidelines for the paper content. Note that the headings&lt;br /&gt;and the structure below are meant to be general categories; please&lt;br /&gt;exercise your discretion and creativity to make the paper as&lt;br /&gt;comprehensible as possible to the readers and reviewers.&lt;/p&gt;&lt;br /&gt;                &lt;h4&gt;Abstract&lt;/h4&gt;&lt;br /&gt;                  Try to include the following:&lt;br /&gt;                  &lt;ul&gt;&lt;li&gt;Motivation: one or two sentences on the problem and it significance;&lt;/li&gt;&lt;li&gt;Results: a short paragraph on approach and results;&lt;/li&gt;&lt;li&gt;Availability: a link to code, data, and supplementary materials,&lt;br /&gt;                  or a statement why this is not possible.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;                &lt;h4&gt;Motivation &amp;amp; Significance&lt;/h4&gt;&lt;br /&gt;                  &lt;p&gt;What is the problem and why is it important or significant?&lt;/p&gt;&lt;br /&gt;                &lt;h4&gt;Problem Statement&lt;/h4&gt;&lt;br /&gt;                  &lt;p&gt;Formal definition of the problem with any preliminary concepts.&lt;/p&gt;&lt;br /&gt;                &lt;h4&gt;Prior Work &amp;amp; Limitations&lt;/h4&gt;&lt;br /&gt;                  &lt;p&gt;What are the existing approaches, and their limitations?&lt;/p&gt;&lt;br /&gt;                &lt;h4&gt;Theory/Algorithm&lt;/h4&gt;&lt;br /&gt;                  &lt;ul&gt;&lt;li&gt;Discuss the main theoretical or algorithmic ideas of the paper;&lt;/li&gt;&lt;li&gt;Mention the main theorems (if any), the intuition behind those, and their&lt;br /&gt;                  practical application. Move the proofs to the appendix, unless the&lt;br /&gt;                  proof itself is the main contribution;&lt;/li&gt;&lt;li&gt;Discuss your algorithmic solution (if any) at the conceptual level with&lt;br /&gt;                  pseudo-code, to convey the main ideas. Move minute (but&lt;br /&gt;                  practically important) implementation details to the appendix;&lt;/li&gt;&lt;li&gt;Discuss why you chose certain paths, and discuss unfruitful&lt;br /&gt;                  paths that you discarded. In other words, give both the&lt;br /&gt;                  theoretical and/or algorithmic "insights" into  your work.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;                &lt;h4&gt;Experiments or other Evidence of Success&lt;/h4&gt;&lt;br /&gt;                  &lt;ul&gt;&lt;li&gt;Complete parameter settings and data descriptions should be&lt;br /&gt;                  provided (including any links to public resources);&lt;/li&gt;&lt;li&gt;Clearly specify the experimental procedure, including evaluation&lt;br /&gt;                  measures;&lt;/li&gt;&lt;li&gt;Compare to prior solutions, or at least to "strawman" solutions;&lt;/li&gt;&lt;li&gt;Clearly discuss the results and what they mean;&lt;/li&gt;&lt;li&gt;Only include the most relevant experiments here, using the&lt;br /&gt;                  appendix to provide any additional results (say on minor parameter&lt;br /&gt;                  tuning of your method, etc).&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;                &lt;h4&gt;Discussion and Future Work&lt;/h4&gt;&lt;br /&gt;                  &lt;p&gt;Describe insights you gained, the limitations and applicability of&lt;br /&gt;                  your work, and directions for future research. Every solution has&lt;br /&gt;                  limitations, which should be explicitly mentioned.&lt;/p&gt;&lt;br /&gt;                &lt;h4&gt;References&lt;/h4&gt;&lt;br /&gt;                  &lt;p&gt;Include the most relevant works, making sure all citations are complete&lt;br /&gt;                  (including editors, publishers, page numbers, etc.).&lt;/p&gt;&lt;br /&gt;                &lt;h4&gt;APPENDIX&lt;/h4&gt;&lt;br /&gt;                  &lt;p&gt;You should use the appendix for supporting details. For example, you&lt;br /&gt;                  may use it to convey detailed technical/practical aspects of your&lt;br /&gt;                  implementation. You may use the appendix for theorem proofs, or for&lt;br /&gt;                  additional experimental results. Include include pointers in the&lt;br /&gt;                  main paper to relevant sections in the appendix.&lt;/p&gt;&lt;br /&gt;                &lt;p&gt; The appendix &lt;em&gt;is&lt;/em&gt; an integral part of the paper, since it will provide&lt;br /&gt;                  details that are important for a proper appreciation of your work&lt;br /&gt;                  (e.g., for replicating or extending it, or for comparison).&lt;br /&gt;                  However, it should be possible on a first read-through to get a good&lt;br /&gt;                  understanding of the paper's contribution from the main part alone.&lt;br /&gt;                  Structuring the paper in this way provides a service to the reader,&lt;br /&gt;                  by separating main ideas from technical details."&lt;/p&gt;&lt;br/&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-2417514098774450313?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/2417514098774450313/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/01/mini-how-to-write-krdkdd-research-paper.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2417514098774450313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2417514098774450313'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/01/mini-how-to-write-krdkdd-research-paper.html' title='Mini How-to write a KRD/KDD Research Paper'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-2387848370370912748</id><published>2009-01-25T09:19:00.000-08:00</published><updated>2009-01-25T09:20:43.131-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='text mining'/><category scheme='http://www.blogger.com/atom/ns#' term='data set'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Text classification datasets with splits</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;...can be found &lt;a href='http://www.cs.uiuc.edu/homes/dengcai2/Data/TextData.html'&gt;here.&lt;/a&gt; It is the 20 Newsgroup data set and the TDT2.&lt;br/&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-2387848370370912748?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/2387848370370912748/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/01/text-classification-datasets-with.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2387848370370912748'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/2387848370370912748'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/01/text-classification-datasets-with.html' title='Text classification datasets with splits'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-6617179639241810494</id><published>2009-01-23T00:15:00.001-08:00</published><updated>2009-01-23T00:15:30.029-08:00</updated><title type='text'>Statistical Machine Translation</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;I recently stumbled over a reasonable good survey on Statistical Machine Translation from &lt;a href='http://portal.acm.org/citation.cfm?id=1380586&amp;amp;jmp=cit&amp;amp;coll=portal&amp;amp;dl=ACM#' target='_blank'&gt;Lopez [1]  &lt;/a&gt;. Starting with the IBM Model 3 and 4 it explains the critical steps of machine translation like&lt;br/&gt;1. selection of the translational model (e.g. Transducers, Synchronous Context Free grammars) &lt;br/&gt;2. Parametrization of the model, i.e. what are the parameters which can be learned (e.g. fertility of words, word alignment etc.)&lt;br/&gt;3. Parameter estimation, i.e. how to estimate the values of parametrization (e.g. using generative or discriminative statistical models)&lt;br/&gt;4. Decoding, which is simply translating new text based on the selected and parametrized model&lt;br/&gt;&lt;br/&gt;Overall, it contains some interesting detail insights on problems like how to deal with sequences and the difference between discriminative and generative statistical models (see also &lt;a href='http://readingsonkrd.blogspot.com/2008/12/conditional-random-fields-and-graphical.html' target='_blank'&gt;CRF Introduction&lt;/a&gt;). Worthy to read.&lt;br/&gt;&lt;br/&gt;Open Source Resources:&lt;br/&gt;[Moses] http://www.statmt.org/moses/&lt;br/&gt;[Overview] http://opentranslation.aspirationtech.org/index.php/Open_Source_Translation_Tools&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;[1]  Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3 (Aug. 2008), 1-49. DOI= http://doi.acm.org/10.1145/1380584.1380586&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-6617179639241810494?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/6617179639241810494/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2009/01/statistical-machine-translation.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6617179639241810494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/6617179639241810494'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2009/01/statistical-machine-translation.html' title='Statistical Machine Translation'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-8737232384188308697</id><published>2008-12-25T05:17:00.001-08:00</published><updated>2008-12-25T05:17:43.590-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Conditional Random Fields'/><category scheme='http://www.blogger.com/atom/ns#' term='Relational Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='must-read'/><title type='text'>Conditional Random Fields and Graphical Models</title><content type='html'>&lt;div xmlns='http://www.w3.org/1999/xhtml'&gt;I am currently reading into the topic of Conditional Random Fields, Hidden Markov Models in particular and Graphical Models in general. A perfect &lt;a href='http://www.bibsonomy.org/bibtex/22d49b7d1b3c2f6c4584b60390727405c/mgrani'&gt;tutorial&lt;/a&gt; on this topics is provided by Sutton and McCallum [1]. It is more than readworthy and gives&lt;br/&gt;&lt;ul&gt;&lt;li&gt;a basic introduction to relational learning for graphical models&lt;/li&gt;&lt;li&gt;an overview on CRF and HMM as well as their application &lt;br/&gt;&lt;/li&gt;&lt;li&gt;Generative vs. descriptive models (i.e. Naive Bayes vs. Linear Regression)&lt;/li&gt;&lt;li&gt;Parameter estimation, backward-forward estimation and application of gneral and linear chain CRF's&lt;/li&gt;&lt;li&gt;Application of general CRF's (i.e. skip chain CRF) for information extraction&lt;/li&gt;&lt;/ul&gt;A must read for everybody interested in information extraction and relational learning state-of-the-art&lt;br/&gt;&lt;br/&gt;[1] &lt;a href='http://www.bibsonomy.org/bibtex/22d49b7d1b3c2f6c4584b60390727405c/mgrani'&gt;Introduction to Conditional Random Fields for Relational Learning&lt;/a&gt;&lt;div class='bmdesc'&gt;&lt;span style='color: rgb(85, 85, 85);'&gt;Charles &lt;a href='http://www.bibsonomy.org/author/Sutton'&gt;Sutton&lt;/a&gt;  and Andrew &lt;a href='http://www.bibsonomy.org/author/Mccallum'&gt;Mccallum&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;MIT Press, &lt;/em&gt;(&lt;em&gt;2006&lt;/em&gt;) &lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-8737232384188308697?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/8737232384188308697/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2008/12/conditional-random-fields-and-graphical.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/8737232384188308697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/8737232384188308697'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2008/12/conditional-random-fields-and-graphical.html' title='Conditional Random Fields and Graphical Models'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7618678606233996595.post-458650296265355272</id><published>2008-12-21T03:10:00.000-08:00</published><updated>2008-12-21T03:14:00.541-08:00</updated><title type='text'>What is it for?</title><content type='html'>This blog gathers stuff I read on knowledge relationship discovery or in more detail on machine learning, natural language processing, semantic technologies and all those stuff. &lt;br /&gt;Entries will be mostly used for personal recall of stuff i stumbled over in the past, so all entries may not be self contained and may require some know-how in the above fields. However, if you can take away something you are welcome&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7618678606233996595-458650296265355272?l=readingsonkrd.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsonkrd.blogspot.com/feeds/458650296265355272/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://readingsonkrd.blogspot.com/2008/12/what-is-it-for.html#comment-form' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/458650296265355272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7618678606233996595/posts/default/458650296265355272'/><link rel='alternate' type='text/html' href='http://readingsonkrd.blogspot.com/2008/12/what-is-it-for.html' title='What is it for?'/><author><name>grani</name><uri>http://www.blogger.com/profile/01859478032669302450</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
