Readings on Knowledge Relationship Discovery: Januar 2009

I recently stumbled over the ACM SIG KDD 09 Call for Papers, which contains a excellent and comprehensive guid on writing an good research paper...at least for data intensive domains ;)

You can find the link here. The important part is also cited below:

" In writing your paper, we suggest you try to address the following questions, credited to George Heilmeier:

What are you trying to do? Articulate your objectives using absolutely no jargon.
How is it done today, and what are the limits of current practice?
What's new in your approach and why do you think it will be successful?
Who cares?
If you're successful, what difference will it make?
What are the risks and the payoffs? (in other words, what are the limitations and strengths of your work)
What
are the midterm and final "exams" to check for success? (in other
words, what are the measures of evaluation and evidence of success)

In light of the above principles, we suggest
the following guidelines for the paper content. Note that the headings
and the structure below are meant to be general categories; please
exercise your discretion and creativity to make the paper as
comprehensible as possible to the readers and reviewers.

Abstract

Try to include the following:

Motivation: one or two sentences on the problem and it significance;
Results: a short paragraph on approach and results;
Availability: a link to code, data, and supplementary materials,
or a statement why this is not possible.

Motivation & Significance

What is the problem and why is it important or significant?

Problem Statement

Formal definition of the problem with any preliminary concepts.

Prior Work & Limitations

What are the existing approaches, and their limitations?

Theory/Algorithm

Discuss the main theoretical or algorithmic ideas of the paper;
Mention the main theorems (if any), the intuition behind those, and their
practical application. Move the proofs to the appendix, unless the
proof itself is the main contribution;
Discuss your algorithmic solution (if any) at the conceptual level with
pseudo-code, to convey the main ideas. Move minute (but
practically important) implementation details to the appendix;
Discuss why you chose certain paths, and discuss unfruitful
paths that you discarded. In other words, give both the
theoretical and/or algorithmic "insights" into your work.

Experiments or other Evidence of Success

Complete parameter settings and data descriptions should be
provided (including any links to public resources);
Clearly specify the experimental procedure, including evaluation
measures;
Compare to prior solutions, or at least to "strawman" solutions;
Clearly discuss the results and what they mean;
Only include the most relevant experiments here, using the
appendix to provide any additional results (say on minor parameter
tuning of your method, etc).

Discussion and Future Work

Describe insights you gained, the limitations and applicability of
your work, and directions for future research. Every solution has
limitations, which should be explicitly mentioned.

References

Include the most relevant works, making sure all citations are complete
(including editors, publishers, page numbers, etc.).

APPENDIX

You should use the appendix for supporting details. For example, you
may use it to convey detailed technical/practical aspects of your
implementation. You may use the appendix for theorem proofs, or for
additional experimental results. Include include pointers in the
main paper to relevant sections in the appendix.

The appendix is an integral part of the paper, since it will provide
details that are important for a proper appreciation of your work
(e.g., for replicating or extending it, or for comparison).
However, it should be possible on a first read-through to get a good
understanding of the paper's contribution from the main part alone.
Structuring the paper in this way provides a service to the reader,
by separating main ideas from technical details."

I recently stumbled over a reasonable good survey on Statistical Machine Translation from Lopez [1] . Starting with the IBM Model 3 and 4 it explains the critical steps of machine translation like
1. selection of the translational model (e.g. Transducers, Synchronous Context Free grammars)
2. Parametrization of the model, i.e. what are the parameters which can be learned (e.g. fertility of words, word alignment etc.)
3. Parameter estimation, i.e. how to estimate the values of parametrization (e.g. using generative or discriminative statistical models)
4. Decoding, which is simply translating new text based on the selected and parametrized model

Overall, it contains some interesting detail insights on problems like how to deal with sequences and the difference between discriminative and generative statistical models (see also CRF Introduction). Worthy to read.

Open Source Resources:
[Moses] http://www.statmt.org/moses/
[Overview] http://opentranslation.aspirationtech.org/index.php/Open_Source_Translation_Tools

[1] Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3 (Aug. 2008), 1-49. DOI= http://doi.acm.org/10.1145/1380584.1380586

Readings on Knowledge Relationship Discovery

Donnerstag, 29. Januar 2009

Mini How-to write a KRD/KDD Research Paper

Abstract

Motivation & Significance

Problem Statement

Prior Work & Limitations

Theory/Algorithm

Experiments or other Evidence of Success

Discussion and Future Work

References

APPENDIX

Sonntag, 25. Januar 2009

Text classification datasets with splits

Freitag, 23. Januar 2009

Statistical Machine Translation

Follower

Blog-Archiv

Über mich