Topic:  Metric measurements in Word Sense Disambiguity, and a new resolving method.

Abstract:
  Word Sense Disambiguation (WSD) is the process of resolving the meaning of a
word unambiguously in a given natural language context.  This is a major issue in 
computing handling natural language processing.  
  Training computers to catalogue and read the written word is a difficult task.  One major 
obstacle is the inherent ambiguity of the english (or other) languages.  Simple words such as 
"bat" or "bank" have many multiple meanings, as well as parts of speech.  A bat under one 
context is an instrument used to play a game, while in another it is a flying mammal.  Bank 
similarly corresponds to a financial institution, a building, or a side of a river.  Humans 
read and easily choose the meaning of the word, based on the context of its usage, based on 
past readings and experience, computers on the other hand do not have the same type of memory 
as us, and cannot reason well enough to know which sense this word should be.
  A few WSD programs and algorithms have been created to try to deal with this problem.
I will discuss a few of the methods in use, and grade them using a set of information metrics,
and discuss another approach to solving the problem.
  These methods are currently used in search applications, to try and determine more 
specifically what the user wants, and is increasingly being used in applications to make them
more user friendly, to predict what the user is wanting, and to create an easier, natural 
language based user interface.







Basic Metrics Information for Information Retrieval, classification.
  The same metrics that are used for WSD are used for similar information retrieval topics such as 
content categorization (placing a article into a group based on the content of the article),
author determination (choosing the author of an unknown article), machine translation (translating
words and text to other languages correctly), and fact or information extraction.


The metrics used in information extraction are borrowed from information retrieval, 
recall and precision. These metrics focus on how well the system performs on identifying 
the relevant information. Recall is the percentage of relevant information that is 
correctly reported by the system. Precision is the percentage of the information 
reported as relevant by the system that is correct. 



The measures that we chose for the evaluation of our methods are those typically used in the language
engineering and machine learning literature: recall,precision and accuracy. The recall measure counts 
the number of words that are assigned the correct sense, out of the total number of words to be 
assigned a sense. This corresponds to the ratio of true positive examples to the total number of 
positives in the test data. On the other hand, precision counts the number of words assigned the
correct sense, out of the number of word-senses considered positive by the decision tree, i.e., 
the ratio of true positive to true and false positive examples. In addition to these two measures 
the percentage correct classification (accuracy), which is a standard measure for machine learning 
methods is used. In summary the three ratios:
  recall = TP/P,
  precision = TP/(TP+FP),
  accuracy = (TP+TN)/(P+N),
where TP/FP and TN/FN stand for True/False Positive and
True/False Negative and P/N for Positive/Negative
examples.



Senseval - Early successful software and semiannual conference
  http://www.cs.unt.edu/~rada/senseval


Mini-Bibliography

Learning Rules for Large Vocabulary Word Sense Disambiguation
  http://iit.demokritos.gr/~paliourg/papers/IJCAI99.pdf
  (With many references to see)

Performance Metrics for Word Sense Disambiguation
  http://www.alta.asn.au/events/altss_w2003_proc/altw/papers/cohn-final.pdf

A STATISTICAL METHOD FOR WORD SENSE DISAMBIGUATION - Thesis paper 283 pages
  http://crl.nmsu.edu/Research/Pubs/MCCS/pdf/mccs-95-283.pdf

Evaluating the results of a memory-based word-expert approach to
unrestricted word sense disambiguation.
  http://ixa.si.ehu.es/Ixa/local/meaning-workshop/papers/walter.pdf

Combining Contextual Features forWord Sense Disambiguation
  http://acl.ldc.upenn.edu/W/W02/W02-0813.pdf

Maximizing Semantic Relatedness to Perform Word Sense Disambiguation
  http://www.d.umn.edu/~tpederse/Pubs/max-sem-relate.pdf

A Perspective on Word Sense Disambiguation Methods and Their Evaluation
  http://www.georgetown.edu/faculty/ard8/Ling361/Resnik-Yarowsky.pdf

Word Sense Disambiguation: The State of the Art
  http://www.up.univ-mrs.fr/~veronis/pdf/1998wsd.pdf

Using Wordnet Lexical Database and Internet to Disambiguate Word Senses.
  http://glotta.ece.ntua.gr/nlp/publications/UsingInternettoDisambiguateWordSenses.pdf
  
Word-for-Word Glossing with Contextually Similar Words (Machine Translation)
  http://www.isi.edu/~pantel/Download/Papers/naacl00.pdf

http://www.informatics.sussex.ac.uk/users/juliewe/senseval3.pdf
http://www.stanford.edu/~lukeb/wsd.pdf


Senseval and Wordnet are major resources, also OMCS has a dismbiguity project as well (maybe OM word expert).

Also here:
http://engr.smu.edu/~rada/wnb/

Going to study 
Rada Mihalcea

Data Sets for OMWE http://teach-computers.org/downloads.html

Google search to find these
http://www.google.com/search?hl=en&lr=&q=metrics+recall+precision+accuracy+PDF+Word+Sense+Disambiguation