Skip Menu |

This queue is for tickets about the WordNet-Similarity CPAN distribution.

Report information
The Basics
Id: 86444
Status: open
Priority: 0/
Queue: WordNet-Similarity

People
Owner: Nobody in particular
Requestors: TPEDERSE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: possible bug in hso, strong matching of compounds
This was reported by Hideki Shima of CMU. ----------------------------------------------------- (4) HSO: strong match with compound words ----------------------------------------------------- According to the definition from the paper by Hirst and St-Onge, "any link between two synsets if one word is a compound word or phrase that includes the other word" is a "strong relation" (score of 16). For example, two synsets 01124794 (n) and 01125562 (n) have a hypernym/hyponym link between them, and words associated with these synsets are compound (government <--> misgovernment). So following the definition, I think there is a "strong relation" between the two synsets. Now, using word-pos-sensenumber notation, the synset 01124794 (n) can be represented as government#n#2 etc, and the other synset 01125562 (n) can be represented in two ways: "misgovernment#n#1" and "misrule#n#1" (using WordNet 3.0). WordNet::Similarity gives different results for different wps of same synset: The relatedness of government#n#2 and misgovernment#n#1 using hso is 16. The relatedness of government#n#2 and misrule#n#1 using hso is 4. I was wondering if the line 329 in hso.pm: if($word1 =~ /$word2/ || $word2 =~ /$word1/) { should ideally be a comparison between all words associated with the synsets, rather than the words from wps notation. Below are some more examples. protocol#n#1 tcp/ip#n#1(=transmission_control_protocol/internet_protocol#n#1) company#n#1 ltd.#n#1(=limited_company#n#1) cell_phone#v#1 call#v#3(=phone#v#1) This phenomenon is also very rare and has not been observed in 10k randomly generated noun-noun pairs of synsets.
problem has been documented in TODO list of WordNet-Similarity 2.07 patches are welcome :)