Expected Search Engine Scoring of Synonyms And sadly, every human being and corpus has subtly different senses of how/where these terms should be treated as more similar or less. Then they’re surprised that the search engine doesn’t prioritize them the way they think they should be prioritized. Many search teams throw these terms in, don’t think too deeply about how the search engine should work. You might see, for example: blue jeans,levis,jeans,denim jeans,denim A messy mish-mash of loosely related terms. Most people’s Solr and Elasticsearch synonyms files fit into this category. Maybe not exactly the same, but pretty close to interchangeable. A tourist, however, sees them as nearly the same kind of place to visit. To the historian, this is quite different from a palace, which is a fancy home for nobility. A historian sees a castle as a defensive structure, synonomous with a fortress. ‘Near’ depends on the search corpus, domain, user, and use cases. Though we humans see them as ‘nearly the same meaning’. To most, ‘palace’ has a different connotation than ‘castle’. Look at the synonyms for ‘castle’ you see this problem: château, estate, hacienda, hall, manor, manor house, manse, mansion, palace, villa Sounds the same as alternate term! In practice though, if you look at a thesaurus, the closeness can be messy and highly contextual. Synonyms (in the linguistic sense…)Ī synonym is defined by Websters as a word with an identical or nearly the same meaning. The default Elasticsearch/Solr synonym functionality ( SynonymQuery ) works pretty well for this use case. As you might guess, alternate terms is the easiest use case. A document containing ‘color’ is just as much about the search term ‘colour’. In the case of alternate terms, we want terms to receive exactly the same in scoring. Expected Search Engine Scoring of Alternate Terms You’ll see what we mean in the ‘Synonyms’ section below. This isn’t really a ‘synonym’ as you would find in a thesaurus. Tagging two terms as true alternate terms should be done with a great deal of care. It’s important to understand how strict this is. In these cases, search users see the terms as almost 100% overlapping in meaning.
0 Comments
Leave a Reply. |