How To Compare Two Strings By Meaning?
Solution 1:
Comparing the meaning of two string is still an ongoing research. If you really want to solve the problem (or to get really good performance of your language modal) you should consider get a PhD.
For out of box solution at the time: I found this Github repo that implement google's BERT modal and use it to get the embedding of two sentences. In theory, the two sentence share the same meaning if there embedding is similar.
https://github.com/UKPLab/sentence-transformers
# the following is simplified from their README.md
embedder = SentenceTransformer('bert-base-nli-mean-tokens')
# Corpus with example sentences
S1 = ['A man is eating a food.']
S2 = ['A man is eating pasta.']
s1_embedding = embedder.encode(S1)
s2_embedding = embedder.encode(S2)
dist = scipy.spatial.distance.cdist([s1_embedding], [s2_embedding], "cosine")[0]
Exampleoutput (copied from their README.md)
Query: Amaniseatingpasta.
Top5mostsimilarsentencesincorpus:
Amaniseatingapieceofbread. (Score: 0.8518)
Amaniseatingafood. (Score: 0.8020)
Amonkeyisplayingdrums. (Score: 0.4167)
Amanisridingahorse. (Score: 0.2621)
Amanisridingawhitehorseonanenclosedground. (Score: 0.2379)
Solution 2:
To compare two strings by meaning, the strings would need to be convert first to a tensor and then evalutuate the distance or similarity between the tensors. Many algorithm can be used to convert strings to tensors - all related to the domain of interest. But the Universal Sentence Encoder is a wide broad sentence encoder that will project all words in one dimensional space. The cosine similarity can be used to see how closed some words are in meaning.
Example
Though king
and kind
are closed in hamming distance (difference of only one character), they are very different. Whereas queen
and king
though they seems not related (because all characters are different) are close in meaning. Therefore the distance (in meaning) between king
and queen
should be smaller than between king
and kind
as demonstrated in the following snippet.
<scriptsrc="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script><scriptsrc="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script><script>
(async() => {
const model = await use.load();
const embeddings = (await model.embed(['queen', 'king', 'kind'])).unstack()
tf.losses.cosineDistance(embeddings[0], embeddings[1], 0).print() // 0.39812755584716797
tf.losses.cosineDistance(embeddings[1], embeddings[2], 0).print() // 0.5585797429084778
})()
</script>
Post a Comment for "How To Compare Two Strings By Meaning?"