What is Sentence Transformers?Sentence embeddings? Semant...

What is Sentence Transformers?
Sentence embeddings? Semantic search? Cosine similarity?!?! 😱 Just a few short weeks ago, these terms were so confusing to me that they made my head spin. I’d heard that Sentence Transformers was a powerful and versatile library for working with language and image data and I was eager to play around with it, but I was worried that I would be out of my depth. As it turns out, I couldn’t have been more wrong!

Sentence Transformers is among the libraries that Hugging Face integrates with, where it’s described with the following:

Compute dense vector representations for sentences, paragraphs, and images

In a nutshell, Sentence Transformers answers one question: What if we could treat sentences as points in a multi-dimensional vector space? This means that ST lets you give it an arbitrary string of text (e.g., “I’m so glad I learned to code with Python!”), and it’ll transform it into a vector, such as [0.2, 0.5, 1.3, 0.9]. Another sentence, such as “Python is a great programming language.”, would be transformed into a different vector. These vectors are called “embeddings,” and they play an essential role in Machine Learning. If these two sentences were embedded with the same model, then both would coexist in the same vector space, allowing for many interesting possibilities.

What makes ST particularly useful is that, once you’ve generated some embeddings, you can use the built-in utility functions to compare how similar one sentence is to another, including synonyms! 🤯 One way to do this is with the “Cosine Similarity” function. With ST, you can skip all the pesky math and call the very handy util.cos_sim function to get a score from -1 to 1 that signifies how “similar” the embedded sentences are in the vector space they share – the bigger the score is, the more similar the sentences are!