Share and discover more about AI with social posts from the community.huggingface/OpenAi
Token Classification

Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models could be trained to identify specific entities in a text, such as dates, individuals and places; and PoS tagging would identify, for example, which words in a text are verbs, nouns, and punctuation marks.
https://huggingface.co/tasks/token-classification
Text-Generation Generating text is the task of generating new text given another text. These models can, for example, fill in incomplete text or paraphrase.
Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness.

About Text Classification
https://youtu.be/leNG9fN9FQU

Use Cases
Sentiment Analysis on Customer Reviews
You can track the sentiments of your customers from the product reviews using sentiment analysis models. This can help understand churn and retention by grouping reviews by sentiment, to later analyze the text and make strategic decisions based on this knowledge.https://huggingface.co/tasks/text-classification
Table Question Answering (Table QA) is the answering a question about an information on a given table.
About Table Question Answering
Use Cases
SQL execution
You can use the Table Question Answering models to simulate SQL execution by inputting a table.

Table Question Answering
Table Question Answering models are capable of answering questions based on a table.

Task Variants
This place can be filled with variants of this task if there's any.

Inference
You can infer with TableQA models using the 🤗 Transformers library.

https://huggingface.co/tasks/table-question-answering
Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.

About Summarization
https://youtu.be/yHnr5Dk2zCI

Use Cases
Research Paper Summarization 🧐
Research papers can be summarized to allow researchers to spend less time selecting which articles to read. There are several approaches you can take for a task like this:

Use an existing extractive summarization model on the Hub to do inference.
Pick an existing language model trained for academic papers. This model can then be trained in a process called fine-tuning so it can solve the summarization task.
Use a sequence-to-sequence model like T5 for abstractive text summarization.
https://huggingface.co/tasks/summarization
Sentence Similarity is the task of determining how similar two texts are. Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between them. This task is particularly useful for information retrieval and clustering/grouping.

About Sentence Similarity

https://youtu.be/VCZq5AkbNEU

Use Cases 🔍
Information Retrieval
You can extract information from documents using Sentence Similarity models. The first step is to rank documents using Passage Ranking models. You can then get to the top ranked document and search it with Sentence Similarity models by selecting the sentence that has the most similarity to the input query.https://huggingface.co/tasks/sentence-similarity
Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context!

About Question Answering
https://youtu.be/ajPx5LwJD-I

Use Cases
Frequently Asked Questions
You can use Question Answering (QA) models to automate the response to frequently asked questions by using a knowledge base (documents) as context. Answers to customer questions can be drawn from those documents.

⚡️⚡️ If you’d like to save inference time, you can first use passage ranking models to see which document might contain the answer to the question and iterate over that document with the QA model instead.https://huggingface.co/tasks/question-answering
Fill-Mask
Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in.

About Fill-Mask
https://youtu.be/mqElG5QJWUg

https://huggingface.co/tasks/fill-mask
Feature extraction is the task of extracting features learnt in a model.

About Feature Extraction
Use Cases
Transfer Learning
Models trained on a specific dataset can learn features about the data. For instance, a model trained on an English poetry dataset learns English grammar at a very high level. This information can be transferred to a new model that is going to be trained on tweets. This process of extracting features and transferring to another model is called transfer learning. One can pass their dataset through a feature extraction pipeline and feed the result to a classifier.
https://huggingface.co/tasks/feature-extraction
Reinforcement learning is the computational approach of learning from action by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback.

About Reinforcement Learning
https://www.youtube.com/watch?v=q0BiUn5LiBc
Gaming
Reinforcement learning is known for its application to video games. Since the games provide a safe environment for the agent to be trained in the sense that it is perfectly defined and controllable, this makes them perfect candidates for experimentation and will help a lot to learn about the capabilities and limitations of various RL algorithms.

but over iterations, the agent gets better and better with each episode of the training. This paper mainly investigates the performance of RL in popular games such as Minecraft or Dota2. https://huggingface.co/tasks/reinforcement-learning
Visual Question Answering is the task of answering open-ended questions based on an image. They output natural language responses to natural language questions.
https://huggingface.co/tasks/visual-question-answering
About Visual Question Answering
Use Cases
Aid the Visually Impaired Persons
VQA models can be used to reduce visual barriers for visually impaired individuals by allowing them to get information about images from the web and the real world.

Education
VQA models can be used to improve experiences at museums by allowing observers to directly ask questions they interested in.

Improved Image Retrieval
Visual question answering models can be used to retrieve images with specific characteristics. For example, the user can ask "Is there a dog?" to find all images with dogs from a set of images.

Video Search
Specific snippets/timestamps of a video can be retrieved based on search queries. For example, the user can ask "At which part of the video does the guitar appear?" and get a specific timestamp range from the whole video. What is Visual Question Answering? - Hugging Face
Video-Text-to-Text
No summary available for this task. Contribute by adding a nice description!

Placeholder page
This is a placeholder page for the Video-Text-to-Text task. It is incomplete and is open to community contributions.

How can I contribute ?
You can contribute to the various sections of this page, including datasets, metrics and models related to this task by opening a pull request to this repository.
https://huggingface.co/tasks/video-text-to-text
Image-text-to-text models take in an image and text prompt and output text. These models are also called vision-language models, or VLMs. The difference from image-to-text models is that these models take an additional text input, not restricting the model to certain use cases like image captioning, and may also be trained to accept a conversation as input.

About Image-Text-to-Text
https://youtu.be/IoGaGfU1CIg
Different Types of Vision Language Models
Vision language models come in three types:

Base: Pre-trained models that can be fine-tuned. A good example of base models is the PaliGemma models family by Google.
Instruction: Base models fine-tuned on instruction datasets. A good example of instruction fine-tuned models is idefics2-8b.
Chatty/Conversational: Base models fine-tuned on conversation datasets. A good example of chatty models is deepseek-vl-7b-chat.
https://huggingface.co/tasks/image-text-to-text
Document Question Answering (also known as Document Visual Question Answering) is the task of answering questions on document images. Document question answering models take a (document, question) pair as input and return an answer in natural language. Models usually rely on multi-modal features, combining text, position of words (bounding-boxes) and image.

About Document Question Answering
Use Cases
Document Question Answering models can be used to answer natural language questions about documents. Typically, document QA models consider textual, layout and potentially visual information. This is useful when the question requires some understanding of the visual aspects of the document. Nevertheless, certain document QA models can work without document images. Hence the task is not limited to visually-rich documents and allows users to ask questions based on spreadsheets, text PDFs, etc!
https://huggingface.co/tasks/document-question-answering
Tabular regression is the task of predicting a numerical value given a set of attributes.
About Tabular Regression
About the Task
Tabular regression is the task of predicting a numerical value given a set of attributes/features. Tabular meaning that data is stored in a table (like an excel sheet), and each sample is contained in its own row. The features used to predict our target can be both numerical and categorical. However, including categorical features often requires additional preprocessing/feature engineering (a few models do accept categorical features directly, like CatBoost). An example of tabular regression would be predicting the weight of a fish given its' species and length.https://huggingface.co/tasks/tabular-regression
Text-to-Speech
Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages.

About Text-to-Speech
https://youtu.be/NW62DpzJ274

Use Cases
Text-to-Speech (TTS) models can be used in any speech-enabled application that requires converting text to speech imitating human voice.
Automatic Speech Recognition
Automatic Speech Recognition (ASR), also known as Speech to Text (STT), is the task of transcribing a given audio to text. It has many applications, such as voice user interfaces.

About Automatic Speech Recognition
https://youtu.be/TksaY_FDgnk
Use Cases
Virtual Speech Assistants
Many edge devices have an embedded virtual assistant to interact with the end users better. These assistances rely on ASR models to recognize different voice commands to perform various tasks. For instance, you can ask your phone for dialing a phone number, ask a general question, or schedule a meeting.
https://huggingface.co/tasks/automatic-speech-recognition
Audio-to-Audio
Audio-to-Audio is a family of tasks in which the input is an audio and the output is one or multiple generated audios. Some example tasks are speech enhancement and source separation.
Use Cases
Speech Enhancement (Noise removal)
Speech Enhancement is a bit self explanatory. It improves (or enhances) the quality of an audio by removing noise. There are multiple libraries to solve this task, such as Speechbrain, Asteroid and ESPNet. Here is a simple example using Speechbrain
https://huggingface.co/tasks/audio-to-audio
Audio Classification
Audio classification is the task of assigning a label or class to a given audio. It can be used for recognizing which command a user is giving or the emotion of a statement, as well as identifying a speaker.
About Audio Classification

Use Cases
Command Recognition
Command recognition or keyword spotting classifies utterances into a predefined set of commands. This is often done on-device for fast response time.