HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
the Community Computer Vision Course
Dear learner,

Welcome to the community-driven course on computer vision. Computer vision is revolutionizing our world in many ways, from unlocking phones with facial recognition to analyzing medical images for disease detection, monitoring wildlife, and creating new images. Together, we’ll dive into the fascinating world of computer vision!

Throughout this course, we’ll cover everything from the basics to the latest advancements in computer vision. It’s structured to include various foundational topics, making it friendly and accessible for everyone. We’re delighted to have you join us for this exciting journey!

On this page, you can find how to join the learners community, make a submission and get a certificate, and more details about the course!https://huggingface.co/learn/computer-vision-course/unit0/welcome/welcome
the 🤗 Deep Reinforcement Learning Course

Welcome to the most fascinating topic in Artificial Intelligence: Deep Reinforcement Learning.

This course will teach you about Deep Reinforcement Learning from beginner to expert. It’s completely free and open-source!

In this introduction unit you’ll:

Learn more about the course content.
Define the path you’re going to take (either self-audit or certification process).
Learn more about the AI vs. AI challenges you’re going to participate in.
Learn more about us.
Create your Hugging Face account (it’s free).
Sign-up to our Discord server, the place where you can chat with your classmates and us (the Hugging Face team).
Let’s get started!
https://huggingface.co/learn/deep-rl-course/unit0/introduction Welcome to the 🤗 Deep Reinforcement Learning Course - Hugging Face Deep RL Course
NLP Course
This course will teach you about natural language processing using libraries from the HF ecosystem.

This course will teach you about natural language processing (NLP) using libraries from the Hugging Face ecosystem — 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate — as well as the Hugging Face Hub. It’s completely free and without ads.
https://huggingface.co/learn/nlp-course/chapter1/1

Natural Language Processing
Ask a Question
Before jumping into Transformer models, let’s do a quick overview of what natural language processing is and why we care about it.
https://huggingface.co/learn/nlp-course/chapter1/2?fw=pt

Transformers, what can they do?
In this section, we will look at what Transformer models can do and use our first tool from the 🤗 Transformers library: the pipeline() function.
https://huggingface.co/learn/nlp-course/chapter1/3?fw=pt Introduction - Hugging Face NLP Course
Image-to-3D models take in image input and produce 3D output.

About Image-to-3D
Use Cases
Image-to-3D models can be used in a wide variety of applications that require 3D, such as games, animation, design, architecture, engineering, marketing, and more.

https://huggingface.co/tasks/image-to-3d
Text-to-3D models take in text input and produce 3D output.
This task is similar to the image-to-3d task, but takes text input instead of image input. In practice, this is often equivalent to a combination of text-to-image and image-to-3d. That is, the text is first converted to an image, then the image is converted to 3D.

Generating Meshes
Meshes are the standard representation of 3D in industry.

Generating Gaussian Splats
Gaussian Splatting is a rendering technique that represents scenes as fuzzy points.https://huggingface.co/tasks/text-to-3d
Zero-shot image classification is the task of classifying previously unseen classes during training of a model.
About the Task
Zero-shot image classification is a computer vision task to classify images into one of several classes, without any prior training or knowledge of the classes.

Zero shot image classification works by transferring knowledge learnt during training of one model, to classify novel classes that was not present in the training data. So this is a variation of transfer learning. For instance, a model trained to differentiate cars from airplanes can be used to classify images of ships.

The data in this learning paradigm consists of

Seen data - images and their corresponding labels
Unseen data - only labels and no images
Auxiliary information - additional information given to the model during training connecting the unseen and seen data. This can be in the form of textual description or word embeddings.
Unconditional image generation is the task of generating images with no condition in any context (like a prompt text or another image). Once trained, the model will create images that resemble its training data distribution.
About Unconditional Image Generation
About the Task
Unconditional image generation is the task of generating new images without any specific input. The main goal of this is to create novel, original images that are not based on existing images. This can be used for a variety of applications, such as creating new artistic images, improving image recognition algorithms, or generating photorealistic images for virtual reality environments.

Unconditional image generation models usually start with a seed that generates a random noise vector. The model will then use this vector to create an output image similar to the images used for training the model.

https://huggingface.co/tasks/unconditional-image-generation
Text-to-video models can be used in any application that requires generating consistent sequence of images from text.

Use Cases
Script-based Video Generation
Text-to-video models can be used to create short-form video content from a provided text script. These models can be used to create engaging and informative marketing videos. For example, a company could use a text-to-video model to create a video that explains how their product works.

Content format conversion
Text-to-video models can be used to generate videos from long-form text, including blog posts, articles, and text files. Text-to-video models can be used to create educational videos that are more engaging and interactive. An example of this is creating a video that explains a complex concept from an article.

Voice-overs and Speech
Text-to-video models can be used to create an AI newscaster to deliver daily news, or for a film-maker to create a short film or a music video.
https://huggingface.co/tasks/text-to-video
Text-to-Image
Generates images from input text. These models can be used to generate and modify images based on text prompts.
Use Cases
Data Generation
Businesses can generate data for their their use cases by inputting text and getting image outputs.

Immersive Conversational Chatbots
Chatbots can be made more immersive if they provide contextual images based on the input provided by the user.

Creative Ideas for Fashion Industry
Different patterns can be generated to obtain unique pieces of fashion. Text-to-image models make creations easier for designers to conceptualize their design before actually implementing it.
https://huggingface.co/tasks/text-to-image
Video classification is the task of assigning a label or class to an entire video. Videos are expected to have only one class for each video. Video classification models take a video as input and return a prediction about which class the video belongs to.

https://huggingface.co/tasks/video-classification
Mask generation is the task of generating masks that identify a specific object or region of interest in a given image. Masks are often used in segmentation tasks, where they provide a precise way to isolate the object of interest for further processing or analysis.

About Mask Generation
Use Cases
Filtering an Image
When filtering for an image, the generated masks might serve as an initial filter to eliminate irrelevant information. For instance, when monitoring vegetation in satellite imaging, mask generation models identify green spots, highlighting the relevant region of the image.
Image to text models output a text from a given image. Image captioning or optical character recognition can be considered as the most common applications of image to text.

About Image-to-Text
Use Cases
Image Captioning
Image Captioning is the process of generating textual description of an image. This can help the visually impaired people to understand what's happening in their surroundings.

Optical Character Recognition (OCR)
OCR models convert the text present in an image, e.g. a scanned document, to text.https://huggingface.co/tasks/image-to-text
Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models.
About Image-to-Image
Use Cases
Style transfer
One of the most popular use cases of image-to-image is style transfer. Style transfer models can convert a normal photography into a painting in the style of a famous painter.https://huggingface.co/tasks/image-to-image
Image feature extraction is the task of extracting features learnt in a computer vision model.
About Image Feature Extraction
Use Cases
Transfer Learning
Models trained on a specific dataset can learn features about the data. For instance, a model trained on a car classification dataset learns to recognize edges and curves on a very high level and car-specific features on a low level. This information can be transferred to a new model that is going to be trained on classifying trucks. This process of extracting features and transferring to another model is called transfer learning.
https://huggingface.co/tasks/image-feature-extraction
Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image. Image classification models take an image as input and return a prediction about which class the image belongs to.
Use Cases
Image classification models can be used when we are not interested in specific instances of objects with location information or their shape.

Keyword Classification
Image classification models are used widely in stock photography to assign each image a keyword.

Image Search
Models trained in image classification can improve user experience by organizing and categorizing photo galleries on the phone or in the cloud, on multiple keywords or tags.
About Image Classification
https://youtu.be/tjAIM7BOYhw
https://huggingface.co/tasks/image-classification
Depth estimation is the task of predicting depth of the objects present in an image.

About Depth Estimation
Use Cases
Depth estimation models can be used to estimate the depth of different objects present in an image.

Estimation of Volumetric Information
Depth estimation models are widely used to study volumetric formation of objects present inside an image. This is an important use case in the domain of computer graphics.

3D Representation
Depth estimation models can also be used to develop a 3D representation from a 2D image.
https://huggingface.co/tasks/depth-estimation
Zero-shot text classification is a task in natural language processing where a model is trained on a set of labeled examples but is then able to classify new examples from previously unseen classes.
About Zero-Shot Classification
About the Task
Zero Shot Classification is the task of predicting a class that wasn't seen by the model during training. This method, which leverages a pre-trained language model, can be thought of as an instance of transfer learning which generally refers to using a model trained for one task in a different application than what it was originally trained for. This is particularly useful for situations where the amount of labeled data is small.

In zero shot classification, we provide the model with a prompt and a sequence of text that describes what we want our model to do, in natural language. Zero-shot classification excludes any examples of the desired task being completed. This differs from single or few-shot classification, as these tasks include a single or a few examples of the selected task.

Zero, single and few-shot classification seem to be an emergent feature of large language models. This feature seems to come about around model sizes of +100M parameters. The effectiveness of a model at a zero, single or few-shot task seems to scale with model size, meaning that larger models (models with more trainable parameters or layers) generally do better at this task.https://huggingface.co/tasks/zero-shot-classification What is Zero-Shot Classification? - Hugging Face
Translation is the task of converting text from one language to another.

About Translation

https://youtu.be/1JvfrvZgi6c

Use Cases
You can find over a thousand Translation models on the Hub, but sometimes you might not find a model for the language pair you are interested in. When this happen, you can use a pretrained multilingual Translation model like mBART and further train it on your own data in a process called fine-tuning.
https://huggingface.co/tasks/translation