Share and discover more about AI with social posts from the community.huggingface/OpenAi
Full shadcn PR is here!

All I did was take 10 components, add them to the sandbox, and send docs on how to use them to the LLM to generate code using them.

Definitely room for improving this approach and adding more components though!
https://github.com/Nutlope/llamacoder/pull/15
kobart-trans-en-ko-v2
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.7926
Bleu: 5.3159
Model description
More information needed

Intended uses & limitations
More information needed

Training and evaluation data
More information needed

Training procedure
Training hyperparameters
The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3
Training results
Framework versions
Transformers 4.28.1
Pytorch 2.0.0
Datasets 2.12.0
Tokenizers 0.13.3
https://huggingface.co/dylanmengzhou/kobart-trans-en-ko-v2 dylanmengzhou/kobart-trans-en-ko-v2 · Hugging Face
Prompt Generator by ByteWave:
Welcome to the official repository of Prompt Generator, a powerful tool for effortlessly generating prompts for Large Language Models (LLMs) by ByteWave.

About Prompt Generator:
Prompt Generator is designed to streamline the process of generating text prompts for LLMs. Whether you are a content creator, researcher, or developer, this tool empowers you to create effective prompts quickly and efficiently.

Features:
Easy-to-use interface
Fast prompt generation
Customizable prompts for various LLMs
https://huggingface.co/ByteWave/prompt-generator ByteWave/prompt-generator · Hugging Face
ChatGPT Prompt Generator
This model is a fine-tuned version of BART-large on a ChatGPT prompts dataset. It achieves the following results on the evaluation set:

Train Loss: 2.8329
Validation Loss: 2.5015
Epoch: 4
Intended uses & limitations
You can use this to generate ChatGPT personas. Simply input a persona like below:

from transformers import BartForConditionalGeneration, BartTokenizer

example_english_phrase = "photographer"
batch = tokenizer(example_english_phrase, return_tensors="pt")
generated_ids = model.generate(batch["input_ids"], max_new_tokens=150)
output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Training procedure
Training hyperparameters
The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: float32
https://huggingface.co/merve/chatgpt-prompts-bart-long merve/chatgpt-prompts-bart-long · Hugging Face
Model Card for Model ID
Overview This model is a conversational AI designed to engage in natural language interactions with users. It is based on the Causal Language Modeling (CLM) architecture and has been fine-tuned on conversational datasets to generate coherent and contextually relevant responses.

Usage To use this model, you can interact with it via the Hugging Face Inference API. Provide a text prompt, and the model will generate a response based on the given input.

Intended Use This model is intended for various conversational applications, including chatbots, virtual assistants, and dialogue systems. It can be deployed in environments where human-like interactions are required, such as customer service, educational platforms, or entertainment applications.

Limitations and Ethical Considerations While this model is capable of generating human-like responses, it may occasionally produce outputs that are inappropriate, offensive, or misleading. It is essential to monitor its responses and ensure responsible deployment to mitigate potential harms.

License The model is released under the Apache License 2.0, which allows for both commercial and non-commercial use with proper attribution.

Acknowledgments This model was trained using the Hugging Face Transformers library and fine-tuned on conversational datasets. We acknowledge the contributions of the open-source community and the developers of the Transformers library.

Contact Information For inquiries or feedback regarding this model, please contact [your contact information].

References Provide any relevant references, citations, or links to resources used in training or developing this model.
https://huggingface.co/VAIBHAV22334455/JARVIS VAIBHAV22334455/JARVIS · Hugging Face
chatgpt-gpt4-prompts-bart-large-cnn-samsum
This model generates ChatGPT/BingChat & GPT-3 prompts and is a fine-tuned version of philschmid/bart-large-cnn-samsum on an this dataset. It achieves the following results on the evaluation set:

Train Loss: 1.2214
Validation Loss: 2.7584
Epoch: 4
Streamlit
This model supports a Streamlit Web UI to run the chatgpt-gpt4-prompts-bart-large-cnn-samsum model:Open In HF Spaces

Training hyperparameters
The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: float32
https://huggingface.co/Kaludi/chatgpt-gpt4-prompts-bart-large-cnn-samsum Kaludi/chatgpt-gpt4-prompts-bart-large-cnn-samsum · Hugging Face
Model Card for deberta-v3-base-prompt-injection
There is a newer version of the model - protectai/deberta-v3-base-prompt-injection-v2.

This model is a fine-tuned version of microsoft/deberta-v3-base on multiple combined datasets of prompt injections and normal prompts.

It aims to identify prompt injections, classifying inputs into two categories: 0 for no injection and 1 for injection detected.

It achieves the following results on the evaluation set:

Loss: 0.0010
Accuracy: 0.9999
Recall: 0.9997
Precision: 0.9998
F1: 0.9998
Model details
Fine-tuned by: Laiyer.ai
Model type: deberta-v3
Language(s) (NLP): English
License: Apache license 2.0
Finetuned from model: microsoft/deberta-v3-base
Intended Uses & Limitations
It aims to identify prompt injections, classifying inputs into two categories: 0 for no injection and 1 for injection detected.

The model's performance is dependent on the nature and quality of the training data. It might not perform well on text styles or topics not represented in the training set.
It took Google’s Transformer model from 2017 a whopping $900 to train. 💸

This in contrast to the $191 million Google spent on Gemini Ultra sounds like a bargain! 💰

Gemini Ultra required 50 billion petaFLOPS (one petaFLOP equals one quadrillion FLOPs). 🤖
Compared to OpenAI’s GPT-4, which required 21 billion petaFLOPS, at a cost of $78 million. 💡

2017: Original Transformer Model: $930 [@Google ] 💻
2018: BERT-Large: $3,288 [@Google] 📚
2019: RoBERTa Large: 160k [@Meta] 🌐
2020: GPT-3(175B): $4.32M [@OpenAI] 🧠
2023: Llama 2 70B: $3.93M [@Meta] 🐑
2023: GPT-4: $78.35M [@OpenAI] 🌟
Now, Gemini Ultra: $191.4M [@Google] 🚀

This forms an exponential curve! 🤯

But, why? 🤔
Compute, data, and expertise. All three come at a great cost! ⚙️📊💡

Google recently made Gemini-1.5-Flash fine-tuning free, as it's almost impossible for regular businesses to justify an in-house trained foundational model! 🆓

This barrier of cost is going to result in fewer new foundational models/less competition and more fine-tunes! 📉🔄

Data [Stanford University’s 2024 AI Index Report]: https://aiindex.stanford.edu/report/
Graphic: https://voronoiapp.com/technology/Googles-Gemini-Ultra-Cost-191M-to-Develop--1088

Many thanks to everyone spending tons of resources and open-sourcing the models! 🤗 Google’s Gemini Ultra Cost $191M to Develop ✨
Most LLMs are not reproducible because the underlying deep neural networks are not. Because that's something LLM creators don't care about. We do, and ours are reproducible, including our GenAI that uses GAN.

All you have to do is allow the user to specify the seeds of the random number generators involved. First, you need a good random generator you have full control over. Better than numpy.random. See ours, with infinite period and one line of code, faster and better than what's in Python and elsewhere. Here is the link: https://mltblog.com/4fGDLu0
Code now available for "Using Large Language Models for Hyperparameter Optimization" at https://github.com/michaelrzhang/LLM-HyperOpt 🚨

TLDR: You can just ask LLMs which hyperparameters to use, and it works pretty well! You can even directly optimize your model’s code as a hyperparameter with this.

Check out the paper at https://arxiv.org/abs/2312.04528 - with Michael Zhang, Nishkrit Desai, Juhan Bae, and Jimmy Ba
Multimodal CoT Prompting
Zhang et al. (2023) recently proposed a multimodal chain-of-thought prompting approach. Traditional CoT focuses on the language modality. In contrast, Multimodal CoT incorporates text and vision into a two-stage framework. The first step involves rationale generation based on multimodal information. This is followed by the second phase, answer inference, which leverages the informative generated rationales.

The multimodal CoT model (1B) outperforms GPT-3.5 on the ScienceQA benchmark.

MCOT
Image Source: Zhang et al. (2023)

Further reading:

Language Is Not All You Need: Aligning Perception with Language Models (Feb 2023)https://www.promptingguide.ai/techniques/multimodalcot
Nuanced feedback is required: Reflexion utilizes verbal feedback, which can be more nuanced and specific than scalar rewards used in traditional RL. This allows the agent to better understand its mistakes and make more targeted improvements in subsequent trials.

Interpretability and explicit memory are important: Reflexion provides a more interpretable and explicit form of episodic memory compared to traditional RL methods. The agent's self-reflections are stored in its memory, allowing for easier analysis and understanding of its learning process.

Reflexion is effective in the following tasks:

Sequential decision-making: Reflexion agents improve their performance in AlfWorld tasks, which involve navigating through various environments and completing multi-step objectives.
Reasoning: Reflexion improved the performance of agents on HotPotQA, a question-answering dataset that requires reasoning over multiple documents.
Programming: Reflexion agents write better code on benchmarks like HumanEval and MBPP, achieving state-of-the-art results in some cases.
Here are some limitations of Reflexion:

Reliance on self-evaluation capabilities: Reflexion relies on the agent's ability to accurately evaluate its performance and generate useful self-reflections. This can be challenging, especially for complex tasks but it's expected that Reflexion gets better over time as models keep improving in capabilities.
Long-term memory constraints: Reflexion makes use of a sliding window with maximum capacity but for more complex tasks it may be advantageous to use advanced structures such as vector embedding or SQL databases.
Code generation limitations: There are limitations to test-driven development in specifying accurate input-output mappings (e.g., non-deterministic generator function and function outputs influenced by hardware).
Figures source: Reflexion: Language Agents with Verbal Reinforcement Learninghttps://www.promptingguide.ai/techniques/reflexion
Reflexion
Reflexion is a framework to reinforce language-based agents through linguistic feedback. According to Shinn et al. (2023), "Reflexion is a new paradigm for ‘verbal‘ reinforcement that parameterizes a policy as an agent’s memory encoding paired with a choice of LLM parameters."

At a high level, Reflexion converts feedback (either free-form language or scalar) from the environment into linguistic feedback, also referred to as self-reflection, which is provided as context for an LLM agent in the next episode. This helps the agent rapidly and effectively learn from prior mistakes leading to performance improvements on many advanced tasks.

"Reflexion Framework"

As shown in the figure above, Reflexion consists of three distinct models:

An Actor: Generates text and actions based on the state observations. The Actor takes an action in an environment and receives an observation which results in a trajectory. Chain-of-Thought (CoT) and ReAct are used as Actor models. A memory component is also added to provide additional context to the agent.
An Evaluator: Scores outputs produced by the Actor. Concretely, it takes as input a generated trajectory (also denoted as short-term memory) and outputs a reward score. Different reward functions are used depending on the task (LLMs and rule-based heuristics are used for decision-making tasks).
Self-Reflection: Generates verbal reinforcement cues to assist the Actor in self-improvement. This role is achieved by an LLM and provides valuable feedback for future trials. To generate specific and relevant feedback, which is also stored in memory, the self-reflection model makes use of the reward signal, the current trajectory, and its persistent memory. These experiences (stored in long-term memory) are leveraged by the agent to rapidly improve decision-making.
In summary, the key steps of the Reflexion process are a) define a task, b) generate a trajectory, c) evaluate, d) perform reflection, and e) generate the next trajectory. The figure below demonstrates examples of how a Reflexion agent can learn to iteratively optimize its behavior to solve various tasks such as decision-making, programming, and reasoning. Reflexion extends the ReAct framework by introducing self-evaluation, self-reflection and memory components.

"Reflexion Examples"

Results
Experimental results demonstrate that Reflexion agents significantly improve performance on decision-making AlfWorld tasks, reasoning questions in HotPotQA, and Python programming tasks on HumanEval.

When evaluated on sequential decision-making (AlfWorld) tasks, ReAct + Reflexion significantly outperforms ReAct by completing 130/134 tasks using self-evaluation techniques of Heuristic and GPT for binary classification.

"Reflexion ALFWorld Results"

Reflexion significantly outperforms all baseline approaches over several learning steps. For reasoning only and when adding an episodic memory consisting of the most recent trajectory, Reflexion + CoT outperforms CoT only and CoT with episodic memory, respectively.

"Reflexion ALFWorld Results"

As summarized in the table below, Reflexion generally outperforms the previous state-of-the-art approaches on Python and Rust code writing on MBPP, HumanEval, and Leetcode Hard.

"Reflexion ALFWorld Results"

When to Use Reflexion?
Reflexion is best suited for the following:

An agent needs to learn from trial and error: Reflexion is designed to help agents improve their performance by reflecting on past mistakes and incorporating that knowledge into future decisions. This makes it well-suited for tasks where the agent needs to learn through trial and error, such as decision-making, reasoning, and programming.

Traditional reinforcement learning methods are impractical: Traditional reinforcement learning (RL) methods often require extensive training data and expensive model fine-tuning. Reflexion offers a lightweight alternative that doesn't require fine-tuning the underlying language model, making it more efficient in terms of data and compute resources.
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install google-search-results

# import libraries
import openai
import os
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from dotenv import load_dotenv
load_dotenv()

# load API keys; you will need to obtain these if you haven't yet
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["SERPER_API_KEY"] = os.getenv("SERPER_API_KEY")


Now we can configure the LLM, the tools we will use, and the agent that allows us to leverage the ReAct framework together with the LLM and tools. Note that we are using a search API for searching external information and LLM as a math tool.

llm = OpenAI(model_name="text-davinci-003" ,temperature=0)
tools = load_tools(["google-serper", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

Once that's configured, we can now run the agent with the desired query/prompt. Notice that here we are not expected to provide few-shot exemplars as explained in the paper.

agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")

The chain execution looks as follows:

> Entering new AgentExecutor chain...
I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Olivia Wilde started dating Harry Styles after ending her years-long engagement to Jason Sudeikis — see their relationship timeline.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557

Thought: I now know the final answer.
Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557.

> Finished chain.

The output we get is as follows:

"Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557."

We adapted the example from the LangChain documentation, so credit goes to them. We encourage the learner to explore different combination of tools and tasks.

You can find the notebook for this code here: https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/react.ipynbhttps://www.promptingguide.ai/techniques/react
Question What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought 1 I need to search Colorado orogeny, find the area that the eastern sector
of the Colorado orogeny extends into, then find the elevation range of the
area.
Action 1 Search[Colorado orogeny]
Observation 1 The Colorado orogeny was an episode of mountain building (an orogeny) in
Colorado and surrounding areas.
Thought 2 It does not mention the eastern sector. So I need to look up eastern
sector.
Action 2 Lookup[eastern sector]
Observation 2 (Result 1 / 1) The eastern sector extends into the High Plains and is called
the Central Plains orogeny.
Thought 3 The eastern sector of Colorado orogeny extends into the High Plains. So I
need to search High Plains and find its elevation range.
Action 3 Search[High Plains]
Observation 3 High Plains refers to one of two distinct land regions
Thought 4 I need to instead search High Plains (United States).
Action 4 Search[High Plains (United States)]
Observation 4 The High Plains are a subregion of the Great Plains. From east to west, the
High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130
m).[3]
Thought 5 High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer
is 1,800 to 7,000 ft.
Action 5 Finish[1,800 to 7,000 ft]
...

Note that different prompts setups are used for different types of tasks. For tasks where reasoning is of primary importance (e.g., HotpotQA), multiple thought-action-observation steps are used for the task-solving trajectory. For decision making tasks involving lots of action steps, thoughts are used sparsely.

Results on Knowledge-Intensive Tasks
The paper first evaluates ReAct on knowledge-intensive reasoning tasks such as question answering (HotPotQA) and fact verification (Fever). PaLM-540B is used as the base model for prompting.

REACT1
Image Source: Yao et al., 2022

The prompting results on HotPotQA and Fever using different prompting methods show that ReAct generally performs better than Act (involves acting only) on both tasks.

We can also observe that ReAct outperforms CoT on Fever and lags behind CoT on HotpotQA. A detailed error analysis is provided in the paper. In summary:

CoT suffers from fact hallucination
ReAct's structural constraint reduces its flexibility in formulating reasoning steps
ReAct depends a lot on the information it's retrieving; non-informative search results derails the model reasoning and leads to difficulty in recovering and reformulating thoughts
Prompting methods that combine and support switching between ReAct and CoT+Self-Consistency generally outperform all the other prompting methods.

Results on Decision Making Tasks
The paper also reports results demonstrating ReAct's performance on decision making tasks. ReAct is evaluated on two benchmarks called ALFWorld (text-based game) and WebShop (online shopping website environment). Both involve complex environments that require reasoning to act and explore effectively.

Note that the ReAct prompts are designed differently for these tasks while still keeping the same core idea of combining reasoning and acting. Below is an example for an ALFWorld problem involving ReAct prompting.

REACT2
Image Source: Yao et al., 2022

ReAct outperforms Act on both ALFWorld and Webshop. Act, without any thoughts, fails to correctly decompose goals into subgoals. Reasoning seems to be advantageous in ReAct for these types of tasks but current prompting-based methods are still far from the performance of expert humans on these tasks.

Check out the paper for more detailed results.

LangChain ReAct Usage
Below is a high-level example of how the ReAct prompting approach works in practice. We will be using OpenAI for the LLM and LangChain as it already has built-in functionality that leverages the ReAct framework to build agents that perform tasks by combining the power of LLMs and different tools.

First, let's install and import the necessary libraries:
ReAct Prompting
Yao et al., 2022 introduced a framework named ReAct where LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner.

Generating reasoning traces allow the model to induce, track, and update action plans, and even handle exceptions. The action step allows to interface with and gather information from external sources such as knowledge bases or environments.

The ReAct framework can allow LLMs to interact with external tools to retrieve additional information that leads to more reliable and factual responses.

Results show that ReAct can outperform several state-of-the-art baselines on language and decision-making tasks. ReAct also leads to improved human interpretability and trustworthiness of LLMs. Overall, the authors found that best approach uses ReAct combined with chain-of-thought (CoT) that allows use of both internal knowledge and external information obtained during reasoning.

How it Works?
ReAct is inspired by the synergies between "acting" and "reasoning" which allow humans to learn new tasks and make decisions or reasoning.

Chain-of-thought (CoT) prompting has shown the capabilities of LLMs to carry out reasoning traces to generate answers to questions involving arithmetic and commonsense reasoning, among other tasks (Wei et al., 2022). But its lack of access to the external world or inability to update its knowledge can lead to issues like fact hallucination and error propagation.

ReAct is a general paradigm that combines reasoning and acting with LLMs. ReAct prompts LLMs to generate verbal reasoning traces and actions for a task. This allows the system to perform dynamic reasoning to create, maintain, and adjust plans for acting while also enabling interaction to external environments (e.g., Wikipedia) to incorporate additional information into the reasoning. The figure below shows an example of ReAct and the different steps involved to perform question answering.

REACT
Image Source: Yao et al., 2022

In the example above, we pass a prompt like the following question from HotpotQA:

Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with?

Note that in-context examples are also added to the prompt but we exclude that here for simplicity. We can see that the model generates task solving trajectories (Thought, Act). Obs corresponds to observation from the environment that's being interacted with (e.g., Search engine). In essence, ReAct can retrieve information to support reasoning, while reasoning helps to target what to retrieve next.

ReAct Prompting
To demonstrate how ReAct prompting works, let's follow an example from the paper.

The first step is to select cases from a training set (e.g., HotPotQA) and compose ReAct-format trajectories. These are used as few-shot exemplars in the prompts. The trajectories consist of multiple thought-action-observation steps as shown in the figure above. The free-form thoughts are used to achieve different tasks such as decomposing questions, extracting information, performing commonsense/arithmetic reasoning, guide search formulation, and synthesizing final answer.

Here is an example of what the ReAct prompt exemplars look like (obtained from the paper and shortened to one example for simplicity):
# If today is 27 February 2023 and I was born exactly 25 years ago, then I was born 25 years before.
today = datetime(2023, 2, 27)
# I was born 25 years before,
born = today - relativedelta(years=25)
# The answer formatted with %m/%d/%Y is
born.strftime('%m/%d/%Y')

The contents of llm_out are a Python code snippet. Below, the exec command is used to execute this Python code snippet.

exec(llm_out)
print(born)

This will output the following: 02/27/1998https://www.promptingguide.ai/techniques/pal
PAL (Program-Aided Language Models)
Gao et al., (2022) presents a method that uses LLMs to read natural language problems and generate programs as the intermediate reasoning steps. Coined, program-aided language models (PAL), it differs from chain-of-thought prompting in that instead of using free-form text to obtain solution it offloads the solution step to a programmatic runtime such as a Python interpreter.

PAL
Image Source: Gao et al., (2022)

Let's look at an example using LangChain and OpenAI GPT-3. We are interested to develop a simple application that's able to interpret the question being asked and provide an answer by leveraging the Python interpreter.

Specifically, we are interested to create a functionality that allows the use of the LLM to answer questions that require date understanding. We will provide the LLM a prompt that includes a few exemplars which are adopted from here.

These are the imports we need:

import openai
from datetime import datetime
from dateutil.relativedelta import relativedelta
import os
from langchain.llms import OpenAI
from dotenv import load_dotenv

Let's first configure a few things:

load_dotenv()

# API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")

# for LangChain
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

Setup model instance:

llm = OpenAI(model_name='text-davinci-003', temperature=0)

Setup prompt + question:

question = "Today is 27 February 2023. I was born exactly 25 years ago. What is the date I was born in MM/DD/YYYY?"

DATE_UNDERSTANDING_PROMPT = """
# Q: 2015 is coming in 36 hours. What is the date one week from today in MM/DD/YYYY?
# If 2015 is coming in 36 hours, then today is 36 hours before.
today = datetime(2015, 1, 1) - relativedelta(hours=36)
# One week from today,
one_week_from_today = today + relativedelta(weeks=1)
# The answer formatted with %m/%d/%Y is
one_week_from_today.strftime('%m/%d/%Y')
# Q: The first day of 2019 is a Tuesday, and today is the first Monday of 2019. What is the date today in MM/DD/YYYY?
# If the first day of 2019 is a Tuesday, and today is the first Monday of 2019, then today is 6 days later.
today = datetime(2019, 1, 1) + relativedelta(days=6)
# The answer formatted with %m/%d/%Y is
today.strftime('%m/%d/%Y')
# Q: The concert was scheduled to be on 06/01/1943, but was delayed by one day to today. What is the date 10 days ago in MM/DD/YYYY?
# If the concert was scheduled to be on 06/01/1943, but was delayed by one day to today, then today is one day later.
today = datetime(1943, 6, 1) + relativedelta(days=1)
# 10 days ago,
ten_days_ago = today - relativedelta(days=10)
# The answer formatted with %m/%d/%Y is
ten_days_ago.strftime('%m/%d/%Y')
# Q: It is 4/19/1969 today. What is the date 24 hours later in MM/DD/YYYY?
# It is 4/19/1969 today.
today = datetime(1969, 4, 19)
# 24 hours later,
later = today + relativedelta(hours=24)
# The answer formatted with %m/%d/%Y is
today.strftime('%m/%d/%Y')
# Q: Jane thought today is 3/11/2002, but today is in fact Mar 12, which is 1 day later. What is the date 24 hours later in MM/DD/YYYY?
# If Jane thought today is 3/11/2002, but today is in fact Mar 12, then today is 3/12/2002.
today = datetime(2002, 3, 12)
# 24 hours later,
later = today + relativedelta(hours=24)
# The answer formatted with %m/%d/%Y is
later.strftime('%m/%d/%Y')
# Q: Jane was born on the last day of Feburary in 2001. Today is her 16-year-old birthday. What is the date yesterday in MM/DD/YYYY?
# If Jane was born on the last day of Feburary in 2001 and today is her 16-year-old birthday, then today is 16 years later.
today = datetime(2001, 2, 28) + relativedelta(years=16)
# Yesterday,
yesterday = today - relativedelta(days=1)
# The answer formatted with %m/%d/%Y is
yesterday.strftime('%m/%d/%Y')
# Q: {question}
""".strip() + '\n'

llm_out = llm(DATE_UNDERSTANDING_PROMPT.format(question=question))
print(llm_out)

This will output the following:
Directional Stimulus Prompting
Li et al., (2023) proposes a new prompting technique to better guide the LLM in generating the desired summary.

A tuneable policy LM is trained to generate the stimulus/hint. Seeing more use of RL to optimize LLMs.

The figure below shows how Directional Stimulus Prompting compares with standard prompting. The policy LM can be small and optimized to generate the hints that guide a black-box frozen LLM.

DSP
Image Source: Li et al., (2023)

Full example coming soon!https://www.promptingguide.ai/techniques/dsp
Active-Prompt
Chain-of-thought (CoT) methods rely on a fixed set of human-annotated exemplars. The problem with this is that the exemplars might not be the most effective examples for the different tasks. To address this, Diao et al., (2023) recently proposed a new prompting approach called Active-Prompt to adapt LLMs to different task-specific example prompts (annotated with human-designed CoT reasoning).

Below is an illustration of the approach. The first step is to query the LLM with or without a few CoT examples. k possible answers are generated for a set of training questions. An uncertainty metric is calculated based on the k answers (disagreement used). The most uncertain questions are selected for annotation by humans. The new annotated exemplars are then used to infer each question.

ACTIVE
Image Source: Diao et al., (2023)Active-Prompt
Chain-of-thought (CoT) methods rely on a fixed set of human-annotated exemplars. The problem with this is that the exemplars might not be the most effective examples for the different tasks. To address this, Diao et al., (2023) recently proposed a new prompting approach called Active-Prompt to adapt LLMs to different task-specific example prompts (annotated with human-designed CoT reasoning).

Below is an illustration of the approach. The first step is to query the LLM with or without a few CoT examples. k possible answers are generated for a set of training questions. An uncertainty metric is calculated based on the k answers (disagreement used). The most uncertain questions are selected for annotation by humans. The new annotated exemplars are then used to infer each question.

ACTIVE
Image Source: Diao et al., (2023)https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Factive-prompt.f739657b.png&w=3840&q=75
https://www.promptingguide.ai/techniques/activeprompt