HF-hub

Share and discover more about AI with social posts from the community.huggingface/OpenAi

22:36 · Sep 3, 2024 · Tue

🚨 𝗛𝘂𝗺𝗮𝗻 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 𝗳𝗼𝗿 𝗔𝗜 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴: 𝗡𝗼𝘁 𝘁𝗵𝗲 𝗴𝗼𝗹𝗱𝗲𝗻 𝗴𝗼𝗼𝘀𝗲 𝘄𝗲 𝘁𝗵𝗼𝘂𝗴𝗵𝘁?

I’ve just read a great paper where Cohere researchers raises significant questions about using Human feedback to evaluate AI language models.

Human feedback is often regarded as the gold standard for judging AI performance, but it turns out, it might be more like fool's gold : the study reveals that our human judgments are easily swayed by factors that have nothing to do with actual AI performance.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
🧠 Test several models: Llama-2, Falcon-40B, Cohere Command 6 and 52B 🙅‍♂️ Refusing to answer tanks AI ratings more than getting facts wrong. We apparently prefer a wrong answer to no answer!

💪 Confidence is key (even when it shouldn't be): More assertive AI responses are seen as more factual, even when they're not. This could be pushing AI development in the wrong direction, with systems like RLHF.

🎭 The assertiveness trap: As AI responses get more confident-sounding, non-expert annotators become less likely to notice when they're wrong or inconsistent.

And a consequence of the above:
🔄 𝗥𝗟𝗛𝗙 𝗺𝗶𝗴𝗵𝘁 𝗯𝗮𝗰𝗸𝗳𝗶𝗿𝗲: Using human feedback to train AI (Reinforcement Learning from Human Feedback) could accidentally make AI more overconfident and less accurate.

This paper means we need to think carefully about how we evaluate and train AI systems to ensure we're rewarding correctness over apparences of it like confident talk.

⛔️ Chatbot Arena’s ELO leaderboard, based on crowdsourced answers from average joes like you and me, might become completely irrelevant as models will become smarter and smarter.

Read the paper 👉
Human Feedback is not Gold Standard (2309.16349)https://huggingface.co/papers/2309.16349

huggingface.co

Paper page - Human Feedback is not Gold Standard

Join the discussion on this paper page

22:36 · Sep 3, 2024 · Tue

Hyperfast Contextual Custom LLM with Agents, Multitokens, Explainable AI, and Distillation https://mltblog.com/4dNPSnB

New additions to this ground-breaking system include multi-token distillation when processing prompts, agents to meet user intent, more NLP, and a command prompt menu accepting both standard prompts and various actions.

I also added several illustrations, featuring xLLM in action with a full session and sample commands to fine-tune in real-time. All the code, input sources (anonymized corporate corpus from fortune 100 company), contextual backend tables including embeddings, are on GitHub. My system has zero weight, no transformer, and no neural network. It relies on explainable AI, does not require training, is fully reproducible, and fits in memory. Yet your prompts can retrieve relevant full text entities from the corpus with no latency — including URLs, categories, titles, email addresses, and so on — thanks to well-designed architecture.

Read more, get the code, paper and everything for free, at https://mltblog.com/4dNPSnB
…

Machine Learning Techniques - Machine Learning, Artificial Intelligence, Experimental Math

Hyperfast Contextual Custom LLM with Agents, Multitokens, Explainable AI, and Distillation - Machine Learning Techniques

For the most recent article discussing the xLLM web API, follow this link. I discuss version 2.0 of my enterprise multi-LLM system called xLLM. Version 1.0 was presented in my recent article entitled "Custom Enterprise LLM/RAG with Real-Time Fine-Tuning"…

22:34 · Sep 3, 2024 · Tue

Zero-shot VQA evaluation of Docmatix using LLM - do we need to fine-tune?
While developing Docmatix, we found that fine-tuning Florence-2 performed well on the DocVQA task, but still scored low on the benchmark. To improve the benchmark score, we had to further fine-tune the model on the DocVQA dataset to learn the grammatical style of the benchmark. Interestingly, the human evaluators felt that the additional fine-tuning seemed to perform worse than fine-tuning on Docmatix alone, so we decided to only use the additional fine-tuned model for ablation experiments and publicly release the model fine-tuned on Docmatix alone. Although the answers generated by the model are semantically consistent with the reference answers (as shown in Figure 1), the benchmark scores are low. This raises the question: should we fine-tune the model to improve performance on existing metrics, or should we develop new metrics that are more consistent with human perception?

01:52 · Sep 3, 2024 · Tue

https://web-check.xyz/check/https%3A%2F%2Fredirect-checker.girff.com%2F

01:51 · Sep 3, 2024 · Tue

https://web-check.xyz/check/https%3A%2F%2Fwww.saasinfopro.com%2F

21:30 · Sep 2, 2024 · Mon

📅 AI Event Scheduler - Streamline event creation with this AI Chrome extension, saving time and reducing manual errors.

21:29 · Sep 2, 2024 · Mon

📚 Cokeep - Transform bookmarks into collaborative spaces with AI organization, summarization, and team sharing capabilities.

21:29 · Sep 2, 2024 · Mon

🎨 Crayon AI - Unleash creativity with an all-in-one AI image toolbox, with generation, editing, and optimization for all skill levels.

21:29 · Sep 2, 2024 · Mon

🖥 Tailwind Genie - Generate responsive UI designs with AI, streamlining web development using Tailwind CSS.

21:29 · Sep 2, 2024 · Mon

🤗 Video Ai Hug - Transform static photos into personalized hugging videos, bringing cherished moments to life.

21:29 · Sep 2, 2024 · Mon

📝 Postin - Supercharge your LinkedIn presence with AI-crafted posts, smart management, and engagement-boosting strategies.

21:29 · Sep 2, 2024 · Mon

📊 Metastory AI v2.2 - Enhance project management with this v2.2 update from Metastory AI that now has Jira integration, project publishing, and an improved editor for streamlined collaboration.

21:29 · Sep 2, 2024 · Mon

🔎 Beloga - Intelligently capture and seamlessly search across Notion, GDrive, notes, the internet and more simultaneously with a digital brain that’s designed to help amplify your knowledge.

21:29 · Sep 2, 2024 · Mon

Sick of feeling like a broken record, endlessly repeating instructions?

It’s time to let AI do the talking. Meet Guidde - your GPT-powered ally that transforms even the most complex tasks into crystal-clear, AI-generated video documentation at lightning speed.

Seamlessly share or embed your guides anywhere, hassle-free

Say goodbye to dry documentation and hello to beautiful guides

Reclaim precious time generating documentation 11x faster with AI

Best of all, it only takes 3 steps:

Install the free guidde Chrome extension

Click ‘Capture’ in the extension and ‘Stop’ when done

Sit back and let AI handle the rest, then share your guide

21:29 · Sep 2, 2024 · Mon

🚔 AI Police Cams - Between July and August, AI cameras used in two UK counties detected over 2,000 people not wearing seat belts on three roads, including 109 children. One case involved an unrestrained toddler sitting on a woman's lap in the front passenger seat. Not only are AI-powered cameras being used for seat belts, they’re also being used to catch litterers.

21:29 · Sep 2, 2024 · Mon

🧠 Qwen - New updates have been made to Qwen’s AI models across multiple modalities. Qwen2-VL is a new vision-language model capable of understanding high-resolution images and 20+ minute videos; Qwen2-Audio is for processing voice inputs; and Qwen-Agent, is an approach to expand 8K context models to handle 1M tokens.

21:29 · Sep 2, 2024 · Mon

📹 Wyze - A new AI-powered search feature from Wyze allows users to search through their camera footage using keywords and natural language queries. Instead of manually scrolling through recorded events, users can now search for specific objects, people, or activities like "truck," "delivery person," or even more detailed requests like "show me my cat in the backyard."

21:03 · Sep 2, 2024 · Mon

Celebrating huggingface's acquisition of huggingface.com at a high price.

10:50 · Sep 1, 2024 · Sun

sequelbox
posted an update
2 days ago
Post
499

new synthetic general chat dataset! meet Supernova, a dataset using prompts from UltraFeedback and responses from Llama 3.1 405b Instruct:
sequelbox/Supernova

new model(s) using the Supernova dataset will follow next week, along with Other Things. (One of these will be a newly updated version of Enigma, utilizing the next version of
sequelbox/Tachibana
with approximately 2x the rows!)

10:50 · Sep 1, 2024 · Sun

just published a demo for Salesforce's new Function Calling Model Salesforce/xLAM

-
Tonic/Salesforce-Xlam-7b-r

-
Tonic/On-Device-Function-Calling

just try em out, and it comes with on-deviceversion too ! cool ! 🚀

Before

After