What is RLHF (Reinforcement Learning from Human Feedback)?
A method where humans rate AI responses, then use those ratings to train the AI to improve. ChatGPT’s success is largely attributed to RLHF — it made models evolve from ‘can talk’ to ’talks correctly’. Expensive but highly effective.