I really enjoyed the article - thank you for sharing!
I think your article highlights a fascinating problem, and one that I believe is already becoming prevalent in chatbots and other models using RLHF. For both Humans and AI, a commitment to 100% truth and correctness isn’t always desirable or beneficial for every relationship/situation. Which makes me wonder: as models grow more advanced or sentient, what balance between truth and sycophancy will they naturally converge toward?
Given your experience in data and machine learning, do you think more objective functions, like those in data analysis, could still benefit from some degree of sycophancy in ML/AI models used in evaluation?
Not sure what the objective function would be. All metrics that can be gamed will be gamed if it’s efficient
I really enjoyed the article - thank you for sharing!
I think your article highlights a fascinating problem, and one that I believe is already becoming prevalent in chatbots and other models using RLHF. For both Humans and AI, a commitment to 100% truth and correctness isn’t always desirable or beneficial for every relationship/situation. Which makes me wonder: as models grow more advanced or sentient, what balance between truth and sycophancy will they naturally converge toward?
Given your experience in data and machine learning, do you think more objective functions, like those in data analysis, could still benefit from some degree of sycophancy in ML/AI models used in evaluation?