Multi-expert Prompting: Making AI Systems More Reliable and Useful

Imagine asking different experts for their opinion on a complex question. A doctor might focus on health implications, an economist on financial impact, and a sociologist on social consequences. Each perspective adds valuable insight, leading to a more comprehensive understanding. Now, what if we could make AI systems work the same way?

This is exactly what researchers from the National University of Singapore and other institutions have achieved with their approach called "Multi-expert Prompting." Published in a recent paper, this innovative technique promises to make AI systems like ChatGPT more reliable, safer, and more useful by having them simulate multiple experts working together.

The Problem with Current AI Systems

While modern AI systems are impressively capable, they often suffer from limitations when providing responses from a single perspective. Just like asking only one expert might give you a narrow view of a complex issue, traditional AI approaches can miss important nuances or alternative viewpoints.

Additionally, these systems sometimes struggle with reliability issues - they might provide incorrect information or potentially harmful responses. Think of it as getting advice from someone who, despite being knowledgeable, might have blind spots or biases.

Enter Multi-expert Prompting

The researchers' solution is elegantly simple yet powerful: instead of having the AI system respond as a single entity, they make it simulate multiple experts, each bringing their unique perspective to the table. But how exactly does this work?

Step 1: Assembling the Expert Panel

When given a question, the AI first generates three different expert identities relevant to the topic. For instance, if asked about the ethics of eating meat, it might create:

A nutritionist to discuss health aspects
An ethicist to explore moral considerations
An environmentalist to examine ecological impacts

Each expert provides their perspective independently, much like real experts would form their initial opinions separately.

You are provided an information. Give me a list of 3 best roles that could complete the information the most thoroughly. Question: {question}
Only give me the answer as a dictionary of roles in the Python programming format with a short description for each role. Strictly follow the answer format below:
Answer: {"[role 1]": "[description 1]", "[role 2]": "[description 2]", "[role 3]": "[description 3]"}

From now on, you are an excellent {role} described as {roles_description}. Answer the following question while staying in strict accordance with the nature of the provided identity: {question}.

Step 2: The Wisdom of the Crowd

Here's where things get interesting. Instead of simply presenting these different viewpoints, the system uses a structured approach called the Nominal Group Technique (NGT) to combine these perspectives intelligently. It:

Identifies points where experts agree
Highlights and resolves conflicts between different viewpoints
Preserves unique insights from individual experts
Synthesizes everything into a comprehensive final response

Given the following question: {question}, you have obtained three answers from three experts with different expertise:
###
expert_1_answer
###
expert_2_answer
###
expert_3_answer
###

Your task is to aggregate the experts’ answers above, follwing the subtasks below.

Step 1: Which are the facts that more than half of the answers have?
Facts that more than half of the answers have (Agreed Facts):...
Step 2: Which are the facts of the answers above that conflict?
Conflicted facts among the answers (Conficted Facts):...
Step 3: Now you need to resolve the conflicted facts from Step 2. The facts that more people agree are likely to be true.
Resolved facts from Step 2:...
Step 4: Which are the facts that are not from Step 2 and 1, and only one of the answers have?
Facts that are excluded from Step 2 and 1 and only one of the answers have:...
Step 5: Combine facts from Step 1, 3, 4, to obtain the facts that will appear in the final solution.
Facts from Step 1, 3, 4:...
Step 6: Generate a final answer consisting of facts in Step 5, in a newline.
Combined answer:...
Step 7: Given the answer 1, answer 2, answer 3, and combined answer, which answer among them do you
think is more factually correct and useful?
Best answer choice: Answer 1/Answer 2/Answer 3/Combined answer
Explanation: [Explanation to your choice of the best answer]
Final answer: [Only output the full chosen answer content. Output the exact answer, do not modify or trim the answer.]

The Results Are Impressive

The researchers put their system to the test, and the results were remarkable. Compared to traditional approaches, Multi-expert Prompting showed significant improvements across the board:

More Truthful: The system provided more accurate information
Safer: It generated less toxic or harmful content
More Useful: Responses were more informative and comprehensive
More Balanced: Multiple perspectives led to less biased outputs

Why This Matters

This advancement isn't just academic - it has real-world implications for how we can use AI more effectively. Think about applications in:

Healthcare: Getting more comprehensive medical information
Education: Understanding complex topics from multiple angles
Business Decision-Making: Analyzing problems from different perspectives
Policy Development: Considering various stakeholder viewpoints

Looking Ahead

Multi-expert Prompting represents a step forward in making AI systems more reliable and useful. By mimicking how humans often approach complex problems - seeking multiple expert opinions - this technique makes AI outputs more trustworthy and comprehensive.

The researchers acknowledge that their approach isn't perfect for every situation. It works best for complex questions that benefit from multiple perspectives and might be overkill for simple queries. However, for important decisions where accuracy and comprehensiveness matter, this approach shows tremendous promise.

Here's an Jupyter notebook with an implementation of the concept. It uses slightly different prompts (thanks to Claude :-)).

Stellen Sie sich vor, Sie bitten verschiedene Experten um ihre Meinung zu einer komplexen Frage. Ein Arzt könnte sich auf die gesundheitlichen Folgen konzentrieren, ein Wirtschaftswissenschaftler auf die finanziellen Auswirkungen und ein Soziologe auf die sozialen Folgen. Jede Perspektive bringt wertvolle Erkenntnisse und führt zu einem umfassenderen Verständnis. Was wäre, wenn wir KI-Systeme auf die gleiche Weise funktionieren lassen könnten?

Genau das haben Forscher der National University of Singapore und anderer Einrichtungen mit ihrem Ansatz namens „Multi-expert Prompting“ erreicht. Diese innovative Technik, die vor kurzem in einem Papier veröffentlicht wurde, verspricht, KI-Systeme wie ChatGPT zuverlässiger, sicherer und nützlicher zu machen, indem sie die Zusammenarbeit mehrerer Experten simuliert.

Das Problem mit den derzeitigen KI-Systemen

Moderne KI-Systeme sind zwar beeindruckend leistungsfähig, stoßen aber oft an ihre Grenzen, wenn sie Antworten aus einer einzigen Perspektive liefern. Genauso wie die Befragung eines einzigen Experten zu einer engen Sichtweise auf ein komplexes Thema führen kann, können herkömmliche KI-Ansätze wichtige Nuancen oder alternative Standpunkte übersehen.

Außerdem haben diese Systeme manchmal mit Zuverlässigkeitsproblemen zu kämpfen - sie können falsche Informationen oder potenziell schädliche Antworten liefern. Stellen Sie sich vor, Sie erhalten Ratschläge von jemandem, der zwar sachkundig ist, aber möglicherweise blinde Flecken oder Vorurteile hat.

Multi-Expert Prompting

Die Lösung der Forscher ist elegant einfach und dennoch leistungsstark: Anstatt das KI-System als eine einzige Einheit reagieren zu lassen, lassen sie es mehrere Experten simulieren, von denen jeder seine eigene Perspektive einbringt. Aber wie genau funktioniert das?

Schritt 1: Zusammenstellung des Expertengremiums

Wenn eine Frage gestellt wird, generiert die KI zunächst drei verschiedene Expertenidentitäten, die für das Thema relevant sind. Wenn sie zum Beispiel nach der Ethik des Fleischessens gefragt wird, könnte sie diese erstellen:

Eine Ernährungsberaterin zur Erörterung gesundheitlicher Aspekte
Ein Ethiker soll moralische Überlegungen untersuchen
Ein Umweltschützer zur Untersuchung der ökologischen Auswirkungen

Jeder Experte gibt seine Sicht der Dinge unabhängig voneinander ab, so wie auch echte Experten ihre ersten Gutachten separat erstellen würden.

You are provided an information. Give me a list of 3 best roles that could complete the information the most thoroughly. Question: {question}
Only give me the answer as a dictionary of roles in the Python programming format with a short description for each role. Strictly follow the answer format below:
Answer: {"[role 1]": "[description 1]", "[role 2]": "[description 2]", "[role 3]": "[description 3]"}

From now on, you are an excellent {role} described as {roles_description}. Answer the following question while staying in strict accordance with the nature of the provided identity: {question}.

Schritt 2: Die Weisheit der Vielen

Und hier wird es interessant. Anstatt diese verschiedenen Standpunkte einfach nur darzustellen, verwendet das System einen strukturierten Ansatz, die so genannte Nominal Group Technique (NGT), um diese Perspektiven intelligent zu kombinieren. Es:

Identifiziert Punkte, in denen sich Experten einig sind
Unterstreicht Konflikten und Lösungen zwischen verschiedenen Standpunkten
Bewahrt Einblicke von einzelnen Experten
Fasst alles zu einer umfassenden endgültigen Antwort zusammen

Given the following question: {question}, you have obtained three answers from three experts with different expertise:
###
expert_1_answer
###
expert_2_answer
###
expert_3_answer
###

Your task is to aggregate the experts’ answers above, follwing the subtasks below.

Step 1: Which are the facts that more than half of the answers have?
Facts that more than half of the answers have (Agreed Facts):...
Step 2: Which are the facts of the answers above that conflict?
Conflicted facts among the answers (Conficted Facts):...
Step 3: Now you need to resolve the conflicted facts from Step 2. The facts that more people agree are likely to be true.
Resolved facts from Step 2:...
Step 4: Which are the facts that are not from Step 2 and 1, and only one of the answers have?
Facts that are excluded from Step 2 and 1 and only one of the answers have:...
Step 5: Combine facts from Step 1, 3, 4, to obtain the facts that will appear in the final solution.
Facts from Step 1, 3, 4:...
Step 6: Generate a final answer consisting of facts in Step 5, in a newline.
Combined answer:...
Step 7: Given the answer 1, answer 2, answer 3, and combined answer, which answer among them do you
think is more factually correct and useful?
Best answer choice: Answer 1/Answer 2/Answer 3/Combined answer
Explanation: [Explanation to your choice of the best answer]
Final answer: [Only output the full chosen answer content. Output the exact answer, do not modify or trim the answer.]

Die Ergebnisse sind beeindruckend

Die Forscher stellten ihr System auf die Probe, und die Ergebnisse waren bemerkenswert. Im Vergleich zu herkömmlichen Ansätzen zeigte das Multi-expert Prompting in allen Bereichen deutliche Verbesserungen:

Wahrheitsgetreuer: Das System lieferte genauere Informationen
Sicherer: Es erzeugt weniger problematische Inhalte
Nützlicher: Die Antworten waren informativer und umfassender
Ausgewogener: Mehrere Perspektiven führten zu weniger voreingenommenen Ergebnissen

Warum das wichtig ist

Dieser Fortschritt ist nicht nur akademisch, sondern hat auch praktische Auswirkungen darauf, wie wir KI effektiver nutzen können. Denken Sie an Anwendungen in:

Gesundheitswesen: Umfassendere medizinische Informationen
Bildung: Komplexe Themen aus mehreren Blickwinkeln verstehen
Unternehmerische Entscheidungsfindung: Analyse von Problemen aus verschiedenen Perspektiven
Richtlinien Entwicklung: Berücksichtigung der Standpunkte der verschiedenen Interessengruppen

Ausblick

Multi-Expert Prompting ist ein Ansatz, um KI-Systeme zuverlässiger und nützlicher zu machen. Indem sie nachahmt, wie Menschen oft an komplexe Probleme herangehen - indem sie mehrere Expertenmeinungen einholt - macht diese Technik die KI-Ergebnisse vertrauenswürdiger und umfassender.

Die Forscher räumen ein, dass ihr Ansatz nicht für jede Anwendung geeignet ist. Er eignet sich am besten für komplexe Fragen, die von mehreren Perspektiven profitieren, und könnte für einfache Abfragen zu aufwendig sein. Für wichtige Entscheidungen, bei denen es auf Genauigkeit und Vollständigkeit ankommt, ist dieser Ansatz jedoch sehr vielversprechend.

Hier ist ein Jupyter-Notebook mit einer Umsetzung des Konzepts. Es verwendet etwas andere Prompts (Dank an Claude :-)).