Nit: "ChatGPT4-o Deep Research" should be just "ChatGPT Deep Research". DR is powered by a special version of OpenAI's o3 model, not 4o. OpenAI is not doing anyone any favors with their naming scheme :)
I checked out your tweet and was sufficiently convinced of how good it is that I think it's a shame most people won't see it because most (understandably) won't click on links:
"People were curious, so here's how I'm using Deep Research. I'll walk through the prompting and then an example:
1. First, I used O1 Pro to build me a prompt for Deep Research to do Deep Research on Deep Research prompting. It read all the blogs and literature on best practices and gave me a thorough report.
2. Then I asked for this to be turned into a prompt template for Deep Research. I've added it below. This routinely creates 3-5 page prompts that are generating 60-100 page, very thorough reports
3. Now when I use O1 Pro to write prompts, I'll write all my thoughts out and ask it to turn it into a prompt using the best practices below:
______
Please build a prompt using the following guidelines:
Define the Objective:
- Clearly state the main research question or task.
- Specify the desired outcome (e.g., detailed analysis, comparison, recommendations).
Gather Context and Background:
- Include all relevant background information, definitions, and data.
- Specify any boundaries (e.g., scope, timeframes, geographic limits).
Use Specific and Clear Language:
- Provide precise wording and define key terms.
- Avoid vague or ambiguous language.
Provide Step-by-Step Guidance:
- Break the task into sequential steps or sub-tasks.
- Organize instructions using bullet points or numbered lists.
Specify the Desired Output Format:
- Describe how the final answer should be organized (e.g., report format, headings, bullet points, citations).
Include any specific formatting requirements.
Balance Detail with Flexibility:
- Offer sufficient detail to guide the response while allowing room for creative elaboration.
- Avoid over-constraining the prompt to enable exploration of relevant nuances.
Incorporate Iterative Refinement:
- Build in a process to test the prompt and refine it based on initial outputs.
- Allow for follow-up instructions to adjust or expand the response as needed.
Apply Proven Techniques:
- Use methods such as chain-of-thought prompting (e.g., “think step by step”) for complex tasks.
- Encourage the AI to break down problems into intermediate reasoning steps.
Set a Role or Perspective:
- Assign a specific role (e.g., “act as a market analyst” or “assume the perspective of a historian”) to tailor the tone and depth of the analysis.
Avoid Overloading the Prompt:
- Focus on one primary objective or break multiple questions into separate parts.
- Prevent overwhelming the prompt with too many distinct questions.
Request Justification and References:
- Instruct the AI to support its claims with evidence or to reference sources where possible.
- Enhance the credibility and verifiability of the response.
Review and Edit Thoroughly:
- Ensure the final prompt is clear, logically organized, and complete.
- Remove any ambiguous or redundant instructions."
Claude scores the new report better across the board than the Gemini-produced report you linked to when given your post as a rubric. Hopefully it helps with the lit review, and in any case I'd be interested to know if that matched your impression upon further reading.
**Goal**: Create a comprehensive research synthesis on disease prodromes across multiple medical disciplines, focusing on high-quality academic sources.
**Output format**: A structured report with categorized findings (by disease type), including timeframes, progression rates, and citation links to primary research.
**Warnings**: Prioritize peer-reviewed literature over general medical websites. Look beyond conventional terminology ("prodrome") to include preclinical, subclinical, and precursor conditions that match the conceptual definition.
**Additional Context**:
I need a comprehensive analysis of chronic or progressive diseases known to have prodromes or preclinical phases. By "prodrome," I mean any set of symptoms, biomarkers, or clinical findings that precede formal disease diagnosis by at least one year.
Please:
1. Examine diseases across multiple medical specialties (neurology, psychiatry, oncology, cardiology, rheumatology, endocrinology, etc.), not limited to conditions where the term "prodrome" is commonly used.
2. For cancers, include precancerous conditions and cellular abnormalities that increase risk of progression to malignancy.
3. For each condition identified, provide:
- Specific prodromal symptoms, biomarkers, or clinical findings
- Typical timeframe between prodromal manifestations and formal diagnosis (in years when possible)
- Quantitative data on progression rates (what percentage of people with the prodrome develop the full disease)
- Current clinical approaches to monitoring or intervention during the prodromal phase
4. Only use high-quality academic sources such as peer-reviewed journal articles, systematic reviews, or clinical guidelines from major medical organizations.
5. Prioritize sources with longitudinal data or cohort studies that track progression from prodromal to diagnosed states.
6. Include relevant molecular or pathophysiological mechanisms when available, especially those that might represent intervention targets.
This research will help identify underexplored areas where early detection and intervention might prevent or delay disease progression.
Also I find Gemini to be nearly as good as ChatGPT (possibly better at synthesis), a potential confounder might be the fact that Gemini Deep Research was switched to Gemini 2 in mid March (https://gemini.google.com/updates)? When were the prompts issued?
Thanks for this! My understanding of PaperQA is that you have to point it at a corpus of literature whereas with ChatGPT Deep Research you do not. Is this the case, and if so, what corpus did you point PaperQA at?
MS, Parkinson's, and Alzheimer's were included in all 5 reports. The rest was pretty diverse; the next most common examples were rheumatoid arthritis and schizophrenia, at 3/5 reports. The reports with more examples were *not* mostly supersets of the reports with fewer.
Is the 40 limit because Google or whoever is hosting the AI only wants to give you so much compute? Or is it because the models would fail in some way if allowed to go longer?
Nit: "ChatGPT4-o Deep Research" should be just "ChatGPT Deep Research". DR is powered by a special version of OpenAI's o3 model, not 4o. OpenAI is not doing anyone any favors with their naming scheme :)
Glad to see Elicit doing so well here, but we gotta get that A- into an A+!
If they’re already helpful to you prompting like this, it might be worth exploring prompt engineering a bit harder to squeeze out more gains.
I’m unsure how this translates across domains, but I get meaningfully better statistics/causal inference/ML results with a prompt template like this: https://x.com/buccocapital/status/1890745551995424987
I checked out your tweet and was sufficiently convinced of how good it is that I think it's a shame most people won't see it because most (understandably) won't click on links:
"People were curious, so here's how I'm using Deep Research. I'll walk through the prompting and then an example:
1. First, I used O1 Pro to build me a prompt for Deep Research to do Deep Research on Deep Research prompting. It read all the blogs and literature on best practices and gave me a thorough report.
2. Then I asked for this to be turned into a prompt template for Deep Research. I've added it below. This routinely creates 3-5 page prompts that are generating 60-100 page, very thorough reports
3. Now when I use O1 Pro to write prompts, I'll write all my thoughts out and ask it to turn it into a prompt using the best practices below:
______
Please build a prompt using the following guidelines:
Define the Objective:
- Clearly state the main research question or task.
- Specify the desired outcome (e.g., detailed analysis, comparison, recommendations).
Gather Context and Background:
- Include all relevant background information, definitions, and data.
- Specify any boundaries (e.g., scope, timeframes, geographic limits).
Use Specific and Clear Language:
- Provide precise wording and define key terms.
- Avoid vague or ambiguous language.
Provide Step-by-Step Guidance:
- Break the task into sequential steps or sub-tasks.
- Organize instructions using bullet points or numbered lists.
Specify the Desired Output Format:
- Describe how the final answer should be organized (e.g., report format, headings, bullet points, citations).
Include any specific formatting requirements.
Balance Detail with Flexibility:
- Offer sufficient detail to guide the response while allowing room for creative elaboration.
- Avoid over-constraining the prompt to enable exploration of relevant nuances.
Incorporate Iterative Refinement:
- Build in a process to test the prompt and refine it based on initial outputs.
- Allow for follow-up instructions to adjust or expand the response as needed.
Apply Proven Techniques:
- Use methods such as chain-of-thought prompting (e.g., “think step by step”) for complex tasks.
- Encourage the AI to break down problems into intermediate reasoning steps.
Set a Role or Perspective:
- Assign a specific role (e.g., “act as a market analyst” or “assume the perspective of a historian”) to tailor the tone and depth of the analysis.
Avoid Overloading the Prompt:
- Focus on one primary objective or break multiple questions into separate parts.
- Prevent overwhelming the prompt with too many distinct questions.
Request Justification and References:
- Instruct the AI to support its claims with evidence or to reference sources where possible.
- Enhance the credibility and verifiability of the response.
Review and Edit Thoroughly:
- Ensure the final prompt is clear, logically organized, and complete.
- Remove any ambiguous or redundant instructions."
My prompt generator Claude project (https://lawsen.substack.com/i/158504802/prompt-generator) came up with the prompt at the bottom of this comment based on your article.
Gemini produced this when given that exact prompt: https://g.co/gemini/share/446012241158
Claude scores the new report better across the board than the Gemini-produced report you linked to when given your post as a rubric. Hopefully it helps with the lit review, and in any case I'd be interested to know if that matched your impression upon further reading.
_________________________________________________________________________
**Goal**: Create a comprehensive research synthesis on disease prodromes across multiple medical disciplines, focusing on high-quality academic sources.
**Output format**: A structured report with categorized findings (by disease type), including timeframes, progression rates, and citation links to primary research.
**Warnings**: Prioritize peer-reviewed literature over general medical websites. Look beyond conventional terminology ("prodrome") to include preclinical, subclinical, and precursor conditions that match the conceptual definition.
**Additional Context**:
I need a comprehensive analysis of chronic or progressive diseases known to have prodromes or preclinical phases. By "prodrome," I mean any set of symptoms, biomarkers, or clinical findings that precede formal disease diagnosis by at least one year.
Please:
1. Examine diseases across multiple medical specialties (neurology, psychiatry, oncology, cardiology, rheumatology, endocrinology, etc.), not limited to conditions where the term "prodrome" is commonly used.
2. For cancers, include precancerous conditions and cellular abnormalities that increase risk of progression to malignancy.
3. For each condition identified, provide:
- Specific prodromal symptoms, biomarkers, or clinical findings
- Typical timeframe between prodromal manifestations and formal diagnosis (in years when possible)
- Quantitative data on progression rates (what percentage of people with the prodrome develop the full disease)
- Current clinical approaches to monitoring or intervention during the prodromal phase
4. Only use high-quality academic sources such as peer-reviewed journal articles, systematic reviews, or clinical guidelines from major medical organizations.
5. Prioritize sources with longitudinal data or cohort studies that track progression from prodromal to diagnosed states.
6. Include relevant molecular or pathophysiological mechanisms when available, especially those that might represent intervention targets.
This research will help identify underexplored areas where early detection and intervention might prevent or delay disease progression.
I've been using this for a little while now, and I've found it consistently produces better results than my handcrafted prompts.
Also I find Gemini to be nearly as good as ChatGPT (possibly better at synthesis), a potential confounder might be the fact that Gemini Deep Research was switched to Gemini 2 in mid March (https://gemini.google.com/updates)? When were the prompts issued?
Gdoc link to the report: https://docs.google.com/document/d/1YMq_-BFipU5FpP8bW7QUdZOlXJ6x7sQa9rUUcvFGBxI/edit?usp=sharing
Nice, thanks for sharing your experience here. Elicit recently hand an interesting benchmark of these systems that might be of interest:
https://blog.elicit.com/elicit-reports-eval/
You might want to know about Undermind as well.
Thanks for this! My understanding of PaperQA is that you have to point it at a corpus of literature whereas with ChatGPT Deep Research you do not. Is this the case, and if so, what corpus did you point PaperQA at?
What about Grok DeepSearch ?
haven't tried yet
My personal vibe winner is perplexity. A winner for answering “what are fun things in this city”
I love perplexity for quick lookups/as a Google replacement. But its "deep research" is barely deeper than its ordinary "Pro" search.
How much overlap was there in the syndromes identified? Were the larger numbers mostly supersets of the smaller numbers or were they fairly disjoint?
MS, Parkinson's, and Alzheimer's were included in all 5 reports. The rest was pretty diverse; the next most common examples were rheumatoid arthritis and schizophrenia, at 3/5 reports. The reports with more examples were *not* mostly supersets of the reports with fewer.
Also, for Deep Research, were you on the $20/month or the $200/month plan?
$20/month
Is the 40 limit because Google or whoever is hosting the AI only wants to give you so much compute? Or is it because the models would fail in some way if allowed to go longer?
probably the former. i can't get them to keep searching longer.
Isn’t prodrome also used for herpes/shingles type latent viral diseases?