21 Comments
User's avatar
mdt's avatar

Nit: "ChatGPT4-o Deep Research" should be just "ChatGPT Deep Research". DR is powered by a special version of OpenAI's o3 model, not 4o. OpenAI is not doing anyone any favors with their naming scheme :)

Expand full comment
VivaLaPanda's avatar

Glad to see Elicit doing so well here, but we gotta get that A- into an A+!

Expand full comment
Andrew Timm's avatar

If they’re already helpful to you prompting like this, it might be worth exploring prompt engineering a bit harder to squeeze out more gains.

I’m unsure how this translates across domains, but I get meaningfully better statistics/causal inference/ML results with a prompt template like this: https://x.com/buccocapital/status/1890745551995424987

Expand full comment
Mo Nastri's avatar

I checked out your tweet and was sufficiently convinced of how good it is that I think it's a shame most people won't see it because most (understandably) won't click on links:

"People were curious, so here's how I'm using Deep Research. I'll walk through the prompting and then an example:

1. First, I used O1 Pro to build me a prompt for Deep Research to do Deep Research on Deep Research prompting. It read all the blogs and literature on best practices and gave me a thorough report.

2. Then I asked for this to be turned into a prompt template for Deep Research. I've added it below. This routinely creates 3-5 page prompts that are generating 60-100 page, very thorough reports

3. Now when I use O1 Pro to write prompts, I'll write all my thoughts out and ask it to turn it into a prompt using the best practices below:

______

Please build a prompt using the following guidelines:

Define the Objective:

- Clearly state the main research question or task.

- Specify the desired outcome (e.g., detailed analysis, comparison, recommendations).

Gather Context and Background:

- Include all relevant background information, definitions, and data.

- Specify any boundaries (e.g., scope, timeframes, geographic limits).

Use Specific and Clear Language:

- Provide precise wording and define key terms.

- Avoid vague or ambiguous language.

Provide Step-by-Step Guidance:

- Break the task into sequential steps or sub-tasks.

- Organize instructions using bullet points or numbered lists.

Specify the Desired Output Format:

- Describe how the final answer should be organized (e.g., report format, headings, bullet points, citations).

Include any specific formatting requirements.

Balance Detail with Flexibility:

- Offer sufficient detail to guide the response while allowing room for creative elaboration.

- Avoid over-constraining the prompt to enable exploration of relevant nuances.

Incorporate Iterative Refinement:

- Build in a process to test the prompt and refine it based on initial outputs.

- Allow for follow-up instructions to adjust or expand the response as needed.

Apply Proven Techniques:

- Use methods such as chain-of-thought prompting (e.g., “think step by step”) for complex tasks.

- Encourage the AI to break down problems into intermediate reasoning steps.

Set a Role or Perspective:

- Assign a specific role (e.g., “act as a market analyst” or “assume the perspective of a historian”) to tailor the tone and depth of the analysis.

Avoid Overloading the Prompt:

- Focus on one primary objective or break multiple questions into separate parts.

- Prevent overwhelming the prompt with too many distinct questions.

Request Justification and References:

- Instruct the AI to support its claims with evidence or to reference sources where possible.

- Enhance the credibility and verifiability of the response.

Review and Edit Thoroughly:

- Ensure the final prompt is clear, logically organized, and complete.

- Remove any ambiguous or redundant instructions."

Expand full comment
Alex Lawsen's avatar

My prompt generator Claude project (https://lawsen.substack.com/i/158504802/prompt-generator) came up with the prompt at the bottom of this comment based on your article.

Gemini produced this when given that exact prompt: https://g.co/gemini/share/446012241158

Claude scores the new report better across the board than the Gemini-produced report you linked to when given your post as a rubric. Hopefully it helps with the lit review, and in any case I'd be interested to know if that matched your impression upon further reading.

_________________________________________________________________________

**Goal**: Create a comprehensive research synthesis on disease prodromes across multiple medical disciplines, focusing on high-quality academic sources.

**Output format**: A structured report with categorized findings (by disease type), including timeframes, progression rates, and citation links to primary research.

**Warnings**: Prioritize peer-reviewed literature over general medical websites. Look beyond conventional terminology ("prodrome") to include preclinical, subclinical, and precursor conditions that match the conceptual definition.

**Additional Context**:

I need a comprehensive analysis of chronic or progressive diseases known to have prodromes or preclinical phases. By "prodrome," I mean any set of symptoms, biomarkers, or clinical findings that precede formal disease diagnosis by at least one year.

Please:

1. Examine diseases across multiple medical specialties (neurology, psychiatry, oncology, cardiology, rheumatology, endocrinology, etc.), not limited to conditions where the term "prodrome" is commonly used.

2. For cancers, include precancerous conditions and cellular abnormalities that increase risk of progression to malignancy.

3. For each condition identified, provide:

- Specific prodromal symptoms, biomarkers, or clinical findings

- Typical timeframe between prodromal manifestations and formal diagnosis (in years when possible)

- Quantitative data on progression rates (what percentage of people with the prodrome develop the full disease)

- Current clinical approaches to monitoring or intervention during the prodromal phase

4. Only use high-quality academic sources such as peer-reviewed journal articles, systematic reviews, or clinical guidelines from major medical organizations.

5. Prioritize sources with longitudinal data or cohort studies that track progression from prodromal to diagnosed states.

6. Include relevant molecular or pathophysiological mechanisms when available, especially those that might represent intervention targets.

This research will help identify underexplored areas where early detection and intervention might prevent or delay disease progression.

Expand full comment
Matt Bamberger's avatar

I've been using this for a little while now, and I've found it consistently produces better results than my handcrafted prompts.

Expand full comment
SOMEONE's avatar

Also I find Gemini to be nearly as good as ChatGPT (possibly better at synthesis), a potential confounder might be the fact that Gemini Deep Research was switched to Gemini 2 in mid March (https://gemini.google.com/updates)? When were the prompts issued?

Expand full comment
Mechanics of Aesthetics's avatar

Nice, thanks for sharing your experience here. Elicit recently hand an interesting benchmark of these systems that might be of interest:

https://blog.elicit.com/elicit-reports-eval/

You might want to know about Undermind as well.

Expand full comment
Neil Ritter's avatar

Thanks for this! My understanding of PaperQA is that you have to point it at a corpus of literature whereas with ChatGPT Deep Research you do not. Is this the case, and if so, what corpus did you point PaperQA at?

Expand full comment
Ivan Rogoz's avatar

What about Grok DeepSearch ?

Expand full comment
Sarah Constantin's avatar

haven't tried yet

Expand full comment
Steeven's avatar

My personal vibe winner is perplexity. A winner for answering “what are fun things in this city”

Expand full comment
Sarah Constantin's avatar

I love perplexity for quick lookups/as a Google replacement. But its "deep research" is barely deeper than its ordinary "Pro" search.

Expand full comment
Bram Cohen's avatar

How much overlap was there in the syndromes identified? Were the larger numbers mostly supersets of the smaller numbers or were they fairly disjoint?

Expand full comment
Sarah Constantin's avatar

MS, Parkinson's, and Alzheimer's were included in all 5 reports. The rest was pretty diverse; the next most common examples were rheumatoid arthritis and schizophrenia, at 3/5 reports. The reports with more examples were *not* mostly supersets of the reports with fewer.

Expand full comment
River's avatar

Also, for Deep Research, were you on the $20/month or the $200/month plan?

Expand full comment
Sarah Constantin's avatar

$20/month

Expand full comment
River's avatar

Is the 40 limit because Google or whoever is hosting the AI only wants to give you so much compute? Or is it because the models would fail in some way if allowed to go longer?

Expand full comment
Sarah Constantin's avatar

probably the former. i can't get them to keep searching longer.

Expand full comment
Noah Rahman's avatar

Isn’t prodrome also used for herpes/shingles type latent viral diseases?

Expand full comment