Fine-Tuning LLMs For Style Transfer

Jun 14, 2024

Why haven't you made your own fine-tune yet?

25 Comments

Jun 14, 2024

I wonder if it's more like 1994 and no one you know has a website yet. I don't think someone has made the equivalent of Geocities, which enabled me and my high school friends to make websites without having to go through all the trouble of finding a host and a domain name and whatever else. (Though we did have to learn a bit of HTML.)

Expand full comment

Chris Best

Jun 15, 2024

"We are not now that strength which in old days

Moved earth and heaven, that which we are, we are;

One equal temper of heroic hearts,

Made weak by time and fate, but strong in will

To strive, to seek, to find, and not to yield."

That's Tennyson (word for word) not Herbert https://www.poetryfoundation.org/poems/45392/ulysses

Expand full comment

Reply (2)

Sarah Constantin

Jun 28, 2024

yeah I occasionally get direct quotes from random authors, not sure why that shows up here.

Expand full comment

⚡Thalia The Comedy Muse⚡

Jun 15, 2024

That's crazy

Expand full comment

Ted Sanders

Jun 14, 2024

In addition to fine-tuned models, few-shot-prompted base models are also much better at copying style than chat models like ChatGPT. Our training of ChatGPT burns in a ton of style habits and idiosyncrasies that are hard to escape.

Expand full comment

Reply (1)

Sarah Constantin

Jun 14, 2024

I don't get much luck with gpt-3.5 or similar, do you?

Expand full comment

Reply (1)

Ted Sanders

Jun 15, 2024

Haven't played with it recently but I'd bet base 3.5 is much better than instruct 3.5 or chat 3.5. Base 3.5 isn't publicly available, though there are open-source base models of similar capability.

Expand full comment

Reply (1)

Sarah Constantin

Jun 15, 2024

Can you recommend an open-source one?

Expand full comment

Chris Lakin

Jun 14, 2024

there's also BurbeaGPT https://burbea.louis02x.com/

Expand full comment

Reply (1)

Sarah Constantin

Jun 14, 2024

I like this! yep you have achieved Non-Vanilla Style. (I don't know Burbea's work myself)

Expand full comment

nostalgebraist

Jun 20, 2024Edited

Some other examples include

- My bot (https://nostalgebraist-autoresponder.tumblr.com/)

- Makin's Drewbot (https://recordcrash.substack.com/p/golemizing-the-nachlass-friend-chatbot)

I was an early adopter of this stuff when it was much harder to do, and I don't do it anymore because I got tired of it. So I'm probably not well positioned to know why others aren't doing it.

That said, I have some ideas:

1. The Bene Gesserit bot makes OpenAI finetuning look good, because the model already knows how to do the task well (it's read the Dune series and all the commentary on it and its inspirations etc.), it just won't do it during normal prompting because of safety/persona tuning. But more "personalized" use cases, like "imitate my friend" or "imitate this one obscure webfic author," are much harder because you're trying to teach the model something new. You'll need much more data to get decent results, so there will be lots of early opportunities to say "huh, I guess this doesn't work" and give up.

2. I've heard a lot of negative feedback about the OpenAI finetuning API, and my own brief experiments with it (using some of my tumblr data) yielded low-quality results. I suspect that this feature is quite bad/limited, and only works well for certain sufficiently "easy" use cases. Note how even the Bene Gesserit bot, which I'm calling an "easy" use case, sometimes produced weird garbage.

3. If you don't want to use an API, you can always finetune an open weights model, but this is much more cumbersome (even today) and -- despite all the hype -- these models are often frustratingly bad, relative to expectations set by API models.

4. Because popular API providers are so focused on chat, and their chat models are often *very* bad at creative writing, I think people are forgetting (or never learning in the first place) that there are interesting creative applications of LLMs that break the helpful assistant framing.

EDIT: any explanation here will have to cover why finetuning is so much more popular with image generators than with LLMs. I think the difference is a mix of #4 (providers are not pushing some "non-creative" way of using of image generators) and #1 (people don't want highly unique things out of image generators, they just want to elicit specific styles or characters more intensely and robustly).

Also I think it's simply easier to teach a new concept to an image generator (as in DreamBooth) using a small amount of data, because "images reveal everything at once" in a way that's not true of text. If I finetune an LLM on a single blog post I wrote, it's not going to know about every single opinion I have. But if I finetune stable diffusion on a single picture of my face, it can see what each of my facial features looks like.

Expand full comment

Fukitol

Jun 18, 2024

There are thousands of finetunes on HuggingFace. Unfortunately most of them suck. Many in literature and "RP". Most of them made by "influencer" types with discord channels.

If I had to guess, the lack of variety is because the culture around LLMs is very "I'm into AI because I don't want to learn or do stuff for myself." Maybe someone should develop an LLM that finetunes LLMs for you. Then you could just ask the LLM for an LLM. Maybe one that does "yo dawg" memes.

Expand full comment

Julian

Jul 2, 2024

Personally, my answer to "why haven't I made my own fine-tuned-on-cool-person chatbot yet?" (apart from not knowing that it would be easy) is that I *don't * think it would be fun to read.

Like you've mentioned with GPT-2 long ago, I still find it very hard to avoid skimming LLM generated text, and this worsens the more incoherent it is - I tried to read the Bene Gesserit samples and gave up after a few lines.

Expand full comment

MissingMinus

Jun 30, 2024

ChatGPT and similar have very bad linguistic artifacts, but if you use a base model they can imitate style *far* better. They're just worse at following instructions. Unfortunately we don't really get access to the base model. I believe gwern commented that poetry, for example, was significantly better in the gpt-4-base model.

If you use other models that have less strict tuning/RLHF/whatever, they get a decent amount of instruction following while being better at style, though they still have tics. Mistral is halfway decent, I've been recently using Euryale for some RP stuff, which should extend to general writing.

Expand full comment

Nigel P. Daly

Jun 21, 2024

Hi Sarah. new reader here. I enjoyed your interesting suggestion on fine-tuning.

I haven't tried the API but if it only works with dialogs, then it seems less useful for creating writing style. This is an issue for me because I train non-English speaking professionals to improve their English writing (biz or research) with GenAI tools. Prompting with style instructions works to achieve some well-known styles, like Hemingway, but I have also noticed the style tends to drift back to baseline style.

That's why for business neutral or technical writing style I recommend to my students Google Gemini, which avoids the flowery and clause-infested style of ChatGPT. I have compared baseline writing styles of Gemini, Claude3 and ChatGPT, and most direct, concise, and readably simple is Google Gemini, which suits my students' needs as well as my own.

Any thoughts on prompting tips? I have tried to create GPTs with knowledge bases of authors or styles and have had some success, but not as consistent as I'd like.

Expand full comment

Will Newsome

Jun 20, 2024Edited

Sent FB message :D

Expand full comment

Dustin

Jun 20, 2024

It's also quite doable and effective to just get GPT-4o to make your Q&A data. It'll even format it correctly for you.

Expand full comment

Caperu_Wesperizzon

Jun 20, 2024Edited

Cool bot!

She (it’s a she, isn’t it?) seems to have trouble deciding who I am: she’s called me Paul Muad'dib (with an ASCII apostrophe), Honored Mother, Muad’Dib (with a Unicode apostrophe, the one she regularly uses, and a capital dee), Jaime and Mothra.

By the way, why can she speak languages other than English?

Expand full comment

Meng Li

Jun 20, 2024

From a usage perspective, the structure can be broken down and generated step by step without any issues. Now that Claude 3.5 has also been released, it further optimizes the bulk generation process. The problem is that many people have difficulty understanding the prompts and do not know how to ask questions. Generally, a prompt template consists of five elements: quantity, topic, details, method, and format.

Expand full comment

Kevin

Jun 16, 2024

There are entire web communities dedicated to exchanging LoRAs for image generation. I think the value proposition is just more obvious in that case.

Expand full comment

Michael Spencer

Jun 16, 2024

You probably don't remember me, but I'm still waiting for that guest post.

Expand full comment

ARX-Han

Jun 15, 2024

This is very cool and I would love to see a detailed guide on how to do this for non-technical normies. I wanted to make something like this to play with for my existing novel (and for other of my favourite novelists) and the results I'd bet would be super interesting.

You might enjoy this piece I wrote about this topic of literary substitution here: https://www.decentralizedfiction.com/p/butterflies-for-the-machine-god-fiction

Expand full comment

Rough Diamonds

Fine-Tuning LLMs For Style Transfer