Based on Vicuna v0 13b and trained via reinforced learning from human feedback (RLHF), StableVicuna is a robust open-source chatbot. It is an upgraded, further-tuned version of the LLaMA 13b model. The chatbot can aid with language, write code, and perform simple math. OpenAssistant Conversations Dataset, GPT4All Prompt Generations, and Alpaca are a combination of three datasets that are utilized in the three-stage RLHF workflow. The RLHF preference datasets OASST1, HH-RLHF, and Stanford Human Preferences are used to train the reward model. In order to obtain StableVicuna, RLHF training employs Proximal Policy Optimization.
For access to StableVicuna-13B on HuggingFace Hub, you must download the weight delta and have the original LLaMA model available. Also being previewed is a soon-to-be released chatbot interface that is nearing completion.
We are proud to present StableVicuna, the first large-scale open source chatbot trained via reinforced learning from human feedback (RLHF). StableVicuna is a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, which is an instruction fine tuned LLaMA 13b model.