Why You Need a Custom Domain Chatbot: A DIY Guide Based on OpenAI’s ChatGPT
What technologies / techniques are behind it ?
Large Language Models (LLM). OpenAI created ChatGPT using Large Language Models (LLMs) as the foundation. The company was a pioneer in discovering the emerging capabilities of LLMs and improving them via prompting. However, this approach has limitations, as LLMs rely on shallow statistical next-word guessing and lack a world model. As a result, they can sometimes produce factual errors and lack common sense.
Alignment is a crucial aspect of ChatGPT’s design. It refers to the process of aligning LLM output with human preferences. This is achieved through Reinforcement Learning from Human Feedback ( RLHF ).
How does it all work together during training ?
During training, the pretrained LLM ( for example GPT-4 ) generates one or more sentences based on a prompt, and human evaluators rank the output, thus training the reward model ( a separate smaller LLM ). The Proximal Policy Optimization (PPO) algorithm then uses the reward model to train the LLM to generate more human-aligned sentences.
Did OpenAI invent any new techniques or just packaged already known stuff ?
OpenAI and DeepMind have made significant advances in integrating language models (LLMs) with reinforcement learning (RLHF), evolving large-scale LLM features, and developing the concept of alignment. The technology underlying LLMs, such as Google’s Transformers and BERT ( one of the first self-supervised LLMs ), generative AI, as well as OpenAI GPT, builds on decades of AI research, most notably the 1997 work of Hochreiter and Schmidhuber on long short-term memory (LSTM) networks. LLMs have been developed as a parallel computing alternative to LSTM, capable of executing on modern massively parallel hardware, such as GPUs and Google TPUs.
chatGPT for a Custom Domain DIY ? Why Do It
We are considering use cases similar to those pursued by Morgan Stanley with GPT-4.
Starting last year, the company began exploring how to harness its intellectual capital with GPT’s embeddings and retrieval capabilities—first GPT-3 and now GPT-4. The model will power an internal-facing chatbot that performs a comprehensive search of wealth management content and “effectively unlocks the cumulative knowledge of Morgan Stanley Wealth Management”
The current chatGPT context window limit is 4K ( 8K for GPT-4).
That means it is not possible to load more than a few pages of tax code, for example, and get quality answers. Essentially, ChatGPT and similar language models can effectively handle all traditional NLP tasks, including QA, summarization, and sentiment analysis. This approach is highly productive and much easier to achieve, as it eliminates the need for domain-specific knowledge in fields such as linguistics or finance.
LLM is Not All You Need
Memorizing facts solely based on LLM weights is suboptimal. Instead, a combination of facts retrieval and LLM training should be employed. Information retrieval can be introduced during pretraining ( RETRO ), fine-tuning or inference.
A functional custom domain QA/chatbot system will likely consist of several components, including a sufficiently sized LLM (such as pre-trained models like LlaMA², or a custom pre-trained LLM), an information retrieval module ( vector database ), and fine-tuning options (such as classic fine-tuning on a custom domain, or reinforcement learning-based approaches like RLHF¹ ).
For a custom domain with a substantial amount of text data that cannot be fully labeled, it is important to decide where and how to integrate it. A potential scenario could involve using a generic foundational LLM and fine-tuning it on a percentage of the labeled custom domain data. Information retrieval and the LLM could be used for the remaining unlabeled data during inference.
Is It Needed and Possible ( Technically and Economically ) To DIY ?
chatGPT is a proprietary API. You may not want to expose your proprietary data to the OpenAI API and perhaps implicitly to the general public, as well as to avoid Open AI imposed limitations².
While it is an expensive and complex product
Gourley estimates that even a project that involved several times as much training would cost a few million dollars—affordable to a well-funded startup or large technology company. “It’s a magical breakthrough,” Gourley says of the fine-tuning that OpenAI did with ChatGPT. “But it’s not something that isn’t going to be replicated.”There are open source efforts to create alternatives.
We highly recommend creating an OpenAI account and familiarizing yourself with the OpenAI API using generic data before starting a custom domain chatGPT replication project as it contains a wealth of relevant information.
chatGPT in a nuthsell
GPT is autoregressive transformer decoder. Accessing GPT-2 model ( GPT-3 is essentially GPT-2 with 175 billion parameters ) is as easy as:
!pip install transformers
from transformers import GPT2Model, GPT2Config
config = GPT2Config(
vocab_size=50257,
n_positions=1024,
n_ctx=1024,
n_embd=4096,
n_layer=8,
n_head=32,
resid_pdrop=0.1,
embd_pdrop=0.1,
attn_pdrop=0.1,
layer_norm_epsilon=1e-5,
initializer_range=0.02
)
model = GPT2Model(config=config)
The next step is to fine tune GPT2 output using RLHF via TRL.
!pip install trl
from transformers import AutoTokenizer
from trl import PPOTrainer, PPOConfig, AutoModelForCausalLMWithValueHead, create_reference_model
from trl.core import respond_to_batch
# get models
model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
model_ref = create_reference_model(model)
tokenizer = AutoTokenizer.from_pretrained('gpt2')
# initialize trainer
ppo_config = PPOConfig(
batch_size=1,
)
# encode a query
query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt")
# get model response
response_tensor = respond_to_batch(model_ref, query_tensor)
# create a ppo trainer
ppo_trainer = PPOTrainer(ppo_config, model, model_ref, tokenizer)
# define a reward for response
# (this could be any reward such as human feedback or output from another model)
reward = [torch.tensor(1.0)]
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
# train model for one step with ppo
train_stats = ppo_trainer.step([query_tensor[0]], [response_tensor[0]], reward)
¹ John Schulman ( OpenAI ): RLHF is purely optimizing on human approval .. we don’t always know the right answer and we’re probably wrong about lots of things .. we’re just optimizing for what sounds convincing and what sounds right.
² OpenAI API could still come in useful in prototyping stage when deciding on the size and type of the model. LlaMA is the first highly performant LLM that has sparked an avalanche of projects built upon its foundation. LlaMA has a non-commercial use licence.