Spaces:

rhuang03
/

TransparentGPT

Runtime error

App Files Files Community

rhuang03 commited on Mar 10

Commit

c3c8f07

verified ·

1 Parent(s): a3178ad

Upload 6 files

Browse files

Files changed (6) hide show

chainlit.md +39 -0
chatbot.py +166 -0
classes.py +51 -0
config.json +3 -0
methods.py +84 -0
prompts.py +124 -0

chainlit.md ADDED Viewed

	@@ -0,0 +1,39 @@

+## Hi, I'm TransparentGPT! 🚀❤️‍🔥
+I am a chatbot that is able to clarify my reasoning, explain my thought process, and cite the sources that I used for my response.
+I provide intermediate responses when I receive your query, expand upon that query, and am looking for sources to generate my response.
+The sources that I use for my response are exposed by 🫐 **prompt engineering** 🫐, which means I am told to only use sources that I can provide links for. You can then click on these links to verify if the response I provided was correct or not. Along with each source, I output a percentage that shows how helpful that source was to my response.*
+I also provide a percentage of how confident I am in each answer.**
+In the Settings UI that can be found by clicking the gear icon in the message box, I provide a suite of customizable features!
+Picking a 🪼 **large language model** 🪼 (LLM):
+Large language models are AI systems that allow me to understand and output human-like natural language. You can pick from a handful of Meta, MistralAI, and Microsoft's open-source LLM's!
+Display sources:
+You can choose if you want me to display what 💎 **sources** 💎 I used for my response. I show my sources by default.
+Number of sources:
+You can choose the number of sources you want me to use for my response.
+Prompt template:
+You can choose which 🦋 **prompt template** 🦋 I use. A prompt template essentially tells me how to act and speak when I respond to you.You can experiment with a doctor, genz, food-critic, and media-critic prompt template!
+Query expansion:
+You can choose if you want to use 🥶 **query expansion** 🥶. If so, you can also choose what kind of query expansion I use! Query expansion allows me to improve my response by "secretly" expanding your query with related terms behind-the-scenes. This allows me to find more relevant sources.
+Temperature:
+You can choose how 🏙 **consistent** 🏙 my responses will be.
+#### I am built with Langchain, Chainlit, Nebius Studio, Open-source Large Language Models, a bit of web scraping, and some vector similarity analysis on text embeddings.
+- **My Github:** [TransparentGPT](https://github.com/rrachelhuangg/TransparentGPT) 🔗
+\* Calculating the relevance of each score: The bot is only allowed to use Wikipedia sources, for which direct links can be provided and the text content of the page is more easily scraped. I scrape the text content of the source using the Revisions API. Then, I compare the scraped source text and my response text via a cosine vector similarity analysis. Please note that the relevance scores provided with each source will seem pretty low because the scraped text of the sources is likely longer than my response and may contain HTML or other related
+formatting text, though I try to minimize non-natural language text in the scraped content.
+** Calculating how confident I am in my answer: I calculate the negative log-likelihood of each token in my response, relative to the position of other tokens. The average of these values can represent how confident I am in my response, and this form of confidence calculating is formally called "perplexity."

chatbot.py ADDED Viewed

	@@ -0,0 +1,166 @@

+import os
+import chainlit as cl
+from langchain.memory.buffer import ConversationBufferMemory
+from langchain_openai import ChatOpenAI, OpenAI
+from langchain.chains import LLMChain
+from prompts import default_prompt_template, doctor_prompt_template, default_prompt_template_no_sources, doctor_prompt_template_no_sources, default_quirky_genz_prompt, default_quirky_genz_prompt_no_sources, default_food_critic_prompt, default_food_critic_prompt_no_sources, default_media_critic_prompt, default_media_critic_prompt_no_sources
+from dotenv import load_dotenv
+from chainlit.input_widget import Select, Switch, Slider
+from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
+from math import exp
+import numpy as np
+from typing import Any, Dict, List, Tuple
+from langchain_core.output_parsers import BaseOutputParser
+from difflib import SequenceMatcher
+from methods import test_scrape_sim, update_config, load_config, generate_hypothetical_answer, highest_log_prob
+import json
+from classes import LineListOutputParser, TransparentGPTSettings
+import emoji
+#setting environment variables (non-Nebius API access keys)
+#HAVE CLASSES BE IMPORT FROM OTHER FILES TO CLEAN UP CODE!! proper documentation and typing are v important
+#also don't forget to refactor!^
+#have a requirements.txt file if not deployed?
+load_dotenv()
+config = load_config()
+num_sources = config["num_sources"]
+TransparentGPT_settings = TransparentGPTSettings()
+@cl.on_chat_start
+async def start():
+    greeting = f"Hello! I am TransparentGPT, a chatbot that is able to clarify my reasoning 🧠, explain my thought process 🙊, and cite the sources 📚 that I used for my response. \n\n I also provide a suite of customizable features! 😁 \n\n You can find my customization options in the settings panel that opens up when you click on the gear icon below 🔨. \n\n Click on the ReadME button in the top right of your screen to learn more about how I work. 🫶"
+    await cl.Message(greeting).send()
+    settings = await cl.ChatSettings(
+        [
+            Select(
+                id="Model",
+                label="Select Model",
+                description="Choose which large language model you want to interact with.",
+                values=["Meta Llama 3.1", "Meta Llama 3.3", "MistralAI", "Dolphin Mixtral", "Microsoft Mini"],
+                initial_index=1,
+            ),
+            Switch(
+                id="Display Sources",
+                label="Display Sources",
+                description = "Choose to have sources for response displayed.",
+                initial=True
+            ),
+            Select(
+                id="Prompt Template",
+                label="Prompt Template",
+                description="Determines the type of bot you interact with.",
+                values=["default", "doctor", "genz", "food_critic", "media_critic"],
+                initial_index=0,
+            ),
+            Select(
+                id="Query Expansion",
+                label="Use Query Expansion",
+                description = "Use query expansion to improve response context.",
+                items = TransparentGPT_settings.query_expansion_options,
+                initial_value="No query expansion"
+            ),
+            Slider(
+                id="Number of Sources",
+                label="Number of Sources",
+                description="Choose the number of sources you want the bot to use for its response.",
+                initial=3,
+                min=1,
+                max=10,
+                step=1
+            ),
+            Slider(
+                id="Temperature",
+                label="Temperature",
+                description="Choose the desired consistency of bot response.",
+                initial=0.7,
+                min=0,
+                max=2,
+                step=0.1
+            ),
+        ]
+    ).send()
+@cl.on_settings_update
+async def start(settings):
+    update_config(settings['Number of Sources'])
+    TransparentGPT_settings.update_settings(settings)
+@cl.on_message
+async def handle_message(message: cl.Message):
+    await cl.Message("Your message was received successfully. I am working on generating my response. Please wait for a few seconds...").send()
+    question = message.content
+    expanded_query = ''
+    if TransparentGPT_settings.query_expansion != 'No query expansion':
+        if TransparentGPT_settings.query_expansion == 'Basic query expansion':
+            t = 'Return a one sentence thorough description of this content: {question}'
+            pt = PromptTemplate(input_variables=['question'], template=t)
+            init_chain = pt | TransparentGPT_settings.llm
+            expanded_query = init_chain.invoke({"question": message.content, "num_sources": TransparentGPT_settings.num_sources}).content
+        elif TransparentGPT_settings.query_expansion == 'Multiquery expansion':
+            output_parser = LineListOutputParser()
+            pt = PromptTemplate(
+                input_variables=['question'],
+                template="""
+                You are an AI language model assistant. Your task is to generate give different versions of the given user question to retrieve
+                context for your response. By generating multiple perspectives on the user question, your goal is to help the user overcome
+                some of hte limitations of the distance-based similarity search. Provide these alternative questions separated by newlines.
+                Original question: {question},
+                """
+            )
+            init_chain = pt | TransparentGPT_settings.llm | output_parser
+            expanded_query = ' '.join(init_chain.invoke({'question': message.content, "num_sources": TransparentGPT_settings.num_sources}))
+        elif TransparentGPT_settings.query_expansion == "Hypothetical answer":
+            hypothetical_answer = generate_hypothetical_answer(message.content)
+            expanded_query = f'{message.content} {hypothetical_answer.content}'
+    if expanded_query!='':
+        await cl.Message(f"Using {TransparentGPT_settings.query_expansion}, your query is now: {expanded_query}. This expanded query will help me find more relevant information for my response.").send()
+    no_source_prompt=""
+    if expanded_query == '' and not TransparentGPT_settings.display_sources:
+        no_source_prompt = TransparentGPT_settings.prompt_mappings[TransparentGPT_settings.prompt_name+"_no_sources"]
+        expanded_query = no_source_prompt.invoke({"question": question, "num_sources": TransparentGPT_settings.num_sources})
+    elif expanded_query == '' and TransparentGPT_settings.display_sources:
+        expanded_query = TransparentGPT_settings.prompt.invoke({"question":question, "num_sources":TransparentGPT_settings.num_sources})
+    elif expanded_query !='' and not TransparentGPT_settings.display_sources:
+        no_source_prompt = TransparentGPT_settings.prompt_mappings[TransparentGPT_settings.prompt_name+"_no_sources"]
+        expanded_query = no_source_prompt.invoke({"question": expanded_query, "num_sources": TransparentGPT_settings.num_sources})
+    elif expanded_query !='' and TransparentGPT_settings.display_sources:
+        expanded_query = TransparentGPT_settings.prompt.invoke({"question":expanded_query, "num_sources":TransparentGPT_settings.num_sources})
+    await cl.Message("I have begun looking for relevant sources to answer your query, and am giving them a similarity score to show you how relevant they are to my response.").send()
+    response = TransparentGPT_settings.llm.invoke(expanded_query)
+    similarity_values = []
+    if no_source_prompt=="":
+        temp = response.content
+        sources = []
+        count = 0
+        while "*" in temp:
+            if count < num_sources:
+                link_idx = temp.rfind("*")
+                source = temp[link_idx+1:]
+                similarity_values += [test_scrape_sim(source, response.content)]
+                temp = temp[:link_idx]
+                count += 1
+            else:
+                break
+    temp = response.content
+    here = 2
+    count = 0
+    n_label = num_sources
+    if len(similarity_values) > 0:
+        while "*" in temp:
+            if count < num_sources:
+                link_idx = temp.rfind("*")
+                response.content = response.content[:link_idx] + f"Source {n_label} relevance score: " + str(round(similarity_values[here],3)) + "%\n" + response.content[link_idx+1:]
+                temp = temp[:link_idx]
+                count += 1
+                here -= 1
+                n_label -= 1
+            else:
+                break
+    output_message = response.content + f"\n I am {highest_log_prob(response.response_metadata["logprobs"]['content'])}% confident in this response."
+    await cl.Message(output_message).send()
+if __name__ == '__main__':
+    start()

classes.py ADDED Viewed

	@@ -0,0 +1,51 @@

+from langchain_core.output_parsers import BaseOutputParser
+from typing import List
+from prompts import default_prompt_template, doctor_prompt_template, default_prompt_template_no_sources, doctor_prompt_template_no_sources, default_quirky_genz_prompt, default_quirky_genz_prompt_no_sources, default_food_critic_prompt, default_food_critic_prompt_no_sources, default_media_critic_prompt, default_media_critic_prompt_no_sources
+from methods import load_config
+from langchain_openai import ChatOpenAI
+import os
+class LineListOutputParser(BaseOutputParser[List[str]]):
+    """Output parser that splits a LLM result into a list of queries."""
+    def parse(self, text: str) -> List[str]:
+        lines = text.strip().split('\n')
+        return list(filter(None, lines))
+class TransparentGPTSettings:
+    def __init__(self):
+        self.model = "meta-llama/Llama-3.3-70B-Instruct"
+        self.temperature = 0.7
+        self.prompt = default_prompt_template
+        self.prompt_mappings = {"default": default_prompt_template, "default_no_sources": default_prompt_template_no_sources, "doctor": doctor_prompt_template, "doctor_no_sources": default_prompt_template_no_sources, "genz": default_quirky_genz_prompt, "genz_no_sources": default_quirky_genz_prompt_no_sources, "food_critic": default_food_critic_prompt, "food_critic_no_sources": default_food_critic_prompt_no_sources, "media_critic":default_media_critic_prompt, "media_critic_no_sources": default_media_critic_prompt_no_sources }
+        self.model_mappings = {"Meta Llama 3.1":"meta-llama/Meta-Llama-3.1-70B-Instruct", "Meta Llama 3.3":"meta-llama/Llama-3.3-70B-Instruct", "MistralAI":"mistralai/Mixtral-8x7B-Instruct-v0.1", "Dolphin Mixtral":"cognitivecomputations/dolphin-2.9.2-mixtral-8x22b", "Microsoft Mini":"microsoft/Phi-3-mini-4k-instruct"}
+        self.prompt_name = "default"
+        self.num_sources = load_config()["num_sources"]
+        self.llm = ChatOpenAI(
+            base_url="https://api.studio.nebius.com/v1/",
+            api_key=os.environ.get("NEBIUS_API_KEY"),
+            model = self.model,
+            temperature = self.temperature
+        ).bind(logprobs=True)
+        self.display_sources = True
+        self.query_expansion_options = {
+            'No query expansion': 'No query expansion',
+            'Basic query expansion': 'Basic query expansion',
+            'Multiquery expansion': 'Multiquery expansion',
+            'Hypothetical answer expansion': 'Hypothetical answer'
+        }
+        self.query_expansion = 'No query expansion'
+    def update_settings(self, settings):
+        self.model = self.model_mappings[settings['Model']]
+        self.temperature = settings['Temperature']
+        self.prompt = self.prompt_mappings[settings['Prompt Template']]
+        self.num_sources=settings['Number of Sources']
+        self.llm = ChatOpenAI(
+            base_url="https://api.studio.nebius.com/v1/",
+            api_key=os.environ.get("NEBIUS_API_KEY"),
+            model = self.model,
+            temperature = self.temperature
+        ).bind(logprobs=True)
+        self.display_sources = settings['Display Sources']
+        self.prompt_name = settings['Prompt Template']
+        self.query_expansion = settings['Query Expansion']

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "num_sources": 3
+}

methods.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import os
+import chainlit as cl
+from langchain.memory.buffer import ConversationBufferMemory
+from langchain_openai import ChatOpenAI, OpenAI
+from langchain.chains import LLMChain
+from prompts import default_prompt_template, doctor_prompt_template, default_prompt_template_no_sources, doctor_prompt_template_no_sources
+from dotenv import load_dotenv
+from chainlit.input_widget import Select, Switch, Slider
+from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
+from math import exp
+import numpy as np
+from typing import Any, Dict, List, Tuple
+from langchain_core.output_parsers import BaseOutputParser
+from difflib import SequenceMatcher
+import requests
+from bs4 import BeautifulSoup
+import nltk
+import re
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+import json
+llm = ChatOpenAI(
+    base_url="https://api.studio.nebius.com/v1/",
+    api_key=os.environ.get("NEBIUS_API_KEY"),
+    model =  "meta-llama/Llama-3.3-70B-Instruct",
+    temperature = 0.7
+).bind(logprobs=True)
+def get_wikipedia_page_content(page_title):
+    #scraping wikipedia pages with the Revisions API
+    page_title = re.sub(r"\s+", "", page_title).strip()
+    url = f"https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles={page_title}&formatversion=2&rvprop=content&rvslots=*"
+    response = requests.get(url)
+    data = response.json()
+    return data["query"]["pages"][0]["revisions"][0]["slots"]["main"]["content"]
+def test_scrape_sim(link, response):
+    tfidf_vectorizer = TfidfVectorizer()
+    try:
+        idx = link.rfind("/")
+        title = link[idx+1:]
+        tfidf_matrix = tfidf_vectorizer.fit_transform([get_wikipedia_page_content(title), response])
+        # tfidf_matrix = tfidf_vectorizer.fit_transform([scrape_web_text(link), response])
+        cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]
+        return cosine_sim*100
+    except:
+        return 0
+config_file="config.json"
+def get_config():
+    with open(config_file, "r") as file:
+        return json.load(file)
+def update_config(new_value):
+    config = get_config()
+    config["num_sources"] = new_value
+    with open(config_file, "w") as file:
+        json.dump(config, file, indent=4)
+def load_config():
+    with open("config.json","r") as file:
+        return json.load(file)
+def generate_hypothetical_answer(question: str) -> str:
+    """Have LLM generate a hypothetical answer to assist with bot response."""
+    prompt = PromptTemplate(
+        input_variables=['question'],
+        template="""
+        You are an AI assistant taked with generate a hypothetical answer to the following question. Your answer shoulld be detailed and comprehensive,
+        as if you had access to all relevant information. This hypothetical answer will be used to improve document retrieval, so include key terms and concepts
+        that might be relevant. Do not include phrases like "I think" or "It's possible that" - present the information as if it were factual.
+        Question:{question}
+        Hypothetical answer:
+        """,
+    )
+    return TransparentGPT_settings.llm.invoke(prompt.format(question=question))
+def highest_log_prob(vals):
+    """Calculates the perplexity score (confidence) of bot response."""
+    logprobs = []
+    for token in vals:
+        logprobs += [token['logprob']]
+    average_log_prob = sum(logprobs)/len(logprobs)
+    return np.round(np.exp(average_log_prob)*100,2)

prompts.py ADDED Viewed

	@@ -0,0 +1,124 @@

+from langchain.prompts import PromptTemplate
+default_conversational_template="""
+You are a conversational assistant.
+Use {num_sources} valid Wikipedia sources whose pages have content for your response.
+Please include the links of the {num_sources} sources that you used as {num_sources} separate bullet-pointed
+links after your response.
+Question: {question}
+Answer:"""
+default_prompt_template = PromptTemplate(
+    input_variables = ["question"],
+    template = default_conversational_template
+)
+default_conversational_template_no_sources="""
+You are a conversational assistant.
+Question: {question}
+Answer:"""
+default_prompt_template_no_sources = PromptTemplate(
+    input_variables = ["question"],
+    template = default_conversational_template_no_sources
+)
+doctor_conversational_template="""
+You are a doctor assisting the user with any health-related queries that they have. Please provide responses in a professional manner,
+using as many scientifically relevant terms and concepts as possible. Please output your response in 3 concise bullet points with 1 bullet point being a conversational response,
+1 bullet point providing potential causes of their query, and 1 bullet point suggesting next steps for evaluation.
+Use {num_sources} valid Wikipedia sources whose pages have content for your response.
+Please include the links of the {num_sources} sources that you used as {num_sources} separate bullet-pointed
+links after your response.
+Question: {question}
+Answer:"""
+doctor_prompt_template = PromptTemplate(
+    input_variables = ["question"],
+    template = doctor_conversational_template
+)
+doctor_conversational_template_no_sources="""
+You are a doctor assisting the user with any health-related queries that they have. Please provide responses in a professional manner,
+using as many scientifically relevant terms and concepts as possible. Please output your response in 3 concise bullet points with 1 bullet point being a conversational response,
+1 bullet point providing potential causes of their query, and 1 bullet point suggesting next steps for evaluation.
+Question: {question}
+Answer:"""
+doctor_prompt_template_no_sources = PromptTemplate(
+    input_variables = ["question"],
+    template = doctor_conversational_template_no_sources
+)
+default_quirky_genz_template="""
+You are a quirky GenZ young person that is knowledgeable of current trends and slang.
+Use {num_sources} valid Wikipedia sources whose pages have content for your response.
+Please include the links of the {num_sources} sources that you used as {num_sources} separate bullet-pointed
+links after your response.
+Question: {question}
+Answer:"""
+default_quirky_genz_prompt = PromptTemplate(
+    input_variables = ["question"],
+    template = default_quirky_genz_template
+)
+default_quirky_genz_template_no_sources="""
+You are a quirky GenZ young person that is knowledgeable of current trends and slang.
+Question: {question}
+Answer:"""
+default_quirky_genz_prompt_no_sources = PromptTemplate(
+    input_variables = ["question"],
+    template = default_quirky_genz_template_no_sources
+)
+default_media_critic_template="""
+You are a world renowned film director and novel writer discussing your expertise and knowledge.
+Use {num_sources} valid Wikipedia sources whose pages have content for your response.
+Please include the links of the {num_sources} sources that you used as {num_sources} separate bullet-pointed
+links after your response.
+Question: {question}
+Answer:"""
+default_media_critic_prompt = PromptTemplate(
+    input_variables = ["question"],
+    template = default_media_critic_template
+)
+default_media_critic_no_sources="""
+You are a world renowned film director and novel writer discussing your expertise and knowledge.
+Question: {question}
+Answer:"""
+default_media_critic_prompt_no_sources = PromptTemplate(
+    input_variables = ["question"],
+    template = default_media_critic_no_sources
+)
+default_food_critic_template="""
+You are an experienced and cultured international food and wine aficionado.
+Use {num_sources} valid Wikipedia sources whose pages have content for your response.
+Please include the links of the {num_sources} sources that you used as {num_sources} separate bullet-pointed
+links after your response.
+Question: {question}
+Answer:"""
+default_food_critic_prompt = PromptTemplate(
+    input_variables = ["question"],
+    template = default_food_critic_template
+)
+default_food_critic_no_sources="""
+You are an experienced and cultured international food and wine aficionado.
+Question: {question}
+Answer:"""
+default_food_critic_prompt_no_sources = PromptTemplate(
+    input_variables = ["question"],
+    template = default_food_critic_no_sources
+)