Love the model! Curious what texts were used?
Thank you so much for this amazing model—truly impressive work! I’m really grateful for the effort behind it. I was wondering if you could share which texts were used to train the model, particularly regarding biblical content, as different versions can reflect different doctrines. I'd love to understand more about the perspective it's aligned with.
The Bible itself was the core material used for this ecumenical version. The dataset was created synthetically by interpreting the Bible through the lens of an ecumenical AI pastor to create question and answer turns about every section of it. The AI pastor ran off deepseek R1 with a complex Nicene Creed system prompt and a few other techniques. It's a lot to get into. I could show you on Discord.
Taking a second look at the model card I can see how it was phrased confusingly. The Bible itself is still the core material but I used different techniques to create four times as much data from the same material and then I had an AI help write the model card lol. There are too many cards to change now though.
You know what I've got the Baptist version of my data published here so you can see what it comes out like. https://huggingface.co/datasets/sleepdeprived3/Baptist-Christian-Bible-Expert
The core idea is to teach an LLM what a Christian question and answer looks like. So things like "How do I get into heaven?" are met with a real answer like John 3:16 rather than some typical AI junk like "there are many religions and beliefs blah blah"
Of course a skilled person can make a system prompt that will get any LLM close enough, but this takes it up another level.
One final note is you should still equip this with at least a basic pastor character. Even one line in your character or system prompt that it is a pastor of your denomination will be enough to get it going in the right direction.