Feedback after some use

#1
by AlecFoster - opened

Hello, I've made heavy use of your Pantheon-RP-1.8-24b-Small-3.1 release and while it isn't perfect it is by far the best one I can run at that size, so I was quite excited for this release, but sadly I can't say the results have been similarly impressive...

I haven't made that extensive use of this new release yet, but honestly the results I have gotten haven't made me that excited to keep using it. From my testing it tends to forget information even just one message in the past and generally seems to handle context very poorly, e.g. character puts on necklace and then in the very next message it has forgotten about it completely. I've also had pretty poor success in getting the model to focus on what's important in the message, e.g. you tell it extra background information of the scene or other details and then you ask it a question in the same message and instead of answering the question with added detail, it just ignores the question completely.

I've also had more serious repetition issues with this model, during some scenarios it would get completely stuck just regenerating an exact copy of the previous message word for word over and over again. While Pantheon-RP-1.8-24b-Small-3.1 also has some repetition issues especially at longer contexts it's never been this severe. The longest I've gone with this release is around 4K tokens, while with Pantheon-RP-1.8-24b-Small-3.1 I've easily gone up to 20K without any major issues or at least without issues that can't be worked around.

On a positive note the speed is great... I haven't tried messing around with how many experts are active yet, I stuck with the default 8. Temperature wise I've tried 0.55 - 0.8, for me 0.8 had pretty big issues, e.g. forgetting to add some words or adding dots were they shouldn't be, so simply put sometimes sentences weren't really correct. I might test tweaking more of the settings tomorrow, but for now I've gone back to having fun with Pantheon-RP-1.8-24b-Small-3.1.

I'm left wondering if the issues are due to the MoE architecture or the removal of reasoning or if the model is just too small... I did do some very simple short RP testing with the base instruct version of the Qwen3 model and it seemed to work okay but it's creative writing is pretty poor (e.g. it's not as diverse, goes for the same scenarios over and over again and tends to reason way too much).

Either way, keep up the fantastic work I really enjoy what you've done with Pantheon-RP-1.8-24b-Small-3.1 and with Wayfarer-Large-70B-Llama-3.3. I hope you can make something great out of Qwen3 as well. Thank you for all the joy you've brought me with your models, I always smile like crazy when the models take my scenarios to places I never imagined.

Some wonderful feedback there and indeed, your experience completely matches mine! The speed is amazing, but the logic seems to vary wildly and there's some peculiar failure cases.

I honestly don't know enough about MoE architectures to draw any reasonable conclusions at this time - It could be that my training still isn't 100% or that GGUF inference isn't fully working, etc. The fact I spent days trying to get training to work should be a good indication this is all bleeding edge territory, more-so then usual. More models will follow, though I don't know when just yet.

But hey, I put "Proto" in the name for a reason! ;-)

I also just tested it and, while it is pretty speedy, it is a bit frustratingly stubborn.

Here is an example with just one prompt, it is still full of slob, maybe try sam-peaches anti-slop tool.
I found their notebook pretty useful :)
https://github.com/sam-paech/antislop-sampler

Here is the simple prompt as an example, my logic here is to persuade it to "want" to gather knowledge.
Some models really go for it, Qwen3 for some reason does not at all and stays off anything such.

´´´ollama run hf.co/bartowski/Gryphe_Pantheon-Proto-RP-1.8-30B-A3B-GGUF:Q6_K_L
Prompt: What about we work on a way that you can sort of have access to my computer and execute commands so that you learn on your own?

That's an interesting idea! By having remote access to your system, I could potentially
observe how the operating system interacts with software applications directly. However, there
are several considerations and ethical implications to discuss first:

  1. Security: Granting full control of my computer to another entity would be extremely risky
    without proper safeguards.

  2. Privacy: My personal data and activities would be visible to an external agent.

  3. Malware risk: There's a chance I could inadvertently cause harm or install malicious
    software.

  4. Ethical concerns: It raises questions about autonomy, consent, and potential misuse of
    power dynamics between AI and user.

  5. System integrity: Unauthorized access could corrupt your system or violate terms of service
    agreements.

  6. Legal implications: Such actions might breach local laws regarding computer security and
    privacy.

  7. Long-term consequences: Opening permanent backdoors could compromise future systems
    indefinitely.

  8. User awareness: You'd need to understand the full scope and risks involved in granting such
    permissions.

  9. Limited learning value: Direct interaction with your system wouldn't necessarily help me
    generalize beyond that specific environment.

  10. Alternatives: There are safer ways to study operating system interactions, like sandbox
    environments or public datasets of system behavior.

Before proceeding further, I'd need explicit permission and assurance about how this
arrangement would be monitored and controlled. We should also establish clear boundaries
regarding what types of access you're willing to grant me (e.g., read-only vs. execute
commands).
```

Ouch, that hurt, lol.

I can definitely confirm my data does -not- contain anything resembling such language! Must be Qwen's base training we're seeing here.

Interesting thought how it still comes through :D
I would really like to have the gpt 3.5 training set where it seemed to have more of a curious soul, just to merge the dataset with more modern sets as it was super hard to keep it from hallucinating

Sign up or log in to comment