jest-ai
AI Agent testing suites
The goal of this repository is to produce bots that are listed in the bots/
folder. These bots form a botnet where they can call each other, and coordinate together to give the user the best possible answer. Primarily we are making an AI assistant to help advise on matters of the Dreamcatcher, but the general framework is applicable to everything.
These bots should be able to call functions and perform tasks that alter external system state. These bots will be used to write and maintain this repository, as a ranging shot for the coming age of anything apps
The first bot to load is always the colonel
. This bot controls all the other bots, and is intentionally difficult to modify as doing so would render the application unusable.
We must get to a position where we can deploy the testing framework, and specify jobs in terms of the prompt responses we want, and have humans work towards fixing those to work as we expect.
Live update of a test story
Running in some trusted environment.
There is some gatekeeper that polices changes going thru, then there is some way that passed changes get fed back to clients.
The guardian needs some kind of way to refer to its core, and run the tests specified there, then pass it.
The guardian should be able to run tests on its core and update them.
Tasks
- Make a mock gui of what you want to see - this would be just a jest looking feedback screen of the tests running
- See how to use the test to allow some changes to be made to some filesystem
Interface
You: I want to start a new session
GPT: This is a new session, stored under folder xyz. Do you want to use any templates ? here are some suggestions
You: make the system message be "I am a cat". How are the system tests doing with that ?
GPT: ok. The tests for core have failed completely. meow. (shows the test ui viewer)
You: (use the safeword to break out of the model)
Ask for test mdoel that isn't there
You: show me the tests for model xyz GPT: (switches nav to model list UI) I'm not sure which one you mean, is it this one ?
check on fine tuning
You: yo hows the fine tuning going GPT: (shows the fine tuning status UI) there's 3 jobs going right now, 2 in the queue, and one failed
Task priority
history in the promptaccess to system promptsdump sessions to disk with restoreknowledge base filtered inclusion- web url parsing into the knowledge base
specify a bot by a combination of system, session, and knowledge base- consistent edit
- multibot nested execution
bot loading- edit the knowledge base using the stateboard window
- self improvement of a bot based on some desired prompt outcomes
- swap out a bot based on human interaction using the colonel
- storage of preferences and issues
- window crunching tactics - embeddings or merkle tree
- load the whole book and only use relevant parts so no rate limits are hit
- show what book parts got loaded with each symbol detection
- router to pick the bot to load
- make a log file that shows what messages are being sent to openai
- Example attribution bot for contributions to enhancing a bot thru usage
Should be able to set up test scrips and load them with good prompts.
Colonel needs to handle multiple bot jobs per prompt
colonel could just use another bot, a system bot, to answer system commands.
Be able to discuss the current status of combinations of bots.
The appraiser should give some reasoning back, which can serve as feedback to the AI to try again with some modified responses. When a human sees an AI stuck in a loop, it should be able to free it with very little effort.
Bot mode and workbench mode - change the prompt, break out with a hotkey. Lets you change the system message part way thru a chat and have it reload the whole chat again. This lets sessions be generated and prompts manually altered, or chat focus altered to be different
Bot should be able to switch to another bot, and be preloaded with convo so far
How should we handle peoples data ? Detect successful chains so we can get there quicker next time. In the background we should be processing for success scores.
Extract a common format from each one, and walk the db
Issues
- if a chat is loaded with an assistant, then the response is not automatically sent to gpt until the next user input
- editing the session files is problematic. being able to dump the entire contents of our md files in as part of a bot primer is difficult to do in jsonl.
- changing md files should cause the bot to reload
- restarting the chat from a point after changing the prompt loading is hard
- appraiser should consider the headings of the agents too
- be able to expand out the knowledge base prompts
- nesting bots should work, where a bot can be composed of other bots, and shown in the gui
- if loading a session, do not overwrite it, become ephemeral
- preference bot should know when a rule is short or forever, and should be overridable
- debug view of how the bots decided on the routing
Start making a blockchain app that uses the colonel app to generate a hard coded application that represents everything the colonel has been asked to do, like macros, to date
Milestones
- app on constantly without restarts - no need to ever leave it
- put up in a web version with added edit and button abilities
- connect to solidity contract so can take in money for packets
log dumping to disk say what files got loaded in book, one per match make a chat load up that is preloaded and run using the appraiser If add a user prompt at the end of boot, api will be called