Skip to main content
·
Nowledge Labs TeamNowledge Labs Team
·4 min read

Don't turn Skills into a prompt library

Skills in v0.9 start from work you actually did. They show their source, change how an agent acts, and stay off until you choose to use them.

When we built Skills, the failure mode was obvious: it could become a prompt library.

AI can write a checklist on almost anything. Give it a few transcripts and it will return something that looks complete: first do this, then do that, remember these caveats. Most of those checklists are not worth keeping. A strong model already knows the generic version, and a weaker model may follow the wrong parts with confidence.

A useful Skill captures a step you learned from real work, the part a capable agent would still get wrong without your experience.

What counts as a Skill

In v0.9, we keep three things separate.

Memories record what happened, what you decided, and what you learned.

“Keep answers short” is a rule. It should shape every run. “Do not hand-edit generated API docs in this project” is also a rule. It constrains behavior.

“When cutting a release, commit the submodules before moving the parent pointer” is closer to a Skill. It changes the order an agent works in. If the agent misses it, the repo can end up in a bad state.

So we ask one simple question before treating something as a Skill:

What exactly will the agent do differently next time?

If there is no answer, it should not become a Skill.

It has to come from real work

A Skill earns trust through its source. Nice phrasing is secondary.

Mem looks for repeatable methods in the work you actually did: a debugging pass, a release sequence, a review habit, a safety check you only learned after getting it wrong. It should point back to the moment that earned it, with less “you seem to prefer this” guesswork.

That is also why suggested Skills stay off. Mem can propose one, but you decide whether it becomes active. Once a Skill is on, agents really can follow it. A bad Skill can make the next run worse.

Skills in Nowledge MemSkills in Nowledge Mem

The example from our own work

While building v0.9, we kept using the same plain testing loop:

Build an evaluation that can fail. Write down what happens at each step. Find the one place the pipeline is stuck. Change only that. Run it again.

That sounds obvious. It is also the first thing people skip when a result looks wrong. The reflex is to tune the prompt, switch the model, move a threshold, or make the system suggest more. It feels busy, but you do not know which step improved.

We used this loop in two separate pieces of work: improving another agent project’s evaluation, and then measuring the quality of Skills suggestions themselves. Mem found the pattern in those development records and wrote it as a Skill.

It skipped the empty “use evaluations” advice and wrote the actual moves: what to measure first, why each step needs a trace, when to change only one thing, and when not to scale the run yet.

The crystallized harness loop Skill detailThe crystallized harness loop Skill detail

That was the useful part: a method from real work, written in a form an agent can follow later.

It also has to resist bad edits

Then we asked Mem to improve that Skill.

It derived a few test cases from the evidence, tried several revisions, and compared them against the current version. None beat the original, so none shipped.

That result is quiet. It is also correct.

If an article edit is weak, the article is just worse. If a Skill edit is weak, the agent may follow worse instructions next time. A self-improving system has to improve, but it also has to stop. No better version, no change.

What you see in v0.9

Open Skills in v0.9 and the page should feel quiet.

The harness loop Skill in the Skills listThe harness loop Skill in the Skills list

The list should be short. Each suggestion needs a source and three answers:

  • What real work did this come from?
  • What will the agent do differently after it is turned on?
  • Why does Mem think this is worth keeping?

If you recognize it and trust the evidence, turn it on. Agents connected to Mem can then read it when it applies. If the Skill is refined later, the same rule holds: the new version has to prove it is better.

That is the point of Skills in v0.9: your own hard-won step, turned into something your agents can reuse.

© 2026 Nowledge Labs. Building the knowledge layer.