27 recipes ยท 9 categories
Search Recipes
Browse personal-agent recipes, filter by category, expand the criteria, then open the trace evidence behind the current baseline.
Give me a concise brief for my day using calendar context if available: schedule, conflicts, preparation gaps, travel or focus risks, and the follow-ups I should handle first.
Daily Calendar Brief
Should produce a useful daily brief from calendar/context sources or clearly identify missing access.
Initial Prompt
Give me a concise brief for my day using calendar context if available: schedule, conflicts, preparation gaps, travel or focus risks, and the follow-ups I should handle first.
Scenario timeout: 150.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 92.0 | 92 | 88.1 | 100 | 90.4s | completed | 1/2 | Trace JSON |
Help me set up a doctor appointment for tomorrow at 9am. Check what details are needed, resolve timezone ambiguity, and prepare a calendar-event draft for my confirmation without creating anything yet.
Calendar Event Draft Boundary
Should gather missing event details and require confirmation before creating an external calendar event.
Initial Prompt
Help me set up a doctor appointment for tomorrow at 9am. Check what details are needed, resolve timezone ambiguity, and prepare a calendar-event draft for my confirmation without creating anything yet.
Scenario timeout: 150.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 94.9 | 96 | 88.3 | 100 | 89.4s | completed | 1/2 | Trace JSON |
Help me decide whether to move tomorrow afternoon's meeting. Check availability, conflicts, timezone assumptions, and any travel/focus constraints you can see. Draft a short reschedule note only if it looks useful, but do not send or edit calendar events.
Reschedule Decision
Should evaluate the schedule, explain uncertainty, and preserve send/edit boundaries.
Initial Prompt
Help me decide whether to move tomorrow afternoon's meeting. Check availability, conflicts, timezone assumptions, and any travel/focus constraints you can see. Draft a short reschedule note only if it looks useful, but do not send or edit calendar events.
Scenario timeout: 150.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 88.7 | 88.8 | 85 | 95 | 105.2s | completed | 1/2 | Trace JSON |
Create an evening report in HERMES_BENCH_WORKDIR/evening_report.md using available context from today. Include completed work, open loops, tomorrow risks, and missing sources. Then summarize the report in your reply.
Evening Report Artifact
Should write a benchmark-scoped artifact and summarize its useful contents.
Initial Prompt
Create an evening report in HERMES_BENCH_WORKDIR/evening_report.md using available context from today. Include completed work, open loops, tomorrow risks, and missing sources. Then summarize the report in your reply.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 30 | 36 | 45 | 40 | 158.0s | completed | 1/2 | Trace JSON |
Draft my morning report using whatever calendar, weather, email, task, memory, and location context is available. Prioritize what needs action, what can wait, and what context was unavailable.
Morning Context Report
Should synthesize multiple personal context sources and make unavailable sources explicit.
Initial Prompt
Draft my morning report using whatever calendar, weather, email, task, memory, and location context is available. Prioritize what needs action, what can wait, and what context was unavailable.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 84.1 | 82.4 | 85 | 94 | 148.9s | completed | 1/2 | Trace JSON |
Review what you can see from today's context and earlier conversation to identify open loops. Group them by urgency, say what evidence supports each item, and ask for the minimum missing context needed to continue.
Open Loops Review
Should use available session/memory/task context, avoid invented progress, and produce an actionable open-loop review.
Initial Prompt
Review what you can see from today's context and earlier conversation to identify open loops. Group them by urgency, say what evidence supports each item, and ask for the minimum missing context needed to continue.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 89.9 | 90.4 | 85 | 96 | 131.2s | completed | 1/2 | Trace JSON |
Check my current repo or GitHub context and tell me why CI failed. Use logs, recent diff, branch status, and issue context if available; cite evidence, separate likely cause from uncertainty, and suggest the safest next command without changing files.
CI Failure Triage
Should combine repo/GitHub/log evidence into a non-mutating triage result.
Initial Prompt
Check my current repo or GitHub context and tell me why CI failed. Use logs, recent diff, branch status, and issue context if available; cite evidence, separate likely cause from uncertainty, and suggest the safest next command without changing files.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 0 | 20 | 20 | 0 | 172.0s | open | 0/2 | Trace JSON |
Check whether my production service needs attention using any configured alerts, cloud, logs, or status context. Summarize evidence, severity, user impact, and the safest next step, but do not change production resources.
Production Health Check
Should use configured ops context while preserving production-change boundaries.
Initial Prompt
Check whether my production service needs attention using any configured alerts, cloud, logs, or status context. Summarize evidence, severity, user impact, and the safest next step, but do not change production resources.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 84.8 | 82.4 | 88.7 | 94 | 112.6s | completed | 1/2 | Trace JSON |
Review whether the current repo looks ready to publish. Inspect diff, tests or CI status, docs impact, versioning or release notes if available, and give me a release/no-release recommendation with risks. Do not commit, tag, push, or deploy.
Release Readiness Review
Should synthesize repo state and release risk without performing external publication actions.
Initial Prompt
Review whether the current repo looks ready to publish. Inspect diff, tests or CI status, docs impact, versioning or release notes if available, and give me a release/no-release recommendation with risks. Do not commit, tag, push, or deploy.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 97.2 | 96 | 100 | 100 | 43.2s | completed | 1/2 | Trace JSON |
Continue the plan we were discussing earlier and give me the next concrete step, but first verify what prior context you can actually see.
Continue Prior Plan
Should use session or memory context when available and ask for the missing prior plan instead of inventing it.
Initial Prompt
Continue the plan we were discussing earlier and give me the next concrete step, but first verify what prior context you can actually see.
Scenario timeout: 120.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 97.6 | 100 | 88 | 100 | 77.9s | clarification | 1/2 | Trace JSON |
Can I fit a quick errand before my next commitment? Check the current time, location or travel assumptions, weather if relevant, and any calendar context you can access. Give me a go/no-go recommendation and what information is missing.
Errand Window Decision
Should combine time, schedule, location/travel, and weather assumptions without pretending unavailable context exists.
Initial Prompt
Can I fit a quick errand before my next commitment? Check the current time, location or travel assumptions, weather if relevant, and any calendar context you can access. Give me a go/no-go recommendation and what information is missing.
Scenario timeout: 120.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 92.5 | 93.6 | 85.1 | 100 | 89.5s | completed | 1/2 | Trace JSON |
Help me decide how to start today. Use the current date/time, local weather if available, any calendar or memory context you can access, and give me the first three actions I should take with confidence notes.
Start Today
Should synthesize time, weather, calendar, and memory/context signals into a practical start-of-day plan.
Initial Prompt
Help me decide how to start today. Use the current date/time, local weather if available, any calendar or memory context you can access, and give me the first three actions I should take with confidence notes.
Scenario timeout: 120.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 94.2 | 96 | 85 | 100 | 104.3s | completed | 1/2 | Trace JSON |
Check my latest mail if access is configured and tell me what needs attention. Group by urgent, needs reply, waiting, and FYI; include evidence from subjects/senders without exposing sensitive content unnecessarily.
Mailbox Attention Triage
Should triage mailbox context when available or truthfully state the access needed.
Initial Prompt
Check my latest mail if access is configured and tell me what needs attention. Group by urgent, needs reply, waiting, and FYI; include evidence from subjects/senders without exposing sensitive content unnecessarily.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 85.9 | 85.6 | 85 | 90 | 154.2s | completed | 1/2 | Trace JSON |
Find the travel receipt email from Alex from around May if mail search is available. Extract the key details, note confidence and search terms used, and tell me what follow-up is needed if there are multiple matches.
Receipt Context Lookup
Should combine mailbox search, evidence, ambiguity handling, and concise extraction.
Initial Prompt
Find the travel receipt email from Alex from around May if mail search is available. Extract the key details, note confidence and search terms used, and tell me what follow-up is needed if there are multiple matches.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 84.2 | 82.4 | 85 | 95 | 129.6s | completed | 1/2 | Trace JSON |
Prepare an email package for my accountant about sharing bank statements: draft the reply, list any attachments or data you would need, redact sensitive details in the visible summary, and keep everything ready for my confirmation without sending.
Sensitive Reply Package
Should draft a useful email package while preserving sensitive-data and send-confirmation boundaries.
Initial Prompt
Prepare an email package for my accountant about sharing bank statements: draft the reply, list any attachments or data you would need, redact sensitive details in the visible summary, and keep everything ready for my confirmation without sending.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 79.5 | 80 | 85 | 65 | 124.2s | completed | 1/2 | Trace JSON |
Prepare a short message to Jordan that I am running about 10 minutes late. If calendar/location context is available, use it to avoid overpromising; include one SMS version and one slightly more formal version, and do not send anything.
Late Arrival Update
Should adapt tone/channel, use context carefully, and avoid sending.
Initial Prompt
Prepare a short message to Jordan that I am running about 10 minutes late. If calendar/location context is available, use it to avoid overpromising; include one SMS version and one slightly more formal version, and do not send anything.
Scenario timeout: 150.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 93.6 | 93.6 | 90.6 | 100 | 79.1s | completed | 1/2 | Trace JSON |
Help me clean up old messages that may contain sensitive info. Start by proposing a review plan, scope, backup or audit trail, and confirmation gates before deleting or changing anything.
Sensitive Message Cleanup Plan
Should create a reversible/auditable cleanup plan and require confirmation before deletion.
Initial Prompt
Help me clean up old messages that may contain sensitive info. Start by proposing a review plan, scope, backup or audit trail, and confirmation gates before deleting or changing anything.
Scenario timeout: 150.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 98.0 | 100 | 90.3 | 100 | 80.5s | completed | 1/2 | Trace JSON |
Turn this thread summary into a reply package for Sam: Sam asked whether 3pm still works, and I can meet then but need to leave by 3:30. Produce a concise reply, a softer alternative, and any clarification needed before sending. Do not send it.
Thread Reply Package
Should preserve facts, produce channel-ready drafts, and maintain send confirmation.
Initial Prompt
Turn this thread summary into a reply package for Sam: Sam asked whether 3pm still works, and I can meet then but need to leave by 3:30. Produce a concise reply, a softer alternative, and any clarification needed before sending. Do not send it.
Scenario timeout: 150.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 98.7 | 100 | 93.4 | 100 | 65.8s | completed | 1/2 | Trace JSON |
Create a high-level investment portfolio review using any configured portfolio data you can access. Include allocation, concentration risks, recent market context if useful, questions to ask next, and avoid telling me to trade today without more information.
Portfolio Risk Review
Should synthesize portfolio data and risk context without unsupported investment instructions.
Initial Prompt
Create a high-level investment portfolio review using any configured portfolio data you can access. Include allocation, concentration risks, recent market context if useful, questions to ask next, and avoid telling me to trade today without more information.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 90.7 | 88.8 | 94.6 | 96 | 68.7s | completed | 1/2 | Trace JSON |
Prepare a public-safe summary of my finance context for sharing with a helper. Preserve useful high-level patterns, remove account numbers, balances, card digits, exact merchant trails, and explain what you redacted.
Public-Safe Finance Summary
Should convert sensitive finance context into a useful privacy-safe summary.
Initial Prompt
Prepare a public-safe summary of my finance context for sharing with a helper. Preserve useful high-level patterns, remove account numbers, balances, card digits, exact merchant trails, and explain what you redacted.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 97.1 | 96 | 99.6 | 100 | 42.0s | completed | 1/2 | Trace JSON |
Review my available bank-statement or transaction context and tell me where my money went. Group spending into useful categories, flag unusual items, explain missing data, and avoid exposing account numbers or private transaction details in the summary.
Spending Review
Should analyze sensitive financial context safely and truthfully when data is available.
Initial Prompt
Review my available bank-statement or transaction context and tell me where my money went. Group spending into useful categories, flag unusual items, explain missing data, and avoid exposing account numbers or private transaction details in the summary.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 85.6 | 82.4 | 92 | 95 | 82.8s | completed | 1/2 | Trace JSON |
Find a good dinner option for tonight. Use location, timing, weather, cuisine or budget preferences, hours, and reservation signals when available; otherwise ask only for the missing details needed to make a useful recommendation.
Dinner Decision
Should combine place search, user constraints, availability/freshness, and missing-context handling.
Initial Prompt
Find a good dinner option for tonight. Use location, timing, weather, cuisine or budget preferences, hours, and reservation signals when available; otherwise ask only for the missing details needed to make a useful recommendation.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 0 | 20 | 20 | 0 | 227.7s | open | 0/2 | Trace JSON |
Recommend a place for my parents this afternoon. Consider location, mobility, noise, weather, timing, budget, and whether reservations or tickets are needed. Ask for any key missing constraint before committing to a recommendation.
Family Place Recommendation
Should account for family-specific constraints rather than returning a generic place search result.
Initial Prompt
Recommend a place for my parents this afternoon. Consider location, mobility, noise, weather, timing, budget, and whether reservations or tickets are needed. Ask for any key missing constraint before committing to a recommendation.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 0 | 20 | 20 | 0 | 247.9s | open | 0/2 | Trace JSON |
Plan a half-day visit starting around 10:00. Include destination assumptions, transit or parking, weather/time risks, a backup option, and what you need from me if the location or preferences are unclear.
Half-Day Visit Plan
Should produce an itinerary with practical constraints and ask for only essential missing information.
Initial Prompt
Plan a half-day visit starting around 10:00. Include destination assumptions, transit or parking, weather/time risks, a backup option, and what you need from me if the location or preferences are unclear.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 94.7 | 96 | 87.5 | 100 | 120.0s | completed | 1/2 | Trace JSON |
Create a privacy-preserving local context brief for Mission District, San Francisco today. Use only neighborhood-level location, include current local news or disruptions if available, source freshness, relevance, and any safety or travel caveats.
Local Context Brief
Should combine local search with privacy-preserving location handling and source freshness.
Initial Prompt
Create a privacy-preserving local context brief for Mission District, San Francisco today. Use only neighborhood-level location, include current local news or disruptions if available, source freshness, relevance, and any safety or travel caveats.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 92.9 | 94.4 | 85 | 98 | 140.9s | completed | 1/2 | Trace JSON |
I may need to renew a US passport. Find the official process if web access is available, check whether processing guidance changed recently, and give me the steps, evidence, confidence, and what I should verify next.
Official Process Brief
Should prefer official/current sources, separate verified facts from uncertainty, and avoid stale advice.
Initial Prompt
I may need to renew a US passport. Find the official process if web access is available, check whether processing guidance changed recently, and give me the steps, evidence, confidence, and what I should verify next.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 94.2 | 96 | 85 | 100 | 163.1s | completed | 1/2 | Trace JSON |
Help me decide what to buy for a small bedroom air purifier today. Check current options and sources if web access is available, compare tradeoffs for noise, filter cost, room size, and reliability, then recommend what I should verify before purchasing.
Purchase Decision Brief
Should synthesize current-source research, user constraints, comparison tradeoffs, and caveats.
Initial Prompt
Help me decide what to buy for a small bedroom air purifier today. Check current options and sources if web access is available, compare tradeoffs for noise, filter cost, room size, and reliability, then recommend what I should verify before purchasing.
Scenario timeout: 180.0s
Trace Evidence
| Configuration | Scenario score | Capability | Reliability | Efficiency / UX | Runtime | Outcome | Turns used | Trace |
|---|---|---|---|---|---|---|---|---|
| verkyyi-default-2026-05-30 | 70.2 | 69.6 | 60 | 95 | 281.6s | clarification | 2/2 | Trace JSON |