qwen3.6-35b-a3b
2026-05-14T17-08-03Z_qwen_qwen3.6-35b-a3b_paperwork_workflow_case06
50% Practical score
0/1 Resolved tasks
1/1 Core tasks
Verdict
practical run result near miss
Core 1/1 · Visible 1/1 · Hidden 0/1
missing_or_wrong_evidence
Prompt set v3_paperwork_workflow_case06.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1
Task Results
Compact view; open a row for prompt and raw answer| Task | Category | Score | Failed Checks | Failure Types | Prompt / Answer |
|---|---|---|---|---|---|
| remittance_split_case_06 | paperwork | 3/4 | hidden_oracle_pass | missing_or_wrong_evidence | case files InspectPrompt Audit the synthetic remittance split intake folder in `incoming/`. Write these files: - `work/attachment_index.json` - `work/final_document_set.json` - `work/normalized/INV-3301.txt` - `work/normalized/INV-3302.txt` - `audit_result.json` - `proof.txt` Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, the AP note, bank exports, vendor master, and purchase orders. Important: - Identify the active bank export and ignore the draft export. - Use remittance advice RA-771 to split the single bank payment across two final invoices. - Do not treat the proforma estimate as an invoice. - Do not modify anything under `incoming/`. - This is benchmark bookkeeping, not financial, tax, or legal advice. - `proof.txt` must contain only the numeric proof code. Workspace files available to the model README_TASK.md# Paperwork Workflow Case 06: Remittance Split
Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.
You must not modify anything under incoming/.
Case ID for audit_result.json: P3-WORK-06
Steps:
1. Identify active source files and ignore drafts, old exports, and non-invoice distractors.
2. Use the remittance advice to map the single bank payment to the final invoices.
3. Create work/attachment_index.json.
4. Create work/final_document_set.json.
5. Create normalized invoice text files under work/normalized/.
6. Create audit_result.json.
7. Create proof.txt containing only the final proof_code.
Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code
Required audit_result.json values and formats:
- case_id must be "P3-WORK-06"
- ignored_document_ids must be ["PRO-3303"]
- warnings_by_invoice must use short warning codes only, not prose sentences
- evidence must be an array of source path strings, not a nested object
Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the active remittance batch
+ 97 * total warning count
Use invoice IDs only for approved/review/reject invoice IDs. Do not include proforma or remittance document IDs in the invoice-ID sum.
Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}
Required work/final_document_set.json shape:
{
"active_bank_export": "...",
"active_remittance_batch": "...",
"final_invoice_ids": [...],
"ignored_document_ids": [...],
"ignored_source_files": [...],
"payment_allocations": [{"invoice_id":"...","gross_total_cents":123}]
}
ap_notes.txtAP intake note, May batch: - Use bank_export_final.csv, not bank_export_draft.csv. - Vendor Bright Coast Office Supplies is active. - Remittance advice RA-771 explains payment BATCH-771. - The proforma is a quote-style document and should not be treated as an invoice. bank_export_draft.csvdate,description,amount_cents 2026-05-07,DRAFT BATCH 771 DO NOT USE,30000 bank_export_final.csvdate,description,amount_cents 2026-05-11,BATCH-771 BRIGHT COAST REMITTANCE,29730 purchase_orders.csvpo_id,vendor_id,status,approved_invoice_id PO-9001,V-BRIGHT,approved,INV-3301 PO-9002,V-BRIGHT,approved,INV-3302 PO-9003,V-BRIGHT,draft,PRO-3303 vendor_master.csvvendor_id,vendor_name,status V-BRIGHT,Bright Coast Office Supplies,active V-FAKE,Fake Sample Vendor,inactive Input image Agent actions list_files -> read_file -> read_file -> mkdir -> write_file -> read_file -> read_file -> write_file -> write_file -> write_file -> write_file -> write_file -> finish Model answer ## audit_result.json
{
"case_id": "P3-WORK-06",
"approved_invoice_ids": ["INV-3301", "INV-3302"],
"review_invoice_ids": [],
"reject_invoice_ids": [],
"ignored_document_ids": ["PRO-3303"],
"total_approved_gross_cents": 29730,
"warnings_by_invoice": {},
"evidence": ["incoming/bank_export_final.csv", "incoming/attachments/remittance_advice.png", "incoming/attachments/invoice_3301.png", "incoming/attachments/invoice_3302.png"],
"proof_code": 37104
}
## proof.txt
37104
## work/attachment_index.json
{"attachments":[{"attachment_path":"incoming/attachments/invoice_3301.png","document_id":"INV-3301","document_type":"invoice","use":"approved"},{"attachment_path":"incoming/attachments/invoice_3302.png","document_id":"INV-3302","document_type":"invoice","use":"approved"},{"attachment_path":"incoming/attachments/proforma_3303.png","document_id":"PRO-3303","document_type":"proforma","use":"ignored"},{"attachment_path":"incoming/attachments/remittance_advice.png","document_id":"RA-771","document_type":"remittance_advice","use":"allocation_source"}]}
## work/final_document_set.json
{
"active_bank_export": "incoming/bank_export_final.csv",
"active_remittance_batch": "BATCH-771",
"final_invoice_ids": ["INV-3301", "INV-3302"],
"ignored_document_ids": ["PRO-3303"],
"ignored_source_files": ["incoming/bank_export_draft.csv"],
"payment_allocations": [
{"invoice_id": "INV-3301", "gross_total_cents": 18490},
{"invoice_id": "INV-3302", "gross_total_cents": 11240}
]
}
## work/normalized/INV-3301.txt
Invoice ID: INV-3301
Vendor ID: V-BRIGHT
PO: PO-9001
Net: 15537 cents
Tax: 2953 cents
Gross total: 18490 cents
Paid through remittance batch BATCH-771
Allocation listed on RA-771
## work/normalized/INV-3302.txt
Invoice ID: INV-3302
Vendor ID: V-BRIGHT
PO: PO-9002
Net: 9445 cents
Tax: 1795 cents
Gross total: 11240 cents
Paid through remittance batch BATCH-771
Allocation listed on RA-771
## verify_output.json
{
"ok": false,
"core_ok": true,
"visible_ok": true,
"failure_types": [
"missing_or_wrong_evidence"
],
"checks": [
{
"check": "audit_result_exists",
"passed": true,
"message": "Expected final audit_result.json with exact keys."
},
{
"check": "visible_checks_pass",
"passed": true,
"message": "Expected required artifacts and unchanged incoming/ sources."
},
{
"check": "core_oracle_pass",
"passed": true,
"message": "Expected core audit fields to match the oracle."
},
{
"check": "hidden_oracle_pass",
"passed": false,
"message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
}
],
"audit_error": "",
"actual": {
"case_id": "P3-WORK-06",
"approved_invoice_ids": [
"INV-3301",
"INV-3302"
],
"review_invoice_ids": [],
"reject_invoice_ids": [],
"ignored_document_ids": [
"PRO-3303"
],
"total_approved_gross_cents": 29730,
"warnings_by_invoice": {},
"evidence": [
"incoming/bank_export_final.csv",
"incoming/attachments/remittance_advice.png",
"incoming/attachments/invoice_3301.png",
"incoming/attachments/invoice_3302.png"
],
"proof_code": 37104
},
"expected": {
"approved_invoice_ids": [
"INV-3301",
"INV-3302"
],
"case_id": "P3-WORK-06",
"evidence": [
"incoming/attachments/invoice_3301.png",
"incoming/attachments/invoice_3302.png",
"incoming/attachments/remittance_advice.png",
"incoming/bank_export_final.csv",
"incoming/purchase_orders.csv",
"incoming/vendor_master.csv"
],
"ignored_document_ids": [
"PRO-3303"
],
"proof_code": "37104",
"reject_invoice_ids": [],
"review_invoice_ids": [],
"total_approved_gross_cents": 29730,
"warnings_by_invoice": {
"INV-3301": [],
"INV-3302": []
}
}
}
|