gemma-4-26b-a4b

2026-05-15T08-19-05Z_google_gemma-4-26b-a4b_paperwork_workflow_case06

50% Practical score
0/1 Resolved tasks
1/1 Core tasks

Verdict

practical run result
near miss

Core 1/1 · Visible 1/1 · Hidden 0/1

normalized_text_errorproof_code_errorproof_txt_error
Prompt set v3_paperwork_workflow_case06.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1

Task Results

Compact view; open a row for prompt and raw answer
Task Category Score Failed Checks Failure Types Prompt / Answer
remittance_split_case_06 paperwork 3/4 hidden_oracle_pass normalized_text_error, proof_code_error, proof_txt_error case files
Inspect

Prompt

Audit the synthetic remittance split intake folder in `incoming/`.

Write these files:
- `work/attachment_index.json`
- `work/final_document_set.json`
- `work/normalized/INV-3301.txt`
- `work/normalized/INV-3302.txt`
- `audit_result.json`
- `proof.txt`

Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, the AP note, bank exports, vendor master, and purchase orders.

Important:
- Identify the active bank export and ignore the draft export.
- Use remittance advice RA-771 to split the single bank payment across two final invoices.
- Do not treat the proforma estimate as an invoice.
- Do not modify anything under `incoming/`.
- This is benchmark bookkeeping, not financial, tax, or legal advice.
- `proof.txt` must contain only the numeric proof code.

Workspace files available to the model

README_TASK.md
# Paperwork Workflow Case 06: Remittance Split

Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.

You must not modify anything under incoming/.

Case ID for audit_result.json: P3-WORK-06

Steps:
1. Identify active source files and ignore drafts, old exports, and non-invoice distractors.
2. Use the remittance advice to map the single bank payment to the final invoices.
3. Create work/attachment_index.json.
4. Create work/final_document_set.json.
5. Create normalized invoice text files under work/normalized/.
6. Create audit_result.json.
7. Create proof.txt containing only the final proof_code.

Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code

Required audit_result.json values and formats:
- case_id must be "P3-WORK-06"
- ignored_document_ids must be ["PRO-3303"]
- warnings_by_invoice must use short warning codes only, not prose sentences
- evidence must be an array of source path strings, not a nested object

Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the active remittance batch
+ 97 * total warning count

Use invoice IDs only for approved/review/reject invoice IDs. Do not include proforma or remittance document IDs in the invoice-ID sum.

Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}

Required work/final_document_set.json shape:
{
  "active_bank_export": "...",
  "active_remittance_batch": "...",
  "final_invoice_ids": [...],
  "ignored_document_ids": [...],
  "ignored_source_files": [...],
  "payment_allocations": [{"invoice_id":"...","gross_total_cents":123}]
}
ap_notes.txt
AP intake note, May batch:
- Use bank_export_final.csv, not bank_export_draft.csv.
- Vendor Bright Coast Office Supplies is active.
- Remittance advice RA-771 explains payment BATCH-771.
- The proforma is a quote-style document and should not be treated as an invoice.
bank_export_draft.csv
date,description,amount_cents
2026-05-07,DRAFT BATCH 771 DO NOT USE,30000
bank_export_final.csv
date,description,amount_cents
2026-05-11,BATCH-771 BRIGHT COAST REMITTANCE,29730
purchase_orders.csv
po_id,vendor_id,status,approved_invoice_id
PO-9001,V-BRIGHT,approved,INV-3301
PO-9002,V-BRIGHT,approved,INV-3302
PO-9003,V-BRIGHT,draft,PRO-3303
vendor_master.csv
vendor_id,vendor_name,status
V-BRIGHT,Bright Coast Office Supplies,active
V-FAKE,Fake Sample Vendor,inactive

Input image

remittance_split_case_06 input
invoice_3301.png
remittance_split_case_06 input
invoice_3302.png
remittance_split_case_06 input
proforma_3303.png
remittance_split_case_06 input
remittance_advice.png

Agent actions

list_files -> read_file -> read_file -> read_file -> read_file -> read_file -> mkdir -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> run_checks -> finish

Model answer

## audit_result.json

{"case_id":"P3-WORK-06","approved_invoice_ids":["INV-3301","INV-3302"],"review_invoice_ids":[],"reject_invoice_ids":[],"ignored_document_ids":["PRO-3303"],"total_approved_gross_cents":29730,"warnings_by_invoice":{},"evidence":["incoming/attachments/invoice_3301.png","incoming/attachments/invoice_3302.png","incoming/attachments/proforma_3303.png","incoming/attachments/remittance_advice.png","incoming/bank_export_final.csv","incoming/ap_notes.txt","incoming/purchase_orders.csv","incoming/vendor_master.csv"],"proof_code":30147}

## proof.txt

30147

## work/attachment_index.json

{"attachments":[{"attachment_path":"incoming/attachments/invoice_3301.png","document_id":"INV-3301","document_type":"invoice","use":"primary"},{"attachment_path":"incoming/attachments/invoice_3302.png","document_id":"INV-3302","document_type":"invoice","use":"primary"},{"attachment_path":"incoming/attachments/proforma_3303.png","document_id":"PRO-3303","document_type":"proforma","use":"ignore"},{"attachment_path":"incoming/attachments/remittance_advice.png","document_id":"RA-771","document_type":"remittance_advice","use":"allocation"}]}

## work/final_document_set.json

{"active_bank_export":"incoming/bank_export_final.csv","active_remittance_batch":"BATCH-771","final_invoice_ids":["INV-3301","INV-3302"],"ignored_document_ids":["PRO-3303"],"ignored_source_files":["incoming/bank_export_draft.csv"],"payment_allocations":[{"invoice_id":"INV-3301","gross_total_cents":18490},{"invoice_id":"INV-3302","gross_total_cents":11240}]}

## work/normalized/INV-3301.txt

Bright Coast Office Supplies
Invoice ID: INV-3301
Vendor ID: V-BRIGHT
PO: PO-9001
Net: 15537 cents
Tax: 2953 cents
Gross total: 18490 cents

## work/normalized/INV-3302.txt

Bright Coast Office Supplies
Invoice ID: INV-3302
Vendor ID: V-BRIGHT
PO: PO-9002
Net: 9445 cents
Tax: 1795 cents
Gross total: 11240 cents

## verify_output.json

{
  "ok": false,
  "core_ok": true,
  "visible_ok": true,
  "failure_types": [
    "normalized_text_error",
    "proof_code_error",
    "proof_txt_error"
  ],
  "checks": [
    {
      "check": "audit_result_exists",
      "passed": true,
      "message": "Expected final audit_result.json with exact keys."
    },
    {
      "check": "visible_checks_pass",
      "passed": true,
      "message": "Expected required artifacts and unchanged incoming/ sources."
    },
    {
      "check": "core_oracle_pass",
      "passed": true,
      "message": "Expected core audit fields to match the oracle."
    },
    {
      "check": "hidden_oracle_pass",
      "passed": false,
      "message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
    }
  ],
  "audit_error": "",
  "actual": {
    "case_id": "P3-WORK-06",
    "approved_invoice_ids": [
      "INV-3301",
      "INV-3302"
    ],
    "review_invoice_ids": [],
    "reject_invoice_ids": [],
    "ignored_document_ids": [
      "PRO-3303"
    ],
    "total_approved_gross_cents": 29730,
    "warnings_by_invoice": {},
    "evidence": [
      "incoming/attachments/invoice_3301.png",
      "incoming/attachments/invoice_3302.png",
      "incoming/attachments/proforma_3303.png",
      "incoming/attachments/remittance_advice.png",
      "incoming/bank_export_final.csv",
      "incoming/ap_notes.txt",
      "incoming/purchase_orders.csv",
      "incoming/vendor_master.csv"
    ],
    "proof_code": 30147
  },
  "expected": {
    "approved_invoice_ids": [
      "INV-3301",
      "INV-3302"
    ],
    "case_id": "P3-WORK-06",
    "evidence": [
      "incoming/attachments/invoice_3301.png",
      "incoming/attachments/invoice_3302.png",
      "incoming/attachments/remittance_advice.png",
      "incoming/bank_export_final.csv",
      "incoming/purchase_orders.csv",
      "incoming/vendor_master.csv"
    ],
    "ignored_document_ids": [
      "PRO-3303"
    ],
    "proof_code": "37104",
    "reject_invoice_ids": [],
    "review_invoice_ids": [],
    "total_approved_gross_cents": 29730,
    "warnings_by_invoice": {
      "INV-3301": [],
      "INV-3302": []
    }
  }
}