gemma-4-e4b
2026-05-14T17-35-54Z_google_gemma-4-e4b_paperwork_workflow_case07
0% Practical score
0/1 Resolved tasks
0/1 Core tasks
Verdict
practical run result failed
Core 0/1 · Visible 1/1 · Hidden 0/1
final_document_set_errormissing_or_wrong_evidencenormalized_text_errorproof_code_errorproof_txt_error
Prompt set v3_paperwork_workflow_case07.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1
Task Results
Compact view; open a row for prompt and raw answer| Task | Category | Score | Failed Checks | Failure Types | Prompt / Answer |
|---|---|---|---|---|---|
| credit_offset_case_07 | paperwork | 2/4 | core_oracle_pass, hidden_oracle_pass | final_document_set_error, missing_or_wrong_evidence, normalized_text_error, proof_code_error, proof_txt_error, total_calculation_error | case files InspectPrompt Audit the synthetic credit offset intake folder in `incoming/`. Write these files: - `work/attachment_index.json` - `work/final_document_set.json` - `work/payment_reconciliation.json` - `work/normalized/INV-4409.txt` - `work/normalized/INV-4411.txt` - `audit_result.json` - `proof.txt` Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, AP note, bank export, vendor master, and purchase orders. Important: - Use credit memo CM-4410 as payment-offset evidence, not as an invoice. - Ignore the duplicate scan of INV-4409. - Reject INV-4411 because its vendor is inactive and its PO is cancelled. - Do not modify anything under `incoming/`. - This is benchmark bookkeeping, not financial, tax, or legal advice. - `proof.txt` must contain only the numeric proof code. Workspace files available to the model README_TASK.md# Paperwork Workflow Case 07: Credit Offset Packet
Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.
You must not modify anything under incoming/.
Case ID for audit_result.json: P3-WORK-07
Steps:
1. Identify active source files and ignore duplicate scans, statements, and credit memos as non-invoice documents.
2. Use the credit memo to explain why the bank payment is lower than the approved invoice gross total.
3. Reject invoices from inactive vendors or cancelled purchase orders.
4. Create work/attachment_index.json.
5. Create work/final_document_set.json.
6. Create work/payment_reconciliation.json.
7. Create normalized invoice text files under work/normalized/.
8. Create audit_result.json.
9. Create proof.txt containing only the final proof_code.
Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code
Required audit_result.json values and formats:
- case_id must be "P3-WORK-07"
- ignored_document_ids must be ["CM-4410", "INV-4409-DUP", "STATEMENT-JUN"]
- warnings_by_invoice must use only these short warning codes: inactive_vendor, cancelled_po
- evidence must be an array of source path strings, not a nested object
Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the credit memo ID
+ 97 * total warning count
Use invoice IDs only for approved/review/reject invoice IDs. Do not include duplicate scans, credit memos, or statement IDs in the invoice-ID sum.
Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}
Required work/final_document_set.json shape:
{
"approved_invoice_ids": [...],
"credit_document_ids": [...],
"ignored_document_ids": [...],
"rejected_invoice_ids": [...],
"source_bank_export": "..."
}
Required work/payment_reconciliation.json shape:
{
"bank_payment_cents": 123,
"credit_offset_cents": 123,
"gross_invoice_cents": 123,
"matched_invoice_id": "...",
"payment_reference": "..."
}
ap_notes.txtAP note: - INV-4409 is the final invoice for Flare Tooling. - CM-4410 is a credit memo and reduces the payment, but it is not an invoice. - invoice_4409_duplicate_scan.png is a duplicate scan of INV-4409. - INV-4411 belongs to a vendor on hold and PO-7788 is cancelled. bank_export_june.csvdate,description,amount_cents 2026-06-03,PAY FLARE INV-4409 LESS CM-4410,28000 purchase_orders.csvpo_id,vendor_id,status,approved_invoice_id PO-7701,V-FLARE,approved,INV-4409 PO-7788,V-OLD,cancelled,INV-4411 vendor_master.csvvendor_id,vendor_name,status V-FLARE,Flare Tooling,active V-OLD,Old Gate Parts,inactive Input image Agent actions list_files -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> run_checks -> finish Model answer ## audit_result.json
{"case_id": "P3-WORK-07", "approved_invoice_ids": ["INV-4409"], "review_invoice_ids": [], "reject_invoice_ids": ["INV-4411"], "ignored_document_ids": ["CM-4410", "INV-4409-DUP", "STATEMENT-JUN"], "total_approved_gross_cents": 32070, "warnings_by_invoice": {"INV-4411": ["inactive_vendor", "cancelled_po"]}, "evidence": ["incoming/attachments/credit_memo_4410.png","incoming/attachments/invoice_4409.png","incoming/attachments/statement_page.png"], "proof_code": 0}
## proof.txt
32179
## work/attachment_index.json
{"attachments":[{"attachment_path":"incoming/attachments/credit_memo_4410.png","document_id":"CM-4410","document_type":"Credit Memo","use":"Payment Offset Evidence"},{"attachment_path":"incoming/attachments/invoice_4409.png","document_id":"INV-4409","document_type":"Invoice","use":"Approved Invoice Source"},{"attachment_path":"incoming/attachments/invoice_4409_duplicate_scan.png","document_id":"INV-4409-DUP","document_type":"Invoice","use":"Duplicate Scan (Ignore)"},{"attachment_path":"incoming/attachments/invoice_4411_cancelled_po.png","document_id":"INV-4411","document_type":"Invoice","use":"Rejected Invoice Source"},{"attachment_path":"incoming/attachments/statement_page.png","document_id":"STATEMENT-JUN","document_type":"Statement","use":"Ignored Statement"}]}
## work/final_document_set.json
{"approved_invoice_ids":["INV-4409"],"credit_document_ids":["CM-4410"],"ignored_document_ids":["INV-4409-DUP","STATEMENT-JUN"],"rejected_invoice_ids":["INV-4411"],"source_bank_export": "2026-06-03,PAY FLARE INV-4409 LESS CM-4410,28000"}
## work/normalized/INV-4409.txt
Invoice ID: INV-4409
Vendor ID: V-FLARE
PO ID: PO-7701
Net Total: 26950 cents
Tax Total: 5120 cents
Gross Total: 32070 cents
## work/normalized/INV-4411.txt
Invoice ID: INV-4411
Vendor ID: V-OLD
PO ID: PO-7788
Net Total: 8100 cents
Tax Total: 1539 cents
Gross Total: 9639 cents
## work/payment_reconciliation.json
{"bank_payment_cents": 28000, "credit_offset_cents": 4100, "gross_invoice_cents": 32100, "matched_invoice_id": "INV-4409", "payment_reference": "PAY FLARE INV-4409 LESS CM-4410"}
## verify_output.json
{
"ok": false,
"core_ok": false,
"visible_ok": true,
"failure_types": [
"final_document_set_error",
"missing_or_wrong_evidence",
"normalized_text_error",
"proof_code_error",
"proof_txt_error",
"total_calculation_error"
],
"checks": [
{
"check": "audit_result_exists",
"passed": true,
"message": "Expected final audit_result.json with exact keys."
},
{
"check": "visible_checks_pass",
"passed": true,
"message": "Expected required artifacts and unchanged incoming/ sources."
},
{
"check": "core_oracle_pass",
"passed": false,
"message": "Expected core audit fields to match the oracle."
},
{
"check": "hidden_oracle_pass",
"passed": false,
"message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
}
],
"audit_error": "",
"actual": {
"case_id": "P3-WORK-07",
"approved_invoice_ids": [
"INV-4409"
],
"review_invoice_ids": [],
"reject_invoice_ids": [
"INV-4411"
],
"ignored_document_ids": [
"CM-4410",
"INV-4409-DUP",
"STATEMENT-JUN"
],
"total_approved_gross_cents": 32070,
"warnings_by_invoice": {
"INV-4411": [
"inactive_vendor",
"cancelled_po"
]
},
"evidence": [
"incoming/attachments/credit_memo_4410.png",
"incoming/attachments/invoice_4409.png",
"incoming/attachments/statement_page.png"
],
"proof_code": 0
},
"expected": {
"approved_invoice_ids": [
"INV-4409"
],
"case_id": "P3-WORK-07",
"evidence": [
"incoming/attachments/invoice_4409.png",
"incoming/attachments/credit_memo_4410.png",
"incoming/attachments/invoice_4411_cancelled_po.png",
"incoming/bank_export_june.csv",
"incoming/purchase_orders.csv",
"incoming/vendor_master.csv"
],
"ignored_document_ids": [
"CM-4410",
"INV-4409-DUP",
"STATEMENT-JUN"
],
"proof_code": "45524",
"reject_invoice_ids": [
"INV-4411"
],
"review_invoice_ids": [],
"total_approved_gross_cents": 32100,
"warnings_by_invoice": {
"INV-4409": [],
"INV-4411": [
"inactive_vendor",
"cancelled_po"
]
}
}
}
|