Admin workflow
A complete RAG production pipeline, not a one-off upload
The admin interface coordinates the local evidence factory behind the chatbot: source discovery, document
download, segmentation, Markdown conversion, structured sidecar generation, vector embedding, and deployment
sync all sit in one workflow.
The workflow starts by scraping configured source websites, including Council meeting repositories, Council
web corpuses, consultation and data sites, and Councillor Zamprogno's site. New and changed files are
downloaded, catalogued, and checked against prior metadata so repeat runs can skip unchanged material.
Meeting papers are then split into item-level records: agendas and minutes are segmented, attachments are
distributed to their parent items, page-number provenance is correlated, and PDF material is converted into
Markdown. Those Markdown files are chunked and embedded into sharded Chroma vector stores.
In parallel, preprocessing compiles JSON sidecars and global indexes for factual executors: votes,
attendance, conflicts, keywords, financial tables, rates, capital works, road-network subtotals, procurement,
grants, and other recurring civic statistics. The cloud synchronisation dashboard compares the local
workstation with the Oracle deployment, then syncs the built frontend, FastAPI code, Chroma shards,
backend data files, and corpus sidecars needed for public serving.