Schedule/automate incremental arXiv harvest for corpus freshness

The corpus only reflects arXiv as of the last harvest; new preprints don't appear until the next incremental run. Set up a recurring update_arxiv_data.py --incremental run (cron/scheduled task) to keep the index current.

See docs/LIMITATIONS.md, Data section, Temporal freshness.