Ingest real Fall 2026 BU course offerings (sections, instructors, meeting times, locations)

## Summary

Staging currently has a **catalog-only** academics dataset: ~8,195 real BU courses (code / name / department / description) each paired with a single **hollow** `course_offerings` row. The operational layer — **sections, instructors, meeting times, locations, syllabi, and correct term** — exists only for the 4 hand-written `seed_staging` demo classes. This issue tracks ingesting **real Fall 2026 (`fall-2026`) BU offering data** so courses behave like a real registrar.

The scrape itself will be run separately; this issue covers defining the scrape contract and building the import + supporting model/API changes.

## Current state (measured against staging)

| Table | Rows | Notes |
|---|---|---|
| `courses` | 8,195 | full BU catalog; `school_id` null on ~8,192, `credits` null on all but 3 |
| `course_offerings` | 8,196 | every course has exactly 1 offering; CS101 has 2 (the +1) |
| `terms` | 4 | `fall-2025`, `spring-2026`, `summer-2026`, **`fall-2026`** all seeded (0019) |
| `schools` | 1 | only "Sapling Demo University" (`seed-school-demo`) |
| `enrollments` | 4 | demo only |

Offering field population across all 8,196 rows:

| Field | Populated |
|---|---|
| `section` | **0** (null on every row) |
| `instructor_name` | 4 (demo only) |
| `meeting_times` | 4 (demo only) |
| `location` | 4 (demo only) |
| `syllabus_url` | **0** |

Term spread of existing offerings: `spring-2026` → 8,194, `fall-2025` → 2. **`fall-2026` has 0 offerings today.**

Relevant code:
- `backend/db/seed_staging.py` — demo seed only; never touches the real catalog.
- `backend/services/academics.py` — `resolve_offering` / `current_term` / `term_for_offering`.
- `backend/db/migrations/0020_academics_split.sql` — the catalog/offering split + `UNIQUE (course_id, term_id, section)`.
- `backend/routes/academics.py` — currently only exposes `GET /semesters`.
- All DB access goes through `backend/db/connection.py::table()` (PostgREST, no DDL).

---

## Tasks

### 1. Scraper / data spec (contract only — scrape run separately)
- [ ] Define the structured output schema the Fall 2026 scrape must produce (the contract the importer consumes). Proposed per-section record:
  - `course_code` (must match catalog format, e.g. `CAS CS 111` — space-separated, uppercase)
  - `section` (e.g. `A1`, `B2`)
  - `instructor_name`
  - `meeting_times` (e.g. `MWF 09:00`)
  - `location`
  - `course_name`, `credits`, `description`, `syllabus_url` (optional; used to enrich/create the catalog course if missing)
- [ ] Decide format (JSON lines / CSV) and where the scraped file lives (not committed; gitignored ops input).
- [ ] Document `course_code` normalization rules so scrape output joins cleanly to `courses.course_code`.

### 2. Importer + section model (core)
- [ ] New idempotent ops script (e.g. `backend/db/import_offerings.py`, run like `seed_staging` under the target env) that:
  - resolves each record's `course_code` → existing `courses.id` (create the catalog course if absent, enriching name/credits/description),
  - upserts one `course_offerings` row **per section** for `term_id = 'fall-2026'`,
  - writes `section`, `instructor_name`, `meeting_times`, `location`, `syllabus_url`,
  - is idempotent (deterministic id or upsert-on-UNIQUE) so re-runs add nothing.
- [ ] All access via `db/connection.py::table()`; env-agnostic (run against staging first).
- [ ] **Fix the section dedup semantics.** `UNIQUE (course_id, term_id, section)` is NULL-distinct today, so real multi-section data won't dedup. Add an append-only migration (next number, currently at 0028 → `0029`) to either set a non-null default `section` (e.g. `''`) or switch to `UNIQUE ... NULLS NOT DISTINCT` (PG 15+). Pick one and make the importer consistent with it.
- [ ] Tests in `backend/tests/` (mirror `test_seed_staging.py`): idempotency, code→course resolution, per-section row creation, dedup on re-run.

### 3. BU school + catalog linkage
- [ ] Add a real `schools` row for **Boston University** (proper `name`/`slug`).
- [ ] Link catalog `courses` to it (`school_id` is null on ~8,192 rows today).
- [ ] **Pre-check for duplicate `course_code`s** before linking — `UNIQUE (school_id, course_code)` is currently NULL-distinct, so dup codes may exist that would collide once `school_id` is set. Resolve/merge duplicates as part of this task.

### 4. API + frontend display
- [ ] Audit the offering read path and surface `section` / `instructor_name` / `meeting_times` / `location` / `syllabus_url` through the course/offering API (today `routes/academics.py` only returns `/semesters`).
- [ ] Update the relevant frontend components so a course with a Fall 2026 offering shows its section, instructor, meeting times, and location.

---

## Notes / gotchas
- `fall-2026` runs 2026-08-24 → 2027-01-03; the term row already exists, so the importer just references it by id.
- `current_term()` is date-derived — today (2026-06-26) resolves to **Summer 2026** (0 offerings). The importer should target `fall-2026` **explicitly**, not "current term".
- The knowledge graph keys on the abstract `course_id` (cumulative across terms); gradebook on `enrollment_id`; study/analytics on `offering_id`. Adding Fall 2026 offerings must not fork a course's graph identity — resolve to the existing catalog `course_id`.
- Staging and prod both currently hold only catalog data (no real user data in prod per the redesign). Run/verify on staging first.


Table	Rows	Notes
`courses`	8,195	full BU catalog; `school_id` null on ~8,192, `credits` null on all but 3
`course_offerings`	8,196	every course has exactly 1 offering; CS101 has 2 (the +1)
`terms`	4	`fall-2025`, `spring-2026`, `summer-2026`, `fall-2026` all seeded (0019)
`schools`	1	only "Sapling Demo University" (`seed-school-demo`)
`enrollments`	4	demo only

Field	Populated
`section`	0 (null on every row)
`instructor_name`	4 (demo only)
`meeting_times`	4 (demo only)
`location`	4 (demo only)
`syllabus_url`	0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ingest real Fall 2026 BU course offerings (sections, instructors, meeting times, locations) #280

Summary

Current state (measured against staging)

Tasks

1. Scraper / data spec (contract only — scrape run separately)

2. Importer + section model (core)

3. BU school + catalog linkage

4. API + frontend display

Notes / gotchas

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Ingest real Fall 2026 BU course offerings (sections, instructors, meeting times, locations) #280

Description

Summary

Current state (measured against staging)

Tasks

1. Scraper / data spec (contract only — scrape run separately)

2. Importer + section model (core)

3. BU school + catalog linkage

4. API + frontend display

Notes / gotchas

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions