You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Staging currently has a catalog-only academics dataset: ~8,195 real BU courses (code / name / department / description) each paired with a single hollowcourse_offerings row. The operational layer — sections, instructors, meeting times, locations, syllabi, and correct term — exists only for the 4 hand-written seed_staging demo classes. This issue tracks ingesting real Fall 2026 (fall-2026) BU offering data so courses behave like a real registrar.
The scrape itself will be run separately; this issue covers defining the scrape contract and building the import + supporting model/API changes.
Current state (measured against staging)
Table
Rows
Notes
courses
8,195
full BU catalog; school_id null on ~8,192, credits null on all but 3
course_offerings
8,196
every course has exactly 1 offering; CS101 has 2 (the +1)
terms
4
fall-2025, spring-2026, summer-2026, fall-2026 all seeded (0019)
schools
1
only "Sapling Demo University" (seed-school-demo)
enrollments
4
demo only
Offering field population across all 8,196 rows:
Field
Populated
section
0 (null on every row)
instructor_name
4 (demo only)
meeting_times
4 (demo only)
location
4 (demo only)
syllabus_url
0
Term spread of existing offerings: spring-2026 → 8,194, fall-2025 → 2. fall-2026 has 0 offerings today.
Relevant code:
backend/db/seed_staging.py — demo seed only; never touches the real catalog.
is idempotent (deterministic id or upsert-on-UNIQUE) so re-runs add nothing.
All access via db/connection.py::table(); env-agnostic (run against staging first).
Fix the section dedup semantics.UNIQUE (course_id, term_id, section) is NULL-distinct today, so real multi-section data won't dedup. Add an append-only migration (next number, currently at 0028 → 0029) to either set a non-null default section (e.g. '') or switch to UNIQUE ... NULLS NOT DISTINCT (PG 15+). Pick one and make the importer consistent with it.
Tests in backend/tests/ (mirror test_seed_staging.py): idempotency, code→course resolution, per-section row creation, dedup on re-run.
3. BU school + catalog linkage
Add a real schools row for Boston University (proper name/slug).
Link catalog courses to it (school_id is null on ~8,192 rows today).
Pre-check for duplicate course_codes before linking — UNIQUE (school_id, course_code) is currently NULL-distinct, so dup codes may exist that would collide once school_id is set. Resolve/merge duplicates as part of this task.
4. API + frontend display
Audit the offering read path and surface section / instructor_name / meeting_times / location / syllabus_url through the course/offering API (today routes/academics.py only returns /semesters).
Update the relevant frontend components so a course with a Fall 2026 offering shows its section, instructor, meeting times, and location.
Notes / gotchas
fall-2026 runs 2026-08-24 → 2027-01-03; the term row already exists, so the importer just references it by id.
current_term() is date-derived — today (2026-06-26) resolves to Summer 2026 (0 offerings). The importer should target fall-2026explicitly, not "current term".
The knowledge graph keys on the abstract course_id (cumulative across terms); gradebook on enrollment_id; study/analytics on offering_id. Adding Fall 2026 offerings must not fork a course's graph identity — resolve to the existing catalog course_id.
Staging and prod both currently hold only catalog data (no real user data in prod per the redesign). Run/verify on staging first.
Summary
Staging currently has a catalog-only academics dataset: ~8,195 real BU courses (code / name / department / description) each paired with a single hollow
course_offeringsrow. The operational layer — sections, instructors, meeting times, locations, syllabi, and correct term — exists only for the 4 hand-writtenseed_stagingdemo classes. This issue tracks ingesting real Fall 2026 (fall-2026) BU offering data so courses behave like a real registrar.The scrape itself will be run separately; this issue covers defining the scrape contract and building the import + supporting model/API changes.
Current state (measured against staging)
coursesschool_idnull on ~8,192,creditsnull on all but 3course_offeringstermsfall-2025,spring-2026,summer-2026,fall-2026all seeded (0019)schoolsseed-school-demo)enrollmentsOffering field population across all 8,196 rows:
sectioninstructor_namemeeting_timeslocationsyllabus_urlTerm spread of existing offerings:
spring-2026→ 8,194,fall-2025→ 2.fall-2026has 0 offerings today.Relevant code:
backend/db/seed_staging.py— demo seed only; never touches the real catalog.backend/services/academics.py—resolve_offering/current_term/term_for_offering.backend/db/migrations/0020_academics_split.sql— the catalog/offering split +UNIQUE (course_id, term_id, section).backend/routes/academics.py— currently only exposesGET /semesters.backend/db/connection.py::table()(PostgREST, no DDL).Tasks
1. Scraper / data spec (contract only — scrape run separately)
course_code(must match catalog format, e.g.CAS CS 111— space-separated, uppercase)section(e.g.A1,B2)instructor_namemeeting_times(e.g.MWF 09:00)locationcourse_name,credits,description,syllabus_url(optional; used to enrich/create the catalog course if missing)course_codenormalization rules so scrape output joins cleanly tocourses.course_code.2. Importer + section model (core)
backend/db/import_offerings.py, run likeseed_stagingunder the target env) that:course_code→ existingcourses.id(create the catalog course if absent, enriching name/credits/description),course_offeringsrow per section forterm_id = 'fall-2026',section,instructor_name,meeting_times,location,syllabus_url,db/connection.py::table(); env-agnostic (run against staging first).UNIQUE (course_id, term_id, section)is NULL-distinct today, so real multi-section data won't dedup. Add an append-only migration (next number, currently at 0028 →0029) to either set a non-null defaultsection(e.g.'') or switch toUNIQUE ... NULLS NOT DISTINCT(PG 15+). Pick one and make the importer consistent with it.backend/tests/(mirrortest_seed_staging.py): idempotency, code→course resolution, per-section row creation, dedup on re-run.3. BU school + catalog linkage
schoolsrow for Boston University (propername/slug).coursesto it (school_idis null on ~8,192 rows today).course_codes before linking —UNIQUE (school_id, course_code)is currently NULL-distinct, so dup codes may exist that would collide onceschool_idis set. Resolve/merge duplicates as part of this task.4. API + frontend display
section/instructor_name/meeting_times/location/syllabus_urlthrough the course/offering API (todayroutes/academics.pyonly returns/semesters).Notes / gotchas
fall-2026runs 2026-08-24 → 2027-01-03; the term row already exists, so the importer just references it by id.current_term()is date-derived — today (2026-06-26) resolves to Summer 2026 (0 offerings). The importer should targetfall-2026explicitly, not "current term".course_id(cumulative across terms); gradebook onenrollment_id; study/analytics onoffering_id. Adding Fall 2026 offerings must not fork a course's graph identity — resolve to the existing catalogcourse_id.