Purpose
Long-running tracker for the TechAPI dataset rebuild across brand, cpu, gpu, smartphone, soc, tablet, watch, and pda.
This issue intentionally stays open while bulk imports, validation hardening, public dump refreshes, and manual verification work continue. PRs may use Closes #1 so GitHub Development links the PR back to this tracker. Repository auto-close is disabled for this workflow, so merging linked PRs should not finish this issue.
Current Status
Latest Data Snapshot
| Category |
Total |
Verified |
Unverified |
Missing verified |
Verified % |
| brand |
189 |
0 |
60 |
129 |
0.0% |
| soc |
2,079 |
58 |
2,021 |
0 |
2.8% |
| smartphone |
42,051 |
184 |
41,867 |
0 |
0.4% |
| tablet |
1,171 |
0 |
1,171 |
0 |
0.0% |
| watch |
176 |
0 |
176 |
0 |
0.0% |
| pda |
110 |
0 |
110 |
0 |
0.0% |
| gpu |
2,030 |
0 |
2,030 |
0 |
0.0% |
| cpu |
3,977 |
976 |
3,001 |
0 |
24.5% |
| all |
51,783 |
1,218 |
50,436 |
129 |
2.4% |
Recent PR History
| PR |
Status |
Main change |
| #36 |
Open |
Import Phones 2024 smartphones, tablets, watches, and refresh public dump |
| #35 |
Merged |
Import Global Smartphone Database variants, add SoC stubs, normalize duplicate brand slugs, and refresh public dump |
| #34 |
Merged |
Import GSMArena Kaggle smartphones, tablets, watches, and refresh public dump |
| #33 |
Merged |
Import PhoneDB/Kaggle smartphone variants, tablets, SoC stubs, and public dump refresh |
| #32 |
Merged |
Add tablet/watch/PDA/API/site category support and prior mobile dump refresh |
| #25 |
Merged |
Add 5,000 PhoneDB raw smartphone variants plus 45 Mobiles 2025 records |
| #24 |
Merged |
Add smartphone and SoC records, improve PR metadata and project automation |
| #23 |
Merged |
Import a larger smartphone batch |
| #22 |
Merged |
Add smartphone and SoC records from Kaggle-derived sources |
| #17 |
Merged |
Expand GPU imports and public data refresh |
| #16 |
Merged |
Expand CPU imports and public data refresh |
Sources Currently Used
Validation Policy
Every data PR should include TechEngineBot comments for:
- Changed data summary: added, modified, deleted, verified/unverified source counts, and examples
- Validation stats: category totals, verified coverage, warning callouts, and key validation output
- Checks:
python -m app.validate, python integrity_check.py TechAPI/data --strict, and site build when site files change
- Heuristic review: naming, typo-like patterns, duplicate-looking fields, and data-quality warnings
Low verified coverage is allowed for bulk import PRs, but should be called out as a follow-up warning instead of failing validation.
Remaining Work
- Continue large unverified imports where source coverage is useful
- Rebase
data/import-staging before each push and keep commits split by source, brand, era, or category
- Backfill manual verification for imported smartphone, tablet, watch, PDA, GPU, CPU, and SoC records
- Add or repair brand
verified flags so the brand category no longer has missing verification metadata
- Dedupe or collapse raw mobile variants where a source creates excessive regional/storage duplicates
- Improve source attribution and audit notes for records imported from broad datasets
- Keep public
v1/index.json and category dumps refreshed after each data batch
Operational Notes
- Assignees: @Seungpyo1007 and @TechEngineBot
- Labels:
data, enhancement
- Milestone: Massive dataset rebuild (1989-2026)
- Projects: TechEngine work and TechAPI-Project
- Priority: High for bulk data PRs
- Start date: 2026-06-20
- Target date: 2026-09-30
TechEngineBot should add or update a tracking comment on this issue whenever a linked data PR is opened or synchronized.
Latest linked PR: #36
Purpose
Long-running tracker for the TechAPI dataset rebuild across
brand,cpu,gpu,smartphone,soc,tablet,watch, andpda.This issue intentionally stays open while bulk imports, validation hardening, public dump refreshes, and manual verification work continue. PRs may use
Closes #1so GitHub Development links the PR back to this tracker. Repository auto-close is disabled for this workflow, so merging linked PRs should not finish this issue.Current Status
data/import-stagingverified: falseuntil manual audit or TechEngine verification confirms recordsLatest Data Snapshot
Recent PR History
Sources Currently Used
Validation Policy
Every data PR should include TechEngineBot comments for:
python -m app.validate,python integrity_check.py TechAPI/data --strict, and site build when site files changeLow verified coverage is allowed for bulk import PRs, but should be called out as a follow-up warning instead of failing validation.
Remaining Work
data/import-stagingbefore each push and keep commits split by source, brand, era, or categoryverifiedflags so thebrandcategory no longer has missing verification metadatav1/index.jsonand category dumps refreshed after each data batchOperational Notes
data,enhancementTechEngineBot should add or update a tracking comment on this issue whenever a linked data PR is opened or synchronized.
Latest linked PR: #36