BACKEND ISSUE 11 —  [FEATURE] Implement Author Productivity vs Impact Matrix API With Project-Based Filters

## 🎯 Goal

Implement API `GET /analytics/matrix/productivity` để trả về dữ liệu điểm tọa độ cho biểu đồ **Author Productivity vs Impact Matrix**.

API này dùng để biểu thị tương quan giữa:

- **Productivity**: năng suất xuất bản của tác giả
- **Impact**: chỉ số tác động của tác giả, đại diện bằng `hIndex`

Biểu đồ sẽ giúp FE render scatter/bubble chart với:

```txt
X-axis: Yearly Output
Y-axis: H-Index
```

API cần nhận `project_id`, sau đó dựa trên scope mà project đang theo dõi như:

- `subject_area`
- `subject_category`
- `keywords`

để lọc dữ liệu bài báo trước khi tính productivity của từng tác giả.

---

## 🖼️ UI Mapping

Khối UI:

```txt
Author Productivity vs Impact Matrix
```

Chart type:

```txt
Scatter chart / Matrix chart
```

Trục tọa độ:

```txt
X-axis: Yearly Output
Y-axis: H-Index
```

Mỗi điểm trên chart đại diện cho **một tác giả**.

---

## 📡 Endpoint

```http
GET /analytics/matrix/productivity
```

---

## 📥 Query Parameters

| Param | Type | Required | Description |
|---|---:|---:|---|
| `project_id` | string / number | Yes | ID của project cần lấy dữ liệu matrix |
| `subject_area` | string | No | Lọc hẹp thêm theo subject area cụ thể |
| `keywords` | string[] / comma-separated | No | Lọc hẹp thêm theo danh sách keyword |
| `from_year` | number | No | Năm bắt đầu lọc dữ liệu |
| `to_year` | number | No | Năm kết thúc lọc dữ liệu |
| `limit` | number | No | Số lượng author points muốn lấy. Default: `50` |

---

## ✅ Example Requests

### 1. Fetch matrix points by project only

```http
GET /analytics/matrix/productivity?project_id=PROJECT_ID
```

### 2. Fetch matrix points with subject area filter

```http
GET /analytics/matrix/productivity?project_id=PROJECT_ID&subject_area=Computer Science
```

### 3. Fetch matrix points with keywords filter

```http
GET /analytics/matrix/productivity?project_id=PROJECT_ID&keywords=AI,Machine Learning,RAG
```

### 4. Fetch matrix points with year range

```http
GET /analytics/matrix/productivity?project_id=PROJECT_ID&from_year=2021&to_year=2026
```

### 5. Fetch limited matrix points

```http
GET /analytics/matrix/productivity?project_id=PROJECT_ID&limit=30
```

---

## 🧾 Response Contract

```json
{
  "code": 200,
  "message": "Fetch matrix points successfully",
  "data": [
    {
      "authorId": "auth_01",
      "yearlyOutput": 12,
      "hIndex": 35
    }
  ]
}
```

---

# 🧠 Business Logic

## 1. Project-based filtering

API bắt buộc nhận `project_id`.

Từ `project_id`, hệ thống cần lấy scope nghiên cứu của project:

- Subject area của project
- Subject categories thuộc subject area đó
- Keywords project đang theo dõi

Sau đó dùng scope này để lọc tập bài báo trước khi tính matrix points.

Ví dụ project đang theo dõi:

```json
{
  "project_id": 1,
  "subject_area": "Artificial Intelligence",
  "subject_categories": ["Computer Science Applications", "Artificial Intelligence"],
  "keywords": ["LLM", "RAG", "Machine Learning"]
}
```

API chỉ tính điểm matrix trên các tác giả có bài báo thuộc scope này.

---

## 2. Filter priority

Nếu client chỉ truyền:

```http
project_id
```

API tự động lấy toàn bộ subject area, subject categories và keywords của project để lọc.

Nếu client truyền thêm:

```http
subject_area
keywords
```

API sẽ lọc hẹp hơn trong phạm vi project.

Ví dụ:

```http
GET /analytics/matrix/productivity?project_id=1&subject_area=AI&keywords=LLM,RAG
```

Kết quả chỉ tính các tác giả có bài báo:

- Thuộc project scope
- Thuộc subject area `AI`
- Có keyword liên quan đến `LLM` hoặc `RAG`

---

## 3. Author productivity calculation

`yearlyOutput` thể hiện năng suất xuất bản của tác giả.

### Default rule

Nếu không truyền `from_year` và `to_year`:

```txt
yearlyOutput = số bài báo của author trong năm mới nhất có dữ liệu
```

### Year range rule

Nếu có `from_year` và `to_year`:

```txt
yearlyOutput = round(totalArticlesInRange / numberOfYears)
```

Trong đó:

```txt
numberOfYears = to_year - from_year + 1
```

Ví dụ:

```txt
from_year = 2021
to_year = 2023
totalArticlesInRange = 36
numberOfYears = 3

yearlyOutput = 36 / 3 = 12
```

---

## 4. H-Index calculation

`hIndex` lấy từ bảng `Author.h_index` nếu đã có sẵn.

Nếu `Author.h_index` null:

```txt
hIndex = 0
```

> Phase 1 ưu tiên dùng field `Author.h_index` có sẵn. Không cần tự tính lại H-index từ citation distribution.

---

## 5. Ranking / sorting rule

Để dữ liệu matrix dễ đọc, API nên sort theo tác giả có ảnh hưởng cao hơn trước.

Suggested sort:

```txt
ORDER BY hIndex DESC, yearlyOutput DESC
```

Sau đó apply `limit`.

Default:

```txt
limit = 50
```

Max suggested:

```txt
limit = 200
```

---

## 6. Data validation rule

Mỗi item trả về phải đảm bảo:

- `authorId` không null
- `yearlyOutput` là number
- `hIndex` là number
- Không có `NaN`
- Không có giá trị âm
- Không trả author không có bài báo trong filtered dataset

---

# 📦 Data Source

API có thể lấy dữ liệu từ các bảng hiện tại:

```txt
Author
Author_Article
Article
Keyword_Article
Topic
Sub_Topic
Subject_Category
Subject_Area
Project
Project_Keyword
```

Relevant schema flow:

```txt
Project.subject_area -> Subject_Area.subject_area_id
Subject_Category.subject_area_id -> Subject_Area.subject_area_id
Article.primary_topic -> Topic.topic_id
Topic.subject_category_id -> Subject_Category.subject_category_id
Sub_Topic.article_id -> Article.article_id
Sub_Topic.topic_id -> Topic.topic_id
Author_Article.author_id -> Author.author_id
Author_Article.article_id -> Article.article_id
Keyword_Article.article_id -> Article.article_id
Keyword_Article.keyword_id -> Keyword.keyword_id
Project_Keyword.keyword_id -> Keyword.keyword_id
```

---

# 🧮 Suggested SQL Logic

## 1. Resolve project scope

```txt
project_id
-> subject_area
-> subject_category_ids
-> keyword_ids
```

## 2. Filter related articles

Article match nếu thỏa mãn ít nhất một trong các điều kiện:

```txt
Article.primary_topic thuộc subject_category_ids
OR Article có Sub_Topic thuộc subject_category_ids
OR Article có Keyword_Article thuộc keyword_ids
```

## 3. Aggregate author metrics

Group theo author:

```txt
authorId = Author.author_id
yearlyOutput = count(distinct Article.article_id) theo rule năm
hIndex = Author.h_index
```

---

# ⚠️ Edge Cases

## 1. Project not found

Nếu `project_id` không tồn tại:

```json
{
  "code": 404,
  "message": "Project not found",
  "data": null
}
```

---

## 2. Project has no scope

Nếu project chưa có subject area hoặc keywords:

```json
{
  "code": 200,
  "message": "Fetch matrix points successfully",
  "data": []
}
```

---

## 3. Empty filtered dataset

Nếu filter xong không có bài báo phù hợp:

```json
{
  "code": 200,
  "message": "Fetch matrix points successfully",
  "data": []
}
```

---

## 4. Author has no h_index

Nếu author không có `h_index`:

```json
{
  "authorId": "auth_01",
  "yearlyOutput": 12,
  "hIndex": 0
}
```

---

## 5. Invalid year range

Nếu `from_year > to_year`:

```json
{
  "code": 400,
  "message": "Invalid year range",
  "data": null
}
```

---

## 6. Invalid limit

Nếu `limit <= 0` hoặc không phải number:

```json
{
  "code": 400,
  "message": "Invalid limit",
  "data": null
}
```

Nếu `limit > 200`, API nên cap về `200` hoặc trả lỗi tùy convention BE.

Suggested phase 1:

```txt
max limit = 200
```

---

# 🧪 Acceptance Criteria

- [ ] API nhận được `project_id`
- [ ] API tự lấy được subject area / subject categories / keywords của project
- [ ] API hỗ trợ filter thêm bằng `subject_area`
- [ ] API hỗ trợ filter thêm bằng `keywords`
- [ ] API hỗ trợ filter theo `from_year` và `to_year`
- [ ] API hỗ trợ `limit`
- [ ] Response trả về array data
- [ ] Mỗi item có đủ `authorId`, `yearlyOutput`, `hIndex`
- [ ] `authorId` không null
- [ ] `yearlyOutput` luôn là number
- [ ] `hIndex` luôn là number
- [ ] Không có `NaN` trong response
- [ ] Không có giá trị âm trong response
- [ ] Empty dataset trả về array rỗng, không làm API lỗi
- [ ] FE có thể render scatter chart trực tiếp không cần transform thêm
- [ ] Response time với dữ liệu nhỏ dưới `100ms`

---

# 🧪 Test Cases

## TC-01: Fetch matrix points by project_id only

### Request

```http
GET /analytics/matrix/productivity?project_id=1
```

### Expected

- API lấy project scope
- Filter bài báo theo project scope
- Group theo author
- Trả về `authorId`, `yearlyOutput`, `hIndex`
- Data không có null hoặc NaN

---

## TC-02: Fetch matrix points with subject_area filter

### Request

```http
GET /analytics/matrix/productivity?project_id=1&subject_area=Artificial Intelligence
```

### Expected

- Chỉ lấy dữ liệu thuộc subject area được truyền vào
- Không trả dữ liệu ngoài project scope
- Matrix points vẫn đúng contract

---

## TC-03: Fetch matrix points with keywords filter

### Request

```http
GET /analytics/matrix/productivity?project_id=1&keywords=LLM,RAG
```

### Expected

- Chỉ lấy dữ liệu có keyword match với `LLM` hoặc `RAG`
- Không có item null
- Không có yearlyOutput/hIndex null hoặc NaN

---

## TC-04: Fetch matrix points with year range

### Request

```http
GET /analytics/matrix/productivity?project_id=1&from_year=2021&to_year=2023
```

### Expected

Nếu author có 36 bài trong giai đoạn 2021-2023:

```json
{
  "authorId": "auth_01",
  "yearlyOutput": 12,
  "hIndex": 35
}
```

---

## TC-05: Author has no h_index

### Input

Author có bài báo phù hợp nhưng `h_index = null`.

### Expected

```json
{
  "authorId": "auth_01",
  "yearlyOutput": 12,
  "hIndex": 0
}
```

---

## TC-06: Project not found

### Request

```http
GET /analytics/matrix/productivity?project_id=999
```

### Expected

```json
{
  "code": 404,
  "message": "Project not found",
  "data": null
}
```

---

## TC-07: Empty filtered dataset

### Request

```http
GET /analytics/matrix/productivity?project_id=1&keywords=unknown-keyword
```

### Expected

```json
{
  "code": 200,
  "message": "Fetch matrix points successfully",
  "data": []
}
```

---

## TC-08: Invalid year range

### Request

```http
GET /analytics/matrix/productivity?project_id=1&from_year=2026&to_year=2021
```

### Expected

```json
{
  "code": 400,
  "message": "Invalid year range",
  "data": null
}
```

---

## TC-09: Invalid limit

### Request

```http
GET /analytics/matrix/productivity?project_id=1&limit=-1
```

### Expected

```json
{
  "code": 400,
  "message": "Invalid limit",
  "data": null
}
```

---

# 📌 Implementation Notes

Nên tách logic thành các phần riêng:

```txt
Controller
  -> validate query params
Service
  -> get project tracking scope
  -> filter related articles
  -> aggregate author productivity
  -> attach author h_index
  -> normalize response
  -> build response contract
```

Gợi ý function:

```ts
getProjectTrackingScope(projectId)
buildArticleScopeFilter(scope, queryParams)
getAuthorProductivityMetrics(filters)
calculateYearlyOutput(totalArticles, fromYear, toYear)
normalizeAuthorMatrixPoints(items)
buildProductivityMatrixResponse(points)
```

---

# 📦 Suggested Mock Data

```ts
const mockAuthorProductivityMatrix = [
  {
    authorId: "auth_01",
    yearlyOutput: 12,
    hIndex: 35
  },
  {
    authorId: "auth_02",
    yearlyOutput: 8,
    hIndex: 28
  },
  {
    authorId: "auth_03",
    yearlyOutput: 15,
    hIndex: 42
  }
];
```

---

# 🚀 Future Improvements

- Add author display name for tooltip
- Add author avatar URL
- Add total citation count
- Add bubble size based on citation count
- Add quadrants classification:
  - High Productivity / High Impact
  - High Productivity / Low Impact
  - Low Productivity / High Impact
  - Low Productivity / Low Impact
- Add `metric_type=h_index|citation_count|impact_score`
- Add Redis caching:

```txt
analytics:matrix:productivity:{project_id}:{subject_area}:{keywords}:{from_year}:{to_year}:{limit}
```

- Optimize query with indexes on:
  - `project_id`
  - `publication_year`
  - `author_id`
  - `article_id`
  - `keyword_id`
  - `subject_category_id`


Param	Type	Required	Description
`project_id`	string / number	Yes	ID của project cần lấy dữ liệu matrix
`subject_area`	string	No	Lọc hẹp thêm theo subject area cụ thể
`keywords`	string[] / comma-separated	No	Lọc hẹp thêm theo danh sách keyword
`from_year`	number	No	Năm bắt đầu lọc dữ liệu
`to_year`	number	No	Năm kết thúc lọc dữ liệu
`limit`	number	No	Số lượng author points muốn lấy. Default: `50`

BACKEND ISSUE 11 — [FEATURE] Implement Author Productivity vs Impact Matrix API With Project-Based Filters #27

Description

🎯 Goal

🖼️ UI Mapping

📡 Endpoint

📥 Query Parameters

✅ Example Requests

1. Fetch matrix points by project only

2. Fetch matrix points with subject area filter

3. Fetch matrix points with keywords filter

4. Fetch matrix points with year range

5. Fetch limited matrix points

🧾 Response Contract

🧠 Business Logic

1. Project-based filtering

2. Filter priority

3. Author productivity calculation

Default rule

Year range rule

4. H-Index calculation

5. Ranking / sorting rule

6. Data validation rule

📦 Data Source

🧮 Suggested SQL Logic

1. Resolve project scope

2. Filter related articles

3. Aggregate author metrics

⚠️ Edge Cases

1. Project not found

2. Project has no scope

3. Empty filtered dataset

4. Author has no h_index

5. Invalid year range

6. Invalid limit

🧪 Acceptance Criteria

🧪 Test Cases

TC-01: Fetch matrix points by project_id only

Request

Expected

TC-02: Fetch matrix points with subject_area filter

Request

Expected

TC-03: Fetch matrix points with keywords filter

Request

Expected

TC-04: Fetch matrix points with year range

Request

Expected

TC-05: Author has no h_index

Input

Expected

TC-06: Project not found

Request

Expected

TC-07: Empty filtered dataset

Request

Expected

TC-08: Invalid year range

Request

Expected

TC-09: Invalid limit

Request

Expected

📌 Implementation Notes

📦 Suggested Mock Data

🚀 Future Improvements

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions