Add Harvard test example to README

- Add detailed test results table with script paths
- Include Harvard test example with commands and sample output
- List covered Harvard schools

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
yangxiaoyu-crypto
2025-12-10 15:43:09 +08:00
parent a4dca81216
commit 2714c8ad5c

View File

@ -118,11 +118,52 @@ uv run university-agent generate \
## 测试过的大学
| 大学 | 状态 | 备注 |
|------|------|------|
| Harvard | ✅ | 找到 277 链接 |
| RWTH Aachen | ✅ | 找到 108 链接 |
| KAUST | ✅ | 需使用 Firefox,网站较慢 |
| 大学 | 状态 | 结果 | 生成的脚本 |
|------|------|------|-----------|
| Harvard | ✅ | 277 链接 (8 项目, 269 教职, 265 个人主页) | `artifacts/harvard_faculty_scraper.py` |
| RWTH Aachen | ✅ | 108 链接 (103 项目, 5 教职) | `artifacts/rwth_aachen_playwright_scraper.py` |
| KAUST | ✅ | 9 链接 (需使用 Firefox) | `artifacts/kaust_faculty_scraper.py` |
### Harvard 测试示例
**生成爬虫脚本:**
```bash
uv run python generate_scraper.py --url "https://www.harvard.edu/" --name "Harvard"
```
**运行爬虫:**
```bash
cd artifacts
uv run python harvard_faculty_scraper.py --max-pages 30 --no-verify
```
**结果输出** (`artifacts/university-scraper_results.json`)
```json
{
"statistics": {
"total_links": 277,
"program_links": 8,
"faculty_links": 269,
"profile_pages": 265
},
"program_links": [
{"url": "https://www.harvard.edu/programs/?degree_levels=graduate", "text": "Graduate Programs"},
...
],
"faculty_links": [
{"url": "https://www.gse.harvard.edu/directory/faculty", "text": "Faculty Directory"},
{"url": "https://faculty.harvard.edu", "text": "Harvard Faculty"},
...
]
}
```
爬取覆盖了 Harvard 的多个学院:
- Graduate School of Design (GSD)
- Graduate School of Education (GSE)
- Faculty of Arts and Sciences (FAS)
- Graduate School of Arts and Sciences (GSAS)
- Harvard Divinity School (HDS)
## 故障排除