Add university scraper system with backend, frontend, and configs

- Add src/university_scraper module with scraper, analyzer, and CLI - Add backend FastAPI service with API endpoints and database models - Add frontend React app with university management pages - Add configs for Harvard, Manchester, and UCL universities - Add artifacts with various scraper implementations - Add Docker compose configuration for deployment - Update .gitignore to exclude generated files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 15:25:08 +08:00
parent 2714c8ad5c
commit 426cf4d2cd
75 changed files with 13527 additions and 2 deletions
--- a/对话总结.txt
+++ b/对话总结.txt
@ -0,0 +1,32 @@
+项目进度与存在的问题：
+📊 项目进度总结
+✅ 已完成的功能
+Web系统架构
+FastAPI 后端 + React + TypeScript + Ant Design 前端
+SQLite 数据库存储大学、脚本、爬取结果
+动态脚本执行机制（通过 exec() 注入模块）
+曼彻斯特大学爬虫
+成功爬取 297 个硕士项目
+创建了完整的学院配置 manchester_complete_scraper.py
+实现了 3 种提取方法：table、links、research_explorer
+项目按关键词自动分配到 13 个学院
+已修复的问题
+Windows asyncio 事件循环策略
+exec 命名空间问题（函数互相调用）
+硕士项目过滤逻辑（排除本科/博士）
+⚠️ 当前存在的问题
+问题	影响	原因
+网络超时	11/12 学院页面加载失败	网络不稳定或页面响应慢
+Research Explorer 页面	大量学院使用此系统	JavaScript 渲染慢，60秒超时不够
+导师数据不完整	仅获取 78 名导师（AMBS）	其他学院页面无法访问
+📈 数据统计
+指标	数量
+硕士项目总数	297
+学院分类数	13
+成功获取导师的学院	1/13
+导师总数	78
+🔧 建议的改进方向
+增加超时时间 - 对 Research Explorer 页面增加到 90-120 秒
+添加重试机制 - 失败后自动重试 2-3 次
+使用备选 URL - 为每个学院配置多个可能的 staff 页面
+分批爬取 - 将学院分批处理，避免同时请求过多