AI Co-Scientist Memory Strategy

단기, 중기, 장기 메모리 아키텍처 설계

Executive Summary

AI Co-Scientist 시스템은 과학적 가설 생성, 실험 설계, 데이터 분석을 자율적으로 수행하는 차세대 연구 도구입니다. 본 문서는 Google의 AI Co-Scientist, Titans 아키텍처, 그리고 최신 LLM Agent Memory 연구를 기반으로 약물 발견 및 생명과학 연구에 최적화된 3단계 메모리 전략을 제시합니다.

1. 메모리 아키텍처 개요

1.1 인간 인지 시스템 기반 설계 원리

최신 연구에 따르면, 효과적인 AI 에이전트 메모리 시스템은 인간의 인지 구조를 반영해야 합니다:

메모리 유형	인간 인지 대응	AI 시스템 구현	주요 기능
단기 메모리 (STM)	Working Memory	Context Window + Cache	즉시 추론, 현재 작업 처리
중기 메모리 (MTM)	Episodic Memory	External Vector Store	경험 기반 학습, 세션 연속성
장기 메모리 (LTM)	Semantic Memory	Knowledge Graph + Parametric	일반화된 지식, 도메인 전문성

1.2 Titans 아키텍처에서의 영감

Google Research의 Titans (2025)는 "surprise metric"을 사용하여 메모리를 동적으로 업데이트합니다:

Low Surprise: 예상된 정보 → 장기 저장 불필요
High Surprise: 예상치 못한 정보 → 장기 메모리에 우선 저장
Momentum: 순간적 놀라움 + 과거 맥락 흐름을 종합 고려
Forgetting (Weight Decay): 적응적 망각을 통한 메모리 용량 관리

2. 단기 메모리 (Short-Term Memory) 전략

2.1 정의 및 목적

단기 메모리는 현재 연구 세션 내에서 즉각적인 추론과 의사결정을 지원합니다.

보유 기간: 현재 세션 ~ 수 시간
용량: Context Window (128K ~ 2M tokens)

2.2 구현 전략

┌─────────────────────────────────────────────────────────┐
│                    SHORT-TERM MEMORY                     │
├─────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────┐            │
│  │ Attention Cache │    │ Working Buffer  │            │
│  │  (KV Cache)     │    │                 │            │
│  │                 │    │ • Current Query │            │
│  │ • Recent tokens │    │ • Active Tools  │            │
│  │ • Sliding window│    │ • Temp Results  │            │
│  └─────────────────┘    └─────────────────┘            │
│                                                         │
│  ┌─────────────────────────────────────────┐           │
│  │         Session Context                  │           │
│  │  • Current hypothesis under evaluation   │           │
│  │  • Active experimental parameters        │           │
│  │  • Real-time tool outputs               │           │
│  │  • User feedback and constraints        │           │
│  └─────────────────────────────────────────┘           │
└─────────────────────────────────────────────────────────┘

2.3 과학 연구 특화 요소

A. 가설 추적 버퍼

class HypothesisTracker:
    """현재 평가 중인 가설들의 상태 추적"""
    
    def __init__(self):
        self.active_hypotheses = []      # 현재 탐색 중인 가설
        self.evaluation_scores = {}       # 실시간 평가 점수
        self.reasoning_chain = []         # 추론 과정 기록
        self.tool_call_history = []       # 도구 호출 이력
        
    def update_hypothesis(self, hypothesis, evidence, score):
        """새로운 증거에 따른 가설 업데이트"""
        self.reasoning_chain.append({
            'hypothesis': hypothesis,
            'evidence': evidence,
            'score_delta': score,
            'timestamp': current_time()
        })

B. 실험 파라미터 캐시

분자 구조 (SMILES, InChI)
단백질 서열 및 구조 정보
ADMET 예측 결과
Docking score 및 binding affinity

C. 문헌 참조 스택

현재 세션에서 참조된 논문 ID (PMID, DOI)
핵심 인용문 및 데이터 포인트
출처 추적 메타데이터

2.4 STM 관리 정책

이벤트	동작	우선순위
Context overflow 임박	Summarization 트리거	Critical
새로운 고신뢰도 증거	관련 캐시 업데이트	High
무관련 정보 감지	Selective filtering	Medium
세션 종료	MTM으로 통합 전송	Standard

3. 중기 메모리 (Medium-Term Memory) 전략

3.1 정의 및 목적

중기 메모리는 에피소드 기반 학습과 연구 프로젝트의 연속성을 보장합니다.

보유 기간: 수 일 ~ 수 주
용량: 프로젝트당 10K~100K 메모리 청크

3.2 구현 전략

┌─────────────────────────────────────────────────────────┐
│                   MEDIUM-TERM MEMORY                     │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌───────────────────────────────────────────┐         │
│  │         EPISODIC MEMORY STORE             │         │
│  │  ┌─────────────────────────────────┐     │         │
│  │  │ Episode 1: Drug Repurposing Study│     │         │
│  │  │  • Context: AML treatment       │     │         │
│  │  │  • Outcome: 3/5 candidates valid │     │         │
│  │  │  • Key insight: KIRA6 inhibition│     │         │
│  │  │  • Timestamp: 2025-01-10        │     │         │
│  │  └─────────────────────────────────┘     │         │
│  │  ┌─────────────────────────────────┐     │         │
│  │  │ Episode 2: Target Discovery      │     │         │
│  │  │  • Context: Liver fibrosis      │     │         │
│  │  │  • Outcome: Epigenetic targets  │     │         │
│  │  │  • Method: Multi-agent critique │     │         │
│  │  └─────────────────────────────────┘     │         │
│  └───────────────────────────────────────────┘         │
│                                                         │
│  ┌───────────────────────────────────────────┐         │
│  │         VECTOR RETRIEVAL SYSTEM           │         │
│  │  • Embedding: domain-adapted (PubMedBERT) │         │
│  │  • Index: HNSW with metadata filtering    │         │
│  │  • Retrieval: Hybrid (dense + sparse)     │         │
│  └───────────────────────────────────────────┘         │
└─────────────────────────────────────────────────────────┘

3.3 에피소드 메모리 스키마

A-MEM (Agentic Memory, 2025) 원칙을 기반으로 한 구조화된 에피소드:

{
  "episode_id": "uuid-string",
  "episode_type": "hypothesis_generation | experiment_result | literature_review",
  "metadata": {
    "project_id": "project-uuid",
    "timestamp": "2025-01-16T09:30:00Z",
    "duration_minutes": 45,
    "agent_roles": ["Ontologist", "Scientist_1", "Critic"]
  },
  "context": {
    "research_goal": "Identify drug repurposing candidates for AML",
    "constraints": ["FDA-approved drugs only", "Clinically relevant concentrations"],
    "prior_knowledge_refs": ["PMID:12345678", "PMID:23456789"]
  },
  "content": {
    "hypothesis": "KIRA6 inhibits IRE1α kinase activity...",
    "reasoning_chain": [...],
    "evidence_used": [...],
    "tool_calls": [...]
  },
  "outcome": {
    "success": true,
    "validation_status": "lab_validated",
    "key_findings": [...],
    "failure_analysis": null
  },
  "connections": {
    "related_episodes": ["episode-uuid-1", "episode-uuid-2"],
    "semantic_tags": ["AML", "kinase_inhibitor", "drug_repurposing"],
    "knowledge_graph_links": ["entity:KIRA6", "entity:IRE1α"]
  }
}

3.4 검색 및 통합 전략

검색 점수 계산 (Generative Agents 방식)

Retrieval_Score(m, q) = α × Recency(m) + β × Importance(m) + γ × Relevance(m, q)

Where:
- Recency: 시간 기반 감쇠 함수
- Importance: 에피소드의 중요도 (성공/실패, 인용 빈도)
- Relevance: 쿼리와의 의미적 유사도

중기 → 장기 통합 (Consolidation) 트리거

조건	통합 유형	대상
패턴 반복 3회 이상	Semantic abstraction	Knowledge Graph
실험적 검증 완료	Validated knowledge	Parametric memory
6개월 이상 미참조	Archival/Pruning	Cold storage
높은 인용 빈도	Priority consolidation	Core knowledge base

4. 장기 메모리 (Long-Term Memory) 전략

4.1 정의 및 목적

장기 메모리는 도메인 전문 지식과 일반화된 연구 패턴을 영구적으로 저장합니다.

보유 기간: 영구 (with periodic updates)
용량: 수백만 ~ 수십억 지식 노드

4.2 이중 저장소 아키텍처

┌─────────────────────────────────────────────────────────┐
│                    LONG-TERM MEMORY                      │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │           KNOWLEDGE GRAPH (Explicit)             │   │
│  │                                                   │   │
│  │    [Drug A]──inhibits──[Target X]                │   │
│  │        │                    │                    │   │
│  │     treats              expressed_in            │   │
│  │        │                    │                    │   │
│  │   [Disease Y]          [Cell Type Z]            │   │
│  │                                                   │   │
│  │  • Ontology: Disease, Drug, Gene, Protein       │   │
│  │  • Relations: 50+ predicate types               │   │
│  │  • Source: PubMed, ChEMBL, UniProt, KEGG       │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │        PARAMETRIC MEMORY (Implicit)              │   │
│  │                                                   │   │
│  │  • Fine-tuned domain embeddings                 │   │
│  │  • Learned research heuristics                  │   │
│  │  • Procedural skills (tool usage patterns)      │   │
│  │  • Cross-domain reasoning patterns              │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │         SKILL LIBRARY (Procedural)               │   │
│  │                                                   │   │
│  │  • Hypothesis generation templates              │   │
│  │  • Experimental design protocols                │   │
│  │  • Data analysis workflows                      │   │
│  │  • Literature search strategies                 │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

4.3 지식 그래프 설계 (SciAgents 기반)

온톨로지 구조

# Drug Discovery Knowledge Graph Schema
entities:
  - Chemical:
      properties: [smiles, inchi, molecular_weight, logP, psa]
  - Protein:
      properties: [uniprot_id, sequence, structure_pdb, function]
  - Disease:
      properties: [mesh_id, icd10_code, symptoms, prevalence]
  - Gene:
      properties: [gene_id, symbol, chromosome, expression_profile]
  - Pathway:
      properties: [kegg_id, reactome_id, involved_genes]
  - ClinicalTrial:
      properties: [nct_id, phase, status, endpoints]

relations:
  - binds_to: {source: Chemical, target: Protein, properties: [affinity, ki, kd]}
  - inhibits: {source: Chemical, target: Protein, properties: [ic50, mechanism]}
  - treats: {source: Chemical, target: Disease, properties: [approval_status, efficacy]}
  - associated_with: {source: Gene, target: Disease, properties: [odds_ratio, p_value]}
  - part_of: {source: Gene, target: Pathway}
  - regulates: {source: Protein, target: Gene, properties: [effect, tissue]}

지식 업데이트 정책

소스	업데이트 빈도	검증 프로세스
PubMed	매일	NER + Relation Extraction + Expert Review
ChEMBL	분기별	Automatic schema mapping
ClinicalTrials.gov	주간	Status change detection
내부 실험 결과	실시간	Human-in-the-loop validation

4.4 Skill Library: 절차적 지식 저장

Voyager 및 MetaGPT 방식을 응용한 연구 스킬 라이브러리:

class ResearchSkill:
    """재사용 가능한 연구 절차 정의"""
    
    def __init__(self, name, description, prerequisites, steps):
        self.name = name
        self.description = description
        self.prerequisites = prerequisites  # 필요한 입력/도구
        self.steps = steps                  # 실행 단계
        self.success_rate = 0.0            # 과거 성공률
        self.usage_count = 0               # 사용 횟수
        
# 예시: 약물 재창출 스킬
drug_repurposing_skill = ResearchSkill(
    name="drug_repurposing_workflow",
    description="FDA 승인 약물을 새로운 적응증에 재창출하기 위한 체계적 접근법",
    prerequisites=["target_disease", "gene_expression_data", "drug_database_access"],
    steps=[
        "1. Disease-gene association analysis via GWAS/literature",
        "2. Gene-drug target mapping using ChEMBL/DrugBank",
        "3. Molecular docking simulation for binding validation",
        "4. ADMET property verification for new indication",
        "5. In-silico efficacy prediction",
        "6. Prioritization and expert review"
    ]
)

5. 통합 메모리 관리 시스템

5.1 메모리 플로우 다이어그램

                          ┌─────────────────┐
                          │   User Query    │
                          │  / Research Goal│
                          └────────┬────────┘
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                        MEMORY CONTROLLER                          │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │   ENCODE    │    │   RETRIEVE  │    │ CONSOLIDATE │          │
│  │             │    │             │    │             │          │
│  │ • Compress  │    │ • Query STM │    │ • MTM → LTM │          │
│  │ • Chunk     │    │ • Query MTM │    │ • Abstraction│         │
│  │ • Index     │    │ • Query LTM │    │ • Pruning   │          │
│  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘          │
│         │                  │                   │                  │
└─────────┼──────────────────┼───────────────────┼──────────────────┘
          │                  │                   │
          ▼                  ▼                   ▼
    ┌─────────────────────────────────────────────────┐
    │                                                 │
    │  ┌─────────┐   ┌─────────┐   ┌─────────┐      │
    │  │   STM   │◄──│   MTM   │◄──│   LTM   │      │
    │  │         │──►│         │──►│         │      │
    │  │ Context │   │Episodic │   │Knowledge│      │
    │  │ Window  │   │ Store   │   │  Graph  │      │
    │  └─────────┘   └─────────┘   └─────────┘      │
    │                                                 │
    └─────────────────────────────────────────────────┘

5.2 AgeMem 통합 관리 프레임워크

최신 AgeMem (2025) 연구를 기반으로 한 통합 메모리 관리:

class AICoScientistMemory:
    """AI Co-Scientist를 위한 통합 메모리 관리 시스템"""
    
    def __init__(self, config):
        self.stm = ShortTermMemory(
            max_tokens=config.context_window,
            attention_cache_size=config.kv_cache_size
        )
        self.mtm = MediumTermMemory(
            vector_store=config.vector_db,
            embedding_model=config.embedding_model
        )
        self.ltm = LongTermMemory(
            knowledge_graph=config.kg_endpoint,
            skill_library=config.skill_path
        )
        self.controller = MemoryController()
        
    def process_query(self, query, research_context):
        """연구 쿼리에 대한 통합 메모리 처리"""
        
        # 1. 관련 메모리 검색
        stm_context = self.stm.get_active_context()
        mtm_episodes = self.mtm.retrieve_relevant(query, k=5)
        ltm_knowledge = self.ltm.query_knowledge_graph(query)
        ltm_skills = self.ltm.get_applicable_skills(research_context)
        
        # 2. 메모리 통합 및 랭킹
        integrated_memory = self.controller.integrate(
            stm_context,
            mtm_episodes,
            ltm_knowledge,
            ltm_skills,
            query=query
        )
        
        return integrated_memory
        
    def store_experience(self, interaction_trace, outcome):
        """연구 경험 저장 및 통합"""
        
        # 1. 즉시 STM 업데이트
        self.stm.update(interaction_trace)
        
        # 2. 에피소드 생성 및 MTM 저장
        episode = self._create_episode(interaction_trace, outcome)
        self.mtm.store(episode)
        
        # 3. 통합 조건 확인 및 LTM 업데이트
        if self._should_consolidate(episode):
            self._consolidate_to_ltm(episode)
            
    def _should_consolidate(self, episode):
        """LTM 통합 여부 결정"""
        criteria = [
            episode.outcome.success,                    # 성공적 결과
            episode.validation_status == 'validated',   # 실험적 검증
            self._pattern_frequency(episode) >= 3,      # 패턴 반복
            episode.importance_score > 0.8              # 높은 중요도
        ]
        return any(criteria)

5.3 메모리 연산 도구 세트

도구	기능	사용 시점
`store(content, type)`	새 정보 저장	새로운 발견/결과 발생 시
`retrieve(query, k)`	관련 정보 검색	추론/의사결정 필요 시
`update(key, content)`	기존 정보 갱신	정보 수정/보완 필요 시
`summarize(span)`	정보 압축	Context overflow 임박 시
`forget(key)`	불필요 정보 제거	오래된/무관련 정보 정리 시
`link(entity1, entity2, relation)`	지식 연결	새로운 관계 발견 시

6. 과학 연구 도메인 특화 구현

6.1 약물 발견 파이프라인 메모리 통합

┌─────────────────────────────────────────────────────────────────┐
│              DRUG DISCOVERY MEMORY PIPELINE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: Target Discovery                                       │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ STM: Current GWAS results, expression data              │    │
│  │ MTM: Previous target validation episodes                 │    │
│  │ LTM: Disease-gene associations, pathway knowledge       │    │
│  └─────────────────────────────────────────────────────────┘    │
│                           │                                      │
│                           ▼                                      │
│  Phase 2: Hit Discovery                                          │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ STM: Docking scores, virtual screening results          │    │
│  │ MTM: Similar compound screening episodes                 │    │
│  │ LTM: Structure-activity relationships, binding modes    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                           │                                      │
│                           ▼                                      │
│  Phase 3: Lead Optimization                                      │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ STM: Current ADMET predictions, synthesis feasibility   │    │
│  │ MTM: Optimization cycles history                         │    │
│  │ LTM: ADMET rules, medicinal chemistry principles        │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

6.2 Multi-Agent 협업 메모리 공유

Google AI Co-Scientist 방식의 다중 에이전트 메모리 공유:

class MultiAgentMemoryShare:
    """다중 에이전트 간 메모리 공유 시스템"""
    
    def __init__(self):
        self.shared_memory = SharedMemoryPool()
        self.agent_private_memory = {}
        
    def setup_agents(self):
        self.agents = {
            'generation_agent': {
                'role': 'Generate initial hypotheses',
                'private_memory': HypothesisBuffer(),
                'shared_access': ['literature', 'knowledge_graph']
            },
            'reflection_agent': {
                'role': 'Evaluate and refine hypotheses',
                'private_memory': CritiqueHistory(),
                'shared_access': ['hypotheses', 'validation_criteria']
            },
            'ranking_agent': {
                'role': 'Prioritize hypotheses by feasibility',
                'private_memory': RankingMetrics(),
                'shared_access': ['hypotheses', 'experimental_constraints']
            },
            'evolution_agent': {
                'role': 'Iteratively improve top hypotheses',
                'private_memory': EvolutionTrace(),
                'shared_access': ['ranked_hypotheses', 'feedback']
            },
            'proximity_agent': {
                'role': 'Verify scientific grounding',
                'private_memory': CitationCache(),
                'shared_access': ['literature', 'hypotheses']
            },
            'meta_review_agent': {
                'role': 'Final quality assessment',
                'private_memory': ReviewCriteria(),
                'shared_access': ['all_agent_outputs']
            },
            'supervisor_agent': {
                'role': 'Orchestrate workflow',
                'private_memory': WorkflowState(),
                'shared_access': ['all']
            }
        }

7. 평가 및 성능 지표

7.1 메모리 시스템 평가 메트릭

메트릭	정의	목표값
Recall@K	관련 메모리 검색 성공률	≥ 85%
Precision@K	검색된 메모리의 관련성	≥ 75%
Latency	메모리 연산 응답 시간	< 500ms
Memory Efficiency	저장 대비 활용률	≥ 60%
Consolidation Accuracy	MTM→LTM 변환 품질	≥ 90%
Forgetting Prevention	중요 정보 보존율	≥ 95%

7.2 과학적 성과 평가

메트릭	측정 방법	벤치마크
Hypothesis Novelty	Expert scoring + Literature overlap	Top 30% novel
Hypothesis Validity	Lab validation rate	≥ 60% (3/5 like Google)
Research Efficiency	Time to validated hypothesis	≤ 2 weeks
Knowledge Accumulation	KG growth rate	+10% monthly

8. 참고 문헌 및 리소스

핵심 논문

Google Research. "Titans: Learning to Memorize at Test Time" (2025)
"A-MEM: Agentic Memory for LLM Agents" arXiv:2502.12110 (2025)
"Memory in LLM-based Multi-agent Systems: A Survey" (2025)
"SciAgents: Automating Scientific Discovery Through Multi-Agent Graph Reasoning" (2024)
Google. "Accelerating scientific breakthroughs with an AI co-scientist" (2025)
"AgeMem: Agentic Memory for LLM Agents" (2025)

도구 및 프레임워크

LangMem SDK: https://langchain-ai.github.io/langmem/
Weaviate Vector DB: https://weaviate.io/
Neo4j Knowledge Graph: https://neo4j.com/

데이터 소스

PubMed: 생명과학 문헌
ChEMBL: 화합물-표적 데이터
UniProt: 단백질 정보
KEGG/Reactome: 경로 데이터