How to Integrate OpenAI API with Pinecone for Long-Term AI Memory

The Problem

ChatGPT and similar AI models have a critical limitation: they can’t remember conversations beyond their context window, and they lack access to your proprietary data. Every new chat session starts from scratch, forcing users to repeatedly provide context. For production AI applications—whether customer support bots, research assistants, or personalized recommendation systems—this amnesia is unacceptable. You need your AI to remember past interactions, learn from user preferences, and retrieve relevant information from massive datasets that far exceed token limits. Pinecone solves this by providing a vector database that stores embeddings (numerical representations of text) and enables lightning-fast semantic search. This integration creates persistent memory for your AI, allowing it to retrieve contextually relevant information from millions of past conversations or documents in milliseconds.

Tech Stack & Prerequisites

  • Node.js v20+ and npm/pnpm
  • TypeScript 5+ (recommended for type safety)
  • OpenAI API Key (from platform.openai.com)
  • Pinecone Account (free tier available at pinecone.io)
  • openai npm package v4.0+
  • @pinecone-database/pinecone v2.0+
  • dotenv for environment variables
  • Basic understanding of embeddings and vector similarity

Step-by-Step Implementation

Step 1: Setup

Initialize your project and install dependencies:

bash
mkdir openai-pinecone-memory
cd openai-pinecone-memory
npm init -y
npm install openai @pinecone-database/pinecone dotenv
npm install -D typescript @types/node tsx

Initialize TypeScript:

bash
npx tsc --init

Update tsconfig.json:

json
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "commonjs",
    "lib": ["ES2022"],
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "moduleResolution": "node",
    "resolveJsonModule": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules"]
}

Create project structure:

bash
mkdir src
touch src/index.ts src/config.ts src/pinecone-client.ts src/openai-client.ts src/memory.ts

Step 2: Configuration

Create a .env file for secure environment variables:

bash
# .env
OPENAI_API_KEY=sk-proj-your_openai_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX_NAME=ai-memory

Important: Add .env to your .gitignore:

bash
echo ".env" >> .gitignore
echo "node_modules/" >> .gitignore
echo "dist/" >> .gitignore

Create a config file in src/config.ts:

typescript
// src/config.ts
import dotenv from 'dotenv';

dotenv.config();

export const config = {
  openai: {
    apiKey: process.env.OPENAI_API_KEY || '',
  },
  pinecone: {
    apiKey: process.env.PINECONE_API_KEY || '',
    environment: process.env.PINECONE_ENVIRONMENT || '',
    indexName: process.env.PINECONE_INDEX_NAME || 'ai-memory',
  },
} as const;

// Validate required environment variables
const requiredEnvVars = [
  'OPENAI_API_KEY',
  'PINECONE_API_KEY',
  'PINECONE_ENVIRONMENT',
];

for (const envVar of requiredEnvVars) {
  if (!process.env[envVar]) {
    throw new Error(`Missing required environment variable: ${envVar}`);
  }
}

Step 3: Core Logic

3.1: Initialize Pinecone Client

Create src/pinecone-client.ts:

typescript
// src/pinecone-client.ts
import { Pinecone, RecordMetadata } from '@pinecone-database/pinecone';
import { config } from './config';

export class PineconeClient {
  private client: Pinecone;
  private indexName: string;

  constructor() {
    this.client = new Pinecone({
      apiKey: config.pinecone.apiKey,
    });
    this.indexName = config.pinecone.indexName;
  }

  /**
   * Initialize Pinecone index with proper dimensions for OpenAI embeddings
   * text-embedding-3-small: 1536 dimensions
   * text-embedding-3-large: 3072 dimensions
   */
  async initializeIndex(dimension: number = 1536): Promise<void> {
    try {
      const indexList = await this.client.listIndexes();
      const indexExists = indexList.indexes?.some(
        (index) => index.name === this.indexName
      );

      if (!indexExists) {
        console.log(`Creating index: ${this.indexName}`);
        await this.client.createIndex({
          name: this.indexName,
          dimension: dimension,
          metric: 'cosine', // cosine similarity for semantic search
          spec: {
            serverless: {
              cloud: 'aws',
              region: 'us-east-1',
            },
          },
        });
        
        // Wait for index to be ready
        await this.waitForIndexReady();
        console.log('Index created successfully');
      } else {
        console.log(`Index ${this.indexName} already exists`);
      }
    } catch (error) {
      console.error('Error initializing index:', error);
      throw error;
    }
  }

  /**
   * Wait for index to be ready (can take 1-2 minutes for new indexes)
   */
  private async waitForIndexReady(): Promise<void> {
    let isReady = false;
    let attempts = 0;
    const maxAttempts = 60;

    while (!isReady && attempts < maxAttempts) {
      const indexDescription = await this.client.describeIndex(this.indexName);
      isReady = indexDescription.status?.ready ?? false;
      
      if (!isReady) {
        console.log('Waiting for index to be ready...');
        await new Promise((resolve) => setTimeout(resolve, 5000)); // Wait 5 seconds
        attempts++;
      }
    }

    if (!isReady) {
      throw new Error('Index failed to become ready in time');
    }
  }

  /**
   * Upsert vectors into Pinecone
   */
  async upsertVectors(
    vectors: Array<{
      id: string;
      values: number[];
      metadata?: RecordMetadata;
    }>
  ): Promise<void> {
    try {
      const index = this.client.index(this.indexName);
      await index.upsert(vectors);
      console.log(`Upserted ${vectors.length} vectors`);
    } catch (error) {
      console.error('Error upserting vectors:', error);
      throw error;
    }
  }

  /**
   * Query vectors for semantic search
   */
  async queryVectors(
    queryVector: number[],
    topK: number = 5,
    filter?: Record<string, any>
  ) {
    try {
      const index = this.client.index(this.indexName);
      const queryResponse = await index.query({
        vector: queryVector,
        topK,
        includeMetadata: true,
        filter,
      });

      return queryResponse.matches;
    } catch (error) {
      console.error('Error querying vectors:', error);
      throw error;
    }
  }

  /**
   * Delete vectors by ID
   */
  async deleteVectors(ids: string[]): Promise<void> {
    try {
      const index = this.client.index(this.indexName);
      await index.deleteMany(ids);
      console.log(`Deleted ${ids.length} vectors`);
    } catch (error) {
      console.error('Error deleting vectors:', error);
      throw error;
    }
  }

  /**
   * Delete all vectors (use with caution!)
   */
  async deleteAllVectors(): Promise<void> {
    try {
      const index = this.client.index(this.indexName);
      await index.deleteAll();
      console.log('Deleted all vectors from index');
    } catch (error) {
      console.error('Error deleting all vectors:', error);
      throw error;
    }
  }
}

3.2: Initialize OpenAI Client

Create src/openai-client.ts:

typescript
// src/openai-client.ts
import OpenAI from 'openai';
import { config } from './config';

export class OpenAIClient {
  private client: OpenAI;

  constructor() {
    this.client = new OpenAI({
      apiKey: config.openai.apiKey,
    });
  }

  /**
   * Generate embeddings for text using OpenAI's embedding model
   * text-embedding-3-small: Fast, cost-effective, 1536 dimensions
   * text-embedding-3-large: Higher quality, 3072 dimensions (more expensive)
   */
  async generateEmbedding(
    text: string,
    model: 'text-embedding-3-small' | 'text-embedding-3-large' = 'text-embedding-3-small'
  ): Promise<number[]> {
    try {
      const response = await this.client.embeddings.create({
        model,
        input: text,
        encoding_format: 'float',
      });

      return response.data[0].embedding;
    } catch (error) {
      console.error('Error generating embedding:', error);
      throw error;
    }
  }

  /**
   * Generate embeddings for multiple texts in batch (more efficient)
   */
  async generateEmbeddings(
    texts: string[],
    model: 'text-embedding-3-small' | 'text-embedding-3-large' = 'text-embedding-3-small'
  ): Promise<number[][]> {
    try {
      const response = await this.client.embeddings.create({
        model,
        input: texts,
        encoding_format: 'float',
      });

      return response.data.map((item) => item.embedding);
    } catch (error) {
      console.error('Error generating embeddings:', error);
      throw error;
    }
  }

  /**
   * Generate chat completion with context from vector search
   */
  async generateChatCompletion(
    messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>,
    model: string = 'gpt-4o-mini',
    temperature: number = 0.7
  ): Promise<string> {
    try {
      const response = await this.client.chat.completions.create({
        model,
        messages,
        temperature,
        max_tokens: 1000,
      });

      return response.choices[0].message.content || '';
    } catch (error) {
      console.error('Error generating chat completion:', error);
      throw error;
    }
  }

  /**
   * Generate streaming chat completion
   */
  async *generateStreamingCompletion(
    messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>,
    model: string = 'gpt-4o-mini'
  ): AsyncGenerator<string, void, unknown> {
    try {
      const stream = await this.client.chat.completions.create({
        model,
        messages,
        stream: true,
      });

      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        if (content) {
          yield content;
        }
      }
    } catch (error) {
      console.error('Error generating streaming completion:', error);
      throw error;
    }
  }
}

3.3: Create Memory Management System

Create src/memory.ts:

typescript
// src/memory.ts
import { PineconeClient } from './pinecone-client';
import { OpenAIClient } from './openai-client';
import { RecordMetadata } from '@pinecone-database/pinecone';

export interface MemoryEntry {
  id: string;
  text: string;
  metadata: {
    userId?: string;
    conversationId?: string;
    timestamp: number;
    type: 'user' | 'assistant' | 'document';
    [key: string]: any;
  };
}

export class MemoryManager {
  private pinecone: PineconeClient;
  private openai: OpenAIClient;

  constructor() {
    this.pinecone = new PineconeClient();
    this.openai = new OpenAIClient();
  }

  /**
   * Initialize the memory system
   */
  async initialize(): Promise<void> {
    await this.pinecone.initializeIndex(1536); // text-embedding-3-small dimensions
  }

  /**
   * Store a conversation message in long-term memory
   */
  async storeMemory(entry: MemoryEntry): Promise<void> {
    try {
      // Generate embedding for the text
      const embedding = await this.openai.generateEmbedding(entry.text);

      // Store in Pinecone
      await this.pinecone.upsertVectors([
        {
          id: entry.id,
          values: embedding,
          metadata: entry.metadata as RecordMetadata,
        },
      ]);

      console.log(`Stored memory: ${entry.id}`);
    } catch (error) {
      console.error('Error storing memory:', error);
      throw error;
    }
  }

  /**
   * Store multiple memories in batch (more efficient)
   */
  async storeMemories(entries: MemoryEntry[]): Promise<void> {
    try {
      // Generate embeddings for all texts in batch
      const texts = entries.map((e) => e.text);
      const embeddings = await this.openai.generateEmbeddings(texts);

      // Prepare vectors for Pinecone
      const vectors = entries.map((entry, index) => ({
        id: entry.id,
        values: embeddings[index],
        metadata: entry.metadata as RecordMetadata,
      }));

      // Store in Pinecone
      await this.pinecone.upsertVectors(vectors);

      console.log(`Stored ${entries.length} memories`);
    } catch (error) {
      console.error('Error storing memories:', error);
      throw error;
    }
  }

  /**
   * Retrieve relevant memories based on semantic similarity
   */
  async retrieveRelevantMemories(
    query: string,
    topK: number = 5,
    filter?: Record<string, any>
  ): Promise<Array<{ id: string; score: number; text?: string; metadata?: any }>> {
    try {
      // Generate embedding for query
      const queryEmbedding = await this.openai.generateEmbedding(query);

      // Search Pinecone
      const matches = await this.pinecone.queryVectors(
        queryEmbedding,
        topK,
        filter
      );

      // Format results
      return matches.map((match) => ({
        id: match.id,
        score: match.score || 0,
        text: match.metadata?.text as string | undefined,
        metadata: match.metadata,
      }));
    } catch (error) {
      console.error('Error retrieving memories:', error);
      throw error;
    }
  }

  /**
   * Generate AI response with context from long-term memory
   */
  async generateResponseWithMemory(
    userMessage: string,
    userId?: string,
    conversationId?: string
  ): Promise<string> {
    try {
      // Retrieve relevant memories
      const filter: Record<string, any> = {};
      if (userId) filter.userId = userId;
      if (conversationId) filter.conversationId = conversationId;

      const relevantMemories = await this.retrieveRelevantMemories(
        userMessage,
        5,
        Object.keys(filter).length > 0 ? filter : undefined
      );

      // Build context from memories
      const context = relevantMemories
        .filter((m) => m.score > 0.7) // Only use highly relevant memories
        .map((m) => m.metadata?.text || '')
        .join('\n\n');

      // Create messages array with context
      const messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }> = [
        {
          role: 'system',
          content: `You are a helpful AI assistant with access to conversation history and relevant context. Use the following context from past conversations to inform your response, but don't explicitly mention that you're using past conversation data unless relevant.

Context from past conversations:
${context || 'No relevant past context found.'}`,
        },
        {
          role: 'user',
          content: userMessage,
        },
      ];

      // Generate response
      const response = await this.openai.generateChatCompletion(messages);

      // Store this interaction in memory
      const timestamp = Date.now();
      await this.storeMemories([
        {
          id: `${conversationId || 'default'}_user_${timestamp}`,
          text: userMessage,
          metadata: {
            userId,
            conversationId,
            timestamp,
            type: 'user',
            text: userMessage,
          },
        },
        {
          id: `${conversationId || 'default'}_assistant_${timestamp}`,
          text: response,
          metadata: {
            userId,
            conversationId,
            timestamp,
            type: 'assistant',
            text: response,
          },
        },
      ]);

      return response;
    } catch (error) {
      console.error('Error generating response with memory:', error);
      throw error;
    }
  }

  /**
   * Delete memories for a specific user or conversation
   */
  async deleteMemories(ids: string[]): Promise<void> {
    await this.pinecone.deleteVectors(ids);
  }

  /**
   * Clear all memories (use with caution!)
   */
  async clearAllMemories(): Promise<void> {
    await this.pinecone.deleteAllVectors();
  }
}

3.4: Create Main Application

Create src/index.ts:

typescript
// src/index.ts
import { MemoryManager } from './memory';

async function main() {
  try {
    console.log('Initializing AI Memory System...\n');

    // Initialize memory manager
    const memory = new MemoryManager();
    await memory.initialize();

    console.log('Memory system initialized!\n');

    // Example 1: Store some initial knowledge
    console.log('--- Example 1: Storing Initial Knowledge ---');
    await memory.storeMemories([
      {
        id: 'doc_1',
        text: 'The user prefers Python for backend development and uses FastAPI framework.',
        metadata: {
          userId: 'user_123',
          timestamp: Date.now(),
          type: 'document',
          category: 'preferences',
        },
      },
      {
        id: 'doc_2',
        text: 'The user is working on a machine learning project involving natural language processing.',
        metadata: {
          userId: 'user_123',
          timestamp: Date.now(),
          type: 'document',
          category: 'projects',
        },
      },
      {
        id: 'doc_3',
        text: 'The user has experience with vector databases and has used Pinecone before.',
        metadata: {
          userId: 'user_123',
          timestamp: Date.now(),
          type: 'document',
          category: 'experience',
        },
      },
    ]);

    console.log('Initial knowledge stored.\n');

    // Example 2: Have a conversation with memory
    console.log('--- Example 2: Conversation with Memory ---');

    const query1 = 'What backend framework should I use for my API?';
    console.log(`User: ${query1}`);
    
    const response1 = await memory.generateResponseWithMemory(
      query1,
      'user_123',
      'conv_001'
    );
    console.log(`Assistant: ${response1}\n`);

    // Second query - AI should remember context
    const query2 = 'Can you help me integrate it with my ML project?';
    console.log(`User: ${query2}`);
    
    const response2 = await memory.generateResponseWithMemory(
      query2,
      'user_123',
      'conv_001'
    );
    console.log(`Assistant: ${response2}\n`);

    // Example 3: Retrieve relevant memories
    console.log('--- Example 3: Manual Memory Retrieval ---');
    const searchQuery = 'Tell me about my technical background';
    console.log(`Searching for: "${searchQuery}"`);
    
    const memories = await memory.retrieveRelevantMemories(
      searchQuery,
      3,
      { userId: 'user_123' }
    );

    console.log('\nRelevant memories found:');
    memories.forEach((mem, idx) => {
      console.log(`${idx + 1}. [Score: ${mem.score.toFixed(3)}] ${mem.metadata?.text}`);
    });

  } catch (error) {
    console.error('Error in main:', error);
    process.exit(1);
  }
}

main();

Step 4: Testing

4.1: Run the Application

Add a script to package.json:

json
{
  "scripts": {
    "dev": "tsx src/index.ts",
    "build": "tsc",
    "start": "node dist/index.js"
  }
}

Run the application:

bash
npm run dev
```

Expected output:
```
Initializing AI Memory System...
Index ai-memory already exists
Memory system initialized!

--- Example 1: Storing Initial Knowledge ---
Stored 3 memories
Initial knowledge stored.

--- Example 2: Conversation with Memory ---
User: What backend framework should I use for my API?
Assistant: Based on your preferences, I'd recommend using FastAPI for your API...

User: Can you help me integrate it with my ML project?
Assistant: Absolutely! Since you're working on an NLP project...

4.2: Test Semantic Search Quality

Create a test script src/test-search.ts:

typescript
// src/test-search.ts
import { MemoryManager } from './memory';

async function testSemanticSearch() {
  const memory = new MemoryManager();
  await memory.initialize();

  // Store diverse content
  await memory.storeMemories([
    {
      id: 'fact_1',
      text: 'Paris is the capital of France and known for the Eiffel Tower.',
      metadata: { type: 'document', timestamp: Date.now() },
    },
    {
      id: 'fact_2',
      text: 'Machine learning involves training algorithms on data to make predictions.',
      metadata: { type: 'document', timestamp: Date.now() },
    },
    {
      id: 'fact_3',
      text: 'The Eiffel Tower was built in 1889 and stands 330 meters tall.',
      metadata: { type: 'document', timestamp: Date.now() },
    },
    {
      id: 'fact_4',
      text: 'Neural networks are a subset of machine learning inspired by the human brain.',
      metadata: { type: 'document', timestamp: Date.now() },
    },
  ]);

  // Test semantic search
  const queries = [
    'Tell me about landmarks in France',
    'How does AI learn from data?',
  ];

  for (const query of queries) {
    console.log(`\nQuery: "${query}"`);
    const results = await memory.retrieveRelevantMemories(query, 2);
    
    results.forEach((result, idx) => {
      console.log(`  ${idx + 1}. [${result.score.toFixed(3)}] ${result.metadata?.text}`);
    });
  }
}

testSemanticSearch();

Run: npx tsx src/test-search.ts

4.3: Test Conversation Continuity

Create src/test-conversation.ts:

typescript
// src/test-conversation.ts
import { MemoryManager } from './memory';

async function testConversation() {
  const memory = new MemoryManager();
  await memory.initialize();

  const userId = 'test_user';
  const conversationId = 'test_conv';

  // Simulate a multi-turn conversation
  const turns = [
    'My name is Alice and I love hiking.',
    'What outdoor activities would you recommend?',
    'What was my name again?', // Test if AI remembers
  ];

  for (const turn of turns) {
    console.log(`\nUser: ${turn}`);
    const response = await memory.generateResponseWithMemory(
      turn,
      userId,
      conversationId
    );
    console.log(`Assistant: ${response}`);
    
    // Small delay between turns
    await new Promise((resolve) => setTimeout(resolve, 1000));
  }
}

testConversation();

4.4: Monitor Pinecone Dashboard

Log into your Pinecone account and navigate to your index. You should see:

  • Vector count increasing as you store memories
  • Query metrics showing search performance
  • Index dimensions (1536 for text-embedding-3-small)

Common Errors & Troubleshooting

1. Error: “Index not found” or “Index not ready”

Cause: Trying to query a Pinecone index that’s still being created or doesn’t exist yet.

Fix: Always wait for index readiness and handle errors gracefully:

typescript
async function safeQueryIndex(retries = 3): Promise<any> {
  for (let i = 0; i < retries; i++) {
    try {
      return await memory.retrieveRelevantMemories(query);
    } catch (error: any) {
      if (error.message.includes('not found') || error.message.includes('not ready')) {
        console.log(`Waiting for index... (attempt ${i + 1}/${retries})`);
        await new Promise(resolve => setTimeout(resolve, 10000));
      } else {
        throw error;
      }
    }
  }
  throw new Error('Index failed to become available');
}

2. Error: “Dimension mismatch” (e.g., expected 1536 but got 3072)

Cause: Using different OpenAI embedding models with incompatible dimensions. For example, creating an index with 1536 dimensions but then using text-embedding-3-large which outputs 3072 dimensions.

Fix: Always use consistent embedding models and dimensions:

typescript
// At initialization
const EMBEDDING_MODEL = 'text-embedding-3-small' as const;
const EMBEDDING_DIMENSION = 1536;

// Use everywhere
await pinecone.initializeIndex(EMBEDDING_DIMENSION);
await openai.generateEmbedding(text, EMBEDDING_MODEL);

// Or create separate indexes for different models
const indexName = EMBEDDING_MODEL === 'text-embedding-3-small' 
  ? 'ai-memory-small' 
  : 'ai-memory-large';

3. Error: Rate limits exceeded or high costs

Cause: Making too many embedding API calls or querying too frequently. OpenAI charges per token for embeddings, and costs can add up quickly with large-scale operations.

Fix: Implement batching and caching strategies:

typescript
// Batch embeddings (up to 2048 inputs per request)
const BATCH_SIZE = 100;
const batches = [];
for (let i = 0; i < texts.length; i += BATCH_SIZE) {
  batches.push(texts.slice(i, i + BATCH_SIZE));
}

for (const batch of batches) {
  const embeddings = await openai.generateEmbeddings(batch);
  // Process batch...
  await new Promise(resolve => setTimeout(resolve, 1000)); // Rate limit delay
}

// Cache embeddings to avoid regenerating
const embeddingCache = new Map<string, number[]>();

async function getCachedEmbedding(text: string): Promise<number[]> {
  if (embeddingCache.has(text)) {
    return embeddingCache.get(text)!;
  }
  const embedding = await openai.generateEmbedding(text);
  embeddingCache.set(text, embedding);
  return embedding;
}

Security Checklist

  • Never commit API keys to version control – always use .env files and add them to .gitignore
  • Use environment-specific keys – separate development, staging, and production API keys
  • Implement rate limiting on user-facing endpoints to prevent API abuse and cost overruns
  • Validate and sanitize user input before generating embeddings to prevent prompt injection attacks
  • Set metadata access controls in Pinecone using namespaces to isolate user data
  • Rotate API keys regularly and monitor usage in OpenAI and Pinecone dashboards
  • Implement user authentication before allowing access to memory retrieval (prevent data leaks between users)
  • Use HTTPS only for all API communications in production
  • Monitor embedding costs by setting up billing alerts in your OpenAI account
  • Implement data retention policies to automatically delete old memories and comply with privacy regulations (GDPR, CCPA)
  • Encrypt sensitive metadata before storing in Pinecone if it contains PII
  • Log security events such as failed authentication attempts, unusual query patterns, or excessive API usage

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top