Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)

פרויקט מס' 210277

12 Bids	תקציב 10,000 ₪ - 25,000 ₪	תוקף הפרויקט 47 ימים, 5 שעות, 25 דקות	טווח הצעות 135 ₪ - 350 ₪ לשעת עבודה 25,000 ₪ - 25,000 ₪ מחיר קבוע	הצעה ממוצעת 261 ₪ לשעת עבודה 25,000 ₪ מחיר קבוע

תקציב

10,000 ₪ - 25,000 ₪

תוקף הפרויקט

47 ימים, 5 שעות, 25 דקות

טווח הצעות

135 ₪ - 350 ₪ לשעת עבודה

25,000 ₪ - 25,000 ₪ מחיר קבוע

הצעה ממוצעת

261 ₪ לשעת עבודה

25,000 ₪ מחיר קבוע

שתף במייל דווח

תאריך פרסום: 11:11, 21 דצמבר, 2025

הצעות תתקבלנה עד: 11:41, 9 פברואר, 2026

Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)

I am looking for a high-level Data Architect and AI Developer to design and implement a high-performance retrieval and analysis system. The project involves managing a massive dataset of 400 million+ records in a single table, enabling both semantic search and analytical capabilities.

Scope of Work:

DB Infrastructure: Setup and optimization of a ClickHouse cluster. Implementation of efficient schema design for high-volume data.

Vector Search Implementation: Configuring ClickHouse for Vector/Semantic search (using VectorSimilarity indexes). You must ensure sub-second latency for vector queries at the specified scale.

Data Pipeline: Developing a process for generating Embeddings from raw text and ingesting them into the DB.

AI Agent Layer: Creating an intelligent agent that translates natural language queries into:

Semantic searches (Vector-based).

Analytical SQL queries (Text-to-SQL).

Synthesized responses using an LLM (RAG architecture).

Technical Stack:

Database: ClickHouse.

AI Frameworks: LangChain or LlamaIndex.

Languages: Python (for the AI/Embedding layer) or Node.js.

Models: OpenAI / Anthropic or local Embedding models (HuggingFace).

(Requirements / Qualifications)
Proven experience with ClickHouse at scale: You must understand Sharding, ReplicatedMergeTree, and memory management.

Vector DB Expertise: Deep understanding of HNSW or other vector indexing methods.

LLM Integration: Experience in building RAG (Retrieval-Augmented Generation) systems.

Performance Engineering: Ability to optimize queries that scan hundreds of millions of rows.