Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)
פרויקט מס' 210277
Job Statistics
| 12 Bids |
תקציב
10,000 ₪ - 25,000 ₪
|
תוקף הפרויקט
47
ימים,
5
שעות,
25
דקות
|
טווח הצעות
135
₪
-
350
₪
לשעת עבודה
25,000
₪
-
25,000
₪
מחיר קבוע
|
הצעה ממוצעת
261
₪
לשעת עבודה
25,000
₪
מחיר קבוע
|
Job Info And Actions
תאריך פרסום:
11:11, 21 דצמבר, 2025
הצעות תתקבלנה עד:
11:41, 9 פברואר, 2026
Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)
I am looking for a high-level Data Architect and AI Developer to design and implement a high-performance retrieval and analysis system. The project involves managing a massive dataset of 400 million+ records in a single table, enabling both semantic search and analytical capabilities.
Scope of Work:
DB Infrastructure: Setup and optimization of a ClickHouse cluster. Implementation of efficient schema design for high-volume data.
Vector Search Implementation: Configuring ClickHouse for Vector/Semantic search (using VectorSimilarity indexes). You must ensure sub-second latency for vector queries at the specified scale.
Data Pipeline: Developing a process for generating Embeddings from raw text and ingesting them into the DB.
AI Agent Layer: Creating an intelligent agent that translates natural language queries into:
Semantic searches (Vector-based).
Analytical SQL queries (Text-to-SQL).
Synthesized responses using an LLM (RAG architecture).
Technical Stack:
Database: ClickHouse.
AI Frameworks: LangChain or LlamaIndex.
Languages: Python (for the AI/Embedding layer) or Node.js.
Models: OpenAI / Anthropic or local Embedding models (HuggingFace).
(Requirements / Qualifications)
Proven experience with ClickHouse at scale: You must understand Sharding, ReplicatedMergeTree, and memory management.
Vector DB Expertise: Deep understanding of HNSW or other vector indexing methods.
LLM Integration: Experience in building RAG (Retrieval-Augmented Generation) systems.
Performance Engineering: Ability to optimize queries that scan hundreds of millions of rows.
Scope of Work:
DB Infrastructure: Setup and optimization of a ClickHouse cluster. Implementation of efficient schema design for high-volume data.
Vector Search Implementation: Configuring ClickHouse for Vector/Semantic search (using VectorSimilarity indexes). You must ensure sub-second latency for vector queries at the specified scale.
Data Pipeline: Developing a process for generating Embeddings from raw text and ingesting them into the DB.
AI Agent Layer: Creating an intelligent agent that translates natural language queries into:
Semantic searches (Vector-based).
Analytical SQL queries (Text-to-SQL).
Synthesized responses using an LLM (RAG architecture).
Technical Stack:
Database: ClickHouse.
AI Frameworks: LangChain or LlamaIndex.
Languages: Python (for the AI/Embedding layer) or Node.js.
Models: OpenAI / Anthropic or local Embedding models (HuggingFace).
(Requirements / Qualifications)
Proven experience with ClickHouse at scale: You must understand Sharding, ReplicatedMergeTree, and memory management.
Vector DB Expertise: Deep understanding of HNSW or other vector indexing methods.
LLM Integration: Experience in building RAG (Retrieval-Augmented Generation) systems.
Performance Engineering: Ability to optimize queries that scan hundreds of millions of rows.
תחומי הפרויקט
קבצי הפרויקט
אנא היכנס לאתר לקבלת גישה לקבצי הפרויקט
הלקוח
אנא
היכנס לאתר
לקבלת גישה ללקוח
עדכונים
אנא היכנס לאתר לקבלת גישה לעדכונים בפרויקט
Private Bid
הצעה פרטית
|
0 פרויקטים
|
|
Private Bid
הצעה פרטית
|
0 פרויקטים
|
|
Private Bid
הצעה פרטית
|
16 פרויקטים
|
|
Private Bid
הצעה פרטית
|
2 פרויקטים
|
|
Private Bid
הצעה פרטית
|
4 פרויקטים
|
|