Retrieval 🔍

End-to-End blazingly fast Multi-lingual Retrieval Engine, 8k context length and OpenAI compatible.

Lower Latency

Faster converting your texts compared to OpenAI endpoints with average 50 ms, lower is better.
Tested in Singapore region, single string, stress-tested on 50 requests for 30 seconds with rate of 10 spawner per second.

Better Embedding and Reranker Accuracy

Better accuracy compared to OpenAI Embedding. We benchmarked on Malaysia knowledge base, mesolitica/malaysian-embedding-leaderboard, higher is better.

Developer Playground

You can play around with Retrieval 🔍 at Dashboard

  • Scatter plot and Heatmap plot
  • Pre-sorting using Reranker
  • File Upload, support TXT or JSONL
  • Generate code for OpenAI Python, OpenAI NodeJS and CuRL

Prepaid pricing

Best either for solo or a team.

  • Natively Multilingual
  • General knowledge
  • Programming language
  • 8192 context length

Embedding Base

0.1 USD
per 1M tokens

Reranker Base

0.1 USD
per 1M tokens

Frequently asked questions

What is this retrieval engine?

Embedding and Reranker models are crucial pipelines for LLMOps to retrieve the correct knowledge base for user queries.

What is the rate limit?

Currently we hard limit 1M Tokens per Minute.

How to topup?

Just go to billing page and topup! Minimum 3 USD and Maximum 1000 USD.

Interested for Enterprise solution?

If you are interested to self-host in your virtual private network either on-premise or private cloud with custom solution, email us at or