Retrieval
End-to-End Multi-lingual Malaysian Retrieval Engine, 8k context length and faster.
Lower Latency
Faster converting your texts compared to OpenAI
endpoints with average 50 ms, lower is better.
Tested in Singapore region, single string, stress-tested on 50 requests for
30 seconds with
rate of 10 spawner per
second.
Better Embedding Accuracy
Better accuracy compared to OpenAI Embedding. We benchmarked on Malaysia knowledge base, mesolitica/malaysian-embedding-leaderboard, higher is better.
Improve Retrieval Recall using Reranker
Post-sorting Embedding Base using Reranker Base improve recall score, higher is better.
Playground
You can play around with Retrieval, try it at Nous App
Try the API
Embedding engine is compatible with OpenAI library, read more Nous LLM Router Documentation
Pricing
Prepaid based, natively Multi-lingual, share credits with MaLLaM π
Model name | Input / 1M tokens |
---|---|
Embedding Base | MYR 1.00 |
Reranker Base | MYR 1.00 |
Private
Self-host Retrieval in your private network for 100% privacy, either on-premise or private cloud, read more at MaLLaM π Self-hosted Enterprise
Frequently asked questions
What is this retrieval engine?
Embedding and Reranker models are crucial pipelines for LLMOps to retrieve the correct knowledge base for user queries.
What is the rate limit?
Currently we hard limit 100k Tokens per Minute.
How to topup?
Just go to billing page and topup! Minimum RM3 and Maximum RM99.