Skip to main content


JTokkit is a fast and efficient tokenizer designed for use in natural language processing tasks using the OpenAI models. It provides an easy-to-use interface for tokenizing input text, for example for counting required tokens in preparation of requests to the gpt-3.5-turbo model. This library resulted out of the need to have similar capacities in the JVM ecosystem as the library tiktoken provides for Python.


✅ Implements encoding and decoding via r50k_base, p50k_base, p50k_edit and cl100k_base

✅ Easy-to-use API

✅ Easy extensibility for custom encoding algorithms

Zero Dependencies

✅ Supports Java 8 and above

✅ Fast and efficient performance


JTokkit reaches 2-3 times the throughput of a comparable tokenizer. Take a look in the benchmarks for more details.


You can install JTokkit by adding the following dependency to your Maven project:


Or alternatively using Gradle:

dependencies {
implementation 'com.knuddels:jtokkit:1.0.0'