Skip to main content

Introduction

JTokkit is a fast and efficient tokenizer designed for use in natural language processing tasks using the OpenAI models. It provides an easy-to-use interface for tokenizing input text, for example for counting required tokens in preparation of requests to the gpt-3.5-turbo model. This library resulted out of the need to have similar capacities in the JVM ecosystem as the library tiktoken provides for Python.

Features

✅ Implements encoding and decoding via r50k_base, p50k_base, p50k_edit and cl100k_base

✅ Easy-to-use API

✅ Easy extensibility for custom encoding algorithms

Zero Dependencies

✅ Supports Java 8 and above

✅ Fast and efficient performance

Performance

JTokkit reaches 2-3 times the throughput of a comparable tokenizer. Take a look in the benchmarks for more details.

Installation

You can install JTokkit by adding the following dependency to your Maven project:

<dependency>
<groupId>com.knuddels</groupId>
<artifactId>jtokkit</artifactId>
<version>1.0.0</version>
</dependency>

Or alternatively using Gradle:

dependencies {
implementation 'com.knuddels:jtokkit:1.0.0'
}