Wombat - Syntax Highlighting with Rust's Bat Called from Crystal
Introduction Hello! Have you heard of the command-line tool bat, written in Rust? bat is a command-line tool similar to cat that displays file contents in the terminal, but with additional features like line numbering, syntax highlighting, and paging. bat hello.rb On the other hand, Crystal currently lacks a powerful syntax highlighting library. So, I thought about using bat as a library to solve this problem. Bat Can Also Be Used as a Rust Library In fact, Bat can also be used as a Rust library. This is possible through the PrettyPrinter struct. use bat::PrettyPrinter; PrettyPrinter::new() .input_from_bytes(b"Hello world!\n") .language("html") .print() .unwrap(); Bat uses a library called Syntect for syntax highlighting. However, Syntect is quite complex, so I thought it would be easier to use Bat directly as a library. From the code above, you can see that the output is handled by the Rust side, specifically to the terminal. Originally, Bat did not have a function to simply syntax highlight a string. In the open-source world, agility and the willingness to get hands-on are essential. So, after consulting with ChatGPT, I added a print_with_writer function to PrettyPrinter. pub fn print_with_writer(&mut self, writer: Option) -> Result This addition allows syntax highlighting of strings, as shown below: use bat::PrettyPrinter; fn main() { let mut output_str = String::new(); PrettyPrinter::new() .input_from_bytes(b"Hello world!\n") .language("html") .print_with_writer(Some(&mut output_str)) .unwrap(); println!("{}", output_str); } I submitted a pull request, and it was successfully merged. Starting from bat v0.25.0, this print_with_writer function is now available. Creating a C Library to Call Bat from C Rust libraries cannot be called directly from Crystal. Therefore, I decided to create a lightweight wrapper library that allows Bat to be called from the C programming language. This makes it easy to use Bat not only from Crystal but also from various other languages. This is because many programming languages provide interfaces for calling C libraries. https://github.com/kojix2/bat-c Since I cannot read or write Rust, almost all of the code was generated with the help of ChatGPT and Copilot. I expect there will be more opportunities to create lightweight C wrappers for Rust libraries in the future, so I’ve noted down some of the things I learned. Rust is considered a low-level programming language, but C has an even lower level of abstraction compared to Rust. This gives more flexibility when designing the API for calling functions. For example, when calling a low-level C library from a high-level language like Python, the method signatures are uniquely defined. Think of how bindings are generated using libffi. The uniqueness of method signatures is what makes libffi bindings possible. Of course, after that, you would design a high-level API that aligns with object-oriented principles, but at the calling level, method signatures are strictly defined. However, calling Rust from C means calling a high-level library from a low-level language. This is similar to calling Python from C—since the level of abstraction decreases, the C-side interface is not strictly defined. This gives the developer some freedom in how to design the API. (To be precise, Python does have a C API, but imagine a scenario where that is abstracted away.) Therefore, even when using ChatGPT, it's important to carefully consider what kind of API you want to design and clearly specify that to the AI. Added a function to display the version. This allows users to identify which version of bat-c they are using. In this implementation, the version is stored as a constant in static memory for the program's entire lifetime, and a pointer to it is returned. #[no_mangle] pub extern "C" fn bat_c_version() -> *const c_char { static VERSION: &str = concat!(env!("CARGO_PKG_VERSION"), "\0"); VERSION.as_ptr() as *const c_char } Memory allocation and deallocation for strings can easily become problematic. If the Rust library allocates memory for a string, it must also provide a function to free that memory on the Rust side. Cargo.toml Configuration Specify the library types under [lib]. Both of the following are set: cdylib to generate a dynamic library. staticlib to generate a static library. rpath = true allows the dynamic library to be located using a relative path. [profile.release] LTO (Link Time Optimization): Enabled to optimize and speed up the binary during linking. codegen-units = 1: Sets the number of code generation units to 1 to maximize optimization. debug and strip are set to potentially reduce file size, though the impact may be minimal. Considered using opt-level = 3 or opt-level = "z" but left it as is for balance. [package] name = "bat-c" ver
Introduction
Hello!
Have you heard of the command-line tool bat, written in Rust?
bat is a command-line tool similar to cat that displays file contents in the terminal, but with additional features like line numbering, syntax highlighting, and paging.
bat hello.rb
On the other hand, Crystal currently lacks a powerful syntax highlighting library.
So, I thought about using bat as a library to solve this problem.
Bat Can Also Be Used as a Rust Library
In fact, Bat can also be used as a Rust library. This is possible through the PrettyPrinter
struct.
use bat::PrettyPrinter;
PrettyPrinter::new()
.input_from_bytes(b"\"color: #ff00cc\">Hello world!\n")
.language("html")
.print()
.unwrap();
Bat uses a library called Syntect for syntax highlighting. However, Syntect is quite complex, so I thought it would be easier to use Bat directly as a library.
From the code above, you can see that the output is handled by the Rust side, specifically to the terminal. Originally, Bat did not have a function to simply syntax highlight a string.
In the open-source world, agility and the willingness to get hands-on are essential. So, after consulting with ChatGPT, I added a print_with_writer
function to PrettyPrinter.
pub fn print_with_writer<W: Write>(&mut self, writer: Option<W>) -> Result<bool>
This addition allows syntax highlighting of strings, as shown below:
use bat::PrettyPrinter;
fn main() {
let mut output_str = String::new();
PrettyPrinter::new()
.input_from_bytes(b"\"color: #ff00cc\">Hello world!\n")
.language("html")
.print_with_writer(Some(&mut output_str))
.unwrap();
println!("{}", output_str);
}
I submitted a pull request, and it was successfully merged. Starting from bat v0.25.0, this print_with_writer
function is now available.
Creating a C Library to Call Bat from C
Rust libraries cannot be called directly from Crystal. Therefore, I decided to create a lightweight wrapper library that allows Bat to be called from the C programming language. This makes it easy to use Bat not only from Crystal but also from various other languages. This is because many programming languages provide interfaces for calling C libraries.
https://github.com/kojix2/bat-c
Since I cannot read or write Rust, almost all of the code was generated with the help of ChatGPT and Copilot.
I expect there will be more opportunities to create lightweight C wrappers for Rust libraries in the future, so I’ve noted down some of the things I learned.
Rust is considered a low-level programming language, but C has an even lower level of abstraction compared to Rust. This gives more flexibility when designing the API for calling functions.
For example, when calling a low-level C library from a high-level language like Python, the method signatures are uniquely defined. Think of how bindings are generated using libffi. The uniqueness of method signatures is what makes libffi bindings possible. Of course, after that, you would design a high-level API that aligns with object-oriented principles, but at the calling level, method signatures are strictly defined.
However, calling Rust from C means calling a high-level library from a low-level language. This is similar to calling Python from C—since the level of abstraction decreases, the C-side interface is not strictly defined. This gives the developer some freedom in how to design the API. (To be precise, Python does have a C API, but imagine a scenario where that is abstracted away.)
Therefore, even when using ChatGPT, it's important to carefully consider what kind of API you want to design and clearly specify that to the AI.
- Added a function to display the version. This allows users to identify which version of bat-c they are using. In this implementation, the version is stored as a constant in static memory for the program's entire lifetime, and a pointer to it is returned.
#[no_mangle]
pub extern "C" fn bat_c_version() -> *const c_char {
static VERSION: &str = concat!(env!("CARGO_PKG_VERSION"), "\0");
VERSION.as_ptr() as *const c_char
}
Memory allocation and deallocation for strings can easily become problematic. If the Rust library allocates memory for a string, it must also provide a function to free that memory on the Rust side.
-
Cargo.toml Configuration
- Specify the library types under
[lib]
. Both of the following are set: -
cdylib
to generate a dynamic library. -
staticlib
to generate a static library. -
rpath = true
allows the dynamic library to be located using a relative path. - [profile.release]
- LTO (Link Time Optimization): Enabled to optimize and speed up the binary during linking.
- codegen-units = 1: Sets the number of code generation units to 1 to maximize optimization.
- debug and strip are set to potentially reduce file size, though the impact may be minimal.
- Considered using
opt-level = 3
oropt-level = "z"
but left it as is for balance.
- Specify the library types under
[package]
name = "bat-c"
version = "0.0.7"
edition = "2021"
[lib]
crate-type = ["cdylib", "staticlib"]
[dependencies]
bat = "0.25.0"
[profile.dev]
rpath = true
[profile.release]
lto = "fat"
codegen-units = 1
rpath = true
debug = false
strip = true
- To automate library version updates, I introduced Renovate. Although I'm not very familiar with it, adding the following JSON file to the repository enables it to work:
renovate.json
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"extends": [
"config:best-practices",
"schedule:quarterly"
]
}
- I configured the workflow to automatically trigger a release when a Git tag is created.
- Since this is strictly a C library, I decided not to run
cargo publish
.
Calling bat-c from Crystal
At this point, calling bat-c from Crystal becomes straightforward. For this purpose, I created a library called wombat.
https://github.com/kojix2/wombat
The main challenge here is downloading and placing the library.
If bat-c were a well-developed and widely used library, packaging it for installation via a package manager would be an option. However, that is not the case this time. Therefore, I decided to simply allow the latest version of the library to be downloaded from the GitHub Releases page.
Although both static and dynamic libraries are available, I chose to use the static library. After all, Rust makes it easy to create static libraries, and unlike Ruby or Python, Crystal can directly integrate static libraries, which is a significant advantage.
In Ruby, you could freely write custom tasks in a Rakefile, but Crystal doesn’t offer that level of flexibility. The closest mechanism is shards' post_install
hook. So, I configured it to trigger a script that downloads the static library.
I could have written Crystal code to handle the download, but unfortunately, Crystal's standard library is still limited and often struggles with redirects or proxy environments. Therefore, I created a shell script using curl for Unix-like systems and a batch file for Windows, allowing the appropriate script to run depending on the OS.
How to Use
Sample Code
require "../src/wombat"
# Output the file content with syntax highlighting by calling the Rust function
Wombat.pretty_print_file(__FILE__)
# Output the highlighted string of the input by calling the Rust function
Wombat.pretty_print(input: %{fn main() { println!("Hello, world!"); }}, language: "Rust")
# Get the highlighted string of the input
puts Wombat.pretty_string(%{puts "Hello, World!"}, language: "Crystal", theme: "TwoDark")
Running in GitHub Actions
For more details, please refer to the API Documentation.
Current Issues and Areas for Improvement
Although I was able to accomplish most of what I set out to do, there are a few concerns that remain:
- The size of the generated library is quite large.
- Ideally, the design should have considered C compatibility from the start at the Rust level. However, since the C wrapper was added afterward, the internal structure might not be as efficient.
- Due to my limited knowledge of Rust and C, the API design might not be as refined as it could be.
That said, the main goal—to call Bat as a library from Crystal with minimal maintenance—has been mostly achieved. Additionally, by establishing a method for calling Rust from Crystal, I've opened up new possibilities for future projects.
Of course, since this is a hobby project developed by an individual, there may be inconveniences or bugs. If you notice any issues, I would greatly appreciate it if you could submit an issue or a pull request.
That’s all for this article. Have a great day!
Original post in Japanese on Qiita - Wombat - RustのBatをCrystalから呼び出しシンタックスハイライティングする
Translated into English by ChatGPT.