dft
is a batteries included suite of a DataFusion applications. The batteries being several common features to modern query execution engines such as:
- Query files from S3 or HuggingFace datasets
- Support for common table formats (Deltalake, Iceberg, Hudi)
- UDFs defined in multiple languages (WASM and soon Python)
- Popular helper functions (for example for working with JSON and Parquet data)
It provides two client interfaces to the query execution engine:
- Text User Interface (TUI): An IDE for DataFusion developers and users that provides a local database experience with utilities to analyze / benchmark queries.
- Command Line Interface (CLI): Scriptable engine for executing queries from files.
And two server implementation, FlightSQL & HTTP, leveraging the same execution engine behind the TUI and CLI. This allows users to iterate and quickly develop a database then seamlessly deploy applications built on it.
dft
is inspired by datafusion-cli
, but has some differences:
dft
TUI focuses on more complete and interactive experience for users.dft
contains many built in integrations such as Delta Lake and Iceberg that are not available indatafusion-cli
.dft
provides server implementations to make it easy to deploy DataFusion based applications / backends.
Currently, the only supported packaging is on crates.io. If you already have Rust installed it can be installed by running cargo install datafusion-dft
. If rust is not installed you can download following the directions here.
The command for each of the apps are:
# TUI (enabled by default)
dft
# Execute command with CLI (enabled by default)
dft -c "SELECT 1"
# Execute SQL from file (enabled by default)
dft -f query.sql
# Start FlightSQL Server (requires `flightsql` feature)
dft serve-flight-sql
# Start HTTP Server (requires `http` feature)
dft serve-http
The CLI can also run your configured DDL prior to executing the query by adding the --run-ddl
parameter.
To have the best experience with dft
it is highly recommended to define all of your DDL in ~/.config/ddl.sql
so that any tables you wish to query are available at startup. Additionally, now that DataFusion supports CREATE VIEW
via sql you can also make a VIEW
based on these tables.
For example, your DDL file could look like the following:
CREATE EXTERNAL TABLE users STORED AS NDJSON LOCATION 's3://bucket/users';
CREATE EXTERNAL TABLE transactions STORED AS PARQUET LOCATION 's3://bucket/transactions';
CREATE EXTERNAL TABLE listings STORED AS PARQUET LOCATION 'file://folder/listings';
CREATE VIEW OR REPLACE users_listings AS SELECT * FROM users LEFT JOIN listings USING (user_id);
This would make the tables users
, transactions
, listings
, and the view users_listings
available at startup. Any of these DDL statements could also be run interactively from the SQL editor as well to create the tables.
Links to more detailed documentation for each of the apps and all of the features can be found below.