Serengeti is a next-generation autonomous distributed database system designed for modern applications that demand high availability, scalability, and zero-configuration management. Built on the JVM, Serengeti brings enterprise-grade distributed database capabilities with unprecedented ease of deployment and maintenance.
- Zero Configuration: Deploy and forget - no complex setup or ongoing maintenance required
- Autonomous Operation: Self-organizing, self-healing distributed architecture
- Automatic Node Discovery: Nodes automatically find each other on the same subnet
- Seamless Scaling: Add or remove nodes without downtime or manual data redistribution
- Fault Tolerance: Automatic data replication and recovery from node failures
- SQL-like Query Interface: Familiar query language for easy data manipulation
- Web-based Dashboard: Intuitive interface for monitoring and management
- High Performance: Optimized storage engine and query processing
Simply start Serengeti on any number of machines on a controlled network where each machine is a member of the same subnet. Each instance will automatically connect to each other and create a distributed database.
Data is replicated across the network for redundancy, and when a new node joins, it automatically receives the existing database structure and replication information. If a node fails, the system automatically detects the failure and redistributes the data to maintain availability and redundancy.
Serengeti's architecture consists of several key components working together to provide a robust, distributed database system:
- Serengeti Core: System initialization and component lifecycle management
- Server: Handles client connections and provides web interfaces
- Query Engine: Processes and optimizes database queries
- Storage System: Manages data persistence with an LSM storage engine
- Indexing System: Provides efficient data access through B-tree indexes
- Network System: Enables communication between nodes in the distributed system
For a detailed architectural overview, see the Architectural Diagram and System Architecture documentation.
The fastest way to get started with Serengeti is using Docker:
docker pull ataiva/serengeti:latest
docker run -p 1985:1985 ataiva/serengeti:latest
Then access the dashboard at http://localhost:1985/dashboard
- Download the latest release from the releases page
- Unzip the package
- Run the application:
java -jar serengeti.jar
-
Clone the repository:
git clone https://github.com/ao/serengeti.git
-
Build with Maven:
mvn clean install
-
Run the application:
java -jar target/serengeti-<version>.jar
Where <version>
is the current version of the project.
Once Serengeti is running, you can interact with it through:
- Dashboard: Access the administrative dashboard at
http://<host>:1985/dashboard
- Interactive Console: Execute queries through the interactive console at
http://<host>:1985/interactive
- REST API: Programmatically interact with the database through the REST API
Serengeti includes comprehensive testing frameworks to ensure reliability and performance:
mvn test
For rapid feedback during development:
./run_fast_tests.sh # On Linux/Mac
run_fast_tests.bat # On Windows
Or directly with Maven:
mvn test -Pfast-tests
The StorageScheduler is a critical component responsible for data durability. Dedicated scripts are provided for testing:
# Linux/macOS
./run_storage_scheduler_tests.sh --all # Run all tests
./run_storage_scheduler_tests.sh --fast # Run only fast tests
./run_storage_scheduler_tests.sh --comprehensive # Run only comprehensive tests
# Windows
run_storage_scheduler_tests.bat --all # Run all tests
run_storage_scheduler_tests.bat --fast # Run only fast tests
run_storage_scheduler_tests.bat --comprehensive # Run only comprehensive tests
For detailed information about the testing strategy, see StorageScheduler Testing Strategy.
Serengeti includes comprehensive documentation to help you understand, use, and contribute to the project:
- Getting Started - Installation, configuration, and initial usage
- Basic Operations - Common database operations and queries
- Troubleshooting - Solutions for common issues
- System Architecture - Overview of the Serengeti system architecture
- Architectural Diagram - Visual representation of the system architecture
- Component Interactions - How components interact with each other
- Design Decisions - Key design decisions and trade-offs
- Serengeti Core - The main Serengeti class and system initialization
- Storage System - Overview of the storage system
- Write-Ahead Logging - Crash recovery using Write-Ahead Logging
- Query Engine - How the query engine processes queries
- Network Component - Network communication between nodes
- Server Component - The server component that handles client requests
- Indexing System - Overview of the indexing system
- LSM Compaction - LSM storage engine compaction process
- Contributing Guide - How to contribute to Serengeti
- Changelog - History of changes to the project
For a complete list of documentation, see the Documentation Index.
- JDK 11 or higher
- Maven 3.6+ (for building from source)
- Network environment where nodes can discover each other (same subnet)
Serengeti is ideal for:
- Microservices Architectures: Provide a distributed data layer for microservices
- Edge Computing: Deploy database capabilities at the edge with minimal configuration
- High-Availability Systems: Ensure data availability even during node failures
- Scalable Applications: Easily scale database capacity by adding nodes
- Development and Testing: Quickly spin up a distributed database for development and testing
- GitHub Issues - Report bugs or request features
- GitHub Discussions - Ask questions and discuss ideas
- Contributing - Learn how to contribute to the project
Serengeti is open-source software licensed under the LICENSE file in the repository.