/ˈar.ti.feks/, [ˈärt̪ɪfɛks̠] 1. artist, actor 2. author, maker 3. craftsman 4. master of an art 5. mastermind
Data Artifex is a Python-based open-source ecosystem for elevating data into API-powered machine-actionable knowledge.
This project is in an early incubation phase.
Our vision is to:
- Foster the creation of comprehensive data documentation that is equally accessible to humans and machines
- Facilite the rapid publication of data and associated metadata through APIs
- Unleash data-driven machine intelligence
- Reduce time spent data wrangling
- Support the adoption of standards and best practices
- Enable natural language-driven data management
This will broadly impact and modernize how we publish, discover, access, and utilize data.
A simple view of the workflow we aim to automate is illustrated below:
To achieve this, we are building, in collaboration with data custodians, research communities, developers, and other stakeholders, a collection of open-source packages powered by metadata standards, knowledge graphs, intelligent agents, and APIs.
The way too common practice of publishing data as downloadable files or in traditional databases, with little documentation and no APIs, is a flawed approach. It leads to users spending the majority of their time data wrangling and prevents machines from understanding or taking intelligent actions on the data.
We aim to address this by building open-source tools promoting metadata/API-first data management practices.
In such an environment:
- Metadata (digital documentation) always exists and surrounds the data, unlocking machine intelligence
- Users interact with the data through intuitive interfaces or using natural languages
- Applications, developers, and data scientists interact with APIs
- The data and metadata are managed in the back-end by agents
We're looking towards a future where managing data is as easy as talking to a computer in everyday language. This means data custodians and non-technical users won't have to worry about the complexities of implementing APIs and metadata challenges.
Note that our focus is on High-Value Datasets (HVDs), which have substantial potential to benefit society, contribute to humanitarian efforts, and address global challenges (socio-economic, health, environment, AI, etc.). The complexities surrounding such data increase the importance and need for user and machine-friendly data and APIs. This approach does not preclude using the tools with other kinds of datasets.
Our technical approach is not to reinvent the wheel but to fill gaps and provide new ways to work, essentially enabling best practices advocated by data custodians and research communities and empowering computer systems with data intelligence.
We envision our open-source ecosystem as a collection of small specialized tools that can be used in isolation but, most importantly, can come together in a well-orchestrated manner to automate the data to API workflow and facilitate the creation and maintenance of metadata.
These will work hand in hand with existing data technologies, such as databases and API frameworks, as well as harness recent developments in artificial intelligence.
Standards and best practices are central to our strategy. We are actively involved in the CODATA Cross-Domain Interoperability Framework, focusing on creating guidelines for domain-agnostic standards supporting the implementation of interoperability and reusability of FAIR data.
Guided by the FAIR principles and the W3C Data on the Web Best Practices, our tools make use of specifications such as the Data Documentation Initiative (DDI), schema.org, Data Catalog Vocabulary (DCAT), Research Object Crates (RO-Crates), and Open Digital Rights Language (ODRL).
On the information technology side, we build upon LinkML, OpenAPI,GraphQL, JSON Schema, and semantic web standards.
The project is made possible with the technical and financial support of Postman Open Technologies.



For more information, contact Pascal Heus ([email protected]).