Data Artifex

/ˈar.ti.feks/, [ˈärt̪ɪfɛks̠] 1. artist, actor 2. author, maker 3. craftsman 4. master of an art 5. mastermind

Data Artifex is a Python-based open-source ecosystem for elevating data into API-powered machine-actionable knowledge.

This project is in an early incubation phase.

Our vision is to:

Foster the creation of comprehensive data documentation that is equally accessible to humans and machines
Facilite the rapid publication of data and associated metadata through APIs
Unleash data-driven machine intelligence
Reduce time spent data wrangling
Support the adoption of standards and best practices
Enable natural language-driven data management

This will broadly impact and modernize how we publish, discover, access, and utilize data.

A simple view of the workflow we aim to automate is illustrated below:

To achieve this, we are building, in collaboration with data custodians, research communities, developers, and other stakeholders, a collection of open-source packages powered by metadata standards, knowledge graphs, intelligent agents, and APIs.

Overview

The way too common practice of publishing data as downloadable files or in traditional databases, with little documentation and no APIs, is a flawed approach. It leads to users spending the majority of their time data wrangling and prevents machines from understanding or taking intelligent actions on the data.

We aim to address this by building open-source tools promoting metadata/API-first data management practices.

In such an environment:

Metadata (digital documentation) always exists and surrounds the data, unlocking machine intelligence
Users interact with the data through intuitive interfaces or using natural languages
Applications, developers, and data scientists interact with APIs
The data and metadata are managed in the back-end by agents

We're looking towards a future where managing data is as easy as talking to a computer in everyday language. This means data custodians and non-technical users won't have to worry about the complexities of implementing APIs and metadata challenges.

Note that our focus is on High-Value Datasets (HVDs), which have substantial potential to benefit society, contribute to humanitarian efforts, and address global challenges (socio-economic, health, environment, AI, etc.). The complexities surrounding such data increase the importance and need for user and machine-friendly data and APIs. This approach does not preclude using the tools with other kinds of datasets.

Implementation strategy

Our technical approach is not to reinvent the wheel but to fill gaps and provide new ways to work, essentially enabling best practices advocated by data custodians and research communities and empowering computer systems with data intelligence.

We envision our open-source ecosystem as a collection of small specialized tools that can be used in isolation but, most importantly, can come together in a well-orchestrated manner to automate the data to API workflow and facilitate the creation and maintenance of metadata.

These will work hand in hand with existing data technologies, such as databases and API frameworks, as well as harness recent developments in artificial intelligence.

Standards

Standards and best practices are central to our strategy. We are actively involved in the CODATA Cross-Domain Interoperability Framework, focusing on creating guidelines for domain-agnostic standards supporting the implementation of interoperability and reusability of FAIR data.

Guided by the FAIR principles and the W3C Data on the Web Best Practices, our tools make use of specifications such as the Data Documentation Initiative (DDI), schema.org, Data Catalog Vocabulary (DCAT), Research Object Crates (RO-Crates), and Open Digital Rights Language (ODRL).

On the information technology side, we build upon LinkML, OpenAPI,GraphQL, JSON Schema, and semantic web standards.

Acknowledgments

The project is made possible with the technical and financial support of Postman Open Technologies.

Strategic partners

Technology partners

Contact

For more information, contact Pascal Heus ([email protected]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Artifex

Overview

Implementation strategy

Standards

Acknowledgments

Strategic partners

Technology partners

Contact

Popular repositories Loading

Repositories

People

Top languages

Most used topics