Releases · dagster-io/dagster

13 Jun 01:00

helloworld

0.8.1

d715435

0.8.1

Bugfix

Fixed a file descriptor leak that caused OSError: [Errno 24] Too many open files when enough
temporary files were created.
Fixed an issue where an empty config in the Playground would unexpectedly be marked as invalid
YAML.
Removed "config" deprecation warnings for dask and celery executors.

New

Improved performance of the Assets page.

Assets 2

11 Jun 23:35

mgasner

0.8.0

f053039

0.8.0 "In The Zone"

Major Changes

Please see the 080_MIGRATION.md migration guide for details on updating existing code to be
compatible with 0.8.0

Workspace, host and user process separation, and repository definition Dagit and other tools no
longer load a single repository containing user definitions such as pipelines into the same
process as the framework code. Instead, they load a "workspace" that can contain multiple
repositories sourced from a variety of different external locations (e.g., Python modules and
Python virtualenvs, with containers and source control repositories soon to come).

The repositories in a workspace are loaded into their own "user" processes distinct from the
"host" framework process. Dagit and other tools now communicate with user code over an IPC
mechanism. This architectural change has a couple of advantages:
- Dagit no longer needs to be restarted when there is an update to user code.
- Users can use repositories to organize their pipelines, but still work on all of their
  repositories using a single running Dagit.
- The Dagit process can now run in a separate Python environment from user code so pipeline
  dependencies do not need to be installed into the Dagit environment.
- Each repository can be sourced from a separate Python virtualenv, so teams can manage their
  dependencies (or even their own Python versions) separately.
We have introduced a new file format, workspace.yaml, in order to support this new architecture.
The workspace yaml encodes what repositories to load and their location, and supersedes the
repository.yaml file and associated machinery.

As a consequence, Dagster internals are now stricter about how pipelines are loaded. If you have
written scripts or tests in which a pipeline is defined and then passed across a process boundary
(e.g., using the multiprocess_executor or dagstermill), you may now need to wrap the pipeline
in the reconstructable utility function for it to be reconstructed across the process boundary.

In addition, rather than instantiate the RepositoryDefinition class directly, users should now
prefer the @repository decorator. As part of this change, the @scheduler and
@repository_partitions decorators have been removed, and their functionality subsumed under
@repository.

Dagit organization The Dagit interface has changed substantially and is now oriented around
pipelines. Within the context of each pipeline in an environment, the previous "Pipelines" and
"Solids" tabs have been collapsed into the "Definition" tab; a new "Overview" tab provides
summary information about the pipeline, its schedules, its assets, and recent runs; the previous
"Playground" tab has been moved within the context of an individual pipeline. Related runs (e.g.,
runs created by re-executing subsets of previous runs) are now grouped together in the Playground
for easy reference. Dagit also now includes more advanced support for display of scheduled runs
that may not have executed ("schedule ticks"), as well as longitudinal views over scheduled runs,
and asset-oriented views of historical pipeline runs.
Assets Assets are named materializations that can be generated by your pipeline solids, which
support specialized views in Dagit. For example, if we represent a database table with an asset
key, we can now index all of the pipelines and pipeline runs that materialize that table, and
view them in a single place. To use the asset system, you must enable an asset-aware storage such
as Postgres.
Run launchers The distinction between "starting" and "launching" a run has been effaced. All
pipeline runs instigated through Dagit now make use of the RunLauncher configured on the
Dagster instance, if one is configured. Additionally, run launchers can now support termination of
previously launched runs. If you have written your own run launcher, you may want to update it to
support termination. Note also that as of 0.7.9, the semantics of RunLauncher.launch_run have
changed; this method now takes the run_id of an existing run and should no longer attempt to
create the run in the instance.
Flexible reexecution Pipeline re-execution from Dagit is now fully flexible. You may
re-execute arbitrary subsets of a pipeline's execution steps, and the re-execution now appears
in the interface as a child run of the original execution.
Support for historical runs Snapshots of pipelines and other Dagster objects are now persisted
along with pipeline runs, so that historial runs can be loaded for review with the correct
execution plans even when pipeline code has changed. This prepares the system to be able to diff
pipeline runs and other objects against each other.
Step launchers and expanded support for PySpark on EMR and Databricks We've introduced a new
StepLauncher abstraction that uses the resource system to allow individual execution steps to
be run in separate processes (and thus on separate execution substrates). This has made extensive
improvements to our PySpark support possible, including the option to execute individual PySpark
steps on EMR using the EmrPySparkStepLauncher and on Databricks using the
DatabricksPySparkStepLauncher The emr_pyspark example demonstrates how to use a step launcher.
Clearer names What was previously known as the environment dictionary is now called the
run_config, and the previous environment_dict argument to APIs such as execute_pipeline is
now deprecated. We renamed this argument to focus attention on the configuration of the run
being launched or executed, rather than on an ambiguous "environment". We've also renamed the
config argument to all use definitions to be config_schema, which should reduce ambiguity
between the configuration schema and the value being passed in some particular case. We've also
consolidated and improved documentation of the valid types for a config schema.
Lakehouse We're pleased to introduce Lakehouse, an experimental, alternative programming model
for data applications, built on top of Dagster core. Lakehouse allows developers to define data
applications in terms of data assets, such as database tables or ML models, rather than in terms
of the computations that produce those assets. The simple_lakehouse example gives a taste of
what it's like to program in Lakehouse. We'd love feedback on whether this model is helpful!
Airflow ingest We've expanded the tooling available to teams with existing Airflow installations
that are interested in incrementally adopting Dagster. Previously, we provided only injection
tools that allowed developers to write Dagster pipelines and then compile them into Airflow DAGs
for execution. We've now added ingestion tools that allow teams to move to Dagster for execution
without having to rewrite all of their legacy pipelines in Dagster. In this approach, Airflow
DAGs are kept in their own container/environment, compiled into Dagster pipelines, and run via
the Dagster orchestrator. See the airflow_ingest example for details!

Breaking Changes

dagster
- The @scheduler and @repository_partitions decorators have been removed. Instances of
  ScheduleDefinition and PartitionSetDefinition belonging to a repository should be specified
  using the @repository decorator instead.
- Support for the Dagster solid selection DSL, previously introduced in Dagit, is now uniform
  throughout the Python codebase, with the previous solid_subset arguments (--solid-subset in
  the CLI) being replaced by solid_selection (--solid-selection). In addition to the names of
  individual solids, this argument now supports selection queries like *solid_name++ (i.e.,
  solid_name, all of its ancestors, its immediate descendants, and their immediate descendants).
- The built-in Dagster type Path has been removed.
- PartitionSetDefinition names, including those defined by a PartitionScheduleDefinition,
  must now be unique within a single repository.
- Asset keys are now sanitized for non-alphanumeric characters. All characters besides
  alphanumerics and _ are treated as path delimiters. Asset keys can also be specified using
  AssetKey, which accepts a list of strings as an explicit path. If you are running 0.7.10 or
  later and using assets, you may need to migrate your historical event log data for asset keys
  from previous runs to be attributed correctly. This event_log data migration can be invoked
  as follows:
```
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance

migrate_event_log_data(instance=DagsterInstance.get())
```
- The interface of the Scheduler base class has changed substantially. If you've written a
  custom scheduler, please get in touch!
- The partitioned schedule decorators now generate PartitionSetDefinition names using
  the schedule name, suffixed with _partitions.
- The repository property on ScheduleExecutionContext is no longer available. If you were
  using this property to pass to Scheduler instance methods, this interface has changed
  significantly. Please see the Scheduler class documentation for details.
- The CLI option --celery-base-priority is no longer available for the command:
  dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g.
  dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'
- The execute_partition_set API has been removed.
- The deprecated is_optional parameter to Field and OutputDefinition has been removed.
  Use is_required instead.
  ...

Assets 2

05 Jun 20:20

prha

0.7.16

71f1019

0.7.16

Bugfix

Enabled NoOpComputeLogManager to be configured as the compute_logs implementation in dagster.yaml
Suppressed noisy error messages in logs from skipped step

Assets 2

29 May 02:59

natekupp

0.7.15

f8fc597

0.7.15

New

Improve dagster scheduler state reconciliation.

Assets 2

22 May 01:19

catherinewu

0.7.14

0313b33

0.7.14

New

Dagit now allows re-executing arbitrary step subset via step selector syntax, regardless of whether
the previous pipeline failed or not.
Added a search filter for the root Assets page
Adds tooltip explanations for disabled run actions
The last output of the cron job command created by the scheduler is now stored in a file. A new dagster schedule logs {schedule_name} command will show the log file for a given schedule. This helps uncover errors like missing environment variables and import errors.
The dagit schedule page will now show inconsistency errors between schedule state and the cron tab that were previously only displayed by the dagster schedule debug command. As before, these errors can be resolve using dagster schedule up

Bugfix

Fixes an issue with config schema validation on Arrays
Fixes an issue with initializing K8sRunLauncher when configured via dagster.yaml
Fixes a race condition in Airflow injection logic that happens when multiple Operators try to
create PipelineRun entries simultaneously.
Fixed an issue with schedules that had invalid config not logging the appropriate error.

Assets 2

14 May 23:13

alangenfeld

0.7.13

8536c1e

0.7.13

Breaking Changes

dagster pipeline backfill command no longer takes a mode flag. Instead, it uses the mode specified on the PartitionSetDefinition. Similarly, the runs created from the backfill also use the solid_subset specified on the PartitionSetDefinition

BugFix

Fixes a bug where using solid subsets when launching pipeline runs would fail config validation.
(dagster-gcp) allow multiple "bq_solid_for_queries" solids to co-exist in a pipeline
Improve scheduler state reconciliation with dagster-cron scheduler. dagster schedule debug command will display issues related to missing crob jobs, extraneous cron jobs, and duplicate cron jobs. Running dagster schedule up will fix any issues.

New

The dagster-airflow package now supports loading Airflow dags without depending on an initialized Airflow database.
Improvements to the longitudinal partitioned schedule view, including live updates, run filtering, and better default states.
Added user warning for dagster library packages that are out of sync with the core dagster package.

Assets 2

11 May 23:17

helloworld

0.7.12

8768338

0.7.12

Bugfix

We now only render the subset of an execution plan that has actually executed, and persist that subset information along with the snapshot.
@pipeline and @composite_solid now correctly capture doc from the function they decorate.
Fixed a bug with using solid subsets in the Dagit playground

Assets 2

09 May 21:38

prha

0.7.11

2bc95b0

0.7.11

Bugfix

Fixed an issue with strict snapshot ID matching when loading historical snapshots, which caused
errors on the Runs page when viewing historical runs.
Fixed an issue where dagster_celery had introduced a spurious dependency on dagster_k8s
(#2435)
Fixed an issue where our Airflow, Celery, and Dask integrations required S3 or GCS storage and
prevented use of filesystem storage. Filesystem storage is now also permitted, to enable use of
these integrations with distributed filesystems like NFS (#2436).

Assets 2

09 May 21:38

prha

0.7.10

3215595

0.7.10

New

RepositoryDefinition now takes schedule_defs and partition_set_defs directly. The loading
scheme for these definitions via repository.yaml under the scheduler: and partitions: keys
is deprecated and expected to be removed in 0.8.0.
Mark published modules as python 3.8 compatible.
The dagster-airflow package supports loading all Airflow DAGs within a directory path, file path,
or Airflow DagBag.
The dagster-airflow package supports loading all 23 DAGs in Airflow example_dags folder and
execution of 17 of them (see: make_dagster_repo_from_airflow_example_dags).
The dagster-celery CLI tools now allow you to pass additional arguments through to the underlying
celery CLI, e.g., running dagster-celery worker start -n my-worker -- --uid=42 will pass the
--uid flag to celery.
It is now possible to create a PresetDefinition that has no environment defined.
Added dagster schedule debug command to help debug scheduler state.
The SystemCronScheduler now verifies that a cron job has been successfully been added to the
crontab when turning a schedule on, and shows an error message if unsuccessful.

Breaking Changes

A dagster instance migrate is required for this release to support the new experimental assets
view.
Runs created prior to 0.7.8 will no longer render their execution plans as DAGs. We are only
rendering execution plans that have been persisted. Logs are still available.
Path is no longer valid in config schemas. Use str or dagster.String instead.
Removed the @pyspark_solid decorator - its functionality, which was experimental, is subsumed by
requiring a StepLauncher resource (e.g. emr_pyspark_step_launcher) on the solid.

Dagit

Merged "re-execute", "single-step re-execute", "resume/retry" buttons into one "re-execute" button
with three dropdown selections on the Run page.

Experimental

Added new asset_key string parameter to Materializations and created a new “Assets” tab in Dagit
to view pipelines and runs associated with these keys. The API and UI of these asset-based are
likely to change, but feedback is welcome and will be used to inform these changes.
Added an emr_pyspark_step_launcher that enables launching PySpark solids in EMR. The
"simple_pyspark" example demonstrates how it’s used.

Bugfix

Fixed an issue when running Jupyter notebooks in a Python 2 kernel through dagstermill with dagster
running in Python 3.
Improved error messages produced when dagstermill spins up an in-notebook context.
Fixed an issue with retrieving step events from CompositeSolidResult objects.

Assets 2

09 May 21:38

prha

0.7.9

bb63761

0.7.9

Breaking Changes

If you are launching runs using DagsterInstance.launch_run, this method now takes a run id instead of an instance of PipelineRun. Additionally, DagsterInstance.create_run and DagsterInstance.create_empty_run have been replaced by DagsterInstance.get_or_create_run and DagsterInstance.create_run_for_pipeline.
If you have implemented your own RunLauncher, there are two required changes:
- RunLauncher.launch_run takes a pipeline run that has already been created. You should remove any calls to instance.create_run in this method.
- Instead of calling startPipelineExecution (defined in the dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION) in the run launcher, you should call startPipelineExecutionForCreatedRun (defined in dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION`
- Refer to the RemoteDagitRunLauncher for an example implementation.

New

Improvements to preset and solid subselection in the playground. An inline preview of the pipeline instead of a modal when doing subselection, and the correct subselection is chosen when selecting a preset.
Improvements to the log searching. Tokenization and autocompletion for searching messages types and for specific steps.
You can now view the structure of pipelines from historical runs, even if that pipeline no longer exists in the loaded repository or has changed structure.
Historical execution plans are now viewable, even if the pipeline has changed structure.
Added metadata link to raw compute logs for all StepStart events in PipelineRun view and Step view.
Improved error handling for the scheduler. If a scheduled run has config errors, the errors are persisted to the event log for the run and can be viewed in Dagit.

Bugfix

No longer manually dispose sqlalchemy engine in dagster-postgres
Made boto3 dependency in dagster-aws more flexible (#2418)
Fixed tooltip UI cleanup in partitioned schedule view

Documentation

Brand new documentation site, available at https://docs.dagster.io
The tutorial has been restructured to multiple sections, and the examples in intro_tutorial have been rearranged to separate folders to reflect this.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

0.7.11

Uh oh!

Uh oh!

Uh oh!

Releases: dagster-io/dagster

0.8.1

Uh oh!

0.8.0 "In The Zone"

Uh oh!

0.7.16

Uh oh!

0.7.15

Uh oh!

0.7.14

Uh oh!

0.7.13

Uh oh!

0.7.12

Uh oh!

0.7.11

0.7.11

Uh oh!

0.7.10

Uh oh!

0.7.9

Uh oh!