Skip to content

spark-operator: Remove docs associated with sparkctl #4089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions content/en/docs/components/spark-operator/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Dependencies will be automatically downloaded locally to `bin` directory as need
To see the full list of available targets, run the following command:

```bash
$ make help
$ make help

Usage:
make <target>
Expand All @@ -66,16 +66,14 @@ Development
go-clean Clean up caches and output.
go-fmt Run go fmt against code.
go-vet Run go vet against code.
lint Run golangci-lint linter.
lint-fix Run golangci-lint linter and perform fixes.
go-lint Run golangci-lint linter.
go-lint-fix Run golangci-lint linter and perform fixes.
unit-test Run unit tests.
e2e-test Run the e2e tests against a Kind k8s instance that is spun up.

Build
build-operator Build Spark operator.
build-sparkctl Build sparkctl binary.
install-sparkctl Install sparkctl binary.
clean Clean spark-operator and sparkctl binaries.
clean Clean binaries.
build-api-docs Build api documentation.
docker-build Build docker image with the operator.
docker-push Push docker image with the operator.
Expand All @@ -90,11 +88,11 @@ Helm
Deployment
kind-create-cluster Create a kind cluster for integration tests.
kind-load-image Load the image into the kind cluster.
kind-delete-custer Delete the created kind cluster.
kind-delete-cluster Delete the created kind cluster.
install-crd Install CRDs into the K8s cluster specified in ~/.kube/config.
uninstall-crd Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
deploy Deploy controller to the K8s cluster specified in ~/.kube/config.
undeploy Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
undeploy Uninstall spark-operator

Dependencies
kustomize Download kustomize locally if necessary.
Expand Down
9 changes: 1 addition & 8 deletions content/en/docs/components/spark-operator/overview/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ The Kubernetes Operator for Apache Spark currently supports the following list o
- Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.
- Supports automatic application restart with a configurable restart policy.
- Supports automatic retries of failed submissions with optional linear back-off.
- Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via `sparkctl`.
- Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via `sparkctl`.
- Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.

## Architecture
Expand All @@ -37,15 +35,14 @@ The operator consists of:
- a *submission runner* that runs `spark-submit` for submissions received from the controller,
- a *Spark pod monitor* that watches for Spark pods and sends pod status updates to the controller,
- a [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) that handles customizations for Spark driver and executor pods based on the annotations on the pods added by the controller,
- and also a command-line tool named `sparkctl` for working with the operator.

The following diagram shows how different components interact and work together.

<img src="architecture-diagram.png"
alt="Spark Operator Architecture Diagram"
class="mt-3 mb-3 border rounded">

Specifically, a user uses the `sparkctl` (or `kubectl`) to create a `SparkApplication` object. The `SparkApplication` controller receives the object through a watcher from the API server, creates a submission carrying the `spark-submit` arguments, and sends the submission to the *submission runner*. The submission runner submits the application to run and creates the driver pod of the application. Upon starting, the driver pod creates the executor pods. While the application is running, the *Spark pod monitor* watches the pods of the application and sends status updates of the pods back to the controller, which then updates the status of the application accordingly.
Specifically, a user uses the `kubectl` to create a `SparkApplication` object. The `SparkApplication` controller receives the object through a watcher from the API server, creates a submission carrying the `spark-submit` arguments, and sends the submission to the *submission runner*. The submission runner submits the application to run and creates the driver pod of the application. Upon starting, the driver pod creates the executor pods. While the application is running, the *Spark pod monitor* watches the pods of the application and sends status updates of the pods back to the controller, which then updates the status of the application accordingly.

## The CRD Controller

Expand All @@ -72,7 +69,3 @@ When the operator decides to restart an application, it cleans up the Kubernetes
## Mutating Admission Webhook

The operator comes with an optional mutating admission webhook for customizing Spark driver and executor pods based on certain annotations on the pods added by the CRD controller. The annotations are set by the operator based on the application specifications. All Spark pod customization needs except for those natively support by Spark on Kubernetes are handled by the mutating admission webhook.

## Command-line Tool: Sparkctl

[sparkctl](https://github.com/kubeflow/spark-operator/blob/master/cmd/sparkctl/README.md) is a command-line tool for working with the operator. It supports creating a `SparkApplication`object from a YAML file, listing existing `SparkApplication` objects, checking status of a `SparkApplication`, forwarding from a local port to the remote port on which the Spark driver runs, and deleting a `SparkApplication` object. For more details on `sparkctl`, please refer to [README](https://github.com/kubeflow/spark-operator/blob/master/cmd/sparkctl/README.md).
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "Using SparkApplications"
weight: 10
---

The operator runs Spark applications specified in Kubernetes objects of the `SparkApplication` custom resource type. The most common way of using a `SparkApplication` is store the `SparkApplication` specification in a YAML file and use the `kubectl` command or alternatively the `sparkctl` command to work with the `SparkApplication`. The operator automatically submits the application as configured in a `SparkApplication` to run on the Kubernetes cluster and uses the `SparkApplication` to collect and surface the status of the driver and executors to the user.
The operator runs Spark applications specified in Kubernetes objects of the `SparkApplication` custom resource type. The most common way of using a `SparkApplication` is store the `SparkApplication` specification in a YAML file and use the `kubectl` command to work with the `SparkApplication`. The operator automatically submits the application as configured in a `SparkApplication` to run on the Kubernetes cluster and uses the `SparkApplication` to collect and surface the status of the driver and executors to the user.

As with all other Kubernetes API objects, a `SparkApplication` needs the `apiVersion`, `kind`, and `metadata` fields. For general information about working with manifests, see [object management using kubectl](https://kubernetes.io/docs/concepts/overview/object-management-kubectl/overview/).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@ weight: 30

## Creating a New SparkApplication

A `SparkApplication` can be created from a YAML file storing the `SparkApplication` specification using either the `kubectl apply -f <YAML file path>` command or the `sparkctl create <YAML file path>` command. Please refer to the `sparkctl` [README](https://github.com/kubeflow/spark-operator/blob/master/cmd/sparkctl/README.md#create) for usage of the `sparkctl create` command. Once a `SparkApplication` is successfully created, the operator will receive it and submits the application as configured in the specification to run on the Kubernetes cluster. Please note, that `SparkOperator` submits `SparkApplication` in `Cluster` mode only.
A `SparkApplication` can be created from a YAML file storing the `SparkApplication` specification using either the `kubectl apply -f <YAML file path>` command. Once a `SparkApplication` is successfully created, the operator will receive it and submits the application as configured in the specification to run on the Kubernetes cluster. Please note, that `SparkOperator` submits `SparkApplication` in `Cluster` mode only.

## Deleting a SparkApplication

A `SparkApplication` can be deleted using either the `kubectl delete <name>` command or the `sparkctl delete <name>` command. Please refer to the `sparkctl` [README](https://github.com/kubeflow/spark-operator/blob/master/cmd/sparkctl/README.md#delete) for usage of the `sparkctl delete`
command. Deleting a `SparkApplication` deletes the Spark application associated with it. If the application is running when the deletion happens, the application is killed and all Kubernetes resources associated with the application are deleted or garbage collected.
A `SparkApplication` can be deleted using either the `kubectl delete <name>` command. Deleting a `SparkApplication` deletes the Spark application associated with it. If the application is running when the deletion happens, the application is killed and all Kubernetes resources associated with the application are deleted or garbage collected.

## Updating a SparkApplication

Expand Down
Loading