..  Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

..    http://www.apache.org/licenses/LICENSE-2.0

..  Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.

===============
Testing PySpark
===============

In order to run PySpark tests, you should build Spark itself first via Maven or SBT. For example,

.. code-block:: bash

    build/mvn -DskipTests clean package

.. code-block:: bash

    build/sbt -Phive clean package


After that, the PySpark test cases can be run via using ``python/run-tests``. For example,

.. code-block:: bash

    python/run-tests --python-executable=python3

Note that you may set ``OBJC_DISABLE_INITIALIZE_FORK_SAFETY`` environment variable to ``YES`` if you are running tests on Mac OS.

Please see the guidance on how to |building_spark|_,
`run tests for a module, or individual tests <https://spark.apache.org/developer-tools.html>`_.


Running Individual PySpark Tests
--------------------------------

You can run a specific test via using ``python/run-tests``, for example, as below:

.. code-block:: bash

    python/run-tests --testnames pyspark.sql.tests.test_arrow

Please refer to `Testing PySpark <https://spark.apache.org/developer-tools.html>`_ for more details.

``breakpoint()`` Support in PySpark Tests
-----------------------------------------

To debug a certain test, you can add ``breakpoint()`` in the test code, and run the test with
``python/run-tests`` as usual. The script will stop at the ``breakpoint()`` line and open an
interactive ``pdb`` debugging session. 


Running Tests using GitHub Actions
----------------------------------

You can run the full PySpark tests by using GitHub Actions in your own forked GitHub
repository with a few clicks. Please refer to
`Running tests in your forked repository using GitHub Actions <https://spark.apache.org/developer-tools.html>`_ for more details.


Running Tests for Spark Connect
-------------------------------

Running Tests for Python Client
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to test the changes in Protobuf definitions, for example, at
`spark/sql/connect/common/src/main/protobuf/spark/connect <https://github.com/apache/spark/tree/master/sql/connect/common/src/main/protobuf/spark/connect>`_,
you should regenerate Python Protobuf client first by running ``dev/connect-gen-protos.sh``.


Running PySpark Shell with Python Client
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The command below starts Spark Connect server automatically locally, and creates a Spark Connect client connected to the server.

.. code-block:: bash

    bin/pyspark --remote "local[*]"

