diff --git a/README.org b/README.org new file mode 100644 index 0000000..3fe01dc --- /dev/null +++ b/README.org @@ -0,0 +1,69 @@ +* Development dependencies +| Software | Version | Comment | +|--------------------------+----------+--------------------------------------------------------------------------------------------| +| Emacs | 29.2 | main development environment | +| Python | 3.11.6 | works with pyspark >= 3.4.0, [[https://stackoverflow.com/questions/75048688/picklingerror-could-not-serialize-object-indexerror-tuple-index-out-of-range][(see discussion)]] | +| python-pyspark | 3.4.0 | Python API for Spark (large-scale data processing library) | +| python-py4j | 0.10.9.7 | enables Python programs to dynamically access Java (dependency of PySpark) | +| python-pandas | 2.0.2 | Python data analysis library | +| python-pyarrow | 15.0.0 | bindings to Apache Arrow (dependency of PySpark) | +| python-tabulate | 0.9.0 | needed to convert dataframes into org-table format | +| Java Runtime Environment | 17.0.10 | newer version do not work with PySpark 3.4.0 | +| PYNT (Emacs package) | >1.0 | interactive kernel for Python in Emacs, read installation instructions at (see [[https://github.com/ebanner/pynt][repository]]) | +| org-export | 64ac299 | command line tool needed for HTML export, requires Emacs (see [[https://github.com/nhoffman/org-export/tree/64ac299c041877620c2cadba83ded44f46c4e124][repository]]) | + +* PYNT Installation +Install the codebook module with pip package manager: +#+begin_src shell + $ pip install git+https://github.com/ebanner/pynt +#+end_src + +On ArchLinux, pip is not allowed to install by default, so pass an extra argument: +#+begin_src shell + $ pip install --break-system-packages git+https://github.com/ebanner/pynt +#+end_src + +Open Emacs. Install ~pynt~ in Emacs through MELPA. +#+begin_src emacs-lisp + M-x package-install RET pynt +#+end_src + +To fix the following error +#+begin_src text + ModuleNotFoundError: No module named 'notebook.services' +#+end_src + +Find the installation of PyNT: +#+begin_src shell + $ grep -i kernelmanager /usr/lib/python3.11/site-packages/codebook/manager.py + from jupyter_server.services.kernels.kernelmanager import MappingKernelManager +#+end_src +which is defined in the [[https://github.com/ebanner/pynt/blob/86cf9ce78d34f92bfd0764c9cbb75427ebd429e6/codebook/manager.py#L15][source code]] and change that line to +#+begin_src python + from jupyter_server.services.kernels.kernelmanager import MappingKernelManager +#+end_src + +* Java Runtime Installation +PySpark Cookbook's recipes were tested in Emacs IDE using ~Java Runtime environment: 17.0.10.~. Set it as a default: +#+begin_src shell + $ export JAVA_HOME=/usr/lib/jvm/java-17-openjdk + $ sudo ln -s /usr/lib/jvm/java-17-openjdk /usr/lib/jvm/default +#+end_src + +* Install org-export +#+begin_src shell + git clone https://github.com/nhoffman/org-export.git + cd org-export + sudo install -D -m 755 org-export* /usr/local/bin +#+end_src + +* Export to HTML +#+begin_src shell + make index.html + make test_ps2org.html +#+end_src + +* Development Environment +#+CAPTION: Emacs with org-mode as a development environment +#+NAME: fig:example +[[./screenshots/example.png]] diff --git a/screenshots/example.png b/screenshots/example.png new file mode 100644 index 0000000..a2b8829 Binary files /dev/null and b/screenshots/example.png differ