Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added faq page #235

Merged
merged 7 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,14 @@ wget -nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'
```python
from cleanvision import Imagelab

if __name__ == '__main__':
# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()
# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()
```

2. CleanVision diagnoses many types of issues, but you can also check for only specific issues.
Expand All @@ -67,6 +66,7 @@ imagelab.report(issue_types=issue_types)
- [Additional example notebooks](https://github.com/cleanlab/cleanvision-examples)
- [Documentation](https://cleanvision.readthedocs.io/)
- [Blog Post](https://cleanlab.ai/blog/cleanvision/)
- [FAQ](https://cleanvision.readthedocs.io/en/latest/faq.html)

sanjanag marked this conversation as resolved.
Show resolved Hide resolved
## *Clean* your data for better Computer *Vision*

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Folder Dataset
Fsspec Dataset
==============

.. automodule:: cleanvision.dataset.folder_dataset
.. automodule:: cleanvision.dataset.fsspec_dataset
:autosummary:
:members:
:undoc-members:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/cleanvision/dataset/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Dataset

.. toctree::
base_dataset
folder_dataset
fsspec_dataset
hf_dataset
torch_dataset
utils
68 changes: 68 additions & 0 deletions docs/source/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Frequently Asked Questions
==========================

Answers to frequently asked questions about the `cleanvision <https://github.com/cleanlab/cleanvision/>`_ open-source package.

1. **What kind of machine learning tasks can I use CleanVision for?**

CleanVision is independent of any machine learning tasks as it directly works on images and does not require and labels or metadata to detect issues in the dataset. The issues detected by CleanVision are helpful for all kinds of machine learning tasks.

2. **Can I check for specific issues in my dataset?**


Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling ``Imagelab.find_issues``

.. code-block:: python3

imagelab.find_issues(issue_types={"light": {}, "blurry": {}})


3. **What dataset formats does CleanVision support?**


Apart from plain image files, CleanVision also works with HuggingFace and Torchvision datasets. You can use the dataset objects as is with the ``image_key`` argument.
sanjanag marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python3

imagelab = Imagelab(hf_dataset=dataset, image_key="image")

For more detailed usage instructions and examples, check the :ref:`tutorials`.

Commonly encountered errors
---------------------------

- **RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.**

.. code-block:: console

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

if __name__ == '__main__':
freeze_support()
...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html


The above issue is caused by multiprocessing module working differently for macOS and Windows platforms. A detailed discussion of the issue can be found `here <https://github.com/cleanlab/cleanlab/issues/159>`_.
A fix around this issue is to run CleanVision in the main namespace like this

.. code-block:: python3

if __name__ == "__main__":

imagelab = Imagelab(data_path)
imagelab.find_issues()
imagelab.report()

OR use `n_jobs=1` to disable parallel processing:

.. code-block:: python3

imagelab.find_issues(n_jobs=1)
71 changes: 41 additions & 30 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,47 +4,50 @@

Documentation
=======================================

CleanVision automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry,
over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project
to find problems in your dataset, which you may want to address before applying machine learning.


Installation
============

To install the latest stable version (recommended):
------------

.. code-block:: console
.. tabs::

$ pip install cleanvision
.. tab:: pip

.. code-block:: bash

To install the bleeding-edge developer version:
pip install cleanvision

.. code-block:: console
To install the package with all optional dependencies:

$ pip install git+https://github.com/cleanlab/cleanvision.git
.. code-block:: bash

To install with HuggingFace optional dependencies
pip install "cleanvision[all]"

.. code-block:: console
.. tab:: source

$ pip install "cleanvision[huggingface]"
.. code-block:: bash

To install with Torchvision optional dependencies
pip install git+https://github.com/cleanlab/cleanvision.git

.. code-block:: console
To install the package with all optional dependencies:

$ pip install "cleanvision[pytorch]"
.. code-block:: bash

pip install "git+https://github.com/cleanlab/cleanvision.git#egg=cleanvision[all]"




Quickstart
===========
How to Use CleanVision
----------------------

1. Using CleanVision to audit your image data is as simple as running the code below:
Basic Usage
^^^^^^^^^^^
Here's how to quickly audit your image data:


.. code-block:: python3
Expand All @@ -60,8 +63,9 @@ Quickstart
# Produce a neat report of the issues found in your dataset
imagelab.report()

2. CleanVision diagnoses many types of issues, but you can also check for only specific issues:

Targeted Issue Detection
^^^^^^^^^^^^^^^^^^^^^^^^
You can also focus on specific issues:

.. code-block:: python3

Expand All @@ -72,8 +76,9 @@ Quickstart
# Produce a report with only the specified issue_types
imagelab.report(issue_types.keys())

3. Run CleanVision on a Hugging Face dataset

Integration with Hugging Face Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Easily use CleanVision with a Hugging Face dataset:

.. code-block:: python3

Expand All @@ -90,7 +95,9 @@ Quickstart

imagelab.report()

4. Run CleanVision on a Torchvision dataset
Integration with Torchvision Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
CleanVision works smoothly with Torchvision datasets too:


.. code-block:: python3
Expand All @@ -111,29 +118,32 @@ Quickstart
imagelab.report()


More on how to get started with CleanVision:
- `Example Python script <https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/run.py>`_
- `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- `How To Contribute <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_
Additional Resources
--------------------
- Get started with our `Example Notebook <https://cleanvision.readthedocs.io/en/latest/tutorials/tutorial.html>`_
- Explore more `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- Learn how to contribute in the `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_


.. toctree::
:hidden:
:maxdepth: 1
:caption: Getting Started

Quickstart <self>
.. _api-reference:


.. _tutorials:
.. toctree::
:hidden:
:maxdepth: 3
:caption: Tutorials
:name: _tutorials

tutorials/tutorial.ipynb
How to Use CleanVision <tutorials/tutorial.ipynb>
tutorials/torchvision_dataset.ipynb
tutorials/huggingface_dataset.ipynb
Frequently Asked Questions <faq>

.. _api-reference:
.. toctree::
:hidden:
:maxdepth: 3
Expand All @@ -153,3 +163,4 @@ More on how to get started with CleanVision:
GitHub <https://github.com/cleanlab/cleanvision.git>
PyPI <https://pypi.org/project/cleanvision/>
Cleanlab Studio <https://cleanlab.ai/studio/?utm_source=cleanvision&utm_medium=docs&utm_campaign=clostostudio>

2 changes: 1 addition & 1 deletion docs/source/tutorials/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview"
"# How to Use CleanVision"
]
},
{
Expand Down