Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 3.1 KB

README.md

File metadata and controls

35 lines (22 loc) · 3.1 KB

Open Source Data Pipeline 🐶

Welcome to DagsHub’s Data Pipeline contribution project for Hacktoberfest 2023!

hero-narrow

In this exciting Hacktoberfest challenge, DagsHub invites you to build data pipelines using DVC for automation and versioning of Open Source Machine Learning projects.

What is DagsHub?

DagsHub is a centralized platform to host and manage machine learning projects including code, data, models, experiments, annotations, model registry, and more! DagsHub does the MLOps heavy lifting for its users. Every repository comes with configured S3 storage, an experiment tracking server, and an annotation workspace - all using popular open-source tools like MLflow, DVC, Git, and Label Studio.

What's the Challenge?

DagsHub is excited to introduce the DVC Data Pipeline Contribution Challenge. In this challenge, we invite you to contribute DVC (Data Version Control) data pipelines to open-source projects on DagsHub. DVC pipelines are essential for efficiently managing, versioning, and sharing data workflows in machine learning and data science projects.

How Can You Participate?

Here's a step-by-step guide to get involved in this challenge:

  1. Choose a Project: Explore open-source projects on DagsHub and select one that interests you. It can be any project that utilizes data pipelines or would benefit from one.
  2. Create the DVC Pipeline: Fork the project under your name and using DVC, design and execute a data pipeline that suits the project's needs. Ensure it follows best practices for data versioning, reproducibility, and scalability.
  3. Document Your Pipeline: As you build the pipeline, maintain clear and concise documentation describing its purpose, data sources, processing steps, and any dependencies. This documentation is crucial for future users and contributors and should be added to the project’s README file.
  4. Tag your project: Add relevant tags to the repository and files including dvc,data-pipeline, hacktoberfest, and hacktoberfest-2023 labels to the DagsHub repository.
  5. Submit Your Contribution: Open a Pull Request to the project on DagsHub.
  6. Proof of Contribution: Open a Pull Request here with the README.md, dvc.yaml and dvc.lock files and a link to the DagsHub repo.

Why Join the Challenge?

Participating in the DagsHub DVC Data Pipeline Contribution Challenge offers numerous benefits:

  • Skill Enhancement: Sharpen your DVC skills and gain hands-on experience in creating robust data pipelines.
  • Collaborative Learning: Collaborate with open-source project maintainers and fellow contributors, expanding your network and knowledge.
  • Contribution to Open Source: Contribute to the open-source community by enhancing the data workflows of valuable projects.
  • Visibility: Showcase your expertise to a wider audience within the data science and machine learning community.