You probably have an interest in data, some coding experience with R or Python, and have taken some online Data Science courses. All of this is great preparation for a career change and for participating at Data Science Retreat. Yet is there anything you can specifically do in the weeks before starting at Data Science Retreat?
We have one elementary requirement, and three suggestions for you. The requirement is that you set up your machine. As operating systems go, Mac and Linux still are much easier to use - and thus preferred. It is essential that you have tried out the latest version of Python, it is set up, and is ready to go.
The three suggestions we have are:
- Maths refresher for Data Science models
- Practical computer science for Data Science deployment
- Preparing for a data-driven project
- Linear algebra connected with computational techniques: No bullshit guide to linear algebra
- Linear algebra and statistics applied in python: linear_algebra_for_machine_learning
- Statistics and probability in Khan Academy
- Linear Algebra in Khan Academy
- Command Line basics: Linux Command Line CheatSheet
- Git: Everyday Git
- Algorithms: algorithms in Khan Academy
Data Science Retreat asks participants to execute one major data-driven project as a group effort. The objective is building or improving a product prototype. Teams of four will invest 1,000 hours of work into the project. This equates to between 200-250 hours per group member, or 25-hour weeks. Teams will be selected in the first few weeks, and an interim assessment follows in Week 8 at the latest. To get ready for the Data Science project, you may consider contributing one project proposal to the DSR pool. All proposals will be evaluated for originality, feasibility, and methodological soundness across the following four dimensions:
- Data
- Sources: Description of one or more possible data sources, e.g. data type, quantity, volume and so on
- Availability: Is the data available? If so, how, e.g. scraping, database, API?
- Accessibility. Is the data actually accessible to you (us)?
- Quality: What is the quality of the data? How much pre-processing needs to be done?
- Problem and Solution
- What is the problem to which you want to provide a solution through Data Science?
- Can you sketch the problem from the point-of-view of a business or organization?
- What would you consider the perhaps 2-3 essential features of the solution?
- Use case
- Can you sketch the problem from the point-of-view of the user?
- If you provide the solution, who would be your first and/or best user?
- Approach
- Which ML or DL approach do you recommend to provide the Data Science solution?
- Which second or other approaches would you suggest to compare your evaluation metrics and outcome?
- Are you aware of tools or packages which can be of help to develop your project?