(split off from main readme to reduce clutter)
- fully integrate with the functional competency map, and replace this syllabus with that when it is ready
- split up the syllabus and the curriculum for this package (this makes it easier to look for new and better
materials, and to make updates to the curriculum, while at the same time making sure the scope of the training package
doesn't drift too much as stuff is added or removed)
- the syllabus notes down things that need to be learned and outcomes that should be achieved
- the curriculum collates and orders the material needed to cover the syllabus
- cover deep learning in the ML module in a bot more depth
- update ASR
- notes on timeseries data
- fb prophet
- changepoint detection
- model / data drift, and how to detect it
- how to split training and test sets (and how NOT to)
- split by time (or in some cases, some other time-dependent variable)
- splitting randomly leaks labels into the training set
- remember that near-100% results are suspicious
- evaluating your results "by eye"
- is the accuracy weird
- is it too good? (98-100%, but 95+ is where you'd get suspicious)-> leaking labels, easy problems
- is it too bad? -> low quality labels and test set?
- rules of thumb
- would the opposite finding be surprising?
- if you were told that some other model predicted the opposite (or a different) result, would it be believable?
- if you were instructed to come up with the opposite result (given your current data), would you be able to do so easily?
- is the accuracy weird
- https://github.com/ahmedbahaaeldin/From-0-to-Research-Scientist-resources-guide
- https://www.scribbr.com/methodology/research-design/
- deep learning specialization course (free to audit)
- google ML crash course
- microsoft ML course
- CRISP-DM (CRoss-Industry Standard Process for Data Mining)
- seems a bit buzzwordy, but the workflow is generally valid
- glossary
- rules for ML
- how to work with users
- technical debt in ML
- wizard of oz models
- see sidebar for titanic walkthrough
- 10 rules for better Jupyter notebooks
- Elements of Statistical Learning (book)
- Data Science and Machine Learning: Mathematical and Statistical Methods
- More Unicode
- http://reedbeta.com/blog/programmers-intro-to-unicode/
- see also
grapheme
, which is a library for working with what you probably think are unicode characters - see also
wcswidth
, which gives you the length of a string, double-counting CJK characters since those are double-wide
- see also 'words of estimative probability' for an example of how categories may be only semi-ordinal
- https://www.rawgraphs.io/gallery
- https://flourish.studio/examples/
- https://www.storytellingwithdata.com/chart-guide
- https://seaborn.pydata.org/examples/index.html
- https://plotly.com/python/plotly-fundamentals/
- https://www.reddit.com/r/dataisbeautiful/top/?t=all
- https://www.reddit.com/r/Infographics/top/?t=all
- https://informationisbeautiful.net/
- https://design.google/library/exploring-color-google-maps/
- https://blog.datawrapper.de/colors-for-data-vis-style-guides/