Welcome to Ready 2019 Advanced Databricks Challenge. We will focus on hands-on activities that develop proficiency in advanced Databricks concepts such as data exploration using Spark, building Supervised & Unsupervised Learning Models, Evaluating Models and using advanced libraries like MMLSpark. These challenges assume an introductory to intermediate knowledge of Azure Databricks, and if this is not the case, please spend time working through the Introduction to Databricks challenges first.
Most challenges observed by customers in these realms are in stitching multiple services together. As such, where possible, we have tried to place key concepts in the context of a broader example.
At the end of this workshop, you should be able to:
-
Understand how to use Azure Databricks to build ML models including:
- Supervised Learning (classification)
- Unsupervised Learning (clustering / recommendation )
-
How to evaluate those models using Azure Databricks
-
Understanding Libraries: Introduction to MMLSpark and when to use it
-Introduction to Deep Learning
This workshop is meant for a Data Scientist on Azure who actively scripts using a common data science language like Python. Since this is only a short workshop, there are certain things you will need to read or setup after you arrive.
Firstly, you should have some previous exposure to Python. We will be using it for everything we are building in the workshop, so you should be familiar with how to use it to create ML models. Additionally, this is not a class where we teach you about how to choose the correct algorithm for the business scenario. We assume you have some familiarity with these concepts ahead of time.
Secondly, you should have some experience with Azure Databricks and the core concepts including workspaces, libraries et al. If not, please check out the Intro to Azure Databricks workshop first.
Thirdly, you should have experience with the portal and be able to create resources (and spend money) on Azure. We will not be providing Azure passes for this workshop.
For fun, I have included a EU soccer example (.DBC) as well as a Retail Fashion example and by popular demand, a Pandas UDF Benchmark notebook to help you get started with your User Defined Functions with Pandas. Please let me know if you have any questions.
[Business Case I - Azure Databricks
- Start by following the steps in the Setup Guide to provision your Azure environment and fork both the labs and the notebooks used in the challenges.
- Challenge 0 - Administration. ****Please note: you do not need to run through Admin if you are an attendee of Ready(see note below for when to use this Databricks Archive).
- Challenge 1 - Exploring Data with Spark.
- Challenge 2 - Building Supervised Learning Models.
- Challenge 3 - Evaluating Supervised Learning Models.
- Challenge 4 - Recommenders and Clustering.
- Challenge 5 - Using the MMLSpark Library
Note: The Challenge 0 - Administration archive is to help Microsoft FTE facilitate this workshop with their customers after the fact.**
- Use Teams for all Challenge Communications
- Create Channel within Teams per Table
- SWAG given for most helpful tables, team mates (not those who complete challenges fastest or first)
- Q&A and Feedback