Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post addition: Serverless dlt + dbt project #4658

Merged
merged 17 commits into from
Jan 15, 2024

Conversation

euanjohnston-dev
Copy link
Contributor

What are you changing in this pull request and why?

PR related to blog post ticket raised here: (#4657).

@euanjohnston-dev euanjohnston-dev requested a review from a team as a code owner December 15, 2023 15:42
Copy link

welcome bot commented Dec 15, 2023

Hello!👋 Thanks for contributing to the dbt product documentation and opening this pull request! ✨
We use Markdown and some HTML to write the dbt product documentation. When writing content, you can use our style guide and content types to understand our writing standards and how we organize information in the dbt product docs.
We'll review your contribution and respond as soon as we can. 😄

Copy link

vercel bot commented Dec 15, 2023

Someone is attempting to deploy a commit to the dbt-labs Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added content Improvements or additions to content developer blog This content fits on the developer blog. size: medium This change will take up to a week to address labels Dec 15, 2023
@runleonarun runleonarun added the new contributor Label for first-time contributors label Dec 15, 2023
Copy link

vercel bot commented Jan 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview 💬 1 unresolved Jan 15, 2024 4:16am

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @euanjohnston-dev this is coming along nicely! Thanks for opening the issue and PR, and thanks for your patience over the holiday break.

Most of the changes I've made are typo fixes or small clarifications, but there's a couple of places I did some more significant reworking. If it doesn't feel totally in your voice, feel free to make further changes.

There's also a bunch of general comments where I think you need to provide a bit more context. The article is interesting but doesn't always make sense if you're coming into it without full context.

Let me know if you've got questions about any of the comments, either here or on the dbt Community Slack.

Comment on lines 46 to 48
The solution has pretty standard components

- An EtL pipeline. The little t stands for normalisation, such as transforming strings to dates or unpacking nested structures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this list is incomplete since it only has one element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rounded this out a bit to give a broader overview into the pipeline and then added additional details later.


- An EtL pipeline. The little t stands for normalisation, such as transforming strings to dates or unpacking nested structures.

Due to the complexity of deduplication, we needed to add a human element to confirm the deduplication. This is reflected in the diagram below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the diagram reflect it? I assume you're talking about the fact that it has to send a Slack alert, but it's not explicit anywhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading further, this turns out to not be the case - the human element is actually manually fixing things in GSheets. It would be worth clarifying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed this element and discussed the gsheets element later in greater detail.


### Production-readying the pipeline

To make our pipeline more “production ready”, we could make some improvements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say "could make", but it looks like you did make these improvements, right? So I don't think this section is necessary as currently laid out.

If you wanted to talk about the Slack notification setup you created, that could make sense in a different context. E.g. you could talk about dlt's extensibility and use this as an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I've changed this around to mention the addition.

The outcome was first and foremost a visualisation highlighting the unique properties available in my specific area of search. The map shown on the left of the page gives a live overview of location, number of duplicates (bubble size) and price (bubble colour) which can amongst other features be filtered using the sliders on the right. This represents a much better decluttered solution from which to observe the actual inventory available.


<Lightbox src="/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/map_screenshot.png" width="70%" title="Dashboard mapping overview" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)


Bad:

- I did have a small hiccup with the google sheets connector assuming an oauth authentication over my desired sdk but this was relatively easy to rectify.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a one-line fix in a config file, it would be nice to either post the line of code here or link to whatever documentation/Stack Overflow post you used to solve your problem, as a pointer to whoever tries to replicate your work and has the same issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a one line config file issue but this wasn't an issue that was able to be replicated. My fix was to explicitly state the object as GcpServiceAccountCredentials in the init.py file for the source. Honestly speaking I felt it overkill to go into details on this and would detract from the core article but adding provided balance to the positives. I would suggest either it stays as is or we remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection I have briefly referenced this if people were to stumble into the same issue.


```python
def dbt_run():
# make an authenticated connection with dlt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean you're using dlt to create an authenticated connection (presumably to the warehouse?), or that you're creating an authenticated connection to dlt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the connection is to the warehouse sorry if that comes across a little unclear. The destination parameter specifies the type of database to which the data will be loaded.

website/blog/authors.yml Outdated Show resolved Hide resolved
euanjohnston-dev and others added 2 commits January 9, 2024 10:48
Batch approved a number of the suggested changes. Will require further review.

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
…re.md

Made amendments to the blog aligned with the commented feedback you provided along the lines of providing greater context. Happy to make further changes if you think deem them necessary. 

Also I imagine we will need to update the blog dates etc. aligned with when this is ready to be pushed? The dates I chose were related to the original PR.
@joellabes
Copy link
Contributor

Thanks for the updates @euanjohnston-dev! I will review next week and we should be able to get this up!

@euanjohnston-dev
Copy link
Contributor Author

euanjohnston-dev commented Jan 12, 2024 via email

@joellabes joellabes merged commit 8a16c31 into dbt-labs:current Jan 15, 2024
3 checks passed
@joellabes
Copy link
Contributor

It's up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content developer blog This content fits on the developer blog. January-2024 new contributor Label for first-time contributors size: medium This change will take up to a week to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants