-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog post addition: Serverless dlt + dbt project #4658
Conversation
Added personal info
…lt-and-dbt-core/test.txt
Moved the architecture diagram underneath (Towards a solution section)
amended repo links to new github profile names.
Hello!👋 Thanks for contributing to the dbt product documentation and opening this pull request! ✨ |
Someone is attempting to deploy a commit to the dbt-labs Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @euanjohnston-dev this is coming along nicely! Thanks for opening the issue and PR, and thanks for your patience over the holiday break.
Most of the changes I've made are typo fixes or small clarifications, but there's a couple of places I did some more significant reworking. If it doesn't feel totally in your voice, feel free to make further changes.
There's also a bunch of general comments where I think you need to provide a bit more context. The article is interesting but doesn't always make sense if you're coming into it without full context.
Let me know if you've got questions about any of the comments, either here or on the dbt Community Slack.
The solution has pretty standard components | ||
|
||
- An EtL pipeline. The little t stands for normalisation, such as transforming strings to dates or unpacking nested structures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this list is incomplete since it only has one element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rounded this out a bit to give a broader overview into the pipeline and then added additional details later.
|
||
- An EtL pipeline. The little t stands for normalisation, such as transforming strings to dates or unpacking nested structures. | ||
|
||
Due to the complexity of deduplication, we needed to add a human element to confirm the deduplication. This is reflected in the diagram below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the diagram reflect it? I assume you're talking about the fact that it has to send a Slack alert, but it's not explicit anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading further, this turns out to not be the case - the human element is actually manually fixing things in GSheets. It would be worth clarifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have removed this element and discussed the gsheets element later in greater detail.
|
||
### Production-readying the pipeline | ||
|
||
To make our pipeline more “production ready”, we could make some improvements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "could make", but it looks like you did make these improvements, right? So I don't think this section is necessary as currently laid out.
If you wanted to talk about the Slack notification setup you created, that could make sense in a different context. E.g. you could talk about dlt's extensibility and use this as an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point I've changed this around to mention the addition.
The outcome was first and foremost a visualisation highlighting the unique properties available in my specific area of search. The map shown on the left of the page gives a live overview of location, number of duplicates (bubble size) and price (bubble colour) which can amongst other features be filtered using the sliders on the right. This represents a much better decluttered solution from which to observe the actual inventory available. | ||
|
||
|
||
<Lightbox src="/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/map_screenshot.png" width="70%" title="Dashboard mapping overview" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
|
||
Bad: | ||
|
||
- I did have a small hiccup with the google sheets connector assuming an oauth authentication over my desired sdk but this was relatively easy to rectify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a one-line fix in a config file, it would be nice to either post the line of code here or link to whatever documentation/Stack Overflow post you used to solve your problem, as a pointer to whoever tries to replicate your work and has the same issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a one line config file issue but this wasn't an issue that was able to be replicated. My fix was to explicitly state the object as GcpServiceAccountCredentials in the init.py file for the source. Honestly speaking I felt it overkill to go into details on this and would detract from the core article but adding provided balance to the positives. I would suggest either it stays as is or we remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection I have briefly referenced this if people were to stumble into the same issue.
website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md
Outdated
Show resolved
Hide resolved
|
||
```python | ||
def dbt_run(): | ||
# make an authenticated connection with dlt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean you're using dlt to create an authenticated connection (presumably to the warehouse?), or that you're creating an authenticated connection to dlt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the connection is to the warehouse sorry if that comes across a little unclear. The destination parameter specifies the type of database to which the data will be loaded.
website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md
Outdated
Show resolved
Hide resolved
website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md
Outdated
Show resolved
Hide resolved
Batch approved a number of the suggested changes. Will require further review. Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
…re.md Made amendments to the blog aligned with the commented feedback you provided along the lines of providing greater context. Happy to make further changes if you think deem them necessary. Also I imagine we will need to update the blog dates etc. aligned with when this is ready to be pushed? The dates I chose were related to the original PR.
Thanks for the updates @euanjohnston-dev! I will review next week and we should be able to get this up! |
No worries, thank you for all your help on this. Have a great weekend!
best,
Euan
…On Fri, Jan 12, 2024 at 4:40 AM Joel Labes ***@***.***> wrote:
Thanks for the updates @euanjohnston-dev
<https://github.com/euanjohnston-dev>! I will review next week and we
should be able to get this up!
—
Reply to this email directly, view it on GitHub
<#4658 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOKSWQIOWA3XGTDMMXRFUNLYOCWD3AVCNFSM6AAAAABAWUGES6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGM4DKMBXHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It's up! |
What are you changing in this pull request and why?
PR related to blog post ticket raised here: (#4657).