Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docsp-31169 - change stream schema inference #164

Merged
merged 6 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion source/configuration/read.txt
Original file line number Diff line number Diff line change
Expand Up @@ -330,9 +330,13 @@ Change Streams
- | Specifies whether to publish the changed document or the full
change stream document.
|
| When set to ``true``, the connector filters out messages that
| When set to ``true``:

- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: per the style guide re: lists, do not start a list with a fragment

Suggested change
| When set to ``true``:
- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.
| The connector displays the following behavior when this property is set to ``true``:
- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch


.. note::

Expand Down
12 changes: 12 additions & 0 deletions source/read-from-mongodb.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,18 @@ Overview

.. include:: /scala/filters.txt

.. important:: Change Stream Schema Inference

When the {+driver-short+} infers the schema of a data frame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we rename this source constant since the product is not a driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

read from a change stream, by default,
it will use the schema of the underlying collection rather than the schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S: Remove the "by default" here. It is implicit as the next sentence explains how to change the behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but to me it benefits from some indication that this behavior can be changed (apart from the following sentence, I mean). Especially when this isn't the same page where this option is documented, so the reader might not be in the mindset of settings they can turn on or off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me!

of the change stream. To instruct the connector to use the schema of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the {+driver-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than the schema
of the change stream. To instruct the connector to use the schema of the
When the {+driver-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than of the change stream.
To instruct the connector to use the schema of the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used your suggestion plus "that" to hopefully make it more clear. LMK

change stream, set the ``change.stream.publish.full.document.only`` option
to ``true``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of the change stream. To instruct the connector to use the schema of the
change stream, set the ``change.stream.publish.full.document.only`` option
to ``true``.
of the change stream. The connector infers the schema from the
change stream if you set the ``change.stream.publish.full.document.only`` option
to ``true``.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded. just for my info, what don't you like about the first version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing of "instruct the connector" seemed needlessly complicated. Better to just have it be a declarative sentence, IMO. but it just a suggestion :)


For more information on configuring a read operation, see the
:ref:`spark-read-conf` guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what i had initially. not sure why i changed


SQL Queries
-----------

Expand Down