Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docsp-31169 - change stream schema inference #164

Merged
merged 6 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions snooty.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ intersphinx = ["https://www.mongodb.com/docs/manual/objects.inv"]
toc_landing_pages = ["configuration"]

[constants]
driver-short = "Spark Connector"
driver-long = "MongoDB {+driver-short+}"
connector-short = "Spark Connector"
connector-long = "MongoDB {+connector-short+}"
current-version = "10.1.1"
artifact-id-2-13 = "mongo-spark-connector_2.13"
artifact-id-2-12 = "mongo-spark-connector_2.12"
Expand Down
8 changes: 6 additions & 2 deletions source/configuration/read.txt
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ You can configure the following properties to read from MongoDB:
Partitioner Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~

Partitioners change the read behavior for batch reads with the {+driver-short+}.
Partitioners change the read behavior for batch reads with the {+connector-short+}.
They do not affect Structured Streaming because the data stream processing
engine produces a single stream with Structured Streaming.

Expand Down Expand Up @@ -330,9 +330,13 @@ Change Streams
- | Specifies whether to publish the changed document or the full
change stream document.
|
| When set to ``true``, the connector filters out messages that
| When this setting is ``true``, the connector exhibits the following behavior:

- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.

.. note::

Expand Down
12 changes: 12 additions & 0 deletions source/read-from-mongodb.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,18 @@ Overview

.. include:: /scala/filters.txt

.. important:: Inferring the Schema of a Change Stream

When the {+connector-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than that
of the change stream. If you set the ``change.stream.publish.full.document.only``
option to ``true``, the connector uses the schema of the
change stream instead.

For more information on configuring a read operation, see the
:ref:`spark-change-stream-conf` section of the Read Configuration Options guide.

SQL Queries
-----------

Expand Down
2 changes: 1 addition & 1 deletion source/structured-streaming.txt
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ Configuring a Write Stream to MongoDB

Configuring a Read Stream from MongoDB
--------------------------------------
When reading a stream from a MongoDB database, the {+driver-long+} supports both
When reading a stream from a MongoDB database, the {+connector-long+} supports both
*micro-batch processing* and
*continuous processing*. Micro-batch processing is the default processing engine, while
continuous processing is an experimental feature introduced in
Expand Down