Skip to content

Commit

Permalink
docsp-31169 - change stream schema inference (#164)
Browse files Browse the repository at this point in the history
Co-authored-by: Caitlin Davey <caitlin@caitlindavey.com>
(cherry picked from commit 099521a)
  • Loading branch information
mongoKart committed Jul 13, 2023
1 parent 5fa4781 commit 5f910e9
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 5 deletions.
4 changes: 2 additions & 2 deletions snooty.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ intersphinx = ["https://www.mongodb.com/docs/manual/objects.inv"]
toc_landing_pages = ["configuration"]

[constants]
driver-short = "Spark Connector"
driver-long = "MongoDB {+driver-short+}"
connector-short = "Spark Connector"
connector-long = "MongoDB {+connector-short+}"
current-version = "10.2.0"
artifact-id-2-13 = "mongo-spark-connector_2.13"
artifact-id-2-12 = "mongo-spark-connector_2.12"
Expand Down
8 changes: 6 additions & 2 deletions source/configuration/read.txt
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ You can configure the following properties to read from MongoDB:
Partitioner Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~

Partitioners change the read behavior for batch reads with the {+driver-short+}.
Partitioners change the read behavior for batch reads with the {+connector-short+}.
They do not affect Structured Streaming because the data stream processing
engine produces a single stream with Structured Streaming.

Expand Down Expand Up @@ -330,9 +330,13 @@ Change Streams
- | Specifies whether to publish the changed document or the full
change stream document.
|
| When set to ``true``, the connector filters out messages that
| When this setting is ``true``, the connector exhibits the following behavior:

- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.

.. note::

Expand Down
12 changes: 12 additions & 0 deletions source/read-from-mongodb.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,18 @@ Overview

.. include:: /scala/filters.txt

.. important:: Inferring the Schema of a Change Stream

When the {+connector-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than that
of the change stream. If you set the ``change.stream.publish.full.document.only``
option to ``true``, the connector uses the schema of the
change stream instead.

For more information on configuring a read operation, see the
:ref:`spark-change-stream-conf` section of the Read Configuration Options guide.

SQL Queries
-----------

Expand Down
2 changes: 1 addition & 1 deletion source/structured-streaming.txt
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ Configuring a Write Stream to MongoDB

Configuring a Read Stream from MongoDB
--------------------------------------
When reading a stream from a MongoDB database, the {+driver-long+} supports both
When reading a stream from a MongoDB database, the {+connector-long+} supports both
*micro-batch processing* and
*continuous processing*. Micro-batch processing is the default processing engine, while
continuous processing is an experimental feature introduced in
Expand Down

0 comments on commit 5f910e9

Please sign in to comment.