Skip to content

Commit

Permalink
Merge pull request #57 from /issues/33
Browse files Browse the repository at this point in the history
feat: Log events in a table
  • Loading branch information
john-gom authored Aug 12, 2024
2 parents 0e49150 + 99eca49 commit d3cc926
Show file tree
Hide file tree
Showing 32 changed files with 1,366 additions and 323 deletions.
6 changes: 4 additions & 2 deletions .env
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
COMPOSE_FILE=docker-compose.yml;docker/dev.yml
COMPOSE_FILE_RUN=docker-compose.yml,docker-compose-run.yml
COMPOSE_FILE=${COMPOSE_FILE_RUN},docker/dev.yml
COMPOSE_PROJECT_NAME=off-query
COMPOSE_PATH_SEPARATOR=;
COMPOSE_PATH_SEPARATOR=,
RESTART_POLICY=no
TAG=latest
QUERY_PORT=127.0.0.1:5511
Expand All @@ -11,6 +12,7 @@ POSTGRES_USER=productopener
POSTGRES_PASSWORD=productopener
POSTGRES_SHM_SIZE=256m
COMMON_NET_NAME=off_shared_network
# Note when running in a container the following settings are changed to use the internal docker network
MONGO_URI=mongodb://localhost:27017
REDIS_URL=redis://localhost:6379
# Log levels are: debug, verbose, log (default), warn, error
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/container-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ jobs:
echo "DOCKER_CLIENT_TIMEOUT=120" >> .env
echo "COMPOSE_HTTP_TIMEOUT=120" >> .env
echo "COMPOSE_PROJECT_NAME=off-query" >> .env
echo "COMPOSE_PATH_SEPARATOR=;" >> .env
echo "COMPOSE_PATH_SEPARATOR=," >> .env
echo "RESTART_POLICY=always" >> .env
echo "COMPOSE_FILE=docker-compose.yml" >> .env
echo "TAG=sha-${{ github.sha }}" >> .env
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ endif
up: run_deps
docker compose up --build --wait

# Called by other projects to start this project as a dependency
run: run_deps
COMPOSE_FILE=${COMPOSE_FILE_RUN} docker compose up -d

# This task starts a Postgres database in Docker and then prepares the local environment for development
dev: run_deps
docker compose up --wait query_postgres
Expand Down
6 changes: 6 additions & 0 deletions docker-compose-run.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
services:
query:
environment:
# Use shared-services MongoDB and REDIS
- MONGO_URI=mongodb://mongodb:27017
- REDIS_URL=redis://redis:6379
3 changes: 3 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ services:
- POSTGRES_PASSWORD
- POSTGRES_DB
shm_size: ${POSTGRES_SHM_SIZE}
# Always expose port for viewer access
ports:
- "${POSTGRES_PORT:-5512}:5432"
volumes:
- dbdata:/var/lib/postgresql/data
networks:
Expand Down
5 changes: 0 additions & 5 deletions docker/dev.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
services:
query_postgres:
# Expose port locally for testing purposes.
ports:
- "${POSTGRES_PORT:-5512}:5432"

query:
build: .
image: openfoodfacts-query:dev
Expand Down
56 changes: 56 additions & 0 deletions docs/decisions/event-aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Product Event Aggregation

## Context and Problem Statement

We would like to be able to support a variety of queries on events that have been recorded on products over time. For example, for the producer's dashboard we want to be able to show the number of edits and distinct products updated over a month.

## Decision Drivers

* Queries should run quickly
* Database space consumed should not be excessive
* Data should be reasonably up to date, i.e. any ETL / ELT process should keep up with the rate at which events are being created

## Considered Options

* Query the raw event tables
* Create specific aggregate tables for each aggregate dimension
* Create a relational model of events against products

## Decision Outcome

Chosen option: "Create a relational model of events against products", because it offers the best compromise in terms of acceptable query performance with minimal storage space and does not require new tables to be created for every possible aggregate dimension.

### Consequences

In general we should try and map things to a relational model, but only at the most granular level of detail that makes sense, e.g. total count of actions in one day.

It has been observed that PostgreSQL performs much better when dealing with small record sizes, so text fields should be normalised where possible so that an integer id can be stored instead.

## Pros and Cons of the Options

### Query the raw event tables

In this option the raw events are simply loaded into a table and then views are created to query this table, joining to the product table to obtain the required dimension.

* Good: Only the raw events are being stored
* Good: Import of data is as fast as possible
* Bad: Query performance is poor. Even with indexing typical queries were taking around 2 minutes

### Create specific aggregate tables for each aggregate dimension

With this option the raw events would be ingested and then a follow-up process would run to aggregate those events by the required dimension, e.g. for producer's dashboard this would be aggregating by day, action and owner with a total update count plus a count of distinct products updated.

* Good: Queries run very quickly (sub 100ms)
* Bad: Additional tables, processes and storage need to be assigned for each new query dimension
* Bad: It is difficult to incrementally refresh tables where distinct counts are included (as cannot work out the new distinct count from the combination of new events plus existing distinct count)

### Create a relational model of events against products

With this option the raw events would be ingested and then a follow-up process would run to just aggregate those events by action, contributor and day against the product. Different views can then be provided to query this data, joining to the product to obtain the required dimension.

With this option it was important to keep the size of the relational table as small as possible, so an enumeration was used for the action and the contributors were normalised into a separate table so that only the id needed to be stored in the event table.

* Neutral: Queries performance is acceptable (sub 1s)
* Good: Queries to support different dimensions do not require addition storage or import processes
* Good: Aggregated counts are not distinct, so can be refreshed incrementally

69 changes: 69 additions & 0 deletions docs/decisions/template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# {short title of solved problem and solution}

## Context and Problem Statement

{Describe the context and problem statement, e.g., in free form using two to three sentences or in the form of an illustrative story.
You may want to articulate the problem in form of a question and add links to collaboration boards or issue management systems.}

<!-- This is an optional element. Feel free to remove. -->
## Decision Drivers

* {decision driver 1, e.g., a force, facing concern, …}
* {decision driver 2, e.g., a force, facing concern, …}
*<!-- numbers of drivers can vary -->

## Considered Options

* {title of option 1}
* {title of option 2}
* {title of option 3}
*<!-- numbers of options can vary -->

## Decision Outcome

Chosen option: "{title of option 1}", because
{justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.

<!-- This is an optional element. Feel free to remove. -->
### Consequences

{Provide detail on the implications of making this decision and how any forseen problems can be mitigated}

<!-- This is an optional element. Feel free to remove. -->
### Confirmation

{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test.
Although we classify this element as optional, it is included in most ADRs.}

<!-- This is an optional element. Feel free to remove. -->
## Pros and Cons of the Options

### {title of option 1}

<!-- This is an optional element. Feel free to remove. -->
{example | description | pointer to more information | …}

* Good: {argument a}
* Good: {argument b}
<!-- use "neutral" if the given argument weights neither for good nor bad -->
* Neutral: {argument c}
* Bad: {argument d}
*<!-- numbers of pros and cons can vary -->

### {title of other option}

{example | description | pointer to more information | …}

* Good: {argument a}
* Good: {argument b}
* Neutral: {argument c}
* Bad: {argument d}
*

<!-- This is an optional element. Feel free to remove. -->
## More Information

{You might want to provide additional evidence/confidence for the decision outcome here and/or
document the team agreement on the decision and/or
define when/how this decision the decision should be realized and if/when it should be re-visited.
Links to other decisions and resources might appear here as well.}
20 changes: 16 additions & 4 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
"@nestjs/platform-express": "^9.0.0",
"@nestjs/schedule": "^4.0.0",
"@nestjs/terminus": "^10.1.1",
"dotenv": "^16.4.5",
"fast-deep-equal": "^3.1.3",
"id128": "^1.6.6",
"mongodb": "^5.8.0",
Expand Down
36 changes: 36 additions & 0 deletions src/app.controller.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import { createTestingModule, randomCode } from '../test/test.helper';
import { AppController } from './app.controller';
import { AppModule } from './app.module';
import sql from './db';
import { ImportService } from './domain/services/import.service';

describe('productupdate', () => {
it('should import message but not refresh products', async () => {
await createTestingModule([AppModule], async (app) => {
const importService = app.get(ImportService);
const importSpy = jest
.spyOn(importService, 'importWithFilter')
.mockImplementation();

const code1 = randomCode();
const updates = [
{
code: code1,
rev: 1,
},
];

const appController = app.get(AppController);
await appController.addProductUpdates(updates);

// Then the import is not called
expect(importSpy).not.toHaveBeenCalled();

// Update events are created
const events =
await sql`SELECT * FROM product_update_event WHERE message->>'code' = ${code1}`;

expect(events).toHaveLength(1);
});
});
});
20 changes: 19 additions & 1 deletion src/app.controller.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
import { Body, Controller, Get, Post, Query, All } from '@nestjs/common';
import { ImportService } from './domain/services/import.service';
import { QueryService } from './domain/services/query.service';
import { RedisListener } from './domain/services/redis.listener';
import { MessagesService } from './domain/services/messages.service';

@Controller()
export class AppController {
constructor(
private readonly importService: ImportService,
private readonly queryService: QueryService,
private readonly redisListener: RedisListener,
private readonly messagesService: MessagesService,
) {}

@Get('importfrommongo')
Expand All @@ -19,7 +23,10 @@ export class AppController {

@Get('scheduledimportfrommongo')
async scheduledImportFromMongo() {
await this.importService.scheduledImportFromMongo();
// Pause redis while doing a scheduled import
await this.redisListener.pauseAndRun(
this.importService.scheduledImportFromMongo,
);
}

parseBoolean(value) {
Expand All @@ -40,4 +47,15 @@ export class AppController {
async select(@Body() body: any, @Query('obsolete') obsolete) {
return await this.queryService.select(body, this.parseBoolean(obsolete));
}

// Temporary code for initial import
messageId = 0;
@Post('productupdates')
async addProductUpdates(@Body() updates: any[]) {
const messages = [];
for (const update of updates) {
messages.push({ id: `0-${this.messageId++}`, message: update });
}
await this.messagesService.create(messages, true);
}
}
2 changes: 2 additions & 0 deletions src/constants.ts
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
export const SCHEMA = 'query';
export const VIEW_USER = 'viewer';
export const VIEW_PASSWORD = 'off';
1 change: 0 additions & 1 deletion src/db.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import postgres from 'postgres';
import { SCHEMA } from './constants';

const sql = postgres({
host: process.env.POSTGRES_HOST,
Expand Down
24 changes: 24 additions & 0 deletions src/domain/domain.module.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import { createTestingModule } from '../../test/test.helper';
import { DomainModule } from './domain.module';
import { ImportService } from './services/import.service';
import { RedisListener } from './services/redis.listener';

describe('refreshProducts', () => {
it('should pause Redis while doing a scheduled reload', async () => {
await createTestingModule([DomainModule], async (app) => {
const importService = app.get(ImportService);
const redisListener = app.get(RedisListener);
jest.spyOn(importService, 'importFromMongo').mockImplementation();
const redisStopSpy = jest
.spyOn(redisListener, 'stopRedisConsumer')
.mockImplementation();
const redisStartSpy = jest
.spyOn(redisListener, 'startRedisConsumer')
.mockImplementation();

await app.get(DomainModule).refreshProducts();
expect(redisStopSpy).toHaveBeenCalledTimes(1);
expect(redisStartSpy).toHaveBeenCalledTimes(1);
});
});
});
Loading

0 comments on commit d3cc926

Please sign in to comment.