- Verify new topics
- Register new workers(nodes)
- Ask workers about load , memory usage , and network
- Use the node info to decide where to send a new job
- Supervisor of workers
- Save configuration about how to partition the data provided by kafka. For example, If the message has a date field it can divide the message by hour or by directory partition like /date/client/product
- Rest api to config a topic. Ex: (MaxMessageInFlight, how to partition)
- Dynamic change configuration, stop worker, and begin with new configuration from the offset
- Should notify if the cluster is in heavy use
- Show launch new instances if needed.
- Should shutdown instances if able to
- Should be able to register a worker for a specific topic (For example if some topic is already being send to s3, we should be able to register a node and ask for the same topic so the worker can save it on cassandra for example)
- The master should be able to recover from a crash
- Where to save the configuration? Every master with a liteSql and duplicate?
- should be able to receive the messages from a topic, and save it to s3 (future (cassandra, elastic search))
- Should receive from the master how to partition the data, and how to commit
- Should be able to tell the master that the lag is getting higher, so the master can try to set up a new node
- Order file by some atribute
- Secor
- Bifrost
- Camus
Version 0.1 -> master finding topics, registering workers, ask workers to process topic, save file with size X, compress and send to s3
start Master start Worker start Singleton
In SBT, just run docker:publishLocal
to create a local docker container.
To launch the first node, which will be the seed node:
$ docker run -i -t --rm --name seed kuhnen/processor:0.1
To add a member to the cluster:
$ docker run --rm --name c1 --link seed:seed -i -t kuhnen/processor:0.1