-
Notifications
You must be signed in to change notification settings - Fork 47
Processing Big Data with Titanoboa
Thanks to its distributed nature, titanoboa is well predisposed for Big Data processing.
You can also fine-tune how performant and robust your Big Data processing will be - based on your job channel configuration - if you are using a job channel that is robust and highly-available so will be your big data processing.
If on the other hand you are using a job channel that does not persist messages you will probably have more performant set up (but less robust). ...And of course you can combine these two approaches (it is perfectly possible to use multiple job channels and core systems in one titanoboa server.
Ultimately, if you use SQS queue as a job channel, your processing can have unlimited scalability while being most robust - your titanoboa servers can be located across multiple regions and availability zones!
But lets not get ahead of ourselves and start from the beginning:
There are two workflow step supertypes that are designed exactly for purpose of processing large(r) datasets:
- :map - based on a sequence returned by this step's workload function, many separate atomic jobs are created
- :reduce - performs reduce function over results returned by jobs triggered by a map step