You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been thinking that the core of Nitro API could be structured to be more pipeline-oriented. Previously, we were discussing what information backends should extract and what should their responsibilities be. My understanding is that we want Nitro to be a flexible framework for all kinds of different use cases. What ever we decide regarding what analysis backends should do is likely to be wrong for someones use case. In the current architecture, backends are kind of special as they are automatically created by the Nitro class (when analysis is enabled).
What if the core Nitro API was structured more like a pipeline where events would pass through various extraction steps where each extractor could alter/extend the events with more information. Nitro's patched KVM could act as a source for the pipeline, but the could be other sources like mock events for testing purpose or other mechanism for extracting system calls from VMs. The API could allow the user to define a processing pipeline where they could select just the functionality they require. If they did not need to know, lets say, the process associated with a particular event, they would simply leave that extractor out of the pipeline.
In this architecture, some of the processing steps could of course depend on other pipeline components having happened before them. For example, a step extracting detailed user information associated with a processes could depend on a step that extracted process descriptors. In a more statically-typed language you could probaly even represent these requirements in the type signatures for the pipeline processing steps, but even if we do not get that benefit I think this would still make the internal architecture cleaner.
This would not necessarily be a huge change, more like a cosmetic change to the user facing API. Currently, the user of the framework initializes a Nitro object which automatically brings in a backend if anlysis is enabled. What I envision, is an API where the user defines a pipeline (I want this event source and then I want to chain the events into this extractor and then the output from that to this other thing... etc) and starts the pipeline. Each step in the pipeline will process events from the previous source and emit new emits or produce some side effects. At the end of the pipeline we could have sinks that dump the produced events into a database or JSON or something. Basically, this would put other analysis steps in a more equal footing with the backends as the current backends would be "just another step in the analysis pipeline". If we wanted to, we could of course go even further and split up the current backend code into smaller pieces that extracted individual bits of information from events.
This change wouldn't necessary enable anything that the user cannot already do. However, it would enable us to not have to worry about question like what should the backend do as it wouldn't matter as much because the user of the framework could construct their pipeline however they like and choose whatever functionality they need. Additionally, I think it this architecture would be a bit cleaner as backends would be no different from other analysis steps.
I do not have a patch to demonstrate this idea right now and I opened this issue mostly to facilitate discussion about the idea. Do you think changing the user facing API this way would be sensible?
The text was updated successfully, but these errors were encountered:
I don't really mind changing the user API, as long as the modification is really valuable.
I understand your point of view about the Pipeline, and it would give more flexibility than our so called Backends.
This will be easier for us to develop, break the semantic translation layer into smaller code, and facilitate the unit testing as you stated.
However, this is a huge change in the codebase, and i would rather stabilize what we are already trying to achieve with Nitro than a big refactoring for the sake of engineering ;)
I've been thinking that the core of Nitro API could be structured to be more pipeline-oriented. Previously, we were discussing what information backends should extract and what should their responsibilities be. My understanding is that we want Nitro to be a flexible framework for all kinds of different use cases. What ever we decide regarding what analysis backends should do is likely to be wrong for someones use case. In the current architecture, backends are kind of special as they are automatically created by the Nitro class (when analysis is enabled).
What if the core Nitro API was structured more like a pipeline where events would pass through various extraction steps where each extractor could alter/extend the events with more information. Nitro's patched KVM could act as a source for the pipeline, but the could be other sources like mock events for testing purpose or other mechanism for extracting system calls from VMs. The API could allow the user to define a processing pipeline where they could select just the functionality they require. If they did not need to know, lets say, the process associated with a particular event, they would simply leave that extractor out of the pipeline.
In this architecture, some of the processing steps could of course depend on other pipeline components having happened before them. For example, a step extracting detailed user information associated with a processes could depend on a step that extracted process descriptors. In a more statically-typed language you could probaly even represent these requirements in the type signatures for the pipeline processing steps, but even if we do not get that benefit I think this would still make the internal architecture cleaner.
This would not necessarily be a huge change, more like a cosmetic change to the user facing API. Currently, the user of the framework initializes a Nitro object which automatically brings in a backend if anlysis is enabled. What I envision, is an API where the user defines a pipeline (I want this event source and then I want to chain the events into this extractor and then the output from that to this other thing... etc) and starts the pipeline. Each step in the pipeline will process events from the previous source and emit new emits or produce some side effects. At the end of the pipeline we could have sinks that dump the produced events into a database or JSON or something. Basically, this would put other analysis steps in a more equal footing with the backends as the current backends would be "just another step in the analysis pipeline". If we wanted to, we could of course go even further and split up the current backend code into smaller pieces that extracted individual bits of information from events.
This change wouldn't necessary enable anything that the user cannot already do. However, it would enable us to not have to worry about question like what should the backend do as it wouldn't matter as much because the user of the framework could construct their pipeline however they like and choose whatever functionality they need. Additionally, I think it this architecture would be a bit cleaner as backends would be no different from other analysis steps.
I do not have a patch to demonstrate this idea right now and I opened this issue mostly to facilitate discussion about the idea. Do you think changing the user facing API this way would be sensible?
The text was updated successfully, but these errors were encountered: