-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Battery usage rate & recharge detection #127
Comments
@unruledboy For the partitioning aspect, you could use the Grouping functions as you suggest, but this has limitations, e.g., all devices must abide by the same singular timeline, such that all events must be ordered even across devices. Alternatively, you can use a partitioned query, where each ingressed event is a PartitionedStreamEvent, partitioned by the device Guid. Trill will then execute the query on each device independently, and each device may be on its own timeline, so as long as each device's sub-stream is ordered, things will still work as expected. For detecting the cycles, it depends a bit on how you want to consume your output, since the two pieces of information have very different formats. You could just have two output streams based on the same (multicast) input stream - one for the recharge cycles and one for the usage/min. E.g., to output a single event on every transition (charging -> draining or draining -> charging), you could set up a simple state machine using our Regex/Afa APIs, that track the current cycle's start time/battery level in the accumulation register. |
@peterfreiling You are right about the two different biz logics. I was thinking may I could have use two different queries for corresponding logic, but then, it would have doubling the resources and potentially half of the performance, right? I think the most challenging part is how to "track" the state transition, without false positive detection because of overlapping cycles. |
I'm not sure what you mean by "overlapping cycles". Each device should either be charging or draining at any given point, right? E.g., here's a quick and dirty sample that uses the pattern matching APIs to set up a simple state machine to track the battery cycle information, using BatteryCycleInfo as the register to accumulate state, and egresses each cycle information (starting/ending battery, start/end time) as soon as the device switches from charging->draining or draining->charging.
|
@peterfreiling this sample works like a treat! great thanks for the detailed explanation! one last thing, the output won't flush until we call I also read through all the tests for similar usage but could not find the answer. |
The OnCompleted call was just for test purposes. You have many other options to control the Flush behavior:
|
@peterfreiling I investigated the FlushPolicy before I asked above. I could have created an periodic punctuation (like every 5 minutes) and experimented it to verify if it will behave as I expect. And I just did the test, seems it will correctly process the events and won't lose the state. For the buffer size, do you mean maxDuration (currently 100) ? I tested with hundreds of inputs but the result did not output. BTW, I noticed if I don't use the artificial timestamp values (100, 101, 102), but instead using Thanks, |
By buffer size, i was referring to Config.DataBatchSize, not maxDuration. If no FlushPolicy.None is specified, then each Trill operator will not flush its output batch until it is full with DataBatchSize events. |
Now that I get it working by periodic FlushPolicy, really appreciate all the detailed explanation and patient help. In our biz, we don't simply capture and work on the battery percentage, we also process heaps of other types of data. So even the incoming data structure is same, we will still have to create dedicated ingress/egress stream processors due to different pattern and backed by different state machines. I was wondering what is the best practice for this scenarios, as the biz needs to process hundreds if not thousands of type of data streams with tens of thousands of devices each, to produce corresponding output events. Currently I just fire new a new Task and hook up with the output with downstream handling logic (sending notifications/emails etc.) per pattern. Is there a guide of this parallel processing for Trill? I also noticed that Trill is a single node solution, which I can partition the data per group of devices per node, I just want to know what is the recommendation. BTW, the default Config.DataBatchSize is 80K, which is big, and it is singleton, which means all processors (queries) will be sharing the same setting, in our case, different biz would have different buffer size, so changing it in one app instance would not be ideal, I believe? Thanks, great work! Yours, |
For parallelization, it is generally left to the consumer. I think ideally you would partition the input source, e.g. by device id, and have separate (possibly otherwise identical) parallel Trill pipelines processing each partition on a separate thread/process. If actual input source cannot be partitioned, you can have a dedicated Trill pipeline to perform that partitioning. Since each Trill pipeline will have multiple output sources/formats, you can use Multicast so that each query fork shares as much as possible with the other forks. E.g., all forks are likely able to share the ingressed input, so after ingress, you can call Multicast to send that single input stream to multiple output streams. For the Config.DataBatchSize, it is unfortunately currently configured as a global static, so all Trill pipelines in a given process will share the same batch size. You can work around this by isolating Trill instances to different processes for different batch sizes. |
@peterfreiling thank you very much for all the detailed explanation and the great working sample. It's running quite solid so far in our development environment without partition. However, based on your test case code, if I use PartitionStreamEvent: And use punctuation:
I read through all the sample codes, there is only one that uses the partitioned stream which is the rule engine one, but it does not use any punctuation to flush, and it is not using And I read through all the issues (including the closed ones), there are a few related to the partition, but no one mentioned about that actual usage of the flush for partitioned stream. |
Thanks @unruledboy , there is a bug in our partitioned pattern matching code where matches aren't detected properly. @rodrigoaatmicrosoft has a fix and will send a PR for the issue soon. |
@peterfreiling that's great news to have a fix for it, I will certainly test it out once it is available. Also, as you mentioned, if the incoming events are by streamable and we can use multicast if the ingested values are sharable between biz logics. But implementation like what we have discussed and as in your sample, we use a pub/sub like mechanism, I believe we can't use MultiCast. |
The sample provided is compatible with multicast, but it depends on your query logic, and how much of the query pipeline is shared between the different forks. From the IStreamable that you want to share between the multiple subqueries, you can call Multicast to get an array of IStreamables to be used in the forks. For example, if you wanted to fork immediately after ingress for two subqueries, one to detect cycles and another to track the average battery percentage, you could do something like: var subQueries = ingress.Multicast(2);
var cycleQuery = subQueries[0]
.Detect(pattern, maxDuration: 100, allowOverlappingInstances);
var output = new List<BatteryCycleInfo>();
qc.RegisterOutput(cycleQuery)
.Where(e => e.IsData)
.ForEachAsync(e => output.Add(e.Payload));
var averageQuery = subQueries[1]
.TumblingWindowLifetime(tumbleDuration: 5)
.Average(batteryPercent => batteryPercent);
var averagesOutput = new List<double>();
qc.RegisterOutput(secondQuery)
.Where(e => e.IsData)
.ForEachAsync(e => averagesOutput.Add(e.Payload));
var process = qc.Restore(); |
@peterfreiling aha, now I got it how to do multi cast with this pattern. You are right, this will be more ideal to deal with multiple logics sharing the same set of data. |
@peterfreiling for the multi cast, I end up like this for min/max/avg values per biz logic per minute per device. The beautify of using Trill is I can work out the aggregation values in real-time without using traditional periodic scheduling logic to run at the background.
|
Hi All,
This is an awesome project, I have been observing and learning this project for quite a while, and finally have the perfect candidate for actual usage.
We have a desktop app that sits in lots of clients' machine, observing the battery usage of the laptops.
We know that:
The data come in this structure
Now, what we need to do are:
The tricky part here are:
The picture below illustrates the typical usage of the battery of ONE laptop:
The text was updated successfully, but these errors were encountered: