-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data being very unrealistic and not following some business rules #6
Comments
Sorry, for the last one, indeed some entries need to be erroneous, meaning that work order does not appear in the maintenance event time slot, but much less. It should be in the percentage of the bad entries as well actually. Right now, the majority is erroneous. |
could you include the output of the queries with a |
You can check the outputs of all 5 queries in the folder https://drive.google.com/drive/folders/1ljP8JFLWBqOFYCDk65yycGvUlp_EkEfg?usp=sharing Please use your UPC account to access it. I sent them as well by email. They are in the order I wrote them in the message. Notice that for AMOS3 it is not that the output is worng, but that only a small part of the data should be wrong and without overlap between workorder and maintenance_event, and the majority should be correct, thus that workorders are inside the time slot of maintenance event. |
1/3 done |
working on the other two, I'll be pushing the changes soon sorry for the wait :) |
…s between scheduledarrival- scheduleddepartureand actualarrival-actualdeparture to something below one dayintroduces two testssee first issue in #6
also, adds tests related to #6
2/3 |
@petarjov Estoy un poco confundido, ¿Cuál esta business rule? ¿A través de qué parámetros se une un workorder con un maintenance event? De momento tengo esto. |
no tiene un foreign key explicito pero se une con aircraft registration y que el tiempo de ejecucion esta dentro del tiempo de maintenance event. es este linea roja en el diagrama. Tienes razon, quizas no esta explicitamente escrito en los business rules pero es la linea roja en el diagrama que deberia estar cumplida por los datos. Bueno un porcentaje (bad) puede estar erroneo, pero en los datos generados es al reves, casi 90% esta erroneo. |
In other words, I have these dependencies between entities at the moment. I can trace back a maintenance event from a work order because they share a workpackageid on one side and executiondate and executionplace on the other, but that's it, a link by these columns is very weak. I think I'm missing something here. Sorry I changed back to english, I edited the message and now I realize I started it in spanish |
There is no reference from maintenance event to work package actually. Where does it come from? |
alright, I'll give it a try again. Just to make sure, considering that prob_bad = 0.2, that means that from the query select
*
from
"AMOS".workorders w
where
not exists (
select
*
from
"AMOS".maintenanceevents m2
where
w.aircraftregistration = m2.aircraftregistration
and w.executiondate between starttime and starttime + duration
) I should get 20 rows out of, say, 100 being inserted from the generator right? in the same way, if prob_bad = prob_noisy = 0, then this query should return nothing? |
Exactly! |
Right now if you do count(*) 96429 out of 100000 are violating this relation. |
I may need to check my notes again, but that was the way I was "linking" a Maintenance event and a workorder, behind the curtains. |
the id was all wrong related to #6
I also updated the business rules to reference the changes individually pending workpackages and attachments wip in issue #6
this relates to #6, but we need to confirm change is correct yet.
En los data de AIMS y AMOS generados por el script hemos encontrado unos problemas que no deberían estar presentes, porque afectan los ETLs y ademas confunden mucho estudiantes cuando analizan los datos.
No creo que sera difícil arreglar estos problemas y lo limpiaría bastante los datos.
Puedes por favor mirarlo con mas urgencia porque ya hemos comenzado con las practicas.
Si ejecutas los siguientes queries, veras que están dando resultados bastante raros.
------------ AIMS
Primero en AIMS, las diferencia entre scheduled arrival y scheduled departure y también entre actual arrivail y actual departure no deberia ser tan grande. Aquí se trata de unos retrasos en nivel de minutos hasta 24 horas, pero no varios días.
select scheduledarrival, scheduleddeparture, scheduledarrival-scheduleddeparture from flights f where date_part('day', age(scheduledarrival, scheduleddeparture))>1 order by scheduledarrival-scheduleddeparture desc
select actualarrival, actualdeparture, actualarrival-actualdeparture from flights f where date_part('day', age(actualarrival, actualdeparture))>1 order by actualarrival-actualdeparture desc
------------ AMOS
En AMOS, en forecastedorders primero el executiondate debería estar dentro del periodo desde planned y deadline. Lo mismo pasa in technicallogbookorders en que executiondate debería estar dentro del periodo entre reportingDat y due.
select executiondate, deadline, planned, date_part('day', age(executiondate,deadline)), date_part('day', age(executiondate,planned)) from forecastedorders where not executiondate between planned and deadline order by case when abs(date_part('day', age(executiondate,deadline))) < abs(date_part('day', age(executiondate,planned))) then abs(date_part('day', age(executiondate,deadline))) else abs(date_part('day', age(executiondate,planned))) end desc;
select executiondate, reportingDate, due, date_part('day', age(due)), date_part('day', age(executiondate,reportingDate)) from technicallogbookorders where not executiondate between reportingDate and due order by case when abs(date_part('day', age(executiondate,due))) < abs(date_part('day', age(executiondate,reportingDate))) then abs(date_part('day', age(executiondate,due))) else abs(date_part('day', age(executiondate,reportingDate))) end desc;
Luego también en AMOS, y esto es el problema mas grave porque luego afecta tambien el ETL, uno de los business rules dice que cada workorders por un aircraft debería estar dentro de al menos un maintenance events (w.executiondate between starttime and starttime+duration).
select * from workorders w where not exists ( select * from maintenanceevents m2 where w.aircraftregistration = m2.aircraftregistration and w.executiondate between starttime and starttime+duration)
Muchas gracias!
Un saludo,
Petar
The text was updated successfully, but these errors were encountered: