Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data being very unrealistic and not following some business rules #6

Open
petarjov opened this issue Oct 16, 2020 · 15 comments
Open

Data being very unrealistic and not following some business rules #6

petarjov opened this issue Oct 16, 2020 · 15 comments
Assignees

Comments

@petarjov
Copy link

petarjov commented Oct 16, 2020

En los data de AIMS y AMOS generados por el script hemos encontrado unos problemas que no deberían estar presentes, porque afectan los ETLs y ademas confunden mucho estudiantes cuando analizan los datos.

No creo que sera difícil arreglar estos problemas y lo limpiaría bastante los datos.

Puedes por favor mirarlo con mas urgencia porque ya hemos comenzado con las practicas.

Si ejecutas los siguientes queries, veras que están dando resultados bastante raros.

------------ AIMS
Primero en AIMS, las diferencia entre scheduled arrival y scheduled departure y también entre actual arrivail y actual departure no deberia ser tan grande. Aquí se trata de unos retrasos en nivel de minutos hasta 24 horas, pero no varios días.

select scheduledarrival, scheduleddeparture, scheduledarrival-scheduleddeparture from flights f where date_part('day', age(scheduledarrival, scheduleddeparture))>1 order by scheduledarrival-scheduleddeparture desc

select actualarrival, actualdeparture, actualarrival-actualdeparture from flights f where date_part('day', age(actualarrival, actualdeparture))>1 order by actualarrival-actualdeparture desc

------------ AMOS
En AMOS, en forecastedorders primero el executiondate debería estar dentro del periodo desde planned y deadline. Lo mismo pasa in technicallogbookorders en que executiondate debería estar dentro del periodo entre reportingDat y due.

select executiondate, deadline, planned, date_part('day', age(executiondate,deadline)), date_part('day', age(executiondate,planned)) from forecastedorders where not executiondate between planned and deadline order by case when abs(date_part('day', age(executiondate,deadline))) < abs(date_part('day', age(executiondate,planned))) then abs(date_part('day', age(executiondate,deadline))) else abs(date_part('day', age(executiondate,planned))) end desc;

select executiondate, reportingDate, due, date_part('day', age(due)), date_part('day', age(executiondate,reportingDate)) from technicallogbookorders where not executiondate between reportingDate and due order by case when abs(date_part('day', age(executiondate,due))) < abs(date_part('day', age(executiondate,reportingDate))) then abs(date_part('day', age(executiondate,due))) else abs(date_part('day', age(executiondate,reportingDate))) end desc;

Luego también en AMOS, y esto es el problema mas grave porque luego afecta tambien el ETL, uno de los business rules dice que cada workorders por un aircraft debería estar dentro de al menos un maintenance events (w.executiondate between starttime and starttime+duration).

select * from workorders w where not exists ( select * from maintenanceevents m2 where w.aircraftregistration = m2.aircraftregistration and w.executiondate between starttime and starttime+duration)

Muchas gracias!

Un saludo,
Petar

@petarjov
Copy link
Author

Sorry, for the last one, indeed some entries need to be erroneous, meaning that work order does not appear in the maintenance event time slot, but much less. It should be in the percentage of the bad entries as well actually. Right now, the majority is erroneous.

@diegoquintanav
Copy link
Owner

diegoquintanav commented Oct 16, 2020

could you include the output of the queries with a LIMIT 5? to have an idea of what you are getting

@petarjov
Copy link
Author

You can check the outputs of all 5 queries in the folder https://drive.google.com/drive/folders/1ljP8JFLWBqOFYCDk65yycGvUlp_EkEfg?usp=sharing

Please use your UPC account to access it. I sent them as well by email.

They are in the order I wrote them in the message.

Notice that for AMOS3 it is not that the output is worng, but that only a small part of the data should be wrong and without overlap between workorder and maintenance_event, and the majority should be correct, thus that workorders are inside the time slot of maintenance event.

@diegoquintanav
Copy link
Owner

1/3 done

@diegoquintanav
Copy link
Owner

working on the other two, I'll be pushing the changes soon

sorry for the wait :)

diegoquintanav added a commit that referenced this issue Oct 21, 2020
…s between scheduledarrival- scheduleddepartureand actualarrival-actualdeparture to something below one dayintroduces two testssee first issue in #6
diegoquintanav added a commit that referenced this issue Oct 21, 2020
also, adds tests

related to #6
diegoquintanav added a commit that referenced this issue Oct 21, 2020
diegoquintanav added a commit that referenced this issue Oct 21, 2020
@diegoquintanav
Copy link
Owner

2/3

@diegoquintanav diegoquintanav self-assigned this Oct 21, 2020
@diegoquintanav
Copy link
Owner

diegoquintanav commented Oct 22, 2020

Luego también en AMOS, y esto es el problema mas grave porque luego afecta tambien el ETL, uno de los business rules dice que cada workorders por un aircraft debería estar dentro de al menos un maintenance events (w.executiondate between starttime and starttime+duration).

@petarjov Estoy un poco confundido, ¿Cuál esta business rule? ¿A través de qué parámetros se une un workorder con un maintenance event? De momento tengo esto.

image

@petarjov
Copy link
Author

no tiene un foreign key explicito pero se une con aircraft registration y que el tiempo de ejecucion esta dentro del tiempo de maintenance event. es este linea roja en el diagrama.
Lo importante es que un work order partenece a un maintenence event, es decir para el mismo aircraft registration executiondate tiene que estar dentro del intervalo starttime y starttime+duration de un maintenence event.

Tienes razon, quizas no esta explicitamente escrito en los business rules pero es la linea roja en el diagrama que deberia estar cumplida por los datos. Bueno un porcentaje (bad) puede estar erroneo, pero en los datos generados es al reves, casi 90% esta erroneo.

@diegoquintanav
Copy link
Owner

In other words,

image

I have these dependencies between entities at the moment. I can trace back a maintenance event from a work order because they share a workpackageid on one side and executiondate and executionplace on the other, but that's it, a link by these columns is very weak. I think I'm missing something here.

Sorry I changed back to english, I edited the message and now I realize I started it in spanish

@petarjov
Copy link
Author

There is no reference from maintenance event to work package actually. Where does it come from?
Maintenance event can include several work orders and each work order is inside one maintenance event. However, this reference is not explicit. It is represented as I said above by the same aircraft registration and the fact that the execution date of the work order is inside the time interval of the maintenance event (startdate, startdate + duration).

@diegoquintanav
Copy link
Owner

diegoquintanav commented Oct 22, 2020

alright, I'll give it a try again. Just to make sure, considering that prob_bad = 0.2, that means that from the query

select
	*
from
	"AMOS".workorders w
where
	not exists (
	select
		*
	from
		"AMOS".maintenanceevents m2
	where
		w.aircraftregistration = m2.aircraftregistration
		and w.executiondate between starttime and starttime + duration
		)

I should get 20 rows out of, say, 100 being inserted from the generator right? in the same way, if prob_bad = prob_noisy = 0, then this query should return nothing?

@petarjov
Copy link
Author

Exactly!

@petarjov
Copy link
Author

Right now if you do count(*) 96429 out of 100000 are violating this relation.

@diegoquintanav
Copy link
Owner

There is no reference from maintenance event to work package actually. Where does it come from?

I may need to check my notes again, but that was the way I was "linking" a Maintenance event and a workorder, behind the curtains.

diegoquintanav added a commit that referenced this issue Oct 22, 2020
diegoquintanav added a commit that referenced this issue Oct 27, 2020
diegoquintanav added a commit that referenced this issue Oct 27, 2020
I also updated the business rules to reference the changes individually

pending workpackages and attachments

wip in issue #6
@diegoquintanav
Copy link
Owner

image

Baby steps

diegoquintanav added a commit that referenced this issue Nov 4, 2020
this relates to #6, but we need to confirm change is correct yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants