This repository has been archived by the owner on Apr 24, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
/
overallDescription.tex
209 lines (155 loc) · 15 KB
/
overallDescription.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
\chapter{Overall Description}
\section{Product perspective}
\subsection{System interfaces}
The interfaces of the bookkeeping system with the rest of the world are:
\begin{itemize}
\item a GUI, for several users the system offers different GUIs
\item a REST-API, the system offers several bindings to work with
\end{itemize}
\subsection{User interfaces}
The bookkeeping system offers users interfaces for several users:
\begin{itemize}
\item shifters
\item physicists
\item (subsystem) run coordinators
\item etc.
\end{itemize}
The user interface will adapt to the needs of the user.
\subsection{Hardware interfaces}
The bookkeeping system has no specific hardware needs above the normal hardware requirements for an administrative system. The client must be able to run on the computers at Point 2.
\subsection{Software interfaces}
The bookkeeping system interacts with several systems already in place or to be developed in the future. For this interaction the bookkeeping system should have facilities such as an API and several bindings.
\subsubsection{Database}
The choice of database is not yet made. Currently two databases are in use:
\begin{itemize}
\item PostgreSQL
\item MySQL
\end{itemize}
The choice of database is not limited to the two mentioned. Given the highly relational nature of the date the development team will probably propose a relational database.
\subsection{Communications interfaces}
No special communication protocol is needed. The bookkeeping system should use a REST API on HTTP. A message queue is needed to decouple the execution of the $O^2$ workflows from any failure in the $O^2$ bookkeeping backend. Furthermore, the bookkeeping system should be able to function while disconnected from the internet.
\subsection{Memory}
The bookkeeping system has no special requirements concerning use of memory.
\subsection{Operations}
Each user has its own specific operation for which he or she uses the system. A shifter uses it to create log entries, a member of the DPG uses it to flag runs on their quality, etc.
\subsection{Site adaptation requirements}
The bookkeeping system will be used on one particular site, i.e. Point 2 where ALICE is located.
\section{Product functions}
\subsection{GUI}
The bookkeeping system should be responsive to several platforms, such as mobile devices, laptops, touch screen devices etc.
\begin{enumerate}
\item An idea would be to have different looks\footnote{From: Grazia Luparello}:
\begin{itemize}
\item shifter look, which can be simple,
\item expert look,
\item another look.
\end{itemize}
\end{enumerate}
\subsection{Run Conditions and Statistics}
The bookkeeping system allows users to view run conditions and statistics either in tabular or detailed view.
\subsection{Data Mining}
The bookkeeping system allows users to view aggregated information (eg. per detector).
\subsection{Data Export}
The bookkeeping system allows users to export the data stored in the logbook to several formats (eg. XML, ASCII, EXCEL).
\subsection{Log Entries and Files}
The bookkeeping system allows users to view/insert Log Entries with optional file attachment.
\subsection{Search and Filtering}
The bookkeeping system allows users to filter the different data stored in the logbook by several search criteria.
\subsection{Access Control}
The bookkeeping system uses the CERN Authentication platform to authenticate the users. Therefore, users should use their NICE credentials.
To log out from the bookkeeping system, users should click on the "Logout" button in the top menu.
\section{User Characteristics}
The bookkeeping systems will have several different users. An overall distinction can be made between users who create input and users who only read data. Another distinction can be made between human and machine users. \footnote{In the week of 11 December Marten Teitsma and Heiko van der Heijden interviewed several people at CERN. A summary of these interviews is given in this section}
\subsection{User}
A user is someone who uses the system for general purpose.
\subsection{Operation room crew}
When ALICE is active there are five persons in the operater room. The run manager stays for two weeks and is 24/7 in the operator room. The crew of four consists of a shift leader, who is coordinator of the crew and three shifters, i.e. a DAQ-shifter, a DCS-shifter and a DQM-shifter. A shift takes eight hours and is done within six days. Each shifter takes two morning shifts, two evening shifts and two night shifts. Some shifters do take only one shift sequence of six days. Shifters have various degrees of experience. Shifters only need the most recent information.
\subsubsection{Shifter}
A shifter is a person who operates a part of ALICE. One of the subsystems his or her responsibility. Currently this shifter is looking at ECS/DAQ-, CTP- or HLT-subsystem. A shifter takes on a shift of eight hours which can be in daytime or nighttime. A shifter is an employee of collaborating member of ALICE. Level of education varies but is most of the time Master or PhD in Physics.
In training the shifter learns that when in doubt a log entry should be made. The EOS-report is an interaction with the (subsystem) run coordinator. A template should be enforced.
Shifters are using Shift Accounting Managing System (SAMS). Shifters should be restricted in their rights.
Shifters do have problems with the electronic logbook:
\begin{itemize}
\item they forget the permission for the logbook: to write you have to be in a list,
\item shifters have to do some training which has to be marked down in SAMS, otherwise they can't do a shift,
\item there should be a coherent way to create entries (a template?),
\item a specific period to make entries,
\item the less freedom the better.
\end{itemize}
The worst people are those who come for six days and then leave. When shifters are not doing their job the right way, they can be reported and loose their credits. This does not happen often, just once or twice since 2009. But of course people make mistakes. There are day by day summary reports. The LHC reports daily. The PARs are reported each week. Shifters have to be able to fill in their report one day (24 hrs) after the end of shift.
\subsubsection{Shift leader}
The shift leader marks runs as good or bad. When detectors do their job a run will be declared as good.
\subsubsection{Run manager}
A run manager is in the operator room at Point 2 for two weeks 24 hours a day.
\subsection{Run coordinator}
The run coordinator is, together with the deputy run coordinator responsible for the operation of ALICE.\footnote{Information provided for in this Section is gathered during an interview with Gracia who was run coordinator in 2017. In 2018 Kristian Gulbrandsen will be run coordinator.} A run coordinator has a full overview of what to do for a whole year. The input for the activities is given by the Physics Board. So they determine which parts of the machine are activated, how the triggers work and what the DAQ should do. They know what is working and what not and take action when things are not in line with the plan. Recurrent problems are seen into.
The current electronic logbook provides all the information needed by the run co\"ordinator. It gives
\begin{itemize}
\item real-time information,
\item summaries (plots), which could be a bit flexible,
\item data taking efficiency in relation to time needed,
\item selected fills,
\item average,
\item other statistical information
\end{itemize}
The run coordinator does not use MonAlisa. The automatic notification done by the subsystems is a good thing. For the End of Shift (EOS) a template for the report would come in handy. An auto-save should be implemented because sometimes a log entry gets lost because of authentication problems, e.g. a screen is open for too long. And sometimes more or less important information is not mentioned because a shifter has forgotten about it.
\subsection{Subsystem Run Coordinator}
A Subsystem Run Coordinator (SRC) is responsible for a detetector or subsystem of ALICE.\footnote{Information provided for in this Section is gathered during an interview with Robert who is subsystem coordinator of TPC.} A SRC needs to have data about this system when called on. All info needed is provided for in the current log system. The fill or run info, i.e. configuration parameters, relates to issues. This info is related to log entries. It is possible now to look into it over a long period.
There is in this area a small line between monitoring and bookkeeping. A difference can be found based on interaction or luminosity rate. Currently specific subsystem data is not stored in the bookkeeping system. It is stored in the DCS.
\subsection{DAQ Run coordinator}
\footnote{Information provided for in this Section is gathered during an interview with Roberto Divia who is DAQ run coordinator.}The subsystem run coordinator gives quality flags. The subsystem run co\"ordinator meets weekly with the run coordinator and other subsystem run coordinators. Then is feedback given on the operational quality. Recurrent problems and the planning are discussed.
There is no correction from offline to online. The Data Quality Monitor (DQM) will become Quality Control and shall be a very small part of $O^2$. In $O^2$ there is only one way: from synchronous to asynchronous. From the High Level Trigger (HLT) the data goes offline. The HLT throws away 99\% of the data. The HLT is a subsystem which will become a shifters task. There will be counters concerning HLT. HLT creates data and less metadata.
The DAQ creates metadata, just as other subsystems. This metadata has to be stored in the bookkeeping system. The electronic logbook is used to describe a run. Problems and human errors are logged. These log entries are post mortem used to find out what happened. The shifter, technician and subsystems themselves are creating this data. On callers also interact with the logbook. The logbook is the basic channel of communication.
The current DAQ-SRC is often called upon to curate End or Run (EOR) reasons, reasons to end the run and thus stop ALICE. He looks for reasons in the electronic logbook. In 99\% of the EORs he succeeds in finding the reason. EOR reasons are given at first by subsystems. Like a stone in a river it is hard to find the wave which causes the other waves. Roberto uses log files, created by the subsystems, to find the reasons. When an error occurs outside a run then it is not shown in the logbook but can be of interest when looking for EOR reasons.
\subsection{Subsystem team member}
Each subsystem is being managed by a team of specialists.
\subsection{Gas technician}
CERN has this service of delivering all kinds of gasses. Technicians who install the equipment can make entries.
\subsection{On call expert}
When ALICE is active, always an expert for a subsystem is available. This expert can be called when the people in the operator room are not able to solve a specific problem.
\subsection{Management}
The management are all the project leaders of $O^2$ for each detector, the spokesperson and run coordinator. In total there are 25 people involved. The management is interested in several subjects:
\begin{itemize}
\item what are the activities in progress
\item who is doing the work (shifts)
\item what are the issues, who is trying to solve these, what errors and issues are actual
\item reports which are generated
\begin{itemize}
\item used for operational meetings
\item statistics over variable length of time (static and dynamic) for irregular meetings
\end{itemize}
\end{itemize}
The management meets every week to talk about subjects of more or less strategic nature, i.e. what are recurrent errors and new issues. The run coordinator and subsystem run coordinator meet daily and weekly and need more operational data.
\subsection{Physics board}
The Physics board meets regularly to discuss the experiment and what to do next.\footnote{Information provided for in this Section is gathered during an interview with Marco van Leeuwen who is coordinator of Physics Board.} The board oversees and coaches the data analysis. The actual analysis is done by PhD-students and postdocs. The board is meeting once a week to discuss resource management and exchange information. The resource management concerns MC-analysis and data reconstruction.
The perspective of the data analist is focused on the specific files he needs for his analysis, e.g. ` which file should I take when I want 13 TeV. There are several categories to discern:
\begin{itemize}
\item energy levels
\item collisions (Pb-Pb, p-p, Pb-p and some other variants)
\item runlist concerning which detectors are used
\item muon arms
\end{itemize}
Questions such as `how many events are detected when a specific detector is working well?' is a typical question of the Physics Board. The same question can be asked about triggers. Triggers can change and with this change the main type of event can change.
Concerning the Monte Carlo simulations the same set of information is needed. MC-simulations are, most often, anchored to a run or period with a specific detector configuration. These simulations can amount to 10 or even 100 simulations for each run configuration. These simulations must be searchable. This event generator creates a lot of collisions which can be of interest. An analysts seeks for photons, jets, strange particles etc.
For a follow-up on the logbook JIRA is used. JIRA is also used for MC-simulations. TWiki is used by the ALICE DPG. In TWiki users make their own lay-out.
The perspective of the Physics Board is concerned with planning and resource management. How to organise the analysis, i.e. data reconstruction and MC-simulation. There is a top-down approach. Reconstruction is driven by the time of production. Simulation is done on demand, which is possible because the demand is not that big.
\subsection{CERN administration officer}
An actor often forgotten is Human Resources who have to be able to access the logbook to read whether somebody has worked as reported.\footnote{Roberto Diva}
\subsection{Data Preparation Group}
Data Preparation Group (DPG) was created for:
\begin{itemize}
\item data preparation, i.e. reconstruction and simulation. This consists of production configuration, setup, calibration monitoring.
\item data processing, i.e. monitoring, execution, reporting, bug tracking.
\item quality assurance, i.e. validation and creating lists of good runs or data sets. Determination whether the data is of good quality or not
\end{itemize}
The goal of the DPG-QA is to provide data sets to analysers
\subsection{Developer}
The developer is a member of the team which develops the system.
\subsection{Observer}
Someone who only has read rights and does nothing in terms of adding. This could be a student developing the bookkeeping system.
\subsection{Administrator}
The administrator is the person who configures the system.
\section{Assumptions and Dependencies}
The bookkeeping system runs on a Linux OS, i.e. CentOS version 7.0
\section{Apportioning of Requirements}
The offline part of the bookkeeping system, i.e. the function currently is done by AliMonitor, has a lower priority than the logging functionality.