CMSPopularity

CMSPopularity is a community project to cover various aspects of CMS popularity via data-stream aggregation on HDFS.

Introduction

We'll use CMSSpark package to produce and collect various metrics from HDFS. These metrics represent use activities with various CMS data-services. For description of data-services and available data please refer to CMSSpark package.

For previous effort to aggregated different metrics please refer to summer student reports.

So far we feed data into CERN MONIT system with the following dashboards:

Data Popularity Scrutiny Plot Specification

This histogram shows dataset usage by CMS jobs. The bins of the plot are labeled by number of accesses. One access is equal to reading 100% of the events or files in the dataset. The 1-bin includes any non-zero reading < 150% of the dataset. Higher accesses are rounded to the nearest integer. The 0-bin contains datasets created during the period but not used. The 0-old bin contains datasets created before the period but not used.

Each bin is broken into three sub-bins that cover the last three months, the last six months, and the full time period of the plot.

Each bin is weighted by the dataset sizes in the bin. The size calculation starts with the average replica size at a site, which is the daily weighted average size of the dataset during the time it is present at a site. Then the average sizes are summed for each day of the dataset’s lifetime over all the sites where replicas are located. This sum is divided by the number of days in the period to give an overall daily weighted average of the CMS disk space taken up by the dataset during the period.

The plot is usually limited to showing data for datasets on T1s and T2s.

Tasks:

In this project we'll follow the following tasks

References:

CMS popularity CMS data-management CMSSpark PySpark

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
PopularityPlot		PopularityPlot
ScrutinyPlot		ScrutinyPlot
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMSPopularity

Introduction

Tasks:

References:

About

Releases

Packages

Contributors 4

Languages

dmwm/CMSPopularity

Folders and files

Latest commit

History

Repository files navigation

CMSPopularity

Introduction

Tasks:

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages