Skip to content

Fake Factor calculation

In this step the fake factors are calculated. This should be run after the preselection step.

All information for the FF calculation step is defined in a configuration file in the configs/ANALYSIS/ERA/ folder using the common_settings.yaml and a more specific config file. The common_settings.yaml has to be named like that and is used for all steps of the fake factor estimation (preselection, FF calculation, FF corrections).
The FF calculation config has the following parameters:

General options for the calculation:

parameter type description
channel string tau pair decay channels ("et", "mt", "tt")
use_embedding bool True if embedded sample should be used, False if only MC sample should be used
use_center_of_mass_bins bool Changes the x-data that is entering FF and correction calculation. If set then a center of mass value is used for the x-data, calculated from events entering the corresponding bin. If not set, the bin centers are used. Default is set to True.

This will not affect FF and correction calculation that are set to "binwise" (the x-data values although displayed in plots are not used)

In target_processes the processes for which FFs should be calculated (normally for QCD, Wjets, ttbar) are defined.
Each target process needs some specifications:

parameter type description
split_categories dict names of variables for the fake factor measurement in different phase space regions
  • the FF measurement can be split based on variables in 1D or 2D (1 or 2 variables)
  • each category/variable has a list of orthogonal cuts (e.g. "njets" with "==1", ">=2")
  • "njets", "nbtag", "tau_decaymode_2" or "deltaR_ditaupair" are already possible, other variables should be added during preprocessing step accordingly
  • at least one inclusive category needs to be specified (assuming variable is written out in preselection step)
If a continous variable is used a window can be defined as ">=lower#&&#<upper" accordingly.
split_categories_binedges dict bin edge values for each split_categories variable.
The number of bin edges should always be N(variable cuts)+1
SRlike_cuts dict event selections for the signal-like region of the target process
ARlike_cuts dict event selections for the application-like region of the target process
SR_cuts dict event selections for the signal region (normally only needed for ttbar)
AR_cuts dict event selections for the application region (normally only needed for ttbar)
var_dependence string variable the FF measurement should depend on (normally pt of the hadronic tau e.g. "pt_2")
var_bins list or dict[list] bin edges for the variable specified in var_dependence.
Can either be a list representing the binning or a dictionary of lists, where keys correspond to the string representations of split categories defined in split_categories.
In the case of two split categories, the dictionary can be nested. If not all second split category elements share the first split category binning, the binning for the affected category must be specified separately. When using split binning, at least the first split category's bin edges must be fully defined.
fit_option list a list of polynomials that should be considered for the fake factor fits can be defined with this parameter (default: ["poly_1"]); besides polynominal fits, it is possible to use "smoothed" (applies a gaussian density kernel), "binwise" or "skip". First two can also be combined into e.g. "binwise#[0,]+smoothed", "binwise#[-1,-2,]+smoothed" or "binwise#[0,]#[-1,]+smoothed" where the nth-bin(s) is computed using binwise method and the rest in smoothed manner. For left bins, positive bin index is used. For last bins, the count is done using negative integers, starting at -1 (like in the example). If both options are provided, then the binwise calculation is applied for the stated left and right bins. The "skip" option simply returns 1.0 for all events; this is intended for subspaces where a variable implies no events (e.g. a jet variable when njets=0), ensuring compatibility with other fake factors in the chain.
bandwidth float if fit_option includes "smoothed" this value can be set to adjust for the bandwidth used during smoothing procedure (has no effect on the result in case of "binwise"). If not set the default value of histogram range divided by 5 is chosen.
Can either be a float value representing the bandwidth for all fake factors or a dictionary of float values, where keys correspond to the string representations of split categories defined in split_categories.
limit_kwargs dict this dictionary allows to define how the fitted function and its uncertainty are handled (also outside the measurement range); the default is that outside the range the fake factor functions stays constant but the up and down variations still increase/decrease; additionally negative fake factor values are not allowed and if present are set to 0

Event selections can be defined the same way as in the preselection step event_selection. Only the tau vs jet ID cut is special because the name should always be had_tau_id_vs_jet (or had_tau_id_vs_jet_* in tt channel), this is needed to read out the working points from the cut string and apply the correct tau vs jet ID weights.

In process_fractions specifications for the calculation of the process fractions are defined.

parameter type description
processes list sample names (from the preprocessing step) of the processes for which the fractions should be stored in the correctionlib json, the sum of fractions of the specified samples is 1.
split_categories dict see target_processes (only in 1D)
AR_cuts list see target_processes
SR_cuts list see target_processes, (optional) not needed for the fraction calculation

Note: When using split binning for process fraction calculations, the var_bins parameter can also be defined in the same manner as for target_processes.

To run the FF calculation step, execute the python script and specify the config file (relative path possible):

python ff_calculation.py --config-file PATH/CONFIG.yaml