Summary day 3

Keypoints

Intro to Pandas
- Lets you construct list- or table-like data structures with mixed data types, the contents of which can be indexed by arbitrary row and column labels
- The main data structures are Series (1D) and DataFrames (2D). Each column of a DataFrame is a Series
Seaborn
- Seaborn makes statistical plots easy and good-looking!
- Seaborn plotting functions take in a Pandas DataFrame, sometimes the names of variables in the DataFrame to extract as x and y, and often a hue that makes different subsets of the data appear in different colors depending on the value of the given categorical variable.
Batch mode
- The SLURM scheduler handles allocations to the calculation nodes
- Batch jobs runs without interaction with user
- A batch script consists of a part with SLURM parameters describing the allocation and a second part describing the actual work within the job, for instance one or several Python scripts.
- Remember to include possible input arguments to the Python script in the batch script.
Big data
- allocate resources sufficient to data size
- decide on useful file formats
- use data-chunking as technique