Summary day 3

Keypoints

  • Intro to Pandas

    • Lets you construct list- or table-like data structures with mixed data types, the contents of which can be indexed by arbitrary row and column labels

    • The main data structures are Series (1D) and DataFrames (2D). Each column of a DataFrame is a Series

  • Seaborn
    • Seaborn makes statistical plots easy and good-looking!

    • Seaborn plotting functions take in a Pandas DataFrame, sometimes the names of variables in the DataFrame to extract as x and y, and often a hue that makes different subsets of the data appear in different colors depending on the value of the given categorical variable.

  • Batch mode
    • The SLURM scheduler handles allocations to the calculation nodes

    • Batch jobs runs without interaction with user

    • A batch script consists of a part with SLURM parameters describing the allocation and a second part describing the actual work within the job, for instance one or several Python scripts.

    • Remember to include possible input arguments to the Python script in the batch script.

  • Big data

    • allocate resources sufficient to data size

    • decide on useful file formats

    • use data-chunking as technique