NAISS SENS and Bianca

Objectives

  • We’ll briefly get an overview of kinds of sensitive data

  • … and the Bianca system

The Bianca workshop

Sensitive personal data

Apply for project

Open NAISS SENS Rounds

Bianca

  • Bianca is a great platform for computationally intensive research on sensitive personal data. It can also be useful for:

    • national and international collaboration on sensitive personal data (without a high compute need)

    • other types of sensitive data

  • Bianca is not good for:

    • storing data

    • publishing data

      • unless the dataset is very popular among Bianca users, e.g. Swegen, SIMPLER

Bianca’s design

  • Bianca was designed to:

    • make accidental data leaks difficult

    • make correct data management as easy as possible

    • emulate the HPC cluster environment that SNIC/NAISS users were familiar with

    • provide a maximum amount of resources

    • satisfy regulations.

Bianca has no Internet

… but we have “solutions”

Image

  • Bianca is only accessible from within Sunet (i.e. from university networks).

  • Use VPN outside Sunet. Link to VPN for UU

    • You can get VPN credentials from all Swedish universities.


  • The whole Bianca cluster (blue) contains hundreds of virtual project clusters (green), each of which is isolated from each other and the Internet.

  • Data can be transferred to or from a virtual project cluster through the Wharf, which is a special file area that is visible from the Internet.

The log in steps

  1. When you log in to https://bianca.uppmax.uu.se, your SSH or ThinLinc client first meets the blue Bianca login node.

    • <username>-<projid>@bianca.uppmax.uu.se

    • like: myname-sens2016999@bianca.uppmax.uu.se

  2. After checking your 2-factor authentication, this server looks for your virtual project cluster.

  3. If it’s present, then you are transferred to a login prompt on your cluster’s login node. If not, then the virtual cluster is started.

  4. Inside each virtual project cluster, by default there is just a one-core login node. When you need more memory or more CPU power, you submit a job (interactive or batch), and an idle node will be moved into your project cluster.

Data transfers:

Software

  • Modules library (almost same as Rackham)

  • Local Conda repository

  • Local Perl modules

  • Local R packages

  • More info at Bianca user guide

ThinLinc

Image

Introduction course

Keypoints

  • If you handle sensitive data, apply for a NAISS-SENS project

  • SENS projects will get accounts on Bianca

  • Bianca has no internet itself but there are solutions like:

    • wharf

    • transit server

    • many installed software

  • Ask support if you need additional software tools