Research Computing Systems

Research Computing Systems at the Geophysical Institute provides advanced computing, storage, data sharing solutions and research IT support to University of Alaska research communities, collaborators and supporters.

Services

Information for new users

RCS computing and storage resources are available upon request to University of Alaska faculty, staff, and sponsored individuals. Please see Getting Access for more information.

Our service rates for FY18 are available on our RCS Service Rates FY18 page

If your recent work was made possible by RCS resources, please see our
Citation and Acknowledgement
page.

Notifications

Services

High Performance Computing

Storage

  • Digdug

    Lustre File System

    Dig Dug is a non-commodity cluster hosting 275 Terabytes of fast-access scratch storage for the RCS HPC clusters. UA researchers utilize this resource for program compilation, data manipulation and massive short-term file input/output storage. Dig Dug
    serves the Gemini and Infiniband interconnects and Ethernet connection to the HPC clusters.

  • Bigdipper

    Sun SPARC Enterprise T5440 Server

    Bigdipper hosts the long term mass file storage for RCS user data. Offering 214 Terabytes of local disk cache and management of the long term file writing and staging requests to tape, Bigdipper works in conjunction with the automated tape library to
    offer access to over 7 Petabytes of file storage.

  • Galaga

    IBM System Storage TS3500 Tape Library

    Galaga is home to over seven Petabytes of enterprise-scale, long-term data storage on the UAF campus. Hosting an automated tape cartridge system with an available tape slot count of over 2,600, this system presents a seemingly unlimited file storage
    solution to the UA community.

  • Interested in using these resources?

    Visit the RCS User Access page to learn how.

Labs and Classrooms

RCS maintains three labs and classrooms on the UAF campus.

  • Remote Sensing Lab

    WRRB 004. This lab contains 16 student workstations running Windows 10 Pro plus an instructor's station and projector. Installed software supports GIS and Remote Sensing exercises and learning.

    WRRB 004 Schedule

    Schedule Request

  • Mac Classroom

    WRRB 009. RCS is hosting 12 Mac Pro Systems in the 009 West Ridge Research Building. Use of these systems is available to staff, faculty and students affiliated with a UAF class or project. Access to the room is available 24 hours a
    day, 7 days a week, upon request. Room access is controlled via PolarExpress card entry.

    WRRB 009 Schedule

    Schedule Request

  • Linux Lab

    WRRB 004. RCS maintains an HPC visualization lab with six workstations running Red Hat Enterprise Linux in 004 West Ridge Research Building. These machines are available to all RCS HPC users.

    Request Access

    Remote Login

Licensed / Proprietary Software

Proprietary software licenses hosted by RCS are made available to the UAF campus network. Please see the below links (log in using your UA credentials) for more information.

RCS also provides a centralized copies of software and data that are valuable to RCS or the UAF community. They are made available at mirrors.rcs.alaska.edu as a shared resource. These are mirrored
"as is" from publicly available repositories.

Here is a quick list of the packages on mirrors.rcs.alaska.edu:

  • ArchAssault
  • CentOS
  • Ubuntu

Request Access

Our mission is to enable the success of the UA research community, its collaborators, and its supporters whose work demands advanced computing, storage and data sharing solutions. Here is how you can take advantage of those services to further your research.

  • I am faculty/staff directly affiliated with UA

    UA faculty/staff may request project accounts from RCS. Please fill out a project account application form. If you do not already have an RCS account, you will also need to fill
    out a user account application form.

  • I am not faculty/staff, but have direct UA affiliation

    Only UA faculty and staff may request use of RCS services. To use our services, you will need to identify a UA faculty/staff member willing to include you on a new or existing RCS project account. Please fill out a user account application form once you know which project account (or pending project account) you will be included under.

  • My colleague is a UA affiliate, but I am not

    You will need the following:

    • An RCS user account
    • A UA guest account, through which you will manage your login credentials

    If your UA colleague has a project account with us, you can request membership in it and be eligible for an RCS user account by filling out a user account application form.

    To obtain a UA guest account, coordinate with your UA colleague to fill out UAF OIT's account request form: https://www.alaska.edu/oit/services/account-management/forms/formMemberAffiliateAccountRequestForm.pdf

    • Guest: Complete the top section of the form, then sign the bottom
    • Guest: In the field labeled "Last 4 Digits of SSN", provide any four-digit number of your choice. You will need to remember this number when resetting your password.
    • UA sponsor: In the "ACCESS REQUESTED BY SPONSOR" section, check the "Other" box, then enter "EDIR/Authserv/LDAP"
    • UA sponsor: Enter an expiration date in the "Sponsor Specified Expiration Date." Leaving the field blank will default to 12 months
    • UA sponsor: Sign and date the form as the affiliates's sponsor
    • UA sponsor: Email the completed form to RCS

Project Account Application

University of Alaska username
Note: Only select Long-Term Data Storage if you are not also requesting HPC access.

Project Account Agreement

By submitting this form, you understand and agree to the following terms:

I understand the information I submitted on this form will be used to evaluate my eligibility for an RCS resource grant. I attest the information submitted on this form is true and correct to the best of my knowledge. If I am granted access to RCS systems I will read, understand, and abide by the rules, policies, and procedures regarding proper use of RCS resources.

Eligibility of project members using RCS resources is contingent on them conducting work consistent with the project description provided herein.

Commercial software availability is contingent on funding provided by projects, academic departments, or the University of Alaska. Some software licenses may have access restrictions which limit the use of and access to the software package.

Access to RCS resources is contingent on my affiliation with the University of Alaska. Should that affiliation end, I understand that access to the resources assigned to this project may be terminated or reassigned at the discretion of RCS.

Should this resource grant proposal be accepted, I also agree to include the acknowledgement found at http://www.gi.alaska.edu/research-computing-systems/citing in any publications which result from research supported by this project grant.

User Account Application

University of Alaska username
If you do not know the project ID, please contact your principal investigator. If this submission is accompanied by a project account submission, please enter "Pending".
If you have multiple UA affiliations, please select the one that best describes your status on the project you are joining.

RCS Account Agreement

The following paragraph applies to the Principal Investigator only:

Annual Report

I understand that as a condition of continued access to RCS resources, I will submit a brief annual report on this project describing research objectives, computational methodology, current results and significance. This report shall include references to all publications (per: Credit paragraph below).

The following paragraphs apply to all users:

RCS Account Policies

I acknowledge personal responsibility for my account credentials and understand that I am responsible for their security. I will protect my account from misuse. I will not share my account or its credentials with anyone for any reason. I hereby attest that I have read, understand, and agree to be bound by the rules, policies, and procedures regarding my access to RCS resources. I agree to report to RCS any problems I encounter while using RCS systems, or any misuse of accounts or passwords by other persons which may occur and come to my knowledge. I understand that RCS will investigate each incident.

Restrictions and Auditing

I will not execute, copy or store copyrighted or proprietary software or information on RCS systems without proper authorization. I understand that I am not allowed to process sensitive or classified information on RCS resources. I understand that as a user of RCS systems, my activities are audited and that misuse of RCS resources may result in disciplinary action and/or revocation of current, or denial of future computing privileges.

Credit

I agree to include the acknowledgement found at http://www.gi.alaska.edu/research-computing-systems/citing in any publications that result from research supported by this grant. I will submit a copy of each of these publications to RCS (the copies can be included with the annual report).

Availability

I understand that RCS makes a reasonable attempt to ensure the availability and integrity of my data and software through regular on-site backups of my home directory. RCS does not maintain off-site backups of user data and I agree to assume all responsibility for the risk of loss of my data and software--regardless of the cause of that loss. I understand that RCS makes a reasonable attempt to ensure the availability of its HPC resources and that periodically, any or all of these systems may go down for scheduled or unscheduled maintenance. I agree that it is my responsibility to ensure the recoverability of my data, wherever feasible, should any of my jobs be unexpectedly restarted or lost due to a downtime.

By submitting this form, you agree to the above terms and conditions.

Rates and Policies

Our service rates for FY18 are available on our RCS Service Rates FY18 page

If your recent work was made possible by RCS resources, please see our
Citation and Acknowledgement
page.

Service Rates

Research Computing Systems - FY18 Rates

High Performance Computing

Community Condo Model Node Prices (Chinook)

Node Type
Description
Approximate Cost
Standard Compute Node Relion 1900: 28-core
Intel E5-2690v4 processors with 128 GB RAM, 3-year warranty
~$9,200
Relion 1900: 28-core
Intel E5-2690v4 processors with 128 GB RAM, 4-year warranty
~$9,500
Relion 1900: 28-core
Intel E5-2690v4 processors with 128 GB RAM, 5-year warranty
~$9,800
BigMem Compute Node Relion 1900: 28-core
Intel E5-2690v4 processors with 1.5 TB RAM, 3-year warranty
~$26,300

Storage

High Performance Computing Scratch Storage ($CENTER1)

Chinook Tier
Service Description
Computation
$HOME Quota
$CENTER1 Quota
Tier 1 Open to the UA research community using nodes procured for the community and unused portions of shareholder nodes No cost for UA faculty, staff, and students 10 GB Project: 1 TB
Additional: $0.15/GB/yr*
Tier 2 For the PI or project that requires CPUs beyond what can be offered by Tier 1 or requires priority access to HPC resources. Approximately $9200/standard node for 3-year share 10 GB Project: 1 TB + 1 TB/node
Appromixately $9500/standard node for 4-year share
Appromixately $9800/standard node for 5-year share Additional: $0.15/GB/yr*
Tier 3 For the PI or project that requires dedicated resources. Contact RCS Contact RCS Contact RCS

Online Spinning Disk Storage

Lease $0.12/GB/yr*
Purchase Price of hardware, 20 hours labor install, $5/TB/year under warranty

Offline Tape Archive Storage ($ARCHIVE)

Project Quota 10 TB
Additional $0.05/GB/year*

Restricted Data Services (PENDING)

File Share No cost for UA faculty, staff, and students.
60 day purge policy
Spinning Disk Storage $0.12/GB/year*
Archive Storage 2 TB per UA project
  $0.05/GB/year*

Virtual Machines

CPU
Memory
Storage
Price/month
Subsidized
1
1 GB
0
$0
Small
1
2 GB
20 GB
$15
Medium
2
4 GB
40 GB
$30
Large
4
8 GB
80 GB
$60

All research project needs are assessed with a 1 hour consultation. Services are invoiced upfront annually. Additional labor needed to support project development or implementation may be coordinated between PI and RCS and direct charged to an award fund/org
with permission granted by the PI through a Work Authorization.

Please contact RCS for assistance to assess your project IT needs or to request a Facilities Statement if required for your proposal submission.

All services are subject to availability and rates are subject to change.

*Offered in 100 GB increments, 200 GB minimum.

Citation and Acknowledgement

Has your recent work using RCS resources resulted in publication? Please include the following text in your publication(s):

      This work was supported in part by the high-performance computing and data storage resources operated by the Research Computing Systems Group at the University of Alaska Fairbanks, Geophysical Institute.

Policies

Login Shells

The login shells supported on RCS systems are bash, csh, ksh, and tcsh. If you would like your default login shell changed, please contact uaf-rcs@alaska.edu

Security Policies

Users of RCS systems agree to abide by published UAF policies and standards: http://www.alaska.edu/oit/services/policies-and-standards. Every user of RCS systems may rightfully expect their programs, data, and documents stored on RCS systems to be inaccessible
by others, secure against arbitrary loss or alteration, and available for use at all times. To help protect system security and achieve this goal, RCS staff reserve the right to routinely examine user accounts. In the event of a suspected security incident,
RCS staff may inactivate and examine the contents of user accounts without prior notification.

Account Sharing

Users of RCS systems may not share their account with anyone under any circumstances. This policy ensures every user is solely responsible for all actions from within their account. When shared access to a particular set of files on a system is desired,
UNIX group permissions should be applied. Contact uaf-rcs@alaska.edu for more information regarding changing group permissions on files and directories.

Policy Enforcement

Abuse of RCS resources is a serious matter and is subject to immediate action. A perceived, attempted, or actual violation of standards, procedures, or guidelines pursuant with RCS policies may result in disciplinary action including the loss of system
privileges and possibly legal prosecution in the case of criminal activity. RCS employs the following mechanisms to enforce its policies:

  • Contacting the user via phone or email to ask them to correct the problem.
  • Modifying the permissions on user's files or directories in response to a security violation.
  • Inactivating accounts or disabling access to resources to ensure availability and security of RCS systems.

User-owned Files and Directories

For home directories, RCS recommends authorizing write access to only the file/directory owner. Group and world write permissions in a home directory should be avoided under all circumstances.

For the $CENTER1 and $ARCHIVE filesystems, RCS recommends using caution when opening group and world read / execute permissions, and extreme caution when opening group and world write permissions.

Setuid and setgid permissions are prohibited on all user-owned files and directories.

User files may be scanned by RCS staff at any time for system security or maintenance purposes.

Non-printing characters, such as ASCII codes for RETURN or DELETE, are occasionally introduced by accident into file names. These characters present a low-level risk to system security and integrity and are prohibited. Techniques for renaming, deleting,
or accessing files containing non-printing characters in the filename are described at www.arsc.edu/support/howtos/nonprintingchars/index.xml

Passwords

RCS uses University of Alaska (UA) credentials for user authentication. Therefore, passwords used to access RCS systems are subject to UA password guidelines. UA passwords may be changed using ELMO (https://elmo.alaska.edu). If you suspect your password
has been compromised, contact the UAF OIT Helpdesk (helpdesk@alaska.edu, 907-450-8300) immediately.

SSH Public Keys

Sharing of private SSH keys to allow another user to access an RCS account is considered account sharing and is prohibited on RCS systems.

Users of SSH public keys are responsible for their private keys and ensuring they are protected against access from other users. Private SSH keys should be generated and stored only on trusted systems.

Tampering

Do not attempt to break passwords, tamper with system files, access files in other users' directories without permission, or otherwise abuse the privileges given to you with your RCS account. Your privileges do not extend beyond the directories, files,
and volumes which you rightfully own or to which you have been given permission to access.

System-generated E-mail

RCS provides a ~/.forward file for users on each system. When the system generates an email, the message will be forwarded to email address(es) listed in the .forward file. Users are free to update their ~/.forward file to their preferred email address.

Maintenance Periods

Annual maintenance schedules are in place for the various RCS systems. They are as follows:

Chinook

  • Monthly: First Wednesday for non-interrupt Operating System security and Scyld ClusterWare updates
  • Quarterly: First Wednesday of Jan, Apr, Jul, and Oct for Operating System and Scyld ClusterWare updates that may require system downtime
  • Twice per year: The month of May during the FAST and over the winter closure for Operating System and Scyld ClusterWare updates that may require system downtime

News Releases

Contact Us

  • Office Suite

    Geophysical Institute

    508 Elvey

  • Phone & Email

    Phone: 907-450-8602

    Email: uaf-rcs@alaska.edu

  • Shipping

    Research Computing Systems Geophysical Institute

    University of Alaska Fairbanks

    PO Box 757320

    903 Koyukuk Drive

    Fairbanks, AK 99775-7320

MATLAB Installation

For a MATLAB installation on a workstation or laptop please email uaf-rcs@alaska.edu with the following information:

  • Name
  • Contact Information (email address, phone number, location if the computer is a desktop)
  • The Operating System on the computer
  • Whether the computer will be in the field, with limited or no network access

If you are interested in using MATLAB on the Chinook HPC Cluster for computationally intensive tasks please see our MATLAB on Chinook page and our page on getting a user account.

For access to the Linux workstations to use MATLAB please see the Remote Login page for access to our Linux workstations.

Remote Login

To log into Linux workstations remotely, you will need a user account and Secure Shell (SSH) client program.

Linux

Linux users should use the OpenSSH client, which is already installed on your computer. Open a terminal session and run the following command to connect to one of the workstations:

ssh uausername@host

replacing uausername with your UA username (e.g. jsmith2) and host with one of the following host names:

  • einstein.alaska.edu
  • planck.alaska.edu
  • tesla.alaska.edu
  • feynman.alaska.edu
  • hawking.alaska.edu

Mac

Mac users, like Linux users, should use the pre-installed OpenSSH client. See above for directions.

Unlike Linux, Mac operating systems do not come with an X Window server pre-installed. If you want to run any graphical applications, we recommend installing XQuartz on your Mac.

Windows

Windows users will need to download and install a third-party SSH client in order to log into the Linux Workstations. Here are a few available options:

  • PuTTY (open source, MIT license)
  • Secure Shell (proprietary, UA credentials required for download)

Installing PuTTY

RCS recommends that Windows users download and install PuTTY, a free-and-open-source ssh/rsh/telnet client.

  1. Download PuTTY from the official site.
  2. Run the PuTTY installer, and select "Next".
  3. By default, the installer will install in C:\Program Files (x86)\PuTTY under 64-bit Windows, and C:\Program Files\PuTTY under 32-bit Windows. Select "Next".
  4. The installer will prompt you for a Start Menu folder in which to create shortcuts. Select "Next".
  5. Select "Create a desktop icon for PuTTY", and select "Next".
  6. The installer will allow you to review your choices. Select "Install" after you have done so.
  7. The installer will require only a few seconds to install PuTTY on your computer. Select "Finish". As it closes, the installer will by default open PuTTY's readme file, which contains additional information on using the additional tools included with PuTTY.

Using PuTTY

Establishing a remote login session over SSH using PuTTY is reasonably straightforward. The following steps describe how to do this, turn on X11 forwarding, and save connection settings.

Remote Login via SSH

  1. Open PuTTY using the icon placed on your desktop by the PuTTY installer.
  2. For "Host Name", enter hostname.arsc.edu where hostname is newton, einstein, planck, tesla, feyman, hawking
  3. Select "Open" to initiate an SSH connection request.
  4. You may receive a security alert warning you that your client cannot establish authenticity of the host you are connecting to. This warning is always displayed the first time your SSH client connects to any computer it has never connected before. If you have never connected to one of the Linux workstations using this PuTTY installation, select "Yes".
  5. A terminal window should open. You will be prompted for a username. Enter your UA username.
  6. You will be prompted for a password. Enter your UA password and continue.
  7. On successful authentication, a command prompt will appear and allow you to execute commands on the linux workstation.

Enabling Graphics

Some applications, especially visualization applications, require a graphical display. It is possible to tunnel graphics over an SSH connection using X11 graphics forwarding, which is supported by PuTTY.

  1. Install a local X Window server. We recommend installing the last free version of XMing, which became proprietary software in May 2007.
  2. In PuTTY, define a connection to hostname.arsc.edu where hostname is newton, einstein, planck, tesla, feyman, hawking and navigate to "Connection-SSH-X11". Check the box labeled "Enable X11 forwarding".
  3. Initiate an SSH connection request and log in as outlined in the last section.
  4. Ensure that your local X server is running. Without this, any graphical application will fail to run properly.
  5. Run xlogo, a simple graphical application. If you see a window containing a black X on a white background, you have successfully enabled X11 forwarding.

Saving Connection Settings

  1. Configure your connection settings as desired
  2. Navigate to "Category-Session"
  3. Enter a name for your session in the "Saved Sessions" input box, and select "Save". Your session should now appear as a new line in the text box to the left of "Save".
  4. To load saved settings, select the session you want to load and then select "Load".

Optionally, PuTTY's command-line flags allow you to create shortcuts that load a particular connection.

  1. Copy your PuTTY shortcut icon
  2. Right click on the copy, and select "Properties"
  3. In the "Target" field, append -load followed by the connection name in quotation marks
  4. Select "Apply", and close the window
  5. Rename the modified shortcut appropriately

Troubleshooting

When I try to connect, PuTTY opens an alert box that says "Disconnected: No supported authentication methods available".

This message means that authentication by username failed. This is most likely caused by an incorrect username, or because you do not have access to the Linux Workstations. Please ensure that you received an email from RCS User Support (uaf-rcs@alaska.edu) notifying you of your account creation, and use the username provided in that email.

My application returns the error "X connection to localhost:10.0 broken (explicit kill or server shutdown)" (or similar).

This is an indication that your local X server is not running. Check the icons on the right-hand side of your task bar for the X server icon. If it is not present, ensure that you have installed an X server locally and that it is running. Once the icon is present, try opening your program again.

I received the "Unknown Host Key" popup alert, followed by another popup stating: "Server unexpectedly closed network connection".

This indicates that the server's SSH timeout was triggered. SSH servers are often configured to kill incoming connections that do not send data for a while. While you were responding to the "Unknown Host Key" popup, the remote host's connection timeout expired and it disconnected you. You should be able to reconnect without problem.

Chinook Documentation

Are you interested in using the Chinook HPC cluster in your computational work? Please read our directions on how to obtain RCS project and user accounts.

Logging In

To log into Chinook, you will need a user account and Secure Shell (SSH) client program.

Use the SSH client you have chosen and installed to connect to chinook.alaska.edu. When prompted for a username, either interactively or while configuring the client, you should provide your UA username. You will be prompted for a password upon opening an SSH connection. When this happens, enter your UA password.

The Chinook login nodes are intended to provide access to the cluster, to compile and modify applications and workflows, to manage data between the mounted filesystems, and to manage jobs. Any processing should be very limited and short to avoid impacting other users' activities. Batch or interactive serial or parallel processing should take place on the compute nodes. RCS reserves the right to terminate user processes or sessions to maintain the normal working order of the login nodes and the cluster.

Linux

Linux users should use the OpenSSH client, which is already installed on your computer. Open a terminal session and run the following command to connect to Chinook:

ssh uausername@chinook.alaska.edu

replacing uausername with your UA username (e.g. jsmith2).

To enable graphical displays with through an X Window run:

ssh -Y uausername@chinook.alaska.edu

Mac

Mac users, like Linux users, should use the pre-installed OpenSSH client. See above for directions.

Unlike Linux, Mac operating systems do not come with an X Window server pre-installed. If you want to run any graphical applications on Chinook, we recommend installing XQuartz on your Mac and use:

ssh -Y uausername@chinook.alaska.edu

Windows

Windows users will need to download and install a third-party SSH client in order to log into Chinook. Here are a few available options:

  • PuTTY (open source, MIT license)
  • Secure Shell (proprietary, UA credentials required for download)

Installing PuTTY

RCS recommends that Windows users download and install PuTTY, a free-and-open-source ssh/rsh/telnet client.

  1. Download PuTTY from the official site.
  2. Run the PuTTY installer, and select "Next".
  3. By default, the installer will install in C:\Program Files (x86)\PuTTY under 64-bit Windows, and C:\Program Files\PuTTY under 32-bit Windows. Select "Next".
  4. The installer will prompt you for a Start Menu folder in which to create shortcuts. Select "Next".
  5. Select "Create a desktop icon for PuTTY", and select "Next".
  6. The installer will allow you to review your choices. Select "Install" after you have done so.
  7. The installer will require only a few seconds to install PuTTY on your computer. Select "Finish". As it closes, the installer will by default open PuTTY's readme file, which contains additional information on using the additional tools included with PuTTY.

Using PuTTY

Establishing a remote login session over SSH using PuTTY is reasonably straightforward. The following steps describe how to do this, turn on X11 forwarding, and save connection settings.

Remote Login via SSH

  1. Open PuTTY using the icon placed on your desktop by the PuTTY installer.
  2. For "Host Name", enter chinook.alaska.edu.
  3. Select "Open" to initiate an SSH connection request.
  4. You may receive a security alert warning you that your client cannot establish authenticity of the host you are connecting to. This warning is always displayed the first time your SSH client connects to any computer it has never connected before. If you have never connected to Chinook using this PuTTY installation, select "Yes".
  5. A terminal window should open. You will be prompted for a username. Enter your UA username.
  6. You will be prompted for a password. Enter your UA password and continue.
  7. On successful authentication, a command prompt will appear and allow you to execute commands on Chinook.

Enabling Graphics

Some applications on Chinook, especially visualization applications, require a graphical display. It is possible to tunnel graphics over an SSH connection using X11 graphics forwarding, which is supported by PuTTY.

  1. Install a local X Window server. We recommend installing the last free version of XMing, which became proprietary software in May 2007.
  2. In PuTTY, define a connection to chinook.alaska.edu and navigate to "Connection-SSH-X11". Check the box labeled "Enable X11 forwarding".
  3. Initiate an SSH connection request and log in as outlined in the last section.
  4. Ensure that your local X server is running. Without this, any graphical application will fail to run properly.
  5. Run xlogo, a simple graphical application. If you see a window containing a black X on a white background, you have successfully enabled X11 forwarding.

Saving Connection Settings

  1. Configure your connection settings as desired
  2. Navigate to "Category-Session"
  3. Enter a name for your session in the "Saved Sessions" input box, and select "Save". Your session should now appear as a new line in the text box to the left of "Save".
  4. To load saved settings, select the session you want to load and then select "Load".

Optionally, PuTTY's command-line flags allow you to create shortcuts that load a particular connection.

  1. Copy your PuTTY shortcut icon
  2. Right click on the copy, and select "Properties"
  3. In the "Target" field, append -load followed by the connection name in quotation marks
  4. Select "Apply", and close the window
  5. Rename the modified shortcut appropriately

Troubleshooting

When I try to connect, PuTTY opens an alert box that says "Disconnected: No supported authentication methods available".

This message means that authentication by username failed. This is most likely caused by an incorrect username, or because you do not have access to Chinook. Please ensure that you received an email from RCS User Support (uaf-rcs@alaska.edu) notifying you of your Chinook account creation, and use the username provided in that email.

My application returns the error "X connection to localhost:10.0 broken (explicit kill or server shutdown)" (or similar).

This is an indication that your local X server is not running. Check the icons on the right-hand side of your task bar for the X server icon. If it is not present, ensure that you have installed an X server locally and that it is running. Once the icon is present, try opening your program again.

I received the "Unknown Host Key" popup alert, followed by another popup stating: "Server unexpectedly closed network connection".

This indicates that the server's SSH timeout was triggered. SSH servers are often configured to kill incoming connections that do not send data for a while. While you were responding to the "Unknown Host Key" popup, the remote host's connection timeout expired and it disconnected you. You should be able to reconnect without problem.

Using VNC to Login

To run graphical applications on RCS systems remotely, the Virtual Network Computing (VNC) application is available and provides some advantages beyond using X Windows over SSH such as a detachable session and better performance over a slow speed connection. Here is basic set up information required for this approach.

***Important Note: Please follow all of these steps with each new VNC session.***

Step 1: Install VNC on your local system

There are multiple VNC viewer programs available with unique interfaces and features. The application on RCS systems is TigerVNC.

MAC users can use the built in Apple "Screen Sharing" as a VNC client and do not have to install an additional client.

After installing the software, make sure ports 5900 and 5901 are open to allow VNC traffic through your host firewall.

Step 2: Setup port forwarding over SSH for the VNC session

On Linux or MAC systems:

  local$ ssh -L 5901:localhost:5901 username@remote.alaska.edu

On a Windows system:

Setup a SSH tunnel with PuTTY on Windows.

  1. On the left side of the PuTTY dialog box when you open PuTTY, choose Connection->SSH->Tunnels
  2. in Source Port enter 5901
  3. in Destination enter remote.alaska.edu:5901
  4. Click Add and you should see the following in the list of forwarded ports:

    L5901 remote.alaska.edu:5901

Step 3: Connect to the remote system and start the VNC server

Log onto the remote system over SSH and specify the appropriate ports for VNC client (your local system) and server (remote system) communication.

Launch a VNC server instance on the remote system. The initial vncserver instance will prompt you for a password to protect your session. Subsequent launches of vncserver will use the same password and you will not be prompted for a password.

  remote$ vncserver -localhost

  You will require a password to access your desktops.
  Password:
  Verify:

  New 'remote:1 (username)' desktop is remote:1

  Creating default startup script /u1/uaf/username/.vnc/xstartup
  Starting applications specified in /u1/uaf/username/.vnc/xstartup
  Log file is /u1/uaf/username/.vnc/remote:1.log

Step 4: Open VNC on your local system

  1. Launch Apple "Screen Sharing" on a MAC.

    The Apple "Screen Sharing" connect to server dialog can be accessed with {apple key} K or Finder - Go - Connect to Server. Use "vnc://localhost:5901" as the "Server Address".

  2. Launch VNC on Windows from the menu or a launcher icon.

    On Windows, the VNC application should have installed a launcher somewhere in the menus and may have also installed an icon on the desk or start bar depending on options you chose when installing. Use the menu or icon to start VNC.

  3. Launch Linux VNC viewer from the command line

    Launch your VNC viewer program and connect to host "localhost" and port 5901. The example below shows how to launch the client using TigerVNC.

    local$ vncviewer localhost:5901

If you are using the TigerVNC GUI, enter "localhost:5901" into the "VNC server:" box then click the "Connect" button. You will then be prompted for the password created in Step 2. If your local VNC client connects successfully, you will then see your desktop on the remote system.

Your circumstances might require the use of different ports due to firewall issues or if you are running more than one VNC server session on the remote system. (Other people on the system might be running their own sessions as well and occupying the ports.) If this is the case, you may need to specify port 5902 or 5903 or ... Add 5900 to the display number to determine the correct remote port to use.

To determine whether the VNC viewer has successfully connected, check the log file noted when vncserver was started on the remote system.

After starting the server, the option exists to log out and back in again using different port forwarding parameters.

Note that some VNC viewer programs can automatically set up the SSH port forwarding through a command-line flag such as "-via" or some option in a graphical configuration menu.

Step 5: When finished, close the VNC session

To close your VNC session, view the open sessions on the remote system, then close the appropriate one.

  remote$ vncserver -list
  TigerVNC server sessions:
  X DISPLAY #     PROCESS ID
  :1                    252550
  remote$ vncserver -kill :1

Troubleshooting

  1. Orphaned Session

    If a previous VNC session remains open on the remote system, that old session will need to be closed prior to establishing a new connection using the same port. To identify and kill the old session, first obtain the processID of the "Xnvc" process, then issue the kill command.

      remote$ ps -elf | grep username | grep Xvnc
      0 S username    236193      1  0  80   0 - 24842 poll_s Nov09 ?        
            00:00:10 /usr/bin/Xvnc :1 -desktop remote:1 (username) 
            -auth /u1/uaf/username/.Xauthority -geometry 1024x768 
            -rfbwait 30000 -rfbauth /u1/uaf/username/.vnc/passwd 
            -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -pn -localhost
      remote$ kill 236193
    

  2. Locked Session

    Depending on your desktop settings on the remote system, the X screensaver may kick in and lock the session after a period of inactivity. If this happens, you'll be prompted for a password that doesn't exist. The xlock process can be killed from the command line. We recommend disabling X locking in the VNC displayed desktop settings to avoid this happening.

  3. Reset Server Password

    To change the VNC server password, use the 'vncpasswd' command on the remote system.

  4. More Information

    Run 'vncserver --help' and 'man vncserver' for more information on how to use the application.

Available Filesystems

On account creation, a new RCS HPC user is given ownership of a subdirectory created on all of the following major filesystems. The paths to each of these subdirectories are recorded in your shell's environment variables, making it easy to use these paths on the command line.

The major filesystems available on Chinook are typically referred to by the Bash syntax used to expand the corresponding environment variable. These names are used below.

The following protocols for transferring files are supported on Chinook:

  • Secure Copy (SCP)
  • SSH File Transfer Protocol (SFTP)
  • rsync (client-to-client only, no daemon)

$HOME

  • The $HOME filesystem is accessible from the Chinook login and compute nodes.
  • Default $HOME quota for Tier 1 users: 10 GB
  • Default $HOME quota for Tier 2 users: 20 GB
  • The $HOME filesystem is backed up regularly.

$CENTER1

  • The $CENTER1 scratch filesystem is accessible from the Chinook login and compute nodes.
  • User directories on $CENTER1 are now /center1/PROJECTID/UAusername
  • Default $CENTER1 quotas are set to 1 TB per project. All users on a project will share this 1 TB quota.
  • Files are no longer purged on $CENTER1, but project quotas are in effect.
  • $CENTER1 is a scratch storage space and is not backed up. For long term storage please copy your files off of $CENTER1.

$ARCHIVE

  • The $ARCHIVE filesystem is accessible from the Chinook login nodes only.
  • Files stored in $ARCHIVE will be written to tape and taken offline over time. Use the "batch_stage" command to bring the files back online prior to viewing the contents of the file or copying the data off $ARCHIVE.
  • To stage directories run "batch_stage -r <DIRECTORY>". For more help and samples run "batch_stage -h".
  • If you have a legacy ARSC username, a symbolic link has been created linking your /archive/u1/uaf/ARSCusername directory to your /archive/u1/uaf/UAusername directory.

Viewing Quotas

You can view your storage quotas and usage through the show_storage command. show_storagewill show the quota and usage for $HOME, $ARCHIVE, and $CENTER1, if they are mounted on the machine that you are on.


chinook00 % show_storage 
Filesystem   Used_GiB    Soft_GiB   Hard_GiB      Files  Soft Files Hard Files
========== ==========  ========== ========== ==========  ========== ==========
$HOME            0.00       10.00      11.00         16     1000000    1100000

===================================$CENTER1===================================
Project      Used_GiB    Soft_GiB   Hard_GiB      Files  Soft Files Hard Files
========== ==========  ========== ========== ==========  ========== ==========
RCSCLASS         0.25     1024.00    1126.40       1471           0          0

show_storage can also be used to determine how much usage is being consumed by each directory in the main project directory through the use of the -d PROJECTID flag. PROJECTID is the Unix group of a specific project. Depending on your usage, this command may take some time to complete.


chinook00 % show_storage -d rcsclass
/import/c1/RCSCLASS      GiB
=================== ========
uaguest_rclass8         0.00
uaguest_rclass9         0.00
uaguest_rclass2         0.00
uaguest_rclass3         0.00
uaguest_rclass1         0.00
uaguest_rclass10        0.00
uaguest_rclass7         0.00
uaguest_rclass4         0.00
uaguest_rclass5         0.00
demos                   0.24
uaguest_rclass6         0.00

The du command can be used to display how much storage is being used by specific directories. du -h /center1/PROJECTID/path/to/directory will list the storage used by each directory in /center1/PROJECTID/path/to/directory


chinook00 % du -h /center1/RCSCLASS/uaguest_rclass1
53K /center1/RCSCLASS/uaguest_rclass1/data
106K  /center1/RCSCLASS/uaguest_rclass1

du -sh /center1/PROJECTID/path/to/directory will sum up the total storage used by a directory.


chinook00 % du -sh /center1/RCSCLASS/uaguest_rclass1/data
53K /center1/RCSCLASS/uaguest_rclass1/data

Migrating from Pacman/Fish

If you are an existing user of either Pacman or Fish, it is important that you know what to expect when you begin using Chinook. Below is a comparison of some characteristics across the different HPC clusters currently operated by RCS:

Attribute Chinook Fish Pacman
Operating System CentOS 6 (CentOS 7 upgrade planned) Cray Linux Environment (CLE) 4 RHEL 6
Workload manager Slurm PBS/Torque (Cray) PBS/Torque
Usernames UA usernames Legacy ARSC usernames Legacy ARSC usernames
Login nodes 2 (with more coming) 2 12 + 1 high memory
Compute nodes Intel Xeon, 24/28 cores per node AMD Istanbul/Interlagos, 12/16 cores per node, nVidia Tesla GPUs AMD Opteron, 16/32 cores per node
Interconnect QLogic QDR InfiniBand (EDR upgrade planned) Cray Gemini QLogic QDR and Voltaire SDR InfiniBand
Default compiler suite Intel PGI PGI
$CENTER Yes Yes Yes
$ARCHIVE Yes Yes Yes
/projects Yes Yes Yes
$HOME Yes (only available on cluster) Yes (only available on cluster) Yes
/usr/local/unsupported Yes (only available on cluster) Yes (only available on cluster) Yes

User-compiled software

All software previously compiled on Pacman or Fish will need to be recompiled for Chinook. This is due to differences between the hardware and Linux kernel present on Chinook and those on Pacman / Fish.

Software stack differences

Compiler toolchain modules

On Pacman and Fish, the environment modules responsible for loading a compiler and related set of core HPC libraries follow the "PrgEnv" naming convention common to many HPC facilities. Chinook's equivalent modules are called compiler toolchains, or just toolchains. See Compiler Toolchains for more information on what is available.

At this time, the "PrgEnv" modules on Chinook are deprecated, and have been replaced by toolchain modules instead. Depending on feedback, we may provide "PrgEnv"-style symlinks to the toolchain modules in the future.

Dependency loading behavior

The modules on Chinook now each specify and load a (mostly) complete module dependency tree. To illustrate, consider loading an Intel-compiled netCDF library. Here is what happens on Pacman:

$ module purge
$ module load netcdf/4.3.0.intel-2013_sp1
$ module list --terse
Currently Loaded Modulefiles:
netcdf/4.3.0.intel-2013_sp1

And an equivalent action on Chinook:

$ module purge
$ module load data/netCDF/4.4.1-pic-intel-2016b
$ module list --terse
Currently Loaded Modulefiles:
compiler/GCCcore/5.4.0
tools/binutils/2.26-GCCcore-5.4.0
compiler/icc/2016.3.210-GCC-5.4.0-2.26
compiler/ifort/2016.3.210-GCC-5.4.0-2.26
openmpi/intel/1.10.2
toolchain/pic-iompi/2016b
numlib/imkl/11.3.3.210-pic-iompi-2016b
toolchain/pic-intel/2016b
lib/zlib/1.2.8-pic-intel-2016b
tools/Szip/2.1-pic-intel-2016b
data/HDF5/1.8.17-pic-intel-2016b
tools/cURL/7.49.1-pic-intel-2016b
data/netCDF/4.4.1-pic-intel-2016b

On Pacman and Fish, you get exactly the module you requested and no more (with a few exceptions). This has advantages and disadvantages:

  • advantage: It is easy to experiment with different software builds by swapping library modules
  • disadvantage: It is not immediately obvious which libraries were used during any given software build when multiple versions of those libraries exist
  • disadvantage: It is trivial to introduce a fatal error in an application by inadvertently loading an incompatible library module or omitting a needed one

On Chinook, standardizing and loading all module dependencies results in consistency and reproducibility. When you load the Intel-compiled netCDF module on Chinook, for example, you get modules loaded for the following:

  • The netCDF library
  • Its immediate dependencies (HDF5, zlib, curl)
  • The dependencies for the dependencies (and so on, recursively)
  • The exact Intel compiler, MPI library, and Intel Math Kernel Library (MKL) used to build netCDF
  • An upgraded version of GCC to supersede the ever-present system version

This takes out the guesswork of manually piecing together a software stack module by module. Every successive dependency will modify LD_LIBRARY_PATH and other variables appropriately, so that desired application or library will dynamically link to the proper supporting libraries instead of accidentally picking up an inappropriate matching library.

One ramification of loading a full dependency tree is that trying to load software compiled with different compiler toolchains will likely result in module conflicts - even if the tools you are trying to load provide only binaries and nothing else. This is because combining two or more different dependency trees will likely result in unintended and harmful dynamic linking due to two different builds of a core compiler or library being loaded. LD_LIBRARY_PATH ensures that the library version found first will be used to satisfy all dependencies on that particular library, causing no problems for the software packages that expect it and possibly wreaking havoc for the packages that expect a different build.

Slurm Translation Guide

One of the most immediately evident changes with Chinook is that it uses Slurm for job scheduling rather than PBS/Torque. The workflow for submitting jobs has not changed significantly, but the syntax and commands have. Below is an excerpt from SchedMD's "Rosetta Stone of Workload Managers" relevant to PBS/Torque.

For more information on Slurm, please see Using the Batch System.

Source: http://slurm.schedmd.com/rosetta.pdf, 28-Apr-2013

User Commands PBS/Torque Slurm
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status (by job) qstat [job_id] squeue [job_id]
Job status (by user) qstat -u [user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
Queue list qstat -Q squeue
Node list pbsnodes -l sinfo -N OR scontrol show nodes
Cluster status qstat -a sinfo
GUI xpbsmon sview
Environment PBS/Torque Slurm
Job ID $PBS_JOBID $SLURM_JOBID
Submit Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR
Submit Host $PBS_O_HOST $SLURM_SUBMIT_HOST
Node List $PBS_NODEFILE $SLURM_JOB_NODELIST
Job Array Index $PBS_ARRAYID $SLURM_ARRAY_TASK_ID
Job Specification PBS/Torque Slurm
Script directive #PBS #SBATCH
Queue -q [queue] -p [queue]
Node Count -l nodes=[count] -N [min[-max]]
CPU Count -l ppn=[count] OR -l mppwidth=[PE_count] -n [count]
Wall Clock Limit -l walltime=[hh:mm:ss] -t [min] OR -t [days-hh:mm:ss]
Standard Output File -o [file_name] -o [file_name]
Standard Error File -e [file_name] e [file_name]
Combine stdout/err -j oe (both to stdout) OR -j eo (both to stderr) (use -o without -e)
Copy Environment -V --export=[ALL | NONE | variables]
Event Notification -m abe --mail-type=[events]
Email Address -M [address] --mail-user=[address]
Job Name -N [name] --job-name=[name]
Job Restart -r [y|n] --requeue OR --no-requeue (NOTE: configurable default)
Working Directory N/A --workdir=[dir_name]
Resource Sharing -l naccesspolicy=singlejob --exclusive OR--shared
Memory Size -l mem=[MB] --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T]
Account to Charge -W group_list=[account] --account=[account]
Tasks Per Node -l mppnppn [PEs_per_node] --tasks-per-node=[count]
CPUs Per Task   --cpus-per-task=[count]
Job Dependency -d [job_id] --depend=[state:job_id]
Job Project   --wckey=[name]
Job host preference   --nodelist=[nodes] AND/OR --exclude= [nodes]
Quality of Service -l qos=[name] --qos=[name]
Job Arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l other=[resource_spec] --gres=[resource_spec]
Licenses   --licenses=[license_spec]
Begin Time -A "YYYY-MM-DD HH:MM:SS" --begin=YYYY-MM-DD[THH:MM[:SS]]

Frequently Asked Questions

Will I need to copy my files from Pacman/Fish to Chinook?

$ARCHIVE and $CENTER are mounted on Chinook, so you will have access to all your existing files on Chinook. However, your home directory is new and we will not be automatically copying any Pacman/Fish home directory contents to Chinook. If you would like to transfer any files from your Pacman/Fish home directory, you may do so using scp or sftp between Pacman/Fish and Chinook.

I used the PGI compiler on Pacman/Fish. What are my options on Chinook?

Support for the PGI compiler suite will expire in FY17. If possible, please look into compiling your code using the Intel or GNU compiler suites. If not, the latest version of the PGI compilers available when support lapses will remain installed on Chinook.

I have a PBS/Torque batch script. Can I use it on Chinook?

Possibly. Slurm does provide compatibility scripts for various PBS/Torque commands including qsub. The compatibility is not perfect, and you will likely need to debug why your batch script isn't doing what you expect. It is worth putting that time towards porting the PBS script to Slurm syntax and using sbatch instead.

Using the Batch System

The Slurm (Simple Linux Utility for Resource Management) workload manager is a software package for submitting, scheduling, and monitoring jobs on large compute clusters. Slurm is available on Chinook for submitting and monitoring user jobs.

Similar to PBS/TORQUE, Slurm accepts user jobs specified in batch scripts. More information on Slurm batch scripts may be found below.

Common Slurm commands, Slurm batch scripts, translating from PBS/TORQUE to Slurm, and running interactive jobs are discussed below. SchedMD, the company behind Slurm, has also put together a quick reference for Slurm commands.

Batch overview

The general principle behind batch processing is automating repetitive tasks. Single tasks are known as jobs, while a set of jobs is known as a batch. This distinction is mostly academic, since the terms job and batch job are now mostly synonymous, but here we'll use the terms separately.

There are three basic steps in a batch or job-oriented workflow:

  1. Copy input data from archival storage to scratch space
  2. Run computational tasks over the input data
  3. Copy output to archival storage

On Chinook the first and last steps must occur on login nodes, and the computation step on compute nodes. This is enforced by the login nodes having finite CPU ulimits set and $ARCHIVE not being present on the compute nodes.

Depending on the scale and characteristics of a particular job, different jobs may require different combinations of computational resources. Garnering these resources is a combination of:

  • Choosing which partition to submit the job to
  • Choosing what resources to request from the partition

This is done by writing batch scripts whose directives specify these resources.

Available partitions

Name Node count Max walltime Nodes per job (min-max) Other rules Purpose
debug 2 1 hour 1-2 For debugging job scripts
t1small 71 1 day 1-2 For short, small jobs with quick turnover
t1standard 71 4 days 3-71 Default General-purpose partition
t2small 71 2 days 1-2 Tier 2 users only. Increased priority and walltime. Tier 2 version of t1small
t2standard 71 7 days 3-71 Tier 2 users only. Increased priority and walltime. Tier 2 general-purpose partition
transfer 1 1 day 1 Shared use Copy files between archival storage and scratch space

Selecting a partition is done by adding a directive to the job submission script such as #SBATCH --partition=t1standard, or on the command line: $ sbatch -p t1standard

Anyone interested in gaining access to the higher-priority Tier 2 partitions (t2small, t2standard) by subscribing to support the cluster or procuring additional compute capacity should contact uaf-rcs@alaska.edu.

Common Slurm Commands

sacct

The sacct command is used for viewing information about submitted jobs. This can be useful for monitoring job progress or diagnosing problems that occurred during job execution. By default, sacct will report the job ID, job name, partition, account, allocated CPU cores, job state, and the exit code for all of the current user's jobs that have been submitted since midnight of the current day.

sacct's output, as with most Slurm informational commands, can be customized in a large number of ways. Here are a few of the more useful options:

Command Result
sacct --starttime 2016-03-01 select jobs since midnight of March 1, 2016
sacct --allusers select jobs from all users (default is only the current user)
sacct --accounts=account_list select jobs whose account appears in a comma-separated list of accounts
sacct --format=field_names print fields specified by a comma-separated list of field names
sacct --helpformat print list of fields that can be specified with --format

For more information on sacct, please visit https://slurm.schedmd.com/sacct.html.

sbatch

The sbatch command is used for submitting jobs to the cluster. Although it is possible to supply command-line arguments to sbatch, it is generally a good idea to put all or most resource requests in the batch script for reproducibility.

Sample usage:

sbatch mybatch.sh

On successful batch submission, sbatch will print out the new job's ID. sbatch may fail if the resources requested cannot be satisfied by the indicated partition.

For more information on sbatch, please visit https://slurm.schedmd.com/sbatch.html.

scontrol

The scontrol command is used for monitoring and modifying queued or running jobs. Although many scontrol subcommands apply only to cluster administration, there are some that may be useful for users:

Command Result
scontrol hold job_id place hold on job specified by job_id
scontrol release job_id release hold on job specified by job_id
scontrol show reservation show details on active or pending reservations
scontrol show nodes show hardware details for compute nodes

For more information on scontrol, please visit https://slurm.schedmd.com/scontrol.html.

sinfo

The sinfo command is used for viewing compute node and partition status. By default, sinfo will report the ID, partition, job name, user, state, time elapsed, nodes requested, nodes held by running jobs, and reason for being in the queue for queued jobs.

sinfo's output, as with most Slurm informational commands, can be customized in a large number of ways. Here are a few of the more useful options:

Command Result
sinfo --partition=t1standard show node info for the partition named 't1standard'
sinfo --summarize group by partition, aggregate state by A/I/O/T (Available/Idle/Other/Total)
sinfo --reservation show Slurm reservation information
sinfo --format=format_tokens print fields specified by format_tokens
sinfo --Format=field_names print fields specified by comma-separated field_names

There are a large number of fields hidden by default that can be displayed using --format and --Format. Refer to the sinfo's manual page for the complete list of fields.

For more information on sinfo, please visit https://slurm.schedmd.com/sinfo.html.

smap

The smap command is an ncurses-based tool useful for viewing the status of jobs, nodes, and node reservations. It aggregates data exposed by other Slurm commands, such as sinfo and squeue.

Command Result
sinfo -i 15 Run sinfo, refreshing every 15 seconds

For more information on smap, please visit https://slurm.schedmd.com/smap.html.

squeue

The squeue command is used for viewing job status. By default, squeue will report the ID, partition, job name, user, state, time elapsed, nodes requested, nodes held by running jobs, and reason for being in the queue for queued jobs.

squeue's output, as with most Slurm informational commands, can be customized in a large number of ways. Here are a few of the more useful options:

Command Result
squeue --user=user_list filter by a comma-separated list of usernames
squeue --start print expected start times of pending jobs
squeue --format=format_tokens print fields specified by format_tokens
squeue --Format=field_names print fields specified by comma-separated field_names

The majority of squeue's customization is done using --format or --Format. The lowercase --format allows for controlling which fields are present, their alignments, and other contextual details such as whitespace, but comes at the cost of readability and completeness (not all fields can be specified using the provided tokens). In contrast, the capitalized --Format accepts a complete set of verbose field names, but offers less flexibility with contextual details.

As an example, the following command produces output identical to squeue --start:

squeue --format="%.18i %.9P %.8j %.8u %.2t %.19S %.6D %20Y %R" --sort=S --states=PENDING

--Format can produce equivalent (but not identical) output:

squeue --Format=jobid,partition,name,username,state,starttime,numnodes,schednodes,reasonlist --sort=S --states=PENDING

For more information on squeue, please visit https://slurm.schedmd.com/squeue.html.

sreport

The sreport command is used for generating job and cluster usage reports. Statistics will be shown for jobs run since midnight of the current day by default. Although many of sreport's reports are more useful for cluster administrators, there are some commands that may be useful to users:

Command Result
sreport cluster AccountUtilizationByUser -t Hours start=2016-03-01 report hours used since Mar 1, 2016, grouped by account
sreport cluster UserUtilizationByAccount -t Hours start=2016-03-01 Users=$USER report hours used by the current user since Mar 1, 2016

For more information on sreport, please visit https://slurm.schedmd.com/sreport.html.

srun

The srun command is used to launch a parallel job step. Typically, srun is invoked from a Slurm batch script to perform part (or all) of the job's work. srun may be used multiple times in a batch script, allowing for multiple program runs to occur in one job.

Alternatively, srun can be run directly from the command line on a login node, in which case srun will first create a resource allocation for running the job. Use command-line keyword arguments to specify the parameters normally used in batch scripts, such as --partition, --nodes, --ntasks, and others. For example, srun --partition=debug --nodes=1 --ntasks=8 whoami will obtain an allocation consisting of 8 cores on 1 node and then run the command whoami on all of them.

Please note that srun does not inherently parallelize programs - it simply runs many independent instances of the specified program in parallel across the nodes assigned to the job. Put another way, srun will launch a program in parallel, but makes no guarantee that the program is designed to be run in parallel at any degree.

See Interactive Jobs for an example of how to use srunto allocate and run an interactive job (i.e. a job whose input and output are attached to your terminal).

A note about MPI: srun is designed to run MPI applications without the need for using mpirun or mpiexec, but this ability is currently not available on Chinook. It may be made available in the future. Until then, please refer to the directions on how to run MPI applications on Chinook below.

For more information on srun, please visit https://slurm.schedmd.com/srun.html.

sview

The sview command is a graphical interface useful for viewing the status of jobs, nodes, partitions, and node reservations. It aggregates data exposed by other Slurm commands, such as sinfo, squeue, and smap, and refreshes every few seconds.

For more information on sview, please visit https://slurm.schedmd.com/sview.html.

Batch Scripts

Batch scripts are plain-text files that specify a job to be run. They consist of batch scheduler (Slurm) directives which specify the resources requested for the job, followed by a script used to successfully run a program.

Here is a simple example of a batch script that will be accepted by Slurm on Chinook:

#!/bin/bash
#SBATCH --partition=debug
#SBATCH --ntasks=24
#SBATCH --tasks-per-node=24

echo "Hello world"

On submitting the batch script to Slurm using sbatch, the job's ID is printed:

$ ls
hello.slurm
$ sbatch hello.slurm
Submitted batch job 8137

Among other things, Slurm stores what the current working directory was when sbatch was run. Upon job completion (nearly immediate for a trivial job like the one specified by hello.slurm), output is written to a file in that directory.

$ ls
hello.slurm  slurm-8137.out
$ cat slurm-8137.out
Hello world

Running an MPI Application

Here is what a batch script for an MPI application might look like:

#!/bin/sh

#SBATCH --partition=t1standard
#SBATCH --ntasks=<NUMTASKS>
#SBATCH --tasks-per-node=24
#SBATCH --mail-user=<USERNAME>@alaska.edu
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --output=<APPLICATION>.%j

ulimit -s unlimited
ulimit -l unlimited

# Load any desired modules, usually the same as loaded to compile
. /etc/profile.d/modules.sh
module purge
module load toolchain/pic-intel/2016b
module load slurm

cd $SLURM_SUBMIT_DIR
# Generate a list of allocated nodes; will serve as a machinefile for mpirun
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes.$SLURM_JOB_ID
# Launch the MPI application
mpirun -np $SLURM_NTASKS -machinefile ./nodes.$SLURM_JOB_ID ./<APPLICATION>
# Clean up the machinefile
rm ./nodes.$SLURM_JOB_ID
  • <APPLICATION>: The executable to run in parallel
  • <NUMTASKS>: The number of parallel tasks requested from Slurm
  • <USERNAME>: Your Chinook username (same as your UA username)

There are many environment variables that Slurm defines at runtime for jobs. Here are the ones used in the above script:

  • $SLURM_JOB_ID: The job's numeric id
  • $SLURM_NTASKS: The value supplied as <NUMTASKS>
  • $SLURM_SUBMIT_DIR: The current working directory when "sbatch" was invoked

Interactive Jobs

Command Line Interactive Jobs

Interactive jobs are possible on Chinook using srun:


chinook:~$ srun -p debug --ntasks=24 --exclusive --pty /bin/bash

The above command will reserve one node in the debug partition and launch an interactive shell job. The --pty option executes task zero in pseudo terminal mode and implicitly sets --unbuffered and --error and --output to /dev/null for all tasks except task zero, which may cause those tasks to exit immediately.

Displaying X Windows from Interactive Jobs

A new module named "sintr" is available to create an interactive job that forwards application windows from the first compute node back the local display. This relies on using X11 forwarding over SSH, so make sure to enable graphics when connecting to a chinook login node. An SSH key pair on a Chinook login node will have to be generated which can be done by running the ssh-keygen -t rsa command:


chinook00 % ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/u1/uaf/<USERNAME>/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
...

The command will prompt you for the location to save the file, using /u1/uaf/<USERNAME>/.ssh/id_rsa as the default. The rsa key pair must be saved in that file. You will also be prompted for a passphrase for the key pair which will be used when connecting to a compute node with sintr. The contents of $HOME/.ssh/id_rsa.pub must then be added to $HOME/.ssh/authorized_keys. This can be done with the following command:

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The sintr command accepts the same command line arguments as sbatch. To launch a single node interactive job in the debug partition, for example, follow these steps:


chinook:~$ module load sintr
chinook:~$ sintr -p debug -N 1
Waiting for JOBID #### to start.
...

The command will wait for a node to be assigned and the job to launch. As soon as that happens, the next prompt should be on the first allocated compute node, and the DISPLAY environment variable will be set to send X windows back across the SSH connection. It is now possible to load and execute a desired windowed application. Here's an example with Totalview.


bash-4.1$ module load totalview
bash-4.1$ totalview

After exiting an application, exit the session too. This will release the allocated node(s) and end the interactive job.


bash-4.1$ exit
exit
[screen is terminating]
chinook:~$

Third-Party Software

Installing Your Own Software

Individuals and research groups may install third party applications and libraries for their own use in the following locations:

  • $HOME
  • /usr/local/unsupported

Packages built for personal use only should be installed in $HOME.

The /usr/local/unsupported directory is intended to host user-installed and maintained software packages and datasets that are shared with a group of users on the system. Users who add content to /usr/local/unsupported are fully responsible for the maintenance of the files and software versions. Please read the /usr/local/unsupported/README.RCS file for more information.

To request a new subdirectory within /usr/local/unsupported, please contact RCS with the following information:

  • The name of the requested subdirectory, which can be your project's name (e.g., UAFCLMIT) or the type of software you intend to install in the directory (e.g., "ClimateModels")
  • A general description of what you intend to install
  • A rough estimate of the amount of storage you will need (e.g., 100 MB)

Using The Software Stack

Chinook already has builds of many third-party software packages (see below for a listing). There are often multiple builds of a particular software package - different versions, different compilers used to build the software, different compile-time flags, et cetera. To avoid conflicts between these many disparate package builds, Chinook employs an environment module system you can use to load and unload different combinations of software packages into your environment.

What are Environment Modules?

The environment modules found on Chinook (often referred to simply as "modules") are Tcl script files that are used to update shell environment variables such as PATH, MANPATH, and LD_LIBRARY_PATH. These variables allow your shell to discover the particular application or library as specified by the module. Some environment modules set additional variables (such as PYTHONPATH or PERL5LIB), while others simply load a suite of other modules.

Common module commands

Command Result
module avail list all available modules
module avail pkg list all available modules beginning with the string pkg
module load pkg load a module named pkg
module swap old new attempt to replace loaded module named old with one named new
module unload pkg unload a module named pkg
module list list all currently-loaded modules
module purge unload all modules
module show pkg summarize environment changes made by module named pkg (sometimes incomplete)

Searching for modules

Because module avail will search for the provided string only at the beginning of a module's fully-qualified name, it can be difficult to use module avail to search for modules nested in any kind of hierarchy. This is the case on Chinook - modules are categorized, then named. Here are some examples:

  • compiler/GCC/version
  • devel/CMake/version
  • math/GMP/version

To find modules for GCC using a pure module avail command, you would need to run module avail compiler/GCC. This is difficult, because you must already know that the module is in the compiler category.

To make things more complicated, module avail is also case-sensitive. Running module avail devel/cmake will not find the module named devel/CMake/version.

Better module searching

One workaround for these impediments is to combine module avail output with grep's full-text case-insensitive string matching ability. The example below additionally uses Bash file descriptor redirection syntax to redirect stderr to stdout because module avail outputs to stderr.

module avail --terse 2>&1 | grep -i pkg

replacing pkg with the string you are searching for.

RCS is currently evaluating Lmod as a replacement for Chinook's current environment modules framework. Lmod has many desirable features, including but not limited to a more user-friendly module avail behavior.

For more information on Chinook's module framework, please visit http://modules.sourceforge.net/index.html.


Compiler Toolchains

Compiler toolchains are modules that bundle together a set of compiler, MPI, and numerical library modules. To use a compiler toolchain, load the compiler toolchain module and all the submodules will be loaded. This will set variables such as PATH, CPATH, LIBRARY_PATH, LD_LIBRARY_PATH, and others. Other variable conventions such as CC and CXX are not automatically defined.

Since Chinook is an Intel-based HPC cluster, RCS defaults to compiling software using Intel-based compiler toolchains.

Toolchain Name Version Comprises
foss 2016b GNU Compiler Collection 5.4.0, Penguin-modified OpenMPI 1.10.2, OpenBLAS 0.2.18, FFTW 3.3.4, ScaLAPACK 2.0.2
pic-intel 2016b Intel Compiler Collection 2016.3.210 (2016 update 3), Penguin-modified OpenMPI 1.10.6, Intel Math Kernel Library (MKL) 11.3.3.210

MPI Libraries

RCS defaults to compiling software against OpenMPI.

Name Version Compiled by Notes
Intel GCC
MPICH2 1.5
MVAPICH2 2.1
MVAPICH2-PSM 2.1
OpenMPI 1.10.2 Included in pic-intel, pic-foss compiler toolchains
OpenMPI 1.6.5
OpenMPI 1.7.5
OpenMPI 1.8.8

Maintained Software Installations

As of 2016-10-19.

Name Version Compiled by Notes
Intel GCC
Autoconf 2.69
Automake 1.15
Autotools 20150215
BamTools 2.4.0
BayeScan 2.1
BCFtools 1.3.1
binutils 2.26 Included in pic-foss compiler toolchain
Bison 3.0.4
Boost 1.61.0
BWA 0.7.15
bzip2 1.0.6
cairo 1.14.6
CMake 3.5.2
cURL 7.49.1
Doxygen 1.8.11
ESMF 7.0.0
expat 2.2.0
FASTX-Toolkit 0.0.14
FFTW 3.3.4 Included in pic-foss compiler toolchain
flex 2.6.0
fontconfig 2.12.1
freetype 2.6.5
g2clib 1.4.0
g2lib 1.4.0
GCC 5.4.0 Included in pic-foss compiler toolchain
GDAL 2.1.0
gettext 0.19.8
GLib 2.49.5
GMP 6.1.1
GSL 2.1
gzip 1.6
HDF 4.2.11
HDF5 1.8.17
HTSlib 1.3.1
icc 2016.3.210 Included in pic-intel compiler toolchain
idl 8.4.1
ifort 2016.3.210 Included in pic-intel compiler toolchain
imkl 11.3.3.210 Included in pic-intel compiler toolchain
JasPer 1.900.1
libffi 3.2.1
libgtextutils 0.7
libjpeg-turbo 1.5.0
libpng 1.6.24
libreadline 6.3
libtool 2.4.6
libxml2 2.9.4
M4 1.4.17
makedepend 1.0.5
MATLAB R2015a
MATLAB R2015b
MATLAB R2016a
Mothur 1.38.1.1
NASM 2.12.02
NCL 6.3.0 Binary distribution
ncurses 6.0
netCDF 4.4.1
netCDF-C++4 4.3.0
netCDF-Fortran 4.4.4
OpenBLAS 0.2.18 Included in pic-foss compiler toolchain
PCRE 8.39
Perl 5.22.1
pixman 0.34.0
pkg-config 0.29.1
Python 2.7.12
SAMtools 1.3.1
ScaLAPACK 2.0.2 Included in pic-foss compiler toolchain
Singularity 2.2
SQLite 3.13.0
Szip 2.1
Tcl 8.6.5
Tk 8.6.5
UDUNITS 2.2.20
VCFtools 0.1.14
X11 20160819
XZ 5.2.2
zlib 1.2.8

Software requests

RCS evaluates third-party software installation requests for widely-used HPC software on a case-by-case basis. Some factors that affect request eligibility are:

  • Applicability to multiple research groups
  • Complexity of the installation process
  • Software licensing

If a third-party software installation request is found to be a viable candidate for installation, RCS may elect to install the software through one of several means:

  • RPM
  • Binary (pre-built) distribution
  • Source build

If an application or library is available through standard RPM repositories (Penguin Computing, CentOS, EPEL, ...) then the RPM may be installed. Users should test the installed software to determine if it meets requirements. If the RPM version does not meet needs, please contact RCS to have alternate installation methods evaluated.

Software that is not installed as an RPM will be installed in a publicly-available location and be accessible via Linux environment modules. If the software is built from source, then RCS will default to using the Intel compiler suite.

MATLAB Usage

Policy

Due to the shared nature of the Chinook login nodes RCS requests that MATLAB be run on the RCS Linux workstations for data processing and on the Chinook compute nodes for computationally intensive tasks. Please see the Remote Login page for more information on running MATLAB on the workstations and the MATLAB parallel computing section for information on running MATLAB on the compute nodes.

MATLAB sessions that are launched on the Chinook login nodes may be killed at RCS discretion due to these jobs affecting all users on the login nodes.

MATLAB Parallel Computing

MATLAB jobs can execute tasks in parallel either using 24 cores on a single node, which requires no additional setup, or multiple cores on multiple nodes which may require some settings to be configured. Jobs on a single node can be submitted through Slurm, while jobs that use multiple nodes must be submitted through a MATLAB session.

Parallel Jobs

Parallel jobs in MATLAB run on "parallel pools" which is a collection of MATLAB workers that will run tasks on the Chinook cluster. Some useful commands and things to keep in mind are the following:

  • MATLAB workers do not have access to any graphical capabilities so data cannot be plotted in parallel
  • Parallel jobs make use of the the parfor, parfeval, and/or the spmd MATLAB directives
  • parfor is used for loops that are not interdependent, and each iteration of the loop will be run on a separate core
  • pareval is used for functions that can be run asynchronously
  • spmd stands for Single Program Multiple Data, and is for functions that operate on different data, but use the same code. This can spread each instance to its own worker, doing work in parallel. This is most often accomplished with the use of the labindex, the ID of an individual worker

For more information please see the MATLAB Parallel Computing documentation

Single Node

Jobs that use only a single node (up to 23 workers or threads in the debug or small queue) may be submitted directly to the Slurm batch scheduler. When submitting a job directly to Slurm the parpool profile must be set to 'local' to use the CPUs available to the compute node. Do not use the 'local' profile when submitting jobs through MATLAB or on the login nodes or Workstations as this will use as many cores as specified and will affect all users on a system.

Submitting a MATLAB script to be run on compute nodes requires using the Batch Submission System. Your MATLAB script must initially set up a parpool which will launch the set of threads to be used for a parallel process. For example the following MATLAB script will create a parpool to generate a set of random numbers:

    
%============================================================================
% parallelTest.m - Parallel Random Number - A Trivial Example
%============================================================================
% this will read in the "ntasks" provided by the Slurm batch submission script
parpool('local', str2num(getenv('SLURM_NTASKS')))
size = 1e7;
endMatrix = [];
tic % Start timing
parfor i = 1:size
    % Work that is independent of each task in the loop goes here
    endMatrix = [endMatrix; rand]
end
timing = toc; % end timing
fprintf('Timing: %8.72f seconds.n', timing);
delete(gcp); % Clean up the parallel pool of tasks
exit;

To submit this to the Chinook compute nodes a batch script must be created:

    
#!/bin/bash
#SBATCH --partition=$PARTITION
#SBATCH --ntasks=24
#SBATCH --tasks-per-node=24
#SBATCH --time=DD:HH:MM:SS
#SBATCH --job-name=$JOBNAME
module purge
module load slurm
module load matlab/R2016b
cd $SLURM_SUBMIT_DIR
matlab -nosplash -nodesktop -r "/path/to/parallelTest"

If this file were named parallelMatlab.slurm you would then submit it with:
chinook00 % sbatch parallelMatlab.slurm

Multiple Nodes

MATLAB jobs that use multiple nodes must be run in an interactive session. Some settings may need to be set by a user to allow MATLAB to interactively submit jobs to the Slurm batch scheduler.

Add the Slurm folder to MATLAB's PATH

MATLAB GUI

  • Click the Set Path button in the toolbar
  • Check if /import/usrlocal/pkg/matlab/matlab-R2016b/toolbox/local/slurm is in your path
  • If not
    • Click Add Folder
    • Type in or copy /usr/local/pkg/matlab/matlab-R2016b/toolbox/local/slurm to the Folder name field
    • Click Save, save pathdef.m to $HOME/.matlab/R2016b/pathdef.m

MATLAB Command Line

  • Run the path function
  • Check to see if /import/usrlocal/pkg/matlab/matlab-R2016b/toolbox/local/slurm is in your path
  • If not run addpath('/usr/local/pkg/matlab/matlab-R2016b/toolbox/local/slurm');

Import the MATLAB Parallel Profile

To use the MATLAB DCS you will need to import the Parallel Profile for Chinook. This can be done through the MATLAB GUI or the command line.

MATLAB GUI

  • Click the Parallel drop down menu in the toolbar
  • Click Manage Cluster Profiles
  • Click Import
  • Navigate to /usr/local/pkg/matlab/slurm_scripts/
  • Select chinook.settings and click Open

You should now have debug, t1small, t1standard, t2small, and t2standard in your Cluster Profile Window

Command Line

                
>> clusterProfile = parallel.importProfile('/usr/local/pkg/matlab/slurm_scripts/chinook');
>> clusterProfile
            
            

You should see a 1x5 cell array containing 'debug','t1small','t1standard','t2small', and 't2standard'

NumWorkers and NumThreads

Two other variables need to be set before running a job on the MATLAB DCS: NumWorkers and NumThreads.
NumWorkers is the number of MATLAB Workers available to the job. By default each Worker is assigned to a core on a node. Our license allows 64 Workers to be used across all users. You will need one more Worker than the number that your job requires because one Worker is dedicated to managing the others. For a two node job you can use up to 47 Workers for computations, and for three node jobs, 63 Workers.

Multiple threads can be assigned to a worker, if a Worker can benefit from multiple threads. NumWorkers * NumThreads should be less than the total number of cores that are available. If not the job will use every core and may have unexpected behavior. There are two ways to set NumWorkers and NumThreads:

MATLAB GUI
  • Click on the Parallel button in the toolbar
  • Click on Manage Cluster Profiles
  • Click on the Profile you wish to edit
  • Click on Edit in the bottom right corner of the window
  • In Number of workers available to cluster NumWorkers enter the number of workers (must be less than 64)
  • In Number of computation threads to use on each worker NumThreads enter the number of threads
Command Line
                
                myCluster = parcluster('$PARTITION');
                myCluster.NumWorkers = $NumWorkers; %less than 64
                myCluster.NumThreads = $NumThreads
            
            

Running Jobs

Parallel jobs can be run during an interactive session of MATLAB by first setting up a Parallel pool on a partition. The Parallel pool will only start if there are nodes available. Starting a parallel pool can be started with the following commands in MATLAB:
myCluster = parcluster('$PROFILE');
where $PROFILE is the name of one of the queues on Chinook (debug, t1/t2standard, t1/t2small). This will start a parallel pool where can be done. Scripts or functions that are run in the interactive session of MATLAB that use parfor, parfeval, or spmd will run that code in the parallel pool that you have open.

When you are done with the parallel pool you will have to close it using the following command in MATLAB:
delete(gcp);

Submitting Jobs

You can also submit jobs to the queue using MATLAB. The profiles that can be submitted to in this way are debug, t1small, t1standard, t2small, and t2standard, where each profile corresponds to a partition available on Chinook. For more information about the partitions on Chinook please see our partition overview.

To submit the job you will need to call the MATLAB batch function. The example below shows how to submit a function to be run on the compute nodes:

            
%Leaving this option blank will use the chosen default profile%
myCluster = parcluster('$PROFILE');

%Run the script and pass its arguments. 'pool' sets the number workers%
%numWorkers must be at most 1 less than the total number of workers allocated in the profile%
%because MATLAB uses one additional worker to manage all the others%
myJob = myCluster.batch(@$FUNCTION,NumberOfOutputs,{input1,input2....,inputN},'pool',$NUMWORKERS);

%Wait for the job to finish before doing anything else%
wait(myJob);

%If submitted interactively do work on the output matrices%
WORK HERE

%Clean up the job%
destroy(myJob);
delete(gcp); 
            
        

where $PROFILE is the partition you wish to use, $FUNCTION is the function, in a .m file, that you wish to run, and $NUMWORKERS is the number of Workers you wish to use.

Customizing sbatch Parameters for MATLAB Job Submission

For multi-node jobs you may have to modify the CommunicatingSubmitFunction may need to be modified to shorten the Walltime on a job. The steps to do this are the following:

  • Create a location for your custom MATLAB CommunicatingSubmitFunction mkdir ~/matlab_scripts for example
  • Copy communicatingSubmitFcn.m from /usr/local/pkg/matlab/slurm_scripts/communicatingSubmitFcn.m ~/matlab_scripts/customCommunicatingSubmitFcn.m
  • For clarity name your personal copy of communicatingSubmitFcn.m to something other than communicatingSubmitFcn.m to insure your profile calls the correct function.

  • Modify the name of the function in customCommunicatingSubmitFcn.m as well
  • Modify the following line in your customCommunicatingSubmitFcn.m:
    
    additionalSubmitArgs = sprintf('--partition=t1standard --ntasks=%d', props.NumberOfTasks);
                    
                

    and add in --time=D-HH:MM:SS to the section in sprintf like so:

    
    additionalSubmitArgs = sprintf('--partition=t1standard --time=1-12:00:00 --ntasks=%d', props.NumberOfTasks);
                
            
  • After you've made your parcluster run the following command: set(myCluster, 'CommunicatingSubmitFcn', @customCommunicatingSubmitFcn); and run your job as normal

Compiling from Source Code

Compiling C, C++, and Fortran code on Chinook is reasonably similar to how it is done on regular CentOS systems, with some important differences. This page will try to outline both areas, focusing mostly on the differences.

Tools

Available Compilers

The default compiler suite on Chinook is the Intel Parallel Studio XE Composer Edition, providing icc, icpc, and ifort. Intel compilers are designed for best performance on Intel processors, which can be taken advantage of on Chinook.

The GNU Compiler Collection is also available and maintained on Chinook, providing gcc, g++, and gfortran. GNU compiler compatibility is ubiquitous across free and open-source software projects, which includes much scientific software.

For docmentation on each of these compiler suites, please refer to the following:

Open-source Linear Algebra / FFT

The following free and open-source linear algebra and fast Fourier transform libraries have been built using GCC and are available for use:

  • OpenBLAS (includes LAPACK)
  • ScaLAPACK
  • FFTW

Intel MKL

The Intel Math Kernel Library (MKL) is available for use on Chinook. MKL offers Intel-tuned versions of all of the above open-source libraries, effectively replacing them.

For more information on linking against MKL, see Intel's MKL Linking Quick Start. Of particular note is the online MKL Link Line Advisor, which will generate appropriate link flag strings for your needs.

For more information on MKL itself, see Intel's MKL documentation.

System Architecture

Physical Architecture

Chinook currently has four login nodes:

  • chinook00.alaska.edu
  • chinook01.alaska.edu
  • chinook02.alaska.edu
  • chinook03.alaska.edu

chinook.alaska.edu will point to one of the above login nodes.

All Chinook compute nodes are Intel Relion 1900 compute nodes with either dual Intel Xeon E5-2685 v3 12-core processors (24 cores per node) or dual Intel Xeon E5-2690 v4 14-core processors (28 cores per node), and 128GB RAM.

Software Architecture

Chinook currently runs the CentOS 6.9 operating system (Linux kernel version 2.6).

Recent versions of the Intel and GNU compiler collections, several different MPI implementations, and core math libraries are available on Chinook. For more details, please refer to the list of third-party software maintained on Chinook.

Community Condo Model

Chinook is a community, condo model high performance computing (HPC) cluster. Common infrastructure elements such as the environmentally regulated data center, network connectivity, equipment racks, management and technical staff, and a small amount of CPUs provide subsidized resources to PIs that they may not be able to procure individually, and allows them to focus time and energy on research, rather than owning and operating individual clusters.

Participants in the condo service share unused portions or elements of the computational resources they add to Chinook with each other and non-invested users - such as students or occasional users - who may or may not pay a fee for access. A queue management system gives vested PIs top priority to the share he/she has purchased whenever the PI needs the resource. RCS also reserves the option to use manual or automated job preemption to interrupt community user jobs as needed to give vested PIs access to their share.

Tier 1: Community Nodes

This level of service is open to the UA research community using nodes procured for the community and unused portions of shareholder nodes. Users in this tier receive:

  • Unlimited total compute node CPU hours
  • Lower initial job priority
  • 10 GB $HOME quota
  • 1 TB per project shared Lustre storage quota ($CENTER1, unpurged)
  • Access to job queues with limited wall time
  • Standard user support (account questions, software requests, short job debugging and diagnosis assistance, ...)

Tier 2: Shareholder Shared Nodes

This level of service is for the PI or project that requires CPUs beyond what can be offered by Tier 1 or requires priority access to HPC resources. Users in this tier are shareholders that procure equipment or support and receive:

  • Unlimited total compute node CPU hours
  • Higher initial job priority, weighted by number of shares procured
  • 10 GB $HOME quota
  • 1 TB + 1 TB/node project shared Lustre storage quota ($CENTER1, unpurged)
  • Access to job queues with limited wall times greater than Tier 1
  • Short term reservations (contact uaf-rcs@alaska.edu to arrange)
  • Higher priority given to user support requests

Tier 3: Shareholder Dedicated Nodes

This level of service is for the PI or project that requires dedicated resources. RCS will manage and operate procured nodes for an additional service fee. Users interested in this level of service should contact RCS.

  • Limited CPU to procured nodes and infrastructure components
  • Limited Lustre (equal to Tier 1 unless additional capacity procured by PI/project) + DataDir storage (pending)
  • No priority or preemption rights to Tier 1 or Tier 2
  • Dedicated queue(s) with unlimited wall times

Purchasing Chinook Shares

Please contact RCS (uaf-rcs@alaska.edu) if you are interested in purchasing compute nodes or support and becoming a tier 2 or 3 shareholder. All node types include licenses for Scyld ClusterWare, Mellanox UFM, and Linux with a minimum 3-year service contract. The purchase price provides shares in Chinook that align with the warranty of the equipment purchased. When shares expire, the resources must be upgraded or the warranty must be renewed, otherwise the resources revert to the community pool and the project will be given a Tier 1 status.

For pricing see the RCS Rates page

Lifecycle Management

All compute nodes include factory support for the duration of the warranty period. During this time any hardware problems will be corrected as soon as possible. After the warranty expires, compute nodes will be supported on a best-effort basis until they suffer complete failure, are replaced, or reach a service age of 5 years. Once a node has reached end-of-life due to failure or obsolescence, it will be removed from service.