+ - 0:00:00
Notes for current slide
Notes for next slide

Parallel programming

450X

Stanford University

Department of Political Science


Toby Nowacki

1 / 16
2 / 16

Overview

  1. Measuring speed
  2. Parallel programming (locally)
  3. Using Farmshare: basics
  4. Using Farmshare: shell scripts
3 / 16

Measuring execution speed

  • Useful for benchmarking and troubleshooting
  • sys_time() saves current computer time
  • system.time({ input }) as a wrapper
test_function <- function(x){ Sys.sleep(x)}
start <- Sys.time()
test_function(10)
end <- Sys.time()
start - end
## Time difference of -10.00359 secs
system.time({test_function(10)})
## user system elapsed
## 0.017 0.001 10.022
4 / 16

Parallel processing

(Pictures)

5 / 16

Parallel processing

  • In R, use doParallel package.
  • makeCluster(n) specifies the number of parallel processors to be set up
  • registerDoParallel(cluster) sets them up in the backend
  • %dopar% is a new operator for this kind of task
6 / 16

A simple example

  • Setting up example with one core
get_sample_dist <- function(n, x){
sample(x, size = n, replace = TRUE) %>% mean}
rand_seq <- runif(100000, min = 1, max = 1000)
n <- 1000
rep_size <- 10:100
one_core <- system.time({map(rep_size, ~ replicate(.x, get_sample_dist(n, rand_seq)))})
7 / 16

A simple example

  • Setting up parallel thread example
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
two_cores <- system.time({foreach(i = rep_size,
.packages = "tidyverse",
.combine = c) %dopar% {
replicate(i, get_sample_dist(n, rand_seq))
}
})
stopCluster(cl)
8 / 16

A simple example

one_core
## user system elapsed
## 0.541 0.016 0.558
two_cores
## user system elapsed
## 0.061 0.006 1.201
six_cores
## user system elapsed
## 0.052 0.006 1.479
9 / 16

Bootstrap example

  • Bootstrapping example (taken from doParallel vignette)
  • Parallel processing
cl <- makeCluster(6)
registerDoParallel(cl)
x <- iris[which(iris[, 5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
r <- foreach(icount(trials),
.combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
ptime
## elapsed
## 5.582
10 / 16

Bootstrap example

  • Sequential processing
x <- iris[which(iris[, 5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
r <- foreach(icount(trials),
.combine=cbind) %do% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
ptime
## elapsed
## 17.398
11 / 16

Farmshare: logging in

  • Sherlock for high-performance computing, Farmshare for coursework
  • Login occurs through Terminal:
ssh username@rice.stanford.edu
  • Folder navigation via cd command
  • List of elements at current level via ls

  • The same commands can be used to navigate on your local machine!

12 / 16

Farmshare: running R

  • (Sherlock only:) in order to run R remotely on the server, need to load module:
ml R
  • (Farmshare only:) no need to load module, R works if you call it from command line.
13 / 16

Farmshare

  • to run a whole script remotely, best to define and run a batch job:
#!/bin/sh
#SBATCH --time=20:00:00
#SBATCH --mem=20000
#SBATCH --cpus-per-task=4
#SBATCH --job-name="test_iterations"
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.out
#SBATCH --mail-user=toby.nowacki@gmail.com
#SBATCH --mail-type=ALL
#SBATCH --workdir=/home/tnowacki/strategic_voting
Rscript code/iterations_v4.R
14 / 16

Farmshare

  • The file needs to be submitted to the job manager, slurm, as follows:
sbatch path_to_file/file.sh
  • Can also specify options via command line. Multiple parameters defined in helpfile:
sbatch -p=bigmem path_to_file/file.sh
sbatch -help
  • Check on status of queued jobs as follows:
squeue -u username
  • Cancel jobs as follows:
scancel jobname
15 / 16
16 / 16
2 / 16
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow