Blog Archive

Wednesday, August 10, 2011

Software - Matlab Running MATLAB in HPCC:

http://www.eng.auburn.edu/ens/hpcc/software_matlab.html

Running MATLAB in HPCC:

You can type "matlab &" and matlab will pop-up.

  • Now at this time if you run any program it will be running on Head node which is not desired
  • You must configure your parallel computing toolbox in-order to run your job in multiple processor and nodes
  • Your code also must be parallel (like parfor-parallel for loop) in order to take advantage of multi-processors
  • Running sequential code in multiple processors does not help in computation to be done faster and it is identical to running job in a single processor

Configuring Parallel Computing Toolbox:

  • Click on Parallel and Manage Configuration

parallel1

  • Select File > New > Torque

parallel2

  • In SecureCRT write "pwd" and hit Enter. It will give you your current home directory address

parallel3

  • Fill the blanks in Scheduler tab as follows
  • Configuration Name Torque
  • Root Directory /export/apps/MATLAB
  • Number of worker 4
  • Your home directory that you found from pwd -- something like /home/me_h3/au_user_id
  • Resource list parameter field: -l nodes=1:ppn=16,walltime=50:00:00 (That implies, you are requesting for 1 compute node (out of 4) 16 processor for 50 hours)
  • Additional Command Set to Empty

parallel4

  • Now in the Jobs tab, write maximum number of worker to 16 and minimum number of worker to 1
  • Maximum number of worker will be number of nodes*number of processor you asked for. That is nodes=1:ppn=16 you asked and your maximum number of worker is 16. If you would have asked for nodes=2:ppn=4, maximum number of worker would be 8

parallel5

  • Leave the Task tab unchanged and hit OK
  • Click on the radio button Torque and hit start validation

parallel6

parallel7

  • Once all the stages are validated successfully, close the window and make sure that your default configuration is Torque

parallel8

Important Notes**

  • Now you are ready to go and it will be your default configuration
  • You are only in need to pass this validation only for once
  • Now everytime you run your parallel code, it will use this configuration (1 node and 16 processor)
  • If you want more processor and worker, you need to validate it once again with the desired value
  • It is recommended that try using 4-10 processor, if not satisfied try using 10-15 processor and so on and see if the performance is increased or decreased
  • For longer simulation that requires 1 or 2 days, it is better to ask 32-50 processor
  • But for small program that requires 1-2 hours, it is recommended to use 4-8 processor because it requires time to allocate that many processors, communication overhead (in terms of time) to talk in between processors
  • Actually your parallel code is de-assembled in to several modules and sent to different workers (which is equal to number of processor) and after the completion of task worker sends the result to head node and it is re-assembled to show you the result so it takes time
  • It has been tested running program on multiple nodes takes more time than running in single node (running on nodes=1:ppn=16 is faster than nodes=2:ppn=8 but in both cases number of processor is 16)
  • It takes time to talk back and forth with different nodes (if job is running on multiple nodes) because it needs time to allocate different node and there is a communication overhead (in terms of time) between the nodes
  • To get a program running faster is also very much dependent on your coding style, say you have a parallel for loop that assigns value to an array within that loop but it has data dependency to another sequential loop, and in that case it won't run faster

For details please visit:

http://www.mathworks.com/products/parallel-computing/

http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/

http://www.mathworks.com/access/helpdesk/help/pdf_doc/distcomp/distcomp.pdf

http://static.msi.umn.edu/tutorial/scicomp/general/intro_parallel_prog/content.html

https://computing.llnl.gov/tutorials/parallel_comp/

  • Finally running a SEQUENTIAL code in multi node, multi processor won't help you in anyway, it will just simply run on 1 single high speed processor with high memory although you asked for may be 1 node and 16 processor and You'll get the result (that you could not run on your local machine)
  • It is recommended to ask for nodes=1:ppn=1 if you wish to run sequential program in cluster computer

Matlab Job Submission:

Once you validated your parallel computing toolbox, job can be submitted in two different ways

  • Graphical User Interface mode-In this case you need to invoke matlab by "matlab &" and run a parallel job the way you usually run job in your local machine but you need to keep open your computer and matlab all the time unless your job is done and it is not a better idea to keep on your machine up for 1-2 days!
  • Batch mode-You can submit your job using a shell script file, and check back the result later (meanwhile you can terminate your secureCRT connection, go to sleep, turn off your machine!) may be after 1-2 days, and it is highly recommended to submit the job in batch mode if your simulation requires 1-2 days

In the next section we will go through in details of submitting jobs

GUI mode:

  • Notice that matlabpool open is used to open 16 labs at the beginning of the code and it must be closed when the program is done, so you should append matlabpool close at the end of your code
  • datestr(now) is used to print timestamp to measure the time it required for simulation
  • Notice that the inner loop is parfor (parallel for loop), for every sequential (j=1 to 12) iteration, there will be 16 parallel iteration (i=1 to 16)
  • print -djpeg aplotted is a .jpeg file that will save your plot

gui1

  • Job was assigned to 16 labs

gui2

  • Give a "showq" in secureCRT terminal and you'll see your job is running in 16 processors

gui3

Batch Mode:

  • You need a script file like this

mat.sh

batch1

  • Notice that it requires 1 processor atelast to through the job, so in this case it will be 17 processor in total (16 that was set your parallel computing toolbox + 1 to submit)
  • Walltime 50:00:00 means your job will be killed automatically if it is not finished within 50 hours
  • #PBS -o default.out is your default output file unless you specify any other output file
  • #PBS -e errorfile is your default error file, any error generated in your Matlab interactive command prompt will be redireted here
  • You need to set environmental varialbles and path in exact this way that has been showed
  • In the last line it is invoking mywave.m and output (that will be generated in Matlab interactive command window) will be redircted to mywave.out
  • As you cannot see any plot, you need "print -djpeg any_name" after your plot command in your code which will save the plot result in jpg format that you can copy from your H drive (or winSCP) and see the result (This code is shown in GUI mode example, see the 3rd line from last)

Use qsub:

  • Once you have that similar script file, make sure that it is executable (you can see it by ls -lrt and you'll see something like rwx-), x stands for executable, in order to be in safe side you can make your script file executable by "chmod 744 file_name.sh"
  • You just need to change source_code.m and output.out file name in ths shell script and it is provided below so you can copy, and do a "chmod"

#!/bin/bash

#PBS -l nodes=1:ppn=1,walltime=50:00:00

#following 2 line ensures you'll be notified by email after job completion

#PBS -M au_user_id@auburn.edu

#PBS -m e

#PBS -o default.out

#PBS -e errorfile

LD_PRELOAD=$MATLAB/export/apps/MATLAB/bin/glnxa64/libguide.so

/export/apps/MATLAB/bin/matlab -nodisplay -nosplash < mywave.m > mywave.out

Write a file (for writing a file in linux vi editor see the earlier example of writing .rhosts file) that contains these lines and give a name (say mat.sh) and "chmod 744 mat.sh"

  • Submit the job using "qsub ./mat.sh"

qsub1

qsub2

So now you already submitted your job. You can see the job running by showq (initially you'll see 1 processor job and then when it is connected to 16 lab, you'll see 17 processor job is running). You can terminate your session (by typing "exit") and check back (login) later to see if your job is finished. You no longer need to keep the secureCRT connection.

No comments:

Post a Comment