Parallelize ‘graph4lg’ R package

Collin

Calculating dPC using the ‘graph4lg’ R package using the graphab_metric function. Is there a way to parallelize this heavy computation?

Paul Savary

Hi Collin,

Although graph4lg is an R package, its graphab_x() functions are essentially wrappers calling the Java language Graphab software program, after preparing the Java command lines corresponding with the arguments you provide. More specifically, it calls Graphab 2.8 downloaded on your machine by the get_graphab() function at the first use and located in the repository whose path is given by rappdirs::user_data_dir().

For this reason, usual ways to parallelize R functions are ineffective (for instance using foreach loops).

There is however a way to parallelize the computations performed by these functions on several cores. It consists in asking Java to do it.
The Graphab command line functionalities make it possible to include a -proc n argument, where n is the number of processors/cores on which the computation is parallelized.

Therefore, it is possible to add this argument to the command line provided by R. I have modified the graphab_metric() function so that this is possible.
You can now find the graphab_metric_para() function on the development version of graph4lg hosted on gitlab: https://gitlab.com/psavary3/graph4lg/-/blob/master/R/graphab_metric_para.R?ref_type=heads

You can either download the whole package from gitlab, or just load this function manually. When downloading the package, you might need to call the function with graph4lg:::graphab_metric_para() given that I parametrized it as an "internal" function.

The "proc" argument will require the number of cores you want to run the computation on (default = 1).

Prior to using it, you can check how many cores are available on your machine using: parallel::detectCores()
It is recommended to use at most n - 1 cores, n being the total number of cores on your machine.

I admit that this is a quick fix and that ideally, all the Graphab functions of graph4lg should have this argument built in. That's for another version!

Thanks for your interest and for reaching out
Feel free to tell if that doesn't work for you

Best,
Paul

Collin

Hi Paul,

Thanks for the thorough reply. Unless I am mistaken, I am now seeing that when I run graphab_metric(), it automatically uses the total number of cores available. For example, when I run on my personal computer it uses 8 cores. But I am currently running an analysis on a high performance computing cluster, and it appears to be running on 28 cores. I'm assuming your solution here would be able to modify this? I will play around with it when I have the time!

Best,

Collin

Paul Savary

Hi Collin,

When using the new graphab_metric_para() function, you should be able to specify the number of cores that both your machine or a computing cluster will use (argument proc). Your understanding is correct. If there are 28 cores available, you can use several of them.

However, I would recommend to ask the manager of the computing cluster you use whether asking Java to use several cores will not cause any issues. It can take a lot of RAM and disturb their queuing systems. I have myself generated troubles on a computing cluster a couple times when using multiple cores and asking each core to take a lot of RAM. So, just in case, I would recommend to check it before.

I hope that helps. Feel free to ask other questions if you encounter issues. Note though that I am not a computer expert and that some answers might come from the manager of the specific facility you use.
All the best,
Paul

gvuidel

Hi Collin,
Graphab Java program parallelize metric calculation by default using n-1 cores of the computer.
If you want to change this behaviour you can use proc parameter as suggested by Paul.
There are two other ways to set the number of cores/processors used by default by Graphab Java program:
1 - Launch Graphab Java GUI and set the number of cores in the menu File -> Preferences
2 - Modify the value of the key PROC in the preference file: .java/.userPrefs/org/thema/graphab/prefs.xml for Linux and MacOS or in the Windows registry: HKEY_CURRENT_USER\Software\JavaSoft\Prefs\org\thema\graphab

Best,
Gilles