Comparison of application of Rcpp and rJava in R

Introduction

I implemented a simple algorithm that computes distances between all pairs of given set of n-dimensional points. The algorithm is implemented in C++ and Java. To communicate with C++ code, I use R’s Rcpp package. To communicate with Java, I use two methods. The first one is a simple approach where the communication is made through temporary disk files containing the input and output data, and the Java program is called through the system() function. In the second approach, the communication is made with help of the rJava package.

Code excerpts

The Java code used through the rJava package looks as follows:

public class CalculateDistances {
    public static double[] run(double[][] pts){
        double[] distances = new double[((pts.length-1)*pts.length)/2];
        int k = 0;
        for(int i = 0; i < pts.length-1; i++){
            for(int j = i+1; j < pts.length; j++){
                distances[k] = calculateDistance(pts[i], pts[j]); 
                k++;
            }
        }
        return distances;
    }

    private static double calculateDistance(double[] pt0, double[] pt1){
        double sum = 0;
        for(int d = 0; d < pt0.length; d++){
            sum += Math.pow(pt0[d]-pt1[d], 2);
        }
        return Math.sqrt(sum);
    }
}

The C++ code used through the Rcpp package is the following:

 #include "calculate_distances.h"

 #include 

 double calculateDistance(const Rcpp::NumericVector& pt0, 
         const Rcpp::NumericVector& pt1){
     double sum = 0;
     for(int d = 0; d < pt0.size(); d++){
         sum += pow(pt0[d]-pt1[d], 2);
     }
     return sqrt(sum);
 }

SEXP calculateDistances(SEXP matrix){
    using namespace Rcpp ;

    NumericMatrix ptsMatrix(matrix); // create Rcpp wrapper for matrix in SEXP

    // crete numeric vector of appropriate length
    NumericVector distances( (ptsMatrix.nrow()-1)*ptsMatrix.nrow()/2 );
    int k = 0;
    for(int i = 0; i < ptsMatrix.nrow()-1; i++){
        for(int j = i+1; j < ptsMatrix.nrow(); j++){
            distances[k] = calculateDistance(
                ptsMatrix.row(i), ptsMatrix.row(j));
            k++;
        }
    }
    return distances;
}

Test

To test these methods, I ran the program on a 10-dimensional set of points drawn randomly from a normal distribution. The number of points in a single set varied from 1 to 3000. Each run was repeated 5 times and then averaged to obtain more reliable results. The results (number of points vs. time taken by the method) are shown in the following plot. The bars correspond to standard deviation of a given run.

Comparison of different methods computing the distances between points: java – implementation in Java with the help of rJava, cpp – implementation in C++ with the help of Rcpp, java_file – implementation in Java with the use of temporary disk files.

Analysis of the test results

It can be seen than the method that uses C++ is the fastest one. Of course, it was expected since C++ programs are generally more efficient than Java, but that’s not the whole story. Apart from that, the Rcpp package allows to write a more efficient code responsible for communication between the native language and R because in Rcpp we do not copy the whole objects while passing them to the native code (as we do in rJava). Rcpp allows and encourages using its thin and effective wrappers for R objects without unnecessary copying of their contents.

On the other hand, the Java code is more readable for a layman, since it doesn’t use any nonstandard packages to implement the functionality. This is not the case with the C++ version where we use the Rcpp.h header to access R-specific constructs.

What was unexpected in this comparison is that the method based on temporary files is faster for smaller data sets. This is probably due to expensive but hidden bookkeeping connected with the rJava package.

Source code

The source code of these methods along with a makefile that builds appropriate packages and installs them in the system and the R code that generates the plot is placed here:

The program was tested with R 2.12.1 on Ubuntu 11.04 system. The code is written in such way that it should be pretty easy to use it as a template for R connection code when implementing other algorithms in Java or C++.

This entry was posted in R and tagged , , . Bookmark the permalink.

6 Responses to Comparison of application of Rcpp and rJava in R

  1. Paul says:

    Very nice, i suggest Admin can set up a forum, so that we can talk and communicate.
    Nancy

  2. I was very pleased to find this web-site.I wanted to thanks for your time for this wonderful read!! I definitely enjoying every little bit of it and I have you bookmarked to check out new stuff you blog post.

  3. Much appreciated for the information and share!

  4. Nancy says:

    Thanks for the share!
    Nancy.R

  5. Kelley says:

    Hi admin do you need unlimited content for your blog ? What if you could copy
    article from other pages, make it unique and publish on your blog – i know the right tool
    for you, just search in google:
    Loimqua’s article tool

  6. MinnaX says:

    I must say you have hi quality content here.
    Your content should go viral. You need initial boost only.
    How to get massive traffic? Search for: Murgrabia’s
    tools go viral

Leave a Reply to best payday loans Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>