Get the creation date of a trello card with R

The creation date of a card is not an attribute directly available with the Trello API. You need to extract it from the object ID. Trello IDs come from MongoDB, and according to Mongo documentation, the first 4-byte value represent the seconds since the Unix epoch. So here is how to get all the cards of a board with their creation date with R.

 

Basically, what needs to be done is

  1. extract the first 8 characters of the ID
  2. prefix it with “0x” to convert it from hexadecimal to decimal with strtoi()
  3. convert the decimal to date. Make sure to specify the origin of unix epoch (1970-01-01)

References

How to get the week number of the year, starting on Sunday with R

I was looking for a way to get the week number of the year, from Sunday to Saturday. I googled it… and I had to visit 5 – 6 sites to get what I was looking for. This is way above my expected patience for that kind of question.

The answer is:

I hope this post will be ranked #1 in Google and you didn’t have to open as many pages like I did.

Query Twitter Streaming API with Pentaho Data-Integration and R

On one of my previous posts (Query Twitter Api With Pentaho PDI), many people asked for a way to use the Twitter Streaming API with Pentaho PDI.  Implementing OAUTH and API calls with Kettle can be very difficult compared to the few lines of code required with many different programming languages. Even if I’m pretty sure it’s possible to do it with native steps , I decided to use R to make the call and parse the results with Kettle. This is way faster and easier. Here is how it works.

Kettle Twitter Streaming API

The Transformation itself is very simple.

  1. The Data Grid step provides the R script name to execute with a few parameters like the researched keyword and the stream timeout.
  2. The Concat Fields step concatenate all parameters into 1 variable (one command line)
  3. The Execute a Process execute the R script that calls the Streaming API.
  4. The result is sent back to Kettle and parsed with the JSON Input Step
  5. Finally, the tweets are saved to a CSV file

Once triggered, the “Execute a process” step executes the R script and uses “sdtout” and “sterr” to share the data. In our case, the R script output a JSON string (the result of the streaming api) to stdout.

This transformation allows the use of any other processes/scripts able to query the streaming API and output  the result to STDOUT. You can find the Transformation and the R script in my github repository.

The R Script

R is a free software programming language and software environment for statistical computing. You can download it on the official website, or it can be found in Ubuntu repository with apt-get.

The script uses the libraries streamR and ROAuth. As their name suggests, streamR provides functions to access Twitter’s filter, sample, and user stream and ROAuth provides generic functions to handle OAuth handshake and signature.

You need to have your consumer key and consumer secret from Twitter. If you don’t already have it, see my previous post.

You need to run the script at least one time before using it with PDI. In R, you can call the script using the source(‘/path/to/script.R’) function. It is required to generate the OAUTH signature.

Kettle Twitter Streaming API OAUTH

The handshake() function will open a page to authorize “the application” to access the data. Then in R , you will be prompted to enter the key that twitter just gave you when you authorized.

Twitter Streaming API Authorize Twitter Streaming API Authorize PINThe generated OAUTH signature will be saved into a file called pdi_R_twitter_oauth.Rdata. The data saved is reused for further api calls, so you don’t need to authorize the application each time. By default, the file is saved in the working directory of R. You can get it with the function getwd().

R getwd function

However, to make it work with Pentaho PDI, you need to find the file and move (or copy) it to your data-integration folder.

Pentaho Data-Integration-Folder

A Quick Recap

  1. Download and install R
  2. Run the script at least one time to generate pdi_R_twitter_oauth.Rdata
  3. Copy pdi_R_twitter_oauth.Rdata to your Pentaho data-integration folder

The PDI Transformation should now be able to use the R script to query the Twitter Streaming API! You can download it in my github repository. Note that it has been developed to work on Linux, but it should work on Windows too !

References:

How to publish an analysis to rpubs.com with RStudio on Windows

It seems a lot of people are facing the same issue while trying to upload a research on rpubs.com using the Publish button in RStudio.

     Error in function (type, msg, asError = TRUE)  : 
     SSL certificate problem, verify that the CA cert is OK. Details:
     error:14090086:SSL      routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
     Calls: rpubsUpload … <Anonymous> -> .postForm -> .Call ->           <Anonymous> -> fun
     Execution halted
That problem mostly affects Windows users. To get rid of it:
  1. Make sure you have the packages “bitops” and “Rcurl” installed
  2. In your RProfile file ( C:\Program Files\R\R-3.1.1\library\base\R) add these 2 lines:

It worked for me!

 

References:

Install RMySQL on Ubuntu

RMySQL is the database interface and MySQL driver for R. If you don’t already have MySQL libraries and try to install it, you could get an error like:

This is because the installation don’t find the required MySQL C API libraries. You can manually download them and specify the path as arguments, but a much simpler way is to use apt-get:

Then, in your R environment you can reinstall the package!