How to convert the first letter of each word to UPPERcase except stop words with PostgreSQL

Last week, I had to convert the first letter of each word to uppercase with PostgreSQL. PG has a very useful builtin function to do it: initcap(). However, it modifies every word of the string or column without flexibility. In my case, I had to convert all of them except some stop words, so initcap() was not enough.

Here is what I did. I’m sharing it because I’ve found a lot of very too long and complicated functions on google. I prefer to keep it simple.

It’s straightforward.

  1. Convert everything with the initcap() function
  2. Create an array of words you don’t want to be converted and for each element,  reconvert to lowercase the word with the value from the array with the replace() function.

You need to be careful to not replace part of a longer word, like “for” and “fortune“. In my example, the words were easy to identify because there were delimited by spaces. You can use a the regexp_replace() function to make sure that only the stop words are replaced if you don’t have obvious delimiters.

Get the creation date of a trello card with R

The creation date of a card is not an attribute directly available with the Trello API. You need to extract it from the object ID. Trello IDs come from MongoDB, and according to Mongo documentation, the first 4-byte value represent the seconds since the Unix epoch. So here is how to get all the cards of a board with their creation date with R.

 

Basically, what needs to be done is

  1. extract the first 8 characters of the ID
  2. prefix it with “0x” to convert it from hexadecimal to decimal with strtoi()
  3. convert the decimal to date. Make sure to specify the origin of unix epoch (1970-01-01)

References

Pentaho Spoon java.lang.RuntimeException: Unable to create the database cache

I got this error while starting Spoon on a PC that gets a lot of blue screens due to crappy hardware. In fact, that problem prevents Spoon from starting.

I don’t know why, but this is the same error at each blue screen: java.lang.RuntimeException: Unable to create the database cache

 

 

To fix the issue, you need to delete the db.cache file.

On my Windows computer, it’s located under C:\Users\patlaf\.kettle\db.cache-5.0.1-stable .

Then you can start Spoon again!

References:

How to get the week number of the year, starting on Sunday with R

I was looking for a way to get the week number of the year, from Sunday to Saturday. I googled it… and I had to visit 5 – 6 sites to get what I was looking for. This is way above my expected patience for that kind of question.

The answer is:

I hope this post will be ranked #1 in Google and you didn’t have to open as many pages like I did.

Query Twitter Streaming API with Pentaho Data-Integration and R

On one of my previous posts (Query Twitter Api With Pentaho PDI), many people asked for a way to use the Twitter Streaming API with Pentaho PDI.  Implementing OAUTH and API calls with Kettle can be very difficult compared to the few lines of code required with many different programming languages. Even if I’m pretty sure it’s possible to do it with native steps , I decided to use R to make the call and parse the results with Kettle. This is way faster and easier. Here is how it works.

Kettle Twitter Streaming API

The Transformation itself is very simple.

  1. The Data Grid step provides the R script name to execute with a few parameters like the researched keyword and the stream timeout.
  2. The Concat Fields step concatenate all parameters into 1 variable (one command line)
  3. The Execute a Process execute the R script that calls the Streaming API.
  4. The result is sent back to Kettle and parsed with the JSON Input Step
  5. Finally, the tweets are saved to a CSV file

Once triggered, the “Execute a process” step executes the R script and uses “sdtout” and “sterr” to share the data. In our case, the R script output a JSON string (the result of the streaming api) to stdout.

This transformation allows the use of any other processes/scripts able to query the streaming API and output  the result to STDOUT. You can find the Transformation and the R script in my github repository.

The R Script

R is a free software programming language and software environment for statistical computing. You can download it on the official website, or it can be found in Ubuntu repository with apt-get.

The script uses the libraries streamR and ROAuth. As their name suggests, streamR provides functions to access Twitter’s filter, sample, and user stream and ROAuth provides generic functions to handle OAuth handshake and signature.

You need to have your consumer key and consumer secret from Twitter. If you don’t already have it, see my previous post.

You need to run the script at least one time before using it with PDI. In R, you can call the script using the source(‘/path/to/script.R’) function. It is required to generate the OAUTH signature.

Kettle Twitter Streaming API OAUTH

The handshake() function will open a page to authorize “the application” to access the data. Then in R , you will be prompted to enter the key that twitter just gave you when you authorized.

Twitter Streaming API Authorize Twitter Streaming API Authorize PINThe generated OAUTH signature will be saved into a file called pdi_R_twitter_oauth.Rdata. The data saved is reused for further api calls, so you don’t need to authorize the application each time. By default, the file is saved in the working directory of R. You can get it with the function getwd().

R getwd function

However, to make it work with Pentaho PDI, you need to find the file and move (or copy) it to your data-integration folder.

Pentaho Data-Integration-Folder

A Quick Recap

  1. Download and install R
  2. Run the script at least one time to generate pdi_R_twitter_oauth.Rdata
  3. Copy pdi_R_twitter_oauth.Rdata to your Pentaho data-integration folder

The PDI Transformation should now be able to use the R script to query the Twitter Streaming API! You can download it in my github repository. Note that it has been developed to work on Linux, but it should work on Windows too !

References:

How to publish an analysis to rpubs.com with RStudio on Windows

It seems a lot of people are facing the same issue while trying to upload a research on rpubs.com using the Publish button in RStudio.

     Error in function (type, msg, asError = TRUE)  : 
     SSL certificate problem, verify that the CA cert is OK. Details:
     error:14090086:SSL      routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
     Calls: rpubsUpload … <Anonymous> -> .postForm -> .Call ->           <Anonymous> -> fun
     Execution halted
That problem mostly affects Windows users. To get rid of it:
  1. Make sure you have the packages “bitops” and “Rcurl” installed
  2. In your RProfile file ( C:\Program Files\R\R-3.1.1\library\base\R) add these 2 lines:

It worked for me!

 

References:

Oracle equivalent of MySQL group_concat

MySQL has a very useful function called GROUP_CONCAT() that concatenate the aggregated values of a group (GROUP BY). With Oracle 11G, you can use listagg to do the same, but Oracle 10G does not have an equivalent function.

If you are still working with Oracle 10G, this is the simplest way I found to concatenate aggregated values.

The separator used CANNOT be present in any of the concatenated values . It can be a problem if you deal with large text. I used “|” because I know it’s a character that cannot be present in my values.

Original Data:

Oracle Group Concat

 

 

 

 

 

Concatenated Data

Oracle Group Concat 2

 

 

 

References:

OBIEE 11G: Refresh user GUIDs

You most probably have to refresh the user GUIDs

  • After a migration from Dev to Prod (or whatever environment)
  • After a server crash
  • After the OBIEE partition got full
  • After having unexplained problems to signing in

This is an expected operation after a migration of the entire catalogue. I don’t really understand why the GUIDs has to be refreshed after a crash (AFAIK, they don’t change), but I had to do it so many times in this situation to be able to log in again.

The task cannot be done using the Weblogic Console neither Weblogic Enterprise Manager. It requires to manually edit 2 config files on the server, bounce the services, remove the edited lines from the config and bounce the services again.

Edit NQSConfig.INI

The file is located in the Oracle BI Server config folder:

/usr/local/OracleBI/instances/instance1/config/OracleBIServerComponent/coreapplication_obis1/NQSConfig.INI

  • Locate the parameter FMW_UPDATE_ROLE_AND_USER_REF_GUIDS and change its value to YES
  • Save and Close the file

Edit instanceconfig.xml

The file is located in the Oracle Presentation Server config  folder:

/usr/local/OracleBI/instances/instance1/config/OracleBIPresentationServicesComponent/coreapplication_obips1/instanceconfig.xml

  • Locate the XML tag <Catalog>
  • Add the following tag within the <Catalog> tag: <UpdateAccountGUIDs>UpdateAndExit</UpdateAccountGUIDs>
  • Save and Close the file

It should looks like:

Now restart all the services. BI Presentation should fail to start but should not throw errors.

You can now undo all the modifications you made to both config files and restart all the services. Et voilà ! You did refresh the user GUIDs.

 

Install RMySQL on Ubuntu

RMySQL is the database interface and MySQL driver for R. If you don’t already have MySQL libraries and try to install it, you could get an error like:

This is because the installation don’t find the required MySQL C API libraries. You can manually download them and specify the path as arguments, but a much simpler way is to use apt-get:

Then, in your R environment you can reinstall the package!

OBIEE 11G: Display hierarchy and columns from the hierachy at the same time

One of the newest feature of 11G is the possibility to expand/collapse hierarchy level (yep.. even in 2014 it seems to be a killer feature…heh!). The problem is you can’t display the hierarchy and the column used to build the hierarchy at the same time with a table. The hierarchy will be OK but the columns will display null value.

Suppose the Hierarchy Year -> Month -> Week -> Day

Add the hierarchy AND the columns used to build it like this:

Hierarchy OBIEE 11g

As you can see, Month and Week are empty. This is caused by the Hierarchy.

 

OBIEE Hierarchy Result

If you need to display the value, there is a hack. You only need to use the _whatever_ function with the columns ! The less intrusive is to append an empty string at the end.
OBIEE Hierarchy Hack

 

et voilà !

HierarchyResult3