The Spotfire Community is moving to TIBCOmmunity and this forum location has closed. During the transition, you can still search the old forums but posting has been disabled. We encourage you to pick up the discussion at the new Spotfire community on TIBCOmmunity.
December 2009 - Posts - Tip of the Week

Tip of the Week

December 2009 - Posts

  • Using Nest and Cross

    Many people are unaware of the Nest and Cross features inside TIBCO Spotfire so this tip is aimed at explaining what they are and how they can be used.

    Let’s assume we have the following survey data loaded in TIBCO Spotfire Professional:



     

    We may want to plot the total number of responses for each answer category (yes, maybe, no) for each question (A,B,C). Using the default settings on a Bar Chart, we can configure it to show this info, as shown below:


     



    This will interact with filtering, so if someone decides to filter out the Answer ‘No’ using the Filter Panel, the result will look like the following:

     

    You can clearly see ‘No’ is removed altogether from the bar chart.  What if you wanted to still include the ‘No’ category to show its there, but filtered out? If you wanted to do this, you would you would click on the ‘Settings’ button inside the X-axis Properties Dialog.


     

    If you change the category mode to ‘Nest’, then this will  show all values in the data set, regardless of whether the filter is applied. This is shown in the Bar Chart below where the ‘No’ category in the Answer column is still filtered out, but it shows up on the x-axis.

     

    Lastly, if you want to show all possible values, you would use the Cross setting.  In this scenario, lets say there is no filtering applied ,and you want to show all possible values (yes, no, maybe), even if there are no responses with that value.

    This is shown in the Bar Chart below. There were no responses for no in Question A, yes in Question B, and maybe in Question C, but the values are shown in the Bar Chart as a possible value asked during the survey

     

    .



    Below are screen shots showing all the three different settings in action when there is no filtering and then when the 'No' values are filtered out.


    No filter


    The 'No' values are filtered out of the 'Answer' Column

     

  • Viewing Data Table Source Information

    One question that I get all the time when people are looking at Spotfire analysis files is “This data does not look like the raw data, what transformations and calculations were applied?” This can be useful if you need to keep a log or record of all actions applied to the data, or if you just want to understand the steps performed in order to replicate them on similar data.
    TIBCO Spotfire captures the ‘Source Information’ for each Data Table. This ‘Source Information’ acts as a recipe of how the Data Table was generated, started with loading the raw data, merging in any columns and rows, applying any transformations, and then adding any calculated columns,  and any other manipulations on the data table , like renaming or freezing columns.

    To access the Data Table properties from within TIBCO Spotfire Professional, click on Edit > Data Table Properties.
     
    Click on a Data Table and then select the ‘Source Information’ tab.

    Publish

  • Transforming Data with Regular Expressions

    Did you know TIBCO Spotfire Professional has built in regular expression capabilities that can be used to validate and clean your data either before or after it has been loaded?

    Data quality is a big issue with all customers.  Whether the data quality is an issue with customers merging data from different sources that use different standards, or whether the data is coming straight from an instrument or a public forum, like a survey, the data needs to be ‘standardized’ before it is analyzed in TIBCO Spotfire.

    This standardization can be accomplished in TIBCO Spotfire by using the RxReplace function inside an expression (either in the Calculated Column UI, or the Custom Expression UI).

    RxReplace is a Regular Expression Replace function which allows you to search for a pattern in your data, and then replace it with another pattern.  For example, search for all dates (in the form dd/mm/yyyy) and replace them with the pattern mm/dd/yyyy)

    We will see this example and a few other common ones below.

    Removing New Line Characters

    The first example is one that happens all the time in training. We allow people to load their own data and they will typically load it from Excel.  Many Excel data cells have carriage returns in them , as shown below.



    When importing this into Spotfire it imports the carriage returns as an ASCII special character, which makes it hard to read (as shown below in both the header of a table visualization and in a properties dialog).



     

    If the columns are string based, it will not affect the analysis too much, but it can be an annoyance to look at in general. To fix it, when loading data, we can apply a ‘replace column’ transformation.

    We can use the RxReplace function, passing in the original column, searching for the carriage return (identified as  ‘\n’), and replacing it with an empty string.

     

    This will input the values in an easier to view format.



    Converting Date Formats

    Assume you are loading data from Europe that will be merged with data from US. The date formats are different. One is DD/MM/YYYY and the other is MM/DD/YYYY. What do you do? Assuming your data is in the format show below, DD/MM/YYYY, you can use the RxReplace method shown below to transform it to MM/DD/YYYY.



    Without getting into too many details on the Regular Expression syntax, the pattern we are looking for is a string that starts with 2 digits (note that the \ character needs to be escaped in Spotfire’s expression language) following by a / character, then 2 more digits, followed by another / character, and then 4 digits at the end of the string. When found, it will be replaced with the month ($2), followed by the day ($1), and then the year ($3), all separated by the / character.

    And the result is just what we want. Below you can see the original column and then the updated column after the expression has been executed.

     



             
    Converting Name Formats

    A third example would be one I got from Brian Prather.  Assume you are in a situation where one data table had a “Names” column that was formatted as <Last name, First name> and a second data table had a “Names” column that was formatted as <First name Last name>. If you need to merge these into Spotfire, you will need to perform a regular expression transformation on one of them to standardize on one format.

    Assuming we want to format into <First name Last name>, we could use the following expression using the RxReplace method:


     

    In this expression, we want to search for a string that starts with any amount of characters followed by a comma and then any amount of characters. This will match any names in the format <last name, first name>.  When found, it will be replaced with the last name ($2) followed by the first name ($1).  This is since grouping was used inside the pattern.


    And the result is just what we want. Below you can see the original column and then the updated column after the expression has been executed.




    This tip is not meant to be a tip teaching regular expression, you can spend months on that. There are plenty of good resources on the internet to help you out. This tip was meant to show you the power of using them to transform, clean, and standardize your data either before you load it in Spotfire, either as a new column in the data table, or as a replacement for an original column in the data table (as shown below).

     



    If you are intersted in learning more about regular expressions and other types of data transformations, please have a look at our Data Cleaning and Transformation techniques TIBCO Spotfire Professional webinar.

    Have any other cool Regular expressions to share? Reply to this post with them.

     

  • Showing Distinct Rows in your Data Table

    One of the most commonly asked questions I get in training is how to show distinct rows inside a Data Table.  People need the full set of rows for most of the analysis, but may also want to be able to display a table with just distinct values.

    Assume you have a simple data set like the following:

    And in at least one location, you want to show only distinct rows, as shown below:

    What you need to do is create a new Data Table , which takes your original Data Table as the source and then pivots it.

    To start, you goto File > Add Data Table.

     

     

    You will then choose the source type to be ‘Existing Data Table in my Analysis’ (shown as Step 1 in the figure above), then if you have more than one Data Table loaded, make sure you select the right Data Table (shown in Step 2 in the figure above), and then select the  ‘Pivot’ Transformation and click ‘Add…’ (shown in Step 3 in the figure above). The Pivot Data dialog will appear.

     

    The ‘Row identifier’ should be the first column in the data set, (or the key column).

    The ‘Column titles’ and ‘Values and aggregation methods’ should both be ‘None’.

    The ‘Transfer columns and aggregation methods’ should list all the remaining columns in the original Data Table, with the expression set to UniqueConcatentate for all columns.

    You can then set the ‘Transfer column naming pattern’ to  %T (which just references the Column name and no aggregation method).

    What this does is not really pivot the data, rather we use the pivot tool to transfer over all the columns in our original Data Table, but we apply a UniqueConcatenate aggregation to all the transfer columns. This will create the distinct effect we are looking for.

     
    When you are done, you can click ‘OK’ to close the Pivot Data dialog and then ‘Finish’ to close the Add Data Table dialog.

    When you are done, you will see a new data table added that has just distinct row.

    You can also add a Data Relation so that the two Data Tables are related. This will allow users to mark rows in one of the data tables , and also mark related rows in the second Data Table. This is shown below where the user marked a row in the ‘Distinct’ Data Table, and then the related rows were also marked in the ‘Original Data Table’


Syndication

Other Spotfire Blogs

©Copyright 2000-2011 TIBCO Software Inc | Privacy Policy | Terms of Use I Blog I Contact Us I Content Center