Python Importer

In this guided example, we will show you how to get started with the iolite Python 3 API, and in particular how to write an import module. We’ll be using an Agilent file as it has the simplest format, but you can extend the concepts shown here for any file format, including your own file formats.

To begin with, please download the example file and open it in a text editor (but not Excel as it changes time stamps etc). It should look something like this:

iolite python importer example agilent file contents

There are four header lines (lines 1 to 4) that include the file path of the raw data file (line 1); a line describing what the file contains (intensity vs time, in CPS/Counts; line 2); the datetime stamp, along with the batch information (line 3); and the column headers (line 4).

The most basic iolite python import module just needs to create create a time array (by ‘array’ here I mean a column of data values, sometimes also called a vector of data: think of a column of data in an Excel spreadsheet), and an array of values for each channel to be loaded. The time values in the time array as in seconds since the Unix epoch (also known as Unix Time). Most mass spec files record the sample start time and date in the headers (the “timestamp”), and then record the time offset for each measurement. We need to convert the timestamp into seconds since the Epoch, and then add this to the offset seconds.

The importer should also create a File object (so that a File appears in the Files Browser), and it’s also handy to create a Sample from the file (which will appear in the Samples Browser). A good importer also lets the user know what’s happening with progress updates, and error messages if something goes wrong. We’ll include those in the our importer too. You can also include debug statements (messages that are to help you the programmer work out where/when things go wrong).

An importer has three required functions:

  • accepted_files(): lets iolite know what file extensions this importer accepts (e.g. “.csv”)
  • correct_format(): checks a given file to make sure that the importer can load the file (usually by looking for a key piece of text within the file)
  • import_data(): the function that actually imports the data and adds it to your Session

You can also have any number of your own functions, but the three above are the minimum required.

iolite v4 comes with many site packages for python that you can use in your scripts. For this very simple importer, we’re just going to use Pandas, Python regular expressions and Python DateTimes.

There is also some metadata at the top of each iolite python script. This metadata tells iolite what the module is called, what type of script it is etc. Now lets get started writing some code. I suggest opening a new file in your favourite text editor (I use the free Atom text editor, but NotePad, TextEdit, etc are all ok too).

Let’s start with our script metadata. Copy and paste this into your new file:

# A python-based importer for iolite 4 starts with some metadata
#/ Name: example agilent importer
#/ Type: Importer
#/ Authors: Joe Petrus and Bence Paul
#/ Description: An importer for loading Agilent .csv files
#/ References: None
#/ Version: 1.0
#/ Contact: support@iolite-software.com

You can put whatever values into this lines, as long as you keep the field names (e.g. Name: is a field etc) the same.

Now we’ll import pandas, python regular expressions, and python datetimes:

import pandas as pd
import re
from datetime import datetime

accepted_files() function

We will start with our first function accepted_files(). Our importer will open .csv files, so this function just needs to return “.csv”:

def accepted_files():
    return ".csv"

correct_format() function

To determine if a file can be opened by our importer, we need to define the correct_format() function. This function returns True is the file meets our conditions, otherwise it returns False. Our function will check to make sure that the file ends with .csv, and then it will read through the file looking for the words “Intensity Vs Time,CPS” which is usually the second line in an Agilent csv file. We’ve also included four debug statements to let you know that iolite checked the file:

IoLog.debug("Checking within .csv file...")

and whether the “Intensity Vs Time,CPS” line was found:

IoLog.debug(".csv file matched the Agilent regex.")

We could probably remove these lines once we’re happy the function is working. Our function will check that the file name ends in .csv, and then it will use a regular expression r”Intensity Vs Time,CPS” to look for that text within the file. If it finds that text, it returns True to show that this is a file that the importer can load, or if it can’t find that text, it returns False:

def correct_format():

    IoLog.debug("correct_format for example Agilent importer called on file = %s"%(importer.fileName))

    if importer.fileName.endswith('csv'):
        IoLog.debug("Checking within .csv file...")
        agilent_regex = r"Intensity Vs Time,CPS"
        with open(importer.fileName) as f:
            match = re.search(agilent_regex, f.read(), re.MULTILINE)
            if match:
                IoLog.debug(".csv file matched the Agilent regex.")
                return True
            else:
                IoLog.debug(".csv file did not match the Agilent regex.")

    return False

Now might be a good time to save your file. You can call it anything you like, as long as the file extension is .py. We’ll move the file location later on, but this save is just to make sure we don’t lose what we’ve done so far.

import_data() function

Now that we have the two easy functions out of the way (accepted_files() and correct_format()), we can make a start on import_data(). The import module has a few member functions such as fileName to get the full path of the file the importer is trying to load; finished() to let iolite know that the importer has finished, progress(100) to indicate in the progress indicator that will appear while the script is running how much progress we’ve made (between 0 and 100), and message(“my message here”) to show a message in the progress indicator while the script is being run. You can see in the code examples of how each of those is used.

There are also calls to iolite’s main Data Manager using data. In this example we use:

data.addDataToInput()
data.calculateTotalBeam()
data.createFileSampleMetadata()
data.createImportedFileMetadata()

You can get more info about each of these functions (and others) in the Python API.

To start off with, we’ll print a debug message so that we can be sure this function was called on our file:

def import_data():
  IoLog.debug("import_data in example Agilent importer called on file = %s"%(importer.fileName))

Next we’ll find the timestamp. In this example, we’re assuming the format of the timestamp is always fixed. In our case the line containing our timestamp looks like this:

Acquired : 15/02/2017 5:50:50 PM using Batch EMS_20170215_test.b

The timestamp has the format dd/MM/yyyy I:mm:ss. Note that in this case, the hour is represented by I because it has no leading 0 and is represented by a single digit. There is more info about the format of python timestamps here: Python strptime . We extract the timestamp with a very simple regular expression that captures everything between “Acquired” and “using” :

time_regex = r"Acquired\W+(.*)\W+using"

For your own regular expressions, I recommend using a real-time helper, such as Regex101.com. Here is what the code that finds our timestamp looks like, along with the code for what happens if it doesn’t find a match:

#find the date and time using a regular expression:
  time_regex = r"Acquired\W+(\d+)\/(\d+)\/(\d+)\W+(\d+):(\d+):([\d.]+)\W([AaPpMm]+)\W+using"
  with open(importer.fileName) as f:
      file_contents = f.read()
      match_time = re.search(time_regex, file_contents, re.MULTILINE)

  if not match_time:
      IoLog.error("Couldn't find a match for the date time stamp")
      importer.message('Error during import')
      importer.progress(100)
      importer.finished()
      return

If we don’t find a match for our regular expression, we log an error message to the Messages View of iolite, and we also add a message to the Progress Indicator to let the user know (without having to check the Messages View) that something went wrong during our import. Finally, we let iolite know that the importer is finished with importer.finished() and then return (because we can’t do anything more without the timestamp).

Now we can convert our timestamp to a python datetime object. We’ll include a debug statement to check that we captured the timestamp correctly, and another debug statement to confirm that we converted the timestamp correctly. We’ll also record the number of seconds since Epoch so that we can add it to the seconds elapsed in our file later.

IoLog.debug(f"Here is our timestamp: {match_time.group(1)}")
start_time = datetime.strptime(match_time.group(1), '%d/%m/%Y %I:%M:%S %p')
start_time_in_s = start_time.timestamp()
IoLog.debug("Start time is: " + start_time.strftime('%Y-%m-%d %H:%M:%S'))

Now we’ll use Pandas to read the csv file. If your file is any other file type, there are many different python packages for reading files if Pandas is not appropriate. In this case, we’re using Pandas’ read_csv() which you can read more about here:

Pandas read_csv()

In our case, we’re skipping the first 3 lines because they are header data, and then we’re reading the column names from line 4. We’re also skipping the lines at the bottom of the file. This will create a DataFrame that we can then import into iolite.

df = pd.read_csv(importer.fileName, skiprows=3, skipfooter=1, header=0, engine='python')

The columns in our DataFrame will look like this:

iolite importer dataframe example image

The first column, called ‘Time [Sec]’, is the seconds elapsed column. We will add our start time in seconds to this column:

#add start time in seconds to time column:
df['Time [Sec]'] = df['Time [Sec]'].add(start_time_in_s)

Now we can add the data to iolite’s Data Manager:

for column in list(df.columns[1:]):
    data.addDataToInput(column, df['Time [Sec]'].values, df[column].values, {"machineName": "My Agilent"})

And we can finish the import by recalculating TotalBeam, which will add all our channels together to get a total signal, and it will also cause iolite to update the user interface. We’ll also create a Sample and a File object for this file (see the API for more details on these functions):

# Now calculate Total Beam:
data.calculateTotalBeam()

#Create a File and Sample object for this file to show in the Files Browser and Samples Browser
data.createFileSampleMetadata("sample_name", start_time, datetime.fromtimestamp(df['Time [Sec]'].iloc[-1]), importer.fileName)
data.createImportedFileMetadata(start_time, datetime.fromtimestamp(df['Time [Sec]'].iloc[-1]), importer.fileName, datetime.now(), len(df.index), list(df.columns[1:]))

And to finish, we’ll let the user know that the import was successful:

importer.message('Finished')
importer.progress(100)
importer.finished()

And our importer is now finished. Here is the importer in its entirety:

# A python-based importer for iolite 4 starts with some metadata
#/ Name: example agilent importer
#/ Type: Importer
#/ Authors: Joe Petrus and Bence Paul
#/ Description: An importer for loading Agilent .csv files
#/ References: None
#/ Version: 1.0
#/ Contact: support@iolite-software.com

"""
see intro.py for explanation of functions
"""

import pandas as pd
import re
from datetime import datetime

def accepted_files():
    return ".csv"

def correct_format():

    IoLog.debug("correct_format for example Agilent importer called on file = %s"%(importer.fileName))

    if importer.fileName.endswith('csv'):
        IoLog.debug("Checking within .csv file...")
        agilent_regex = r"Intensity Vs Time,CPS"
        with open(importer.fileName) as f:
            match = re.search(agilent_regex, f.read(), re.MULTILINE)
            if match:
                IoLog.debug(".csv file matched the Agilent regex.")
                return True
            else:
                IoLog.debug(".csv file did not match the Agilent regex.")

    return False


def import_data():

    IoLog.debug("import_data in example Agilent importer called on file = %s"%(importer.fileName))

    #find the date and time using a regular expression:
    time_regex = r"Acquired\W+(.*)\W+using"
    with open(importer.fileName) as f:
        file_contents = f.read()
        match_time = re.search(time_regex, file_contents, re.MULTILINE)

    if not match_time:
        IoLog.error("Couldn't find a match for the date time stamp")
        importer.message('Error during import')
        importer.progress(100)
        importer.finished()
        return

    IoLog.debug(f"Here is our timestamp: {match_time.group(1)}")
    start_time = datetime.strptime(match_time.group(1), '%d/%m/%Y %I:%M:%S %p')
    start_time_in_s = start_time.timestamp()
    IoLog.debug("Start time is: " + start_time.strftime('%Y-%m-%d %H:%M:%S'))

    df = pd.read_csv(importer.fileName, skiprows=3, skipfooter=1, header=0, engine='python')

    #add start time in seconds to time column:
    df['Time [Sec]'] = df['Time [Sec]'].add(start_time_in_s)

    for column in list(df.columns[1:]):
        data.addDataToInput(column, df['Time [Sec]'].values, df[column].values, {"machineName": "My Agilent"})

    # Now calculate Total Beam:
    data.calculateTotalBeam()

    #Create a File and Sample object for this file to show in the Files Browser and Samples Browser
    data.createFileSampleMetadata("sample_name", start_time, datetime.fromtimestamp(df['Time [Sec]'].iloc[-1]), importer.fileName)
    data.createImportedFileMetadata(start_time, datetime.fromtimestamp(df['Time [Sec]'].iloc[-1]), importer.fileName, datetime.now(), len(df.index), list(df.columns[1:]))

    importer.message('Finished')
    importer.progress(100)
    importer.finished()

We need to put the script in a place where iolite can find it. In iolite’s Preferences, in the Paths section, you can set the folder that iolite looks for import scripts in (the “Importers” field). Click on the folder icon to select the folder that your script is in.

Now we need to go to iolite’s plugin manager and disable to builtin Agilent importer, because it will try to import the file instead of our custom importer. Go to the Tools menu, and select Plugins -> Manage. Search for the Agilent importer using the Search Field in the top left. Select the AgilentImporter and click on the Disable button where is shows the Status of the plugin. You will need to restart iolite now.

Once iolite restarts, go back to the Plugins Manager (Tools -> Plugins -> Manage) and check that the builtin Agilent importer is disabled. Now, search for your custom Agilent importer (if you kept the default name above, it should be called “example agilent importer”). The Search Field is case sensitive, so if you type “example” or “agilent” it should come up (typing “Agilent” won’t produce a match because of the capital “A”). If your importer is not in the list, check that you have specified the correct folder in Preferences.

If your importer is listed in the Plugins Manager, you can now give it try. Click on the Import button in the Files Browser, and select the example file you downloaded at the beginning of this tutorial. It should load, and you should now be able to see the file in the Files Browser, the Sample in the Samples Browser, and your data in the Time Series View.