Project: Wunderlist/SQList

January 22, 2009

Change of project name and method:

It has been quite a while since I last decided to put something down about the projects I have been working on.  The Wunderlist, which was a java & SQL system, has been put on hold indefinately.  While it would have been easier to do the migration of our labs data to Biofortis’ Labmatrix system from a java base, it was impractical as an intermediate step.  As such, the goal of the project has been preserved, but the implementation has been changed.  The new project is SQList.

The SQList in the current incarnation is a mySQL database that is a copy of the data contained in the monsterous Excel spreadsheet, “The LIST.”  The data is migrated to the database by a C/C++ application titled, “wxSQList++.”  This system is currently used solely by myself, but will soon be migrated off my local machine to a space hosted by the NCIweb group.

The devil and some details of the mySQL schema that is used:

  • Schema’s name is currently “lab.”
    I intend on changing this to something such as TILLL (TIL Lab List), TheList, wList, or CG9.  It occurs to me that the naming really does not matter as any name choosen will not retain the original meaning beyond a year or two.  After that, the name will simply be a name.
  • The Schema tables are:
    norminv:  Listing of the well formed biomaterial entries.  This holds the majority of the data.
    incoming: A copy of the entries being held in the Incoming sheet of The List.  These entries are manually examine prior to being merged with the main body of the normal inventory.
    abinv: Listing of the entries from The List that could not be parsed well or attributed to a subject.
    patients:  Listing of our subjects.  Having this table removes the need to repeat this information for each biomaterial entry that is attributed to a subject.

    controlsubjects: Listing of subjects that have materials that have been selected to be used as controls for assays (ie: Coculture IFNγ Release ELISA)
  • Data Source:
    The wxSQList++ application is an interim tool that drops, creates, repopulates, and cleans all the tables in the schema from the information currently held in The List.  As such, the database is a snap shot of the inventory status and not a real time representation.

A bit about the wxSQList++ application:

This is a text only tool written in C\C++ that is designed for my use only.  I intend to replace it with a much more robust read and backup tool.  It’s purpose is to read the data out of The List excel spreadsheets and create a database representation of the information.

Using Java to get a string into BigDecimal format.

One of the difficulties of this C/Vial column is that there are two formats to be considered. The first describes the number of cells and looks like, “2.00E+6,” while the second describes a volume of serum per vial in the form like, “1.50 mL.”

//set up components
java.text.DecimalFormat decfmtCellConcen = new java.text.DecimalFormat(”###.00E+##”);
java.text.DecimalFromat decfmtSerumConcen = new java.text.DecimalFormat(”##.
decfmtCellConcen.setParseBigDecimal(true);
java.math.BigDecimal bigdecConcen;

//parse concentration column
try{

bigdecConcen= (BigDecimal) decfmtCellConcen .parse(spLine[12]);
}catch(Exception e){
try{
bigdecConcen = (BigDecimal) decfmtCellConcen.parss(spLine[12]);
}catch(Exception x){ myLogger.debug(x.toString()); }
}

Monkey wrenches in parsing compliments of TheLIST.

One of the best part of dealing with a bloated spreadsheet is that sometimes random crap can creep its way into columns it was never supposed to be in. An empty cell or a ‘?’ in a cell is understandable, but finding the word “blue” in the Concentration per Vial column just illustrates how screwed up the data can get. In order to get around this it seems like this string might have to be parsed twice, or simply handle the exception thrown when something unexpected shows up.

Need to get the date strings from The LIST spreadsheet into a SQL database

The object here is to return the long representation of the date such that it can be wrapped into a java.sql.Date constructor. So ideally, inside the logic of parsing the lines from The LIST we can simply do java.sql.Date( parseListDate( stringFromTheLIST ) ); and insert this into the sql database.

import java.text.SimpleDateFormat
import java.text.DateFormat
public long parseListDate( String dateStr ) Throws ParseException{
SimpleDateFormat simpDateFmt = new SimpleDateFormat("MM/dd/yyyy");
return( simpDateFmt.parse(dateStr).getTime());
}

Of course, the SQL database representation is simply an intermediate for cleaning the data prior to the final import into the proprietary labmatrix database. This will also provide a method for keeping track of which rows were unable to be imported for reasons of confusing, corrupted, or lacking data.

Projects List

May 14, 2008

  • WunderList

    The first object of this project is to Importing patient and biological sample data out of a massive and irregular spreadsheet.  Efficiently parsing 66k rows by 30 columns into a more representational model is a huge undertaking, because I will also have to provide an interface for data input and queries.

    The second sub project is to develop an efficient user interface(s) for daily data input and useful queries.  This needs to be intuitive to the users of the current system, so the useful parts of the look and feel of the old interface (Exhell) will likely be mimicked.

  • Predictive Artificial Neural Network

    Designing the middle layers of an ANN intelligently would be a futile task.  The greatest minds in immunology haven’t quite figured out which combinations of patient attributes will be predictive of response to treatment.  As such, the alternative to intelligent design is using an evolutionary process for arriving at a useful predictive model.

    This project will require methods for: collecting variables about patient & treatment, generating middle layer functions, generating weighting middle layer, model summation by thresholding, feedback to thresholding evaluation layer, model accuracy vs. training data.

  • Cryogenic Preservation of Human Lymphocytes

    Examination of various cryoprotectants and freezing processes on both transduced and untransduced lymphocytes from PBMC.  These experiments are relatively straight forward.  Count cells before preservation, and then afterwards.  However, there are many nuances to this simple method to consider:  counting method, number of counts, when to count, standards between experiments, analysis of data, and so on.