Introduction

Process your raw data, without cooking it!

Sushi Fabric is defined as
  • S equence analysis of U seless S taff with H ighly I ntegrated Fabric
    or
  • S uper U seful S ystem for HI gh-throughput data Fabric
    or
  • S uper U ltra S pecial H yper I ncredible Fabric

Namely, it is not decided yet. Please send an email to if you have better definition.

Edit

Modules

There are 3 modules.
  1. workflow_manager (gem): manage jobs and cluster environment, checking a running job and submitting a job
  2. sushi_fabric (gem): Super class for a specific application, and provide primitive functions communicating with workflow_manager via DRuby
  3. SushiFabric (Ruby on Rails): provide GUI, calling sushi_fabric functions
TODO
  • UML diagrams
Edit

Dependencies

  • Ruby 1.9.3
  • Ruby on Rails (~> 3.2.9)
  • libsqlite3-dev (to build the gem)
  • libxml2-dev (to build a gem)
  • libxslt1-dev (to build a gem)
  • node.js (installed from source, http://nodejs.org/)
Note
  • Ruby 1.8.7 does not work
  • Ruby 2.1 is not checked yet
  • More details for gem libraries refer to Gemfile.lock
Edit

Installation

Edit

Steps

  1. (Install Ruby)
  2. Install bundle (gem)
  3. Install workflow_manager (gem)
  4. Install sushi_fabric (gem)
  5. Download SushiFabric.tgz (Ruby on Rails application)
  6. Install required libraries (bundle install)
  7. Setup database (rake db:migrate)
  8. Run (rails server)
Edit

Install Ruby 1.9

Check if Ruby 1.9.3 is installed on your computer

$ ruby -v

If Ruby is NOT installed or Ruby version is under 1.9, then you should download and install Ruby from

Apple: Needs XCode or at least the command-line tools found here: https://developer.apple.com/downloads/

Typical commands to install Ruby

$ wget http://cache.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p484.tar.gz
$ tar zxvf ruby-1.9.3-p484.tar.gz
$ cd ruby-1.9.3-p484
$ ./configure
$ make
$ sudo make install

Note
  • Usually Ruby interpreter is installed in /usr/local/bin
  • If you want to change the installation directory, set --prefix option to ./configure as follows
$ ./configure --prefix=$HOME/bin/ruby-1.9.3
$ make
$ make install
Edit

Install bundle

gem install bundle
Note
  • gem is a Ruby library manager, and it downloads and installs a library from http://rubygems.org/
  • bundle is a Ruby library version manager that manages specific library versions for an application
  • All gem installations may need sudo permissions (if you installed Ruby in a system directory but you can change the gem directory with setting GEM_HOME environmental variable. see below.)

Gem library installtion directory can be set with the environment variable:

GEM_HOME = $HOME/.gems

Edit

Install workflow_manager

$ gem install workflow_manager

Run

$ workflow_manager
mode = development
druby://localhost:12345

Note
  • Ignore the warning:
    /usr/local/lib/ruby/1.9.1/yaml.rb:84:in `<top (required)>':
    It seems your ruby installation is missing psych (for YAML output).
    To eliminate this warning, please install libyaml and reinstall your ruby.
    
Edit

Install sushi_fabric (core code of SushiFabric)

$ gem install sushi_fabric

That's it.

Edit

Download SushiFabric (Ruby on Rails code)

$ wget http://fgcz-sushi.uzh.ch//SushiFabric_20131201.tgz

Tar and bundle install

$ tar zxvf SushiFabric_20131201.tgz
$ cd SushiFabric
$ bundle install

Edit

Set up database for Ruby on Rails

Go to the SushiFabric directory, then

$ bundle exec rake db:migrate

Start Ruby on Rails server

$ rails server

Note
  • And then, try to access http://localhost:3000
  • If you can see the top menu of SushiFabric, it scceeded in installing.
  • In order to stop it, Ctrl + C
Edit

Configuration

There are two configuration files, one for workflow_manager and one for sushi_fabric

Configuration file
  • config/environments/development.rb
#!/usr/bin/env ruby
# encoding: utf-8

WorkflowManager::Server.configure do |config|
  config.log_dir = 'logs'
  config.db_dir = 'dbs'
  config.interval = 30
  config.resubmit = 0
  config.cluster = WorkflowManager::LocalComputer.new('local_computer')
end
Note
  • A new cluster class should inherit WorkflowManager::Cluster Class and define (overwrite) the following 4 methods:
  • In more detail, you can refer to $GEM_HOME/**/workflow_manager/lib/workflow_manager/cluster.rb

Cluster Class

  class Cluster
    def submit_job(script_file, script_content, option='')
    end
    def job_running?(job_id)
    end
    def job_ends?(log_file)
    end
    def copy_commands(org_dir, dest_parent_dir)
    end
  end

Default sushi_fabric parameters (config/environments/development.rb)

SushiFabric::Application.configure do
  # sushi_fabric
  config.workflow_manager = 'druby://localhost:12345'
  config.gstore_dir = File.join(Dir.pwd, 'public/gstore/projects')
  config.sushi_app_dir = Dir.pwd
  config.scratch_dir = '/tmp/scratch'
end

Note
  • If you want to change the parameters, make config/environments/development.rb like above.
  • workflow_manager and sushi_fabric configurations are totally independent.
  • Usually, Ruby on Rails has a configuration file with the same file name (in development mode), and you can add the above 4 additional parameters in the file.
Edit

Usage

Top Menu {img src="img/wiki_up/top_menu.png" }

  • Project: You can select a project where you are belonging
  • DataSet: You can see DataSets
  • Import DataSet: Import DataSet from .tsv file
  • Run Application: Submit a job
  • Check Jobs: Show job list
  • gStore: Browsing gstore directory
Edit

Run Application

First, select input DataSet and Application.
Second, set SGE job parameters and application parameters.
Then clicking the submit button, the job will be submitted to SGE via workflow manager.

After the job running,
  1. A new DataSet
  2. Parameter set
  3. Logs

are generated automattically in a directory under gStore directory.

The new DataSet can be an input for the other application.

Edit

Job monitoring

All submitted jobs are listed with running status: success, failure, or re-submitted.
Submitted time and finish time are also shown.

Edit

Command interface

All sushi functions are available also on command line.

Edit

Workflow manager

Edit

Abstract

  • A console based server-client application
  • For each (g-sub, qsub) submittion, a new thread is created and it will be monitoring the job once in 30 seconds (default, you can modify it by changing the constant INTERVAL = 30 in the source code).
  • If a job fails (the case that it does not reach at the end of script), it submits it again automatically (at maximum 2 times, it also can be modified in the source code RESUBMIT = 2)
  • It logs when it starts and ends and who submits a job, and the submitted scripts and standard outputs and errors.

Server (daemon)

$ ruby workflow_manager.rb -h
Version: 20130228-170033
Usage:
 workflow_manager.rb [druby://host:port]

  host: you should select the server where you start workflow_manager.rb
  port: whatever that is free
  default: druby://localhost:12345

Note
  • It automatically creates 'logs' directory. All the scripts and logs will be saved in the directory.
  • Some *.kch files will also be created. Those files are binary files that are DB files of KyotoCabinet.
  • The URI should be the same when you use the client commands (wfm_*).
  • If the same URI has been already used by somebody, it does not start.
  • If workflow manager stops for some reason while the monitoring job is running, the job status will never change because the monitoring thread is gone away.
  • At the moment, there is no command to assign a thread to monitor a submitted job (This would be a future feature).
  • I want a function to send an email when a monitored job finishs (this function will be added soon)

Example

start: 
  workflow_manager
kill: 
  ctl-c

Client (command)

wfm_monitoring job_script.sh
wfm_job_list
wfm_status job_id
wfm_get_log job_id
wfm_get_job_script job_id

Note
  • The client commands use the url druby://localhost:12345 as a default
  • If you start the server with the other URI, you have to add the URI after the client command as an argument like
    wfm_job_list druby://hogehoge:777
    
  • But if you have .wfmrc file in your home directory or the current directory where you run the client command as follows, the server URI in the file is used from both server and client without the URL argument.
user: masa
server: druby://localhost:7777
  • wfm_monitoring just transfer the script to g-sub command, and only .sh (born shell script or bash script) is available (this will be updated).
  • You should write all the options for g-sub (qsub) in your job script
  • Only -o and -e options are ignored in the script since the standard outpus and errors are outputted in 'logs' directory.
  • For private use, the important file is statuses.kch. If you want to edit this file directory, you can use kchashmgr command. An example is below. This command works without workflow_manager server.
$ kchashmgr
kchashmgr: the command line utility of the file hash database of Kyoto Cabinet

usage:
  kchashmgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] path
  kchashmgr inform [-onl|-otl|-onr] [-st] path
  kchashmgr set [-onl|-otl|-onr] [-add|-rep|-app|-inci|-incd] [-sx] path key value
  kchashmgr remove [-onl|-otl|-onr] [-sx] path key
  kchashmgr get [-onl|-otl|-onr] [-rm] [-sx] [-px] [-pz] path key
  kchashmgr list [-onl|-otl|-onr] [-max num] [-rm] [-sx] [-pv] [-px] path [key]
  kchashmgr clear [-onl|-otl|-onr] path
  kchashmgr import [-onl|-otl|-onr] [-sx] path [file]
  kchashmgr copy [-onl|-otl|-onr] path file
  kchashmgr dump [-onl|-otl|-onr] path [file]
  kchashmgr load [-otr] [-onl|-otl|-onr] path [file]
  kchashmgr defrag [-onl|-otl|-onr] path
  kchashmgr setbulk [-onl|-otl|-onr] [-sx] path key value ...
  kchashmgr removebulk [-onl|-otl|-onr] [-sx] path key ...
  kchashmgr getbulk [-onl|-otl|-onr] [-sx] [-px] path key ...
  kchashmgr check [-onl|-otl|-onr] path
$ kchashmgr inform statuses.kch 
count: 11
size: 6298656
$ kchashmgr list statuses.kch 
541726
541727
$ kchashmgr get statuses.kch 541727
resubmit: 541728,fail_sample.sh,2013-03-01 09:50:38,masa
$ kchashmgr remove statuses.kch 541727
$ kchashmgr list statuses.kch 
541726
$ kchashmgr get statuses.kch 541727
kchashmgr: DB::get failed: statuses.kch: 7: no record: no record
Edit

Example

Server:

$ workflow_manager
druby://localhost:12345

Client:

Edit

How to implement a new sushi application (a subclass of SushiApp class)

TODO

Edit

General Info.

  • All applicationes (classes) inherit SushiApp (lib/sushiApp.rb) class
  • At the moment, we must make a sushi application class file manually and import it in lib directory at the moment (Feature#2548 http://fgcz-track.uzh.ch/issues/2548 )
  • The class file is automatically loaded and make an instance when it is selected (no need to reboot sushi server), but the update of Sushi Application file is required to reboot the server
  • I recommend to make a trunk (branch) of SushiFabric in your local environment and test your Sushi Application before you commit the file
  • You can refer to many existing application files in lib directory of SushiFabric repository
  • lib/WordCountApp.rb is the smallest application and it can be referred to for the first time when you create your Sushi Application
Edit

Template of Sushi Application

#!/usr/bin/env ruby
# encoding: utf-8

require 'sushiApp'

class YourSushiApplication < SushiApp
  def initialize
    super
    @name = 'Application_Name'      # No Space
    @analysis_category = 'Category' # No Space
    @required_columns = ['Name', 'Read1']
    @required_params = []
  end
  def next_dataset
    {'Name'=>@dataset['Name'],
     'Stats [File]'=>File.join(@result_dir, @dataset['Name'].to_s + '.stats')
    }
    # [File] tag value will be outputted in gstore
  end
  def preprocess
  end
  def commands
    'echo hoge'
  end
end
if __FILE__ == $0
  usecase = YourSushiApplication.new
  usecase.project = "p1001" 
  usecase.user = 'sushi_lover'
  usecase.dataset_tsv_file = 'sample_dataset.tsv'

  # run (submit to workflow_manager)
  #usecase.run
  usecase.test_run
end
Note
  • Important methods are initialized, next_dataset, and commands
  • preprocess method is not required but it will be executed before running commands method
  • next_dataset method must return Hash data, key will be column name and value will be a value
  • commands method must return String data
  • @name and @analysis_category instance variable is required and no space must be included

Test Run

  • SushiApp#test_run method is implemented, which helps you to check if your application is correctly implemented
$ cd $RAILS_ROOT/lib
$ ruby -I. your_sushi_app.rb

Example: WordCountApp.rb

masaomi@fgcz-s-034:$RAILS_ROOT/lib
$ ruby -I. WordCountApp.rb
check project name: PASSED:
    @project=p1001
check user name: PASSED:
    @user=sushi_lover
check application name: PASSED:
    @name=Word_Count
check analysis_category: PASSED:
    @analysis_category=Stats
check dataset: PASSED:
    @dataset_hash.length = 2
check required columns: PASSED:
    required columns: ["Name", "Read1"]
    dataset  columns: ["Name", "Read1 [File]", "Read2 [File]", "Species", "Dummy [Factor]"]
check required parameters: PASSED:
    parameters: ["cores", "ram", "scratch", "node", "process_mode"]
    required  : []
check next dataset: PASSED:
check output files: PASSED:
check commands: PASSED:
generated command will be:
    gunzip -c $GSTORE_DIR/p1001/data/short-ama_E1_R1.fastq.gz |wc > sample1.stats
    echo 'Factor columns: [Dummy]'
    echo 'Factors: [{"Dummy"=>"hoge"},{"Dummy"=>"bar"}]'
PASSED:
generated command will be:
    gunzip -c $GSTORE_DIR/p1001/data/short-ama2_R1.fastq.gz |wc > sample2.stats
    echo 'Factor columns: [Dummy]'
    echo 'Factors: [{"Dummy"=>"hoge"},{"Dummy"=>"bar"}]'
check workflow manager: PASSED:
All checks PASSED

Edit

DataSet format

TODO