Introduction¶

Process your raw data, without cooking it!

Sushi Fabric is defined as

S equence analysis of U seless S taff with H ighly I ntegrated Fabric
or
S uper U seful S ystem for HI gh-throughput data Fabric
or
S uper U ltra S pecial H yper I ncredible Fabric

Namely, it is not decided yet. Please send an email to masaomi.hatakeyama@fgcz.uzh.ch if you have better definition.

Modules¶

There are 3 modules.

workflow_manager (gem): manage jobs and cluster environment, checking a running job and submitting a job
sushi_fabric (gem): Super class for a specific application, and provide primitive functions communicating with workflow_manager via DRuby
SushiFabric (Ruby on Rails): provide GUI, calling sushi_fabric functions

TODO

UML diagrams

Dependencies¶

Ruby 1.9.3
Ruby on Rails (~> 3.2.9)
libsqlite3-dev (to build the gem)
libxml2-dev (to build a gem)
libxslt1-dev (to build a gem)
node.js (installed from source, http://nodejs.org/)

Note

Ruby 1.8.7 does not work
Ruby 2.1 is not checked yet
More details for gem libraries refer to Gemfile.lock

Installation¶

Steps¶

(Install Ruby)
Install bundle (gem)
Install workflow_manager (gem)
Install sushi_fabric (gem)
Download SushiFabric.tgz (Ruby on Rails application)
Install required libraries (bundle install)
Setup database (rake db:migrate)
Run (rails server)

Install Ruby 1.9¶

Check if Ruby 1.9.3 is installed on your computer

$ ruby -v

If Ruby is NOT installed or Ruby version is under 1.9, then you should download and install Ruby from

https://www.ruby-lang.org/en/downloads/

Apple: Needs XCode or at least the command-line tools found here: https://developer.apple.com/downloads/

Typical commands to install Ruby

$ wget http://cache.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p484.tar.gz
$ tar zxvf ruby-1.9.3-p484.tar.gz
$ cd ruby-1.9.3-p484
$ ./configure
$ make
$ sudo make install

Note

Usually Ruby interpreter is installed in /usr/local/bin
If you want to change the installation directory, set --prefix option to ./configure as follows

$ ./configure --prefix=$HOME/bin/ruby-1.9.3
$ make
$ make install

Install bundle¶

gem install bundle

Note

gem is a Ruby library manager, and it downloads and installs a library from http://rubygems.org/
bundle is a Ruby library version manager that manages specific library versions for an application
All gem installations may need sudo permissions (if you installed Ruby in a system directory but you can change the gem directory with setting GEM_HOME environmental variable. see below.)

Gem library installtion directory can be set with the environment variable:

GEM_HOME = $HOME/.gems

Install workflow_manager¶

$ gem install workflow_manager

Run

$ workflow_manager
mode = development
druby://localhost:12345

Note

Ignore the warning:

/usr/local/lib/ruby/1.9.1/yaml.rb:84:in `<top (required)>':
It seems your ruby installation is missing psych (for YAML output).
To eliminate this warning, please install libyaml and reinstall your ruby.

Install sushi_fabric (core code of SushiFabric)¶

$ gem install sushi_fabric

That's it.

Download SushiFabric (Ruby on Rails code)¶

$ wget http://fgcz-sushi.uzh.ch//SushiFabric_20131201.tgz

Tar and bundle install

$ tar zxvf SushiFabric_20131201.tgz
$ cd SushiFabric
$ bundle install

Set up database for Ruby on Rails¶

Go to the SushiFabric directory, then

$ bundle exec rake db:migrate

Start Ruby on Rails server

$ rails server

Note

And then, try to access http://localhost:3000
If you can see the top menu of SushiFabric, it scceeded in installing.
In order to stop it, Ctrl + C

Configuration¶

There are two configuration files, one for workflow_manager and one for sushi_fabric

Configuration file

config/environments/development.rb

#!/usr/bin/env ruby
# encoding: utf-8

WorkflowManager::Server.configure do |config|
  config.log_dir = 'logs'
  config.db_dir = 'dbs'
  config.interval = 30
  config.resubmit = 0
  config.cluster = WorkflowManager::LocalComputer.new('local_computer')
end

Note

A new cluster class should inherit WorkflowManager::Cluster Class and define (overwrite) the following 4 methods:
In more detail, you can refer to $GEM_HOME/**/workflow_manager/lib/workflow_manager/cluster.rb

Cluster Class

  class Cluster
    def submit_job(script_file, script_content, option='')
    end
    def job_running?(job_id)
    end
    def job_ends?(log_file)
    end
    def copy_commands(org_dir, dest_parent_dir)
    end
  end

Default sushi_fabric parameters (config/environments/development.rb)

SushiFabric::Application.configure do
  # sushi_fabric
  config.workflow_manager = 'druby://localhost:12345'
  config.gstore_dir = File.join(Dir.pwd, 'public/gstore/projects')
  config.sushi_app_dir = Dir.pwd
  config.scratch_dir = '/tmp/scratch'
end

Note

If you want to change the parameters, make config/environments/development.rb like above.
workflow_manager and sushi_fabric configurations are totally independent.
Usually, Ruby on Rails has a configuration file with the same file name (in development mode), and you can add the above 4 additional parameters in the file.

Usage¶

Top Menu {img src="img/wiki_up/top_menu.png" }

Project: You can select a project where you are belonging
DataSet: You can see DataSets
Import DataSet: Import DataSet from .tsv file
Run Application: Submit a job
Check Jobs: Show job list
gStore: Browsing gstore directory

Run Application¶

First, select input DataSet and Application.
Second, set SGE job parameters and application parameters.
Then clicking the submit button, the job will be submitted to SGE via workflow manager.

After the job running,

A new DataSet
Parameter set
Logs

are generated automattically in a directory under gStore directory.

The new DataSet can be an input for the other application.

Job monitoring¶

All submitted jobs are listed with running status: success, failure, or re-submitted.
Submitted time and finish time are also shown.

Command interface¶

All sushi functions are available also on command line.

Workflow manager¶

Abstract¶

A console based server-client application
For each (g-sub, qsub) submittion, a new thread is created and it will be monitoring the job once in 30 seconds (default, you can modify it by changing the constant INTERVAL = 30 in the source code).
If a job fails (the case that it does not reach at the end of script), it submits it again automatically (at maximum 2 times, it also can be modified in the source code RESUBMIT = 2)
It logs when it starts and ends and who submits a job, and the submitted scripts and standard outputs and errors.

Server (daemon)

$ ruby workflow_manager.rb -h
Version: 20130228-170033
Usage:
 workflow_manager.rb [druby://host:port]

  host: you should select the server where you start workflow_manager.rb
  port: whatever that is free
  default: druby://localhost:12345

Note

It automatically creates 'logs' directory. All the scripts and logs will be saved in the directory.
Some *.kch files will also be created. Those files are binary files that are DB files of KyotoCabinet.
The URI should be the same when you use the client commands (wfm_*).
If the same URI has been already used by somebody, it does not start.
If workflow manager stops for some reason while the monitoring job is running, the job status will never change because the monitoring thread is gone away.
At the moment, there is no command to assign a thread to monitor a submitted job (This would be a future feature).
I want a function to send an email when a monitored job finishs (this function will be added soon)

Example

start: 
  workflow_manager
kill: 
  ctl-c

Client (command)

wfm_monitoring job_script.sh
wfm_job_list
wfm_status job_id
wfm_get_log job_id
wfm_get_job_script job_id

Note

The client commands use the url druby://localhost:12345 as a default
If you start the server with the other URI, you have to add the URI after the client command as an argument like
```
wfm_job_list druby://hogehoge:777
```

But if you have .wfmrc file in your home directory or the current directory where you run the client command as follows, the server URI in the file is used from both server and client without the URL argument.

user: masa
server: druby://localhost:7777

wfm_monitoring just transfer the script to g-sub command, and only .sh (born shell script or bash script) is available (this will be updated).
You should write all the options for g-sub (qsub) in your job script
Only -o and -e options are ignored in the script since the standard outpus and errors are outputted in 'logs' directory.
For private use, the important file is statuses.kch. If you want to edit this file directory, you can use kchashmgr command. An example is below. This command works without workflow_manager server.

$ kchashmgr
kchashmgr: the command line utility of the file hash database of Kyoto Cabinet

usage:
  kchashmgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] path
  kchashmgr inform [-onl|-otl|-onr] [-st] path
  kchashmgr set [-onl|-otl|-onr] [-add|-rep|-app|-inci|-incd] [-sx] path key value
  kchashmgr remove [-onl|-otl|-onr] [-sx] path key
  kchashmgr get [-onl|-otl|-onr] [-rm] [-sx] [-px] [-pz] path key
  kchashmgr list [-onl|-otl|-onr] [-max num] [-rm] [-sx] [-pv] [-px] path [key]
  kchashmgr clear [-onl|-otl|-onr] path
  kchashmgr import [-onl|-otl|-onr] [-sx] path [file]
  kchashmgr copy [-onl|-otl|-onr] path file
  kchashmgr dump [-onl|-otl|-onr] path [file]
  kchashmgr load [-otr] [-onl|-otl|-onr] path [file]
  kchashmgr defrag [-onl|-otl|-onr] path
  kchashmgr setbulk [-onl|-otl|-onr] [-sx] path key value ...
  kchashmgr removebulk [-onl|-otl|-onr] [-sx] path key ...
  kchashmgr getbulk [-onl|-otl|-onr] [-sx] [-px] path key ...
  kchashmgr check [-onl|-otl|-onr] path
$ kchashmgr inform statuses.kch 
count: 11
size: 6298656
$ kchashmgr list statuses.kch 
541726
541727
$ kchashmgr get statuses.kch 541727
resubmit: 541728,fail_sample.sh,2013-03-01 09:50:38,masa
$ kchashmgr remove statuses.kch 541727
$ kchashmgr list statuses.kch 
541726
$ kchashmgr get statuses.kch 541727
kchashmgr: DB::get failed: statuses.kch: 7: no record: no record

For more detail, refer to http://fallabs.com/kyotocabinet/command.html

Example¶

Server:

$ workflow_manager
druby://localhost:12345

Client:

How to implement a new sushi application (a subclass of SushiApp class)¶

TODO

General Info.¶

All applicationes (classes) inherit SushiApp (lib/sushiApp.rb) class
At the moment, we must make a sushi application class file manually and import it in lib directory at the moment (Feature#2548 http://fgcz-track.uzh.ch/issues/2548 )
The class file is automatically loaded and make an instance when it is selected (no need to reboot sushi server), but the update of Sushi Application file is required to reboot the server
I recommend to make a trunk (branch) of SushiFabric in your local environment and test your Sushi Application before you commit the file
You can refer to many existing application files in lib directory of SushiFabric repository
lib/WordCountApp.rb is the smallest application and it can be referred to for the first time when you create your Sushi Application

Template of Sushi Application¶

#!/usr/bin/env ruby
# encoding: utf-8

require 'sushiApp'

class YourSushiApplication < SushiApp
  def initialize
    super
    @name = 'Application_Name'      # No Space
    @analysis_category = 'Category' # No Space
    @required_columns = ['Name', 'Read1']
    @required_params = []
  end
  def next_dataset
    {'Name'=>@dataset['Name'],
     'Stats [File]'=>File.join(@result_dir, @dataset['Name'].to_s + '.stats')
    }
    # [File] tag value will be outputted in gstore
  end
  def preprocess
  end
  def commands
    'echo hoge'
  end
end
if __FILE__ == $0
  usecase = YourSushiApplication.new
  usecase.project = "p1001" 
  usecase.user = 'sushi_lover'
  usecase.dataset_tsv_file = 'sample_dataset.tsv'

  # run (submit to workflow_manager)
  #usecase.run
  usecase.test_run
end

Note

Important methods are initialized, next_dataset, and commands
preprocess method is not required but it will be executed before running commands method
next_dataset method must return Hash data, key will be column name and value will be a value
commands method must return String data
@name and @analysis_category instance variable is required and no space must be included

Test Run

SushiApp#test_run method is implemented, which helps you to check if your application is correctly implemented

$ cd $RAILS_ROOT/lib
$ ruby -I. your_sushi_app.rb

Example: WordCountApp.rb

masaomi@fgcz-s-034:$RAILS_ROOT/lib
$ ruby -I. WordCountApp.rb
check project name: PASSED:
    @project=p1001
check user name: PASSED:
    @user=sushi_lover
check application name: PASSED:
    @name=Word_Count
check analysis_category: PASSED:
    @analysis_category=Stats
check dataset: PASSED:
    @dataset_hash.length = 2
check required columns: PASSED:
    required columns: ["Name", "Read1"]
    dataset  columns: ["Name", "Read1 [File]", "Read2 [File]", "Species", "Dummy [Factor]"]
check required parameters: PASSED:
    parameters: ["cores", "ram", "scratch", "node", "process_mode"]
    required  : []
check next dataset: PASSED:
check output files: PASSED:
check commands: PASSED:
generated command will be:
    gunzip -c $GSTORE_DIR/p1001/data/short-ama_E1_R1.fastq.gz |wc > sample1.stats
    echo 'Factor columns: [Dummy]'
    echo 'Factors: [{"Dummy"=>"hoge"},{"Dummy"=>"bar"}]'
PASSED:
generated command will be:
    gunzip -c $GSTORE_DIR/p1001/data/short-ama2_R1.fastq.gz |wc > sample2.stats
    echo 'Factor columns: [Dummy]'
    echo 'Factors: [{"Dummy"=>"hoge"},{"Dummy"=>"bar"}]'
check workflow manager: PASSED:
All checks PASSED

DataSet format¶

TODO

SushiFabric README

Introduction¶

Modules¶

Dependencies¶

Installation¶

Steps¶

Install Ruby 1.9¶

Install bundle¶

Install workflow_manager¶

Install sushi_fabric (core code of SushiFabric)¶

Download SushiFabric (Ruby on Rails code)¶

Set up database for Ruby on Rails¶

Configuration¶

Usage¶

Top Menu {img src="img/wiki_up/top_menu.png" }

Run Application¶

Job monitoring¶

Command interface¶

Workflow manager¶

Abstract¶

Example¶

How to implement a new sushi application (a subclass of SushiApp class)¶

General Info.¶

Template of Sushi Application¶

Test Run

DataSet format¶