- Introduction
- Modules
- Dependencies
- Installation
- Usage
- How to implement a new sushi application (a subclass of SushiApp class)
- DataSet format
Introduction¶
Process your raw data, without cooking it!
Sushi Fabric is defined as- S equence analysis of U seless S taff with H ighly I ntegrated Fabric
or - S uper U seful S ystem for HI gh-throughput data Fabric
or - S uper U ltra S pecial H yper I ncredible Fabric
Namely, it is not decided yet. Please send an email to masaomi.hatakeyama@fgcz.uzh.ch if you have better definition.
Modules¶
There are 3 modules.- workflow_manager (gem): manage jobs and cluster environment, checking a running job and submitting a job
- sushi_fabric (gem): Super class for a specific application, and provide primitive functions communicating with workflow_manager via DRuby
- SushiFabric (Ruby on Rails): provide GUI, calling sushi_fabric functions
- UML diagrams
Dependencies¶
- Ruby 1.9.3
- Ruby on Rails (~> 3.2.9)
- libsqlite3-dev (to build the gem)
- libxml2-dev (to build a gem)
- libxslt1-dev (to build a gem)
- node.js (installed from source, http://nodejs.org/)
- Ruby 1.8.7 does not work
- Ruby 2.1 is not checked yet
- More details for gem libraries refer to Gemfile.lock
Installation¶
Steps¶
- (Install Ruby)
- Install bundle (gem)
- Install workflow_manager (gem)
- Install sushi_fabric (gem)
- Download SushiFabric.tgz (Ruby on Rails application)
- Install required libraries (bundle install)
- Setup database (rake db:migrate)
- Run (rails server)
Install Ruby 1.9¶
Check if Ruby 1.9.3 is installed on your computer
$ ruby -vIf Ruby is NOT installed or Ruby version is under 1.9, then you should download and install Ruby from
Apple: Needs XCode or at least the command-line tools found here: https://developer.apple.com/downloads/
Typical commands to install Ruby
$ wget http://cache.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p484.tar.gz $ tar zxvf ruby-1.9.3-p484.tar.gz $ cd ruby-1.9.3-p484 $ ./configure $ make $ sudo make installNote
- Usually Ruby interpreter is installed in /usr/local/bin
- If you want to change the installation directory, set --prefix option to ./configure as follows
$ ./configure --prefix=$HOME/bin/ruby-1.9.3 $ make $ make install
Install bundle¶
gem install bundleNote
- gem is a Ruby library manager, and it downloads and installs a library from http://rubygems.org/
- bundle is a Ruby library version manager that manages specific library versions for an application
- All gem installations may need sudo permissions (if you installed Ruby in a system directory but you can change the gem directory with setting GEM_HOME environmental variable. see below.)
Gem library installtion directory can be set with the environment variable:
GEM_HOME = $HOME/.gems
Install workflow_manager¶
$ gem install workflow_manager
Run
$ workflow_manager mode = development druby://localhost:12345Note
- Ignore the warning:
/usr/local/lib/ruby/1.9.1/yaml.rb:84:in `<top (required)>': It seems your ruby installation is missing psych (for YAML output). To eliminate this warning, please install libyaml and reinstall your ruby.
Install sushi_fabric (core code of SushiFabric)¶
$ gem install sushi_fabric
That's it.
Download SushiFabric (Ruby on Rails code)¶
$ wget http://fgcz-sushi.uzh.ch//SushiFabric_20131201.tgz
Tar and bundle install
$ tar zxvf SushiFabric_20131201.tgz $ cd SushiFabric $ bundle install
Set up database for Ruby on Rails¶
Go to the SushiFabric directory, then
$ bundle exec rake db:migrate
Start Ruby on Rails server
$ rails serverNote
- And then, try to access http://localhost:3000
- If you can see the top menu of SushiFabric, it scceeded in installing.
- In order to stop it, Ctrl + C
Configuration¶
There are two configuration files, one for workflow_manager and one for sushi_fabric
Configuration file- config/environments/development.rb
#!/usr/bin/env ruby # encoding: utf-8 WorkflowManager::Server.configure do |config| config.log_dir = 'logs' config.db_dir = 'dbs' config.interval = 30 config.resubmit = 0 config.cluster = WorkflowManager::LocalComputer.new('local_computer') endNote
- A new cluster class should inherit WorkflowManager::Cluster Class and define (overwrite) the following 4 methods:
- In more detail, you can refer to $GEM_HOME/**/workflow_manager/lib/workflow_manager/cluster.rb
Cluster Class
class Cluster def submit_job(script_file, script_content, option='') end def job_running?(job_id) end def job_ends?(log_file) end def copy_commands(org_dir, dest_parent_dir) end end
Default sushi_fabric parameters (config/environments/development.rb)
SushiFabric::Application.configure do # sushi_fabric config.workflow_manager = 'druby://localhost:12345' config.gstore_dir = File.join(Dir.pwd, 'public/gstore/projects') config.sushi_app_dir = Dir.pwd config.scratch_dir = '/tmp/scratch' endNote
- If you want to change the parameters, make config/environments/development.rb like above.
- workflow_manager and sushi_fabric configurations are totally independent.
- Usually, Ruby on Rails has a configuration file with the same file name (in development mode), and you can add the above 4 additional parameters in the file.
Usage¶
Top Menu {img src="img/wiki_up/top_menu.png" }
- Project: You can select a project where you are belonging
- DataSet: You can see DataSets
- Import DataSet: Import DataSet from .tsv file
- Run Application: Submit a job
- Check Jobs: Show job list
- gStore: Browsing gstore directory
Run Application¶
First, select input DataSet and Application.
Second, set SGE job parameters and application parameters.
Then clicking the submit button, the job will be submitted to SGE via workflow manager.
- A new DataSet
- Parameter set
- Logs
are generated automattically in a directory under gStore directory.
The new DataSet can be an input for the other application.
Job monitoring¶
All submitted jobs are listed with running status: success, failure, or re-submitted.
Submitted time and finish time are also shown.
Command interface¶
All sushi functions are available also on command line.
Workflow manager¶
Abstract¶
- A console based server-client application
- For each (g-sub, qsub) submittion, a new thread is created and it will be monitoring the job once in 30 seconds (default, you can modify it by changing the constant INTERVAL = 30 in the source code).
- If a job fails (the case that it does not reach at the end of script), it submits it again automatically (at maximum 2 times, it also can be modified in the source code RESUBMIT = 2)
- It logs when it starts and ends and who submits a job, and the submitted scripts and standard outputs and errors.
Server (daemon)
$ ruby workflow_manager.rb -h Version: 20130228-170033 Usage: workflow_manager.rb [druby://host:port] host: you should select the server where you start workflow_manager.rb port: whatever that is free default: druby://localhost:12345Note
- It automatically creates 'logs' directory. All the scripts and logs will be saved in the directory.
- Some *.kch files will also be created. Those files are binary files that are DB files of KyotoCabinet.
- The URI should be the same when you use the client commands (wfm_*).
- If the same URI has been already used by somebody, it does not start.
- If workflow manager stops for some reason while the monitoring job is running, the job status will never change because the monitoring thread is gone away.
- At the moment, there is no command to assign a thread to monitor a submitted job (This would be a future feature).
- I want a function to send an email when a monitored job finishs (this function will be added soon)
Example
start: workflow_manager kill: ctl-c
Client (command)
wfm_monitoring job_script.sh wfm_job_list wfm_status job_id wfm_get_log job_id wfm_get_job_script job_idNote
- The client commands use the url druby://localhost:12345 as a default
- If you start the server with the other URI, you have to add the URI after the client command as an argument like
wfm_job_list druby://hogehoge:777
- But if you have .wfmrc file in your home directory or the current directory where you run the client command as follows, the server URI in the file is used from both server and client without the URL argument.
user: masa server: druby://localhost:7777
- wfm_monitoring just transfer the script to g-sub command, and only .sh (born shell script or bash script) is available (this will be updated).
- You should write all the options for g-sub (qsub) in your job script
- Only -o and -e options are ignored in the script since the standard outpus and errors are outputted in 'logs' directory.
- For private use, the important file is statuses.kch. If you want to edit this file directory, you can use kchashmgr command. An example is below. This command works without workflow_manager server.
$ kchashmgr kchashmgr: the command line utility of the file hash database of Kyoto Cabinet usage: kchashmgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] path kchashmgr inform [-onl|-otl|-onr] [-st] path kchashmgr set [-onl|-otl|-onr] [-add|-rep|-app|-inci|-incd] [-sx] path key value kchashmgr remove [-onl|-otl|-onr] [-sx] path key kchashmgr get [-onl|-otl|-onr] [-rm] [-sx] [-px] [-pz] path key kchashmgr list [-onl|-otl|-onr] [-max num] [-rm] [-sx] [-pv] [-px] path [key] kchashmgr clear [-onl|-otl|-onr] path kchashmgr import [-onl|-otl|-onr] [-sx] path [file] kchashmgr copy [-onl|-otl|-onr] path file kchashmgr dump [-onl|-otl|-onr] path [file] kchashmgr load [-otr] [-onl|-otl|-onr] path [file] kchashmgr defrag [-onl|-otl|-onr] path kchashmgr setbulk [-onl|-otl|-onr] [-sx] path key value ... kchashmgr removebulk [-onl|-otl|-onr] [-sx] path key ... kchashmgr getbulk [-onl|-otl|-onr] [-sx] [-px] path key ... kchashmgr check [-onl|-otl|-onr] path $ kchashmgr inform statuses.kch count: 11 size: 6298656 $ kchashmgr list statuses.kch 541726 541727 $ kchashmgr get statuses.kch 541727 resubmit: 541728,fail_sample.sh,2013-03-01 09:50:38,masa $ kchashmgr remove statuses.kch 541727 $ kchashmgr list statuses.kch 541726 $ kchashmgr get statuses.kch 541727 kchashmgr: DB::get failed: statuses.kch: 7: no record: no record
- For more detail, refer to http://fallabs.com/kyotocabinet/command.html
Example¶
Server:
$ workflow_manager druby://localhost:12345
Client:
How to implement a new sushi application (a subclass of SushiApp class)¶
TODO
General Info.¶
- All applicationes (classes) inherit SushiApp (lib/sushiApp.rb) class
- At the moment, we must make a sushi application class file manually and import it in lib directory at the moment (Feature#2548 http://fgcz-track.uzh.ch/issues/2548 )
- The class file is automatically loaded and make an instance when it is selected (no need to reboot sushi server), but the update of Sushi Application file is required to reboot the server
- I recommend to make a trunk (branch) of SushiFabric in your local environment and test your Sushi Application before you commit the file
- You can refer to many existing application files in lib directory of SushiFabric repository
- lib/WordCountApp.rb is the smallest application and it can be referred to for the first time when you create your Sushi Application
Template of Sushi Application¶
#!/usr/bin/env ruby # encoding: utf-8 require 'sushiApp' class YourSushiApplication < SushiApp def initialize super @name = 'Application_Name' # No Space @analysis_category = 'Category' # No Space @required_columns = ['Name', 'Read1'] @required_params = [] end def next_dataset {'Name'=>@dataset['Name'], 'Stats [File]'=>File.join(@result_dir, @dataset['Name'].to_s + '.stats') } # [File] tag value will be outputted in gstore end def preprocess end def commands 'echo hoge' end end if __FILE__ == $0 usecase = YourSushiApplication.new usecase.project = "p1001" usecase.user = 'sushi_lover' usecase.dataset_tsv_file = 'sample_dataset.tsv' # run (submit to workflow_manager) #usecase.run usecase.test_run endNote
- Important methods are initialized, next_dataset, and commands
- preprocess method is not required but it will be executed before running commands method
- next_dataset method must return Hash data, key will be column name and value will be a value
- commands method must return String data
- @name and @analysis_category instance variable is required and no space must be included
Test Run
- SushiApp#test_run method is implemented, which helps you to check if your application is correctly implemented
$ cd $RAILS_ROOT/lib $ ruby -I. your_sushi_app.rb
Example: WordCountApp.rb
masaomi@fgcz-s-034:$RAILS_ROOT/lib $ ruby -I. WordCountApp.rb check project name: PASSED: @project=p1001 check user name: PASSED: @user=sushi_lover check application name: PASSED: @name=Word_Count check analysis_category: PASSED: @analysis_category=Stats check dataset: PASSED: @dataset_hash.length = 2 check required columns: PASSED: required columns: ["Name", "Read1"] dataset columns: ["Name", "Read1 [File]", "Read2 [File]", "Species", "Dummy [Factor]"] check required parameters: PASSED: parameters: ["cores", "ram", "scratch", "node", "process_mode"] required : [] check next dataset: PASSED: check output files: PASSED: check commands: PASSED: generated command will be: gunzip -c $GSTORE_DIR/p1001/data/short-ama_E1_R1.fastq.gz |wc > sample1.stats echo 'Factor columns: [Dummy]' echo 'Factors: [{"Dummy"=>"hoge"},{"Dummy"=>"bar"}]' PASSED: generated command will be: gunzip -c $GSTORE_DIR/p1001/data/short-ama2_R1.fastq.gz |wc > sample2.stats echo 'Factor columns: [Dummy]' echo 'Factors: [{"Dummy"=>"hoge"},{"Dummy"=>"bar"}]' check workflow manager: PASSED: All checks PASSED
DataSet format¶
TODO