bio-linux presentation, biolinux

What is a Computer?What is a Computer?

Computer is a programmable Machine

MACHINE is a device that uses energy to perform some activity, and

The DEVICE is a piece of equipment made for a particular purpose, especially a mechanical or electrical one.

Programmable Machine

DATA

Input/Output

DATA

Electronic Device

So a COMPUTER is an electronic device that is used to process, store and retrieve DATA

Here I want to emphasize that DATA is centric to COMPUTERS

Where are these Where are these Computers applied?Computers applied?

Information technology (IT) is "the study, design, development, implementation, support or management of computer bases information systems".

Information technology is a general term that describes any technology that helps to produce, manipulate, store, communicate, and/or disseminate information.

Information, in its most restricted technical sense, is an ordered sequence of symbols.

As a concept, however, information has many meanings.

Moreover, the concept of information is closely related to notions of constraint, communication, control, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.

I want to emphasize that INFORMATION is nothing but a meaningful or an ordered DATA

COMPUTERS are applied in the field of INFORMARION TECHNOLOGY

INFORMATION = Organized DATA

What is Bioinformatics?What is Bioinformatics?

Biology is a natural science concerned with the study of life

and living organisms, including their structure, function, growth, origin,

evolution, distribution, and taxonomy.

Biology is a vast subject containing many subdivisions,

topics, and disciplines.

A biologist is a scientist devoted to and producing

results in biology through the study of life.

Rough Overview

DATA BASES of early biologist

Cabinets of curiosities, such as that of Ole Worm, were centers of biological knowledge in the early modern period, bringing organisms from across the world together in one place. Before the Age of Exploration, naturalists had little idea of the sheer scale of biological diversity.

In the course of his travels, Alexander von Humboldt mapped the distribution of plants across landscapes and recorded a variety of physical conditions such as pressure and temperature.

Why is there Bioinformatics?

Huge datasets

Lots of new sequences being added - Automated sequencers - Genome Projects - EST sequencing, microarray studies, proteomics

Nature of Data

Most forms of raw data make visual inspection ineffective

Patterns in datasets that can be analyzed using computers

The Human Genome Project (HGP) was an international scientific research project

with a primary goal to determine the sequence of chemical base pairs which make up DNA and to identify and map the approximately

20,000–25,000 genes of the human genome from

both a physical and functional standpoint.

Human Genome Project has been called a Mega Project because of the following factors:1. The human genome has approx. 3.3 billion base-pairs; if the cost of sequencing is US $3 per base-pair, then the approx. cost will be US $10 billion.

2. If the sequence obtained were to be stored in a typed form in books and if each page contains 1000 letters and each book contains 1000 pages, then 3300 such books would be needed to store the complete information.

However, if expressed in computer storage units (3.3 billion base-pairs) x (2 bits per pair) = 825 megabytes of raw data. Which is about the same size of one music CD. If further compressed, this data can be expected to fit in less than 20 Megabytes.

The first printout of the human genome to be presented as a series of books, displayed at the Wellcome Collection, London

Think – Pair – Share!

The Biologist in the Age of InformationThe job of the biologist is changing...

Bionformatics - is the combination of biology and information technology.

Major Bioinformatics Tasks

- Data organization and curation - Data analysis - Software development

Bioinformatics is the combination of biology and information technology. Thediscipline encompasses any computational tools and methods used to manage,analyze and manipulate large sets of biological data. Essentially, bioinformatics hasthree components:

• The creation of databases allowing the storage and management of large biological data sets.• The development of algorithms and statistics to determine relationships among members of large data sets.• The use of these tools for the analysis and interpretation of various types of biological data, including DNA, RNA and protein sequences, protein structures, gene expression profiles, and biochemical pathways.

The term bioinformatics first came into use in the 1990s and was originally synonymous with the management and analysis of DNA, RNA and protein sequence data

Bioinformatics is largely, although not exclusively, a computer-based discipline.Computers are a must in bioinformatics for two reasons:

First, many bioinformatics problems require the same task to be repeated `millions of times. Second, computers are required for their problem-solving power.

Genetics related applications

There are three types of computational problems in genetics

Analysis of a single sequence to assess similarity with known genes. Identification of typical features such as binding sites or derive evolutionary relationships through phylogenetic trees. Complete genome analysis to identify members of gene families, determination of the chromosomal location of the gene, etc.

Sequence ComparisonLinkage Analysis

Phylogetic AnalysisGenomics

MicroarraysSequence assemblyGenome annotation

ProteomicsPharmacogenomics

Drug Discovery and computer aided drug designSystems Biology

Implications for Biomedicine... and Bioinformatics

• Physicians will use genetic information to diagnose and treat disease.

– Virtually all medical conditions (other than trauma) have a genetic component – Individualize drugs – reduce side effects – Single Nucleotide Polymorphisms (SNPs)

• Faster drug development research – More targets – Faster clinical trials (selected trial populations)

• Most Biologists will analyze gene sequence information in their daily work

Bioinformatics will help with....... DNA Sequencing

- Automated sequencers > 40,000 bp per day

- 500 bp reads must be assembled into complete Sequences

- Detecting errors especially insertions and deletions

- Data flow management

Bioinformatics will help with.......

Similarity Searching Sequence Databases

- What is similar to my sequence?

- Searching gets harder as the databases get bigger - and quality changes

- Tools: BLAST and FASTA = time saving heuristics (approximate methods)

- Statistics + informed judgement of the biologist


Structure-Function Relationships

Can we predict the function of proteinmolecules from their sequence?

sequence > structure > function

Prediction of some simple 3-D structures (α-helix, β-sheet, membrane spanning, etc.)


Phylogenetics

Can we define evolutionary relationships between organisms by comparing DNA sequences

- What is the molecular clock? - Lots of methods and software, what is the "correct" analysis?

• Sequence data processing – Base calling – Quality determination – Trace viewing – Vector masking – Repeat masking – Assembly• Sequence characterization – Nucleotide composition – Codon usage – Gene finding – Annotation• Alignment – Pairwise sequence alignment and database searching – Genome alignment – Multiple sequence alignment• Image analysis and processing• Phylogenetic analysis• Clustering• Group determinations

Yet another approachBio

INFORMATI

CS

APPLI

CATI

ONS

Molecular biology is the study of biology at a molecular level.

This field overlaps with other areas of biology and chemistry,

particularly genetics and biochemistry. Molecular biology chiefly concerns itself

with understanding the interactions between the various systems of a cell,

including the interactions between DNA, RNA and protein biosynthesis

as well as learning how these interactions are regulated.

Central Dogma of molecular biology

Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.

Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.

Biological databases are an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life.

What is the a Database?What is the a Database?

Databases — DefinitionA database is a set of data that has a regular structure and that is organized in such a way that a computer can easily find the desired information.

An organized body of related information.

Major Databases in bioinformatics

Protein DatabasesPrimary databases, ex – SWISS-PROT, PIRSecondary/Composite Databases, ex – OWL, NRDB

Structural DatabasesPDB, CATH, SCOP

Nucleotide and Genome SequencesGenBank, DDBJ, EMBL, SGD, EBI,COG, (GenBank at NCBI is in collaboration with DDBJ, EMBL)

Gene Expression DataOther Databases, ex – GeneCards, KEGG

http://www.ncbi.nlm.nih.gov/

http://www.ncbi.nlm.nih.gov/

National Center for Biotechnology Information

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM).

The NCBI houses genome sequencing data in GenBank and an index of biomedical, biotechnology research articles in PubMed.

All the databases are available online through the Entrez search engine. Entrez search engine.

The NCBI is directed by David Lipman, one of the original authors of the BLAST sequence alignment program and a widely respected figure in Bioinformatics.

Since 1992, NCBI has grown to provide other databases in addition to GenBank.

NCBI provides

Online Mendelian Inheritance in Man, the Molecular Modeling Database (3D protein structures), dbSNP a database of single-nucleotide polymorphisms, the Unique Human Gene Sequence Collection, a Gene Map of the human genome, a Taxonomy Browser, and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy Project. The NCBI assigns a unique identifier (Taxonomy ID number) to each species of organism.

GenBank

The NCBI has had responsibility for making available the GenBank DNA sequence database since 1992.

GenBank coordinates with individual laboratories and other sequence databases such as those of the European Molecular Biology Laboratory (EMBL) and the DNA Data Bank of Japan (DDBJ)

The NCBI has many software tools that are available by WWW browsing or by FTP. For example, BLAST is a sequence similarity searching program.

BLASTBLAST can do sequence comparisons against the GenBank DNA database with in 15 sec.

A Model Organism

A Science Primer

Databases and Tools

http://www.ncbi.nlm.nih.gov/Database/datamodel/index.html

http://www.ncbi.nlm.nih.gov/Database/datamodel/index.html

Different Data Centers, the last one is BioMed DataCenter

Bioinformatics is the application of statistics and computer science to the field of molecular biology.

Informatics (academic field), a broad academic field encompassing

information science, information technology,

algorithms, and social science

Bioinformatics is the IT field of a Biologist

Bioinformatics = TECHNOLOGY + Biological DATA

Informatics is nothing but Study of information

Bioinfomactics is nothing buy study of biological information

By now we learned that computer is a programmable machine and is applied to bioinformatics

So we need to PROGRAM bioinformatics, as we are applying statistics and computers to the field of molecular biology

What is a Program?What is a Program?

Any size of a program, big or a small.Program = logic (algorithm) + DATA

A computer program is a sequence of instructions written to perform a specified task for a computer.

difference between s/w and h/w

Softwareis the collection of computer programs

and related data that provide the instructions

telling a computer what to do. Hardware

(meaning physical device),in contrast to hardware,

software is intangible,meaning it "cannot be touched"

Computers Technology is applied to bioinformatics, by building bioinformcatics Software & Hardware which is centric to biological DATA

PROGRAM = LOGIC (algorithm) + DATA

At-least we do not need to build these Software Packages

Why because many of them are bundled through readily available

Open-source

GNU GPL LINUX Operating System

Topic-Bio-Linux6

Bhargavi SaragadamMSC Human GeneticsANDHRA UNIVERSITY

Suresh SaragadamBE cseBHARATHIDASAN UNIVERSITY

23/Aug/2010

Bio-linux

Bio-Linux 6.0 Overview

Bio-Linux 6.0 is a fully featured, powerful, configurable and easy to maintain bioinformatics workstation. Bio-Linux provides more than 500 bioinformatics programs on an Ubuntu Linux 10.04 base.

Before we Jump into bio-linux letus know the terminology

Open-SourceFSFGNUGPL

FOSS

Which address the term Software Freedom

And let us understand

the philosophy of free software.

Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in.

The Open Source Initiative (OSI) is a non-profit corporation formed to educate about and advocate for the benefits of open source and to build bridges among different constituencies in the open-source community.

OSI was jointly founded by Eric Raymond and Bruce Perens in late February 1998, with Raymond as its first president

The Open Source DefinitionIntroduction

Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria:

1. Free Redistribution

2. Source Code

3. Derived Works

4. Integrity of The Author's Source Code

5. No Discrimination Against Persons or Groups

6. No Discrimination Against Fields of Endeavor

7. Distribution of License

8. License Must Not Be Specific to a Product

9. License Must Not Restrict Other Software

10. License Must Be Technology-Neutral

No provision of the license may be predicated on any individual technology or style of interface

The Free Software Foundation (FSF) is a non-profit corporation

founded by Richard Stallman on 4 October 1985

to support the free software movement

# The FSF advocates for free software ideals as outlined in the Free Software Definition, works for adoption of free software and free media formats, and organizes activist campaigns against threats to user freedom like Windows 7, Apple's iPhone and OS X, DRM, ebooks and movies, and software patents.

# The FSF promotes completely free software distributions of GNU/Linux, and advocate that users of the GNU/Linux operating system switch to a distribution which respects their freedom.

# The FSF drives development of the GNU operating system and maintain a list of high-priority free software projects to promote replacements for common proprietary applications.

# The FSF builds and update resources useful for the free software community like the Free Software and Hardware Directories, and the free software jobs board. The FSF also provide licenses for free software developers to share their code, including the GNU General Public License.

The GNU General Public License (GNU GPL or simply GPL) is the most widely used free software license, originally written by Richard Stallman for the GNU project.

The GPL is the first and foremost copyleft license, which means that derived works can only be distributed under the same license terms. Under this philosophy, the GPL grants the recipients of a computer program the rights of the free software definition and uses copyleft to ensure the freedoms are preserved, even when the work is changed or added to. This is in distinction to permissive free software licenses, of which the BSD licenses are the standard examples.

The GNU Project, to develop a complete Unix-like operating system which is free software

* GNU, a computer operating system * GNU General Public License, a free software license * GNU Free Documentation License, a copyleft license

for free documentation

The GNU operating system is a complete free software system, upward-compatible with Unix. GNU stands for “GNU's Not Unix”. Richard Stallman made the Initial Announcement of the GNU Project in September 1983.

The name “GNU” is a recursive acronym for “GNU's Not Unix!”;

— it is pronounced g-noo, as one syllable with no vowel sound between the g and the n.

Free and open source software, also F/OSS, FOSS is software that is liberally licensed to grant the right of users to use, study, change, and improve its design through the availability of its source code.

Free software licences and open source licenses are used by many software packages.

The licenses have important differences, which mirror the differences in the ways the two kinds of software can be used and distributed and reflect differences in the philosophy behind the two.

In the context of free and open source software, "free" is intended to refer to the freedom to copy and re-use the software, rather than to the price of the software.

What is free Software?

“Free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech”, not as in “free beer”.

Free software is a matter of the users' freedom to run, copy, distribute, study, change and improve the software.

More precisely, it refers to four kinds of freedom, for the users of the software:

* The freedom to run the program, for any purpose (freedom 0). * The freedom to study how the program works, and adapt it to your needs (freedom 1).

Access to the source code is a precondition for this. * The freedom to redistribute copies so you can help your neighbor (freedom 2). * The freedom to improve the program, and release your improvements to the public,

so that the whole community benefits (freedom 3). Access to the source code is a precondition for this.

The Free Software Foundation --Richard M. Stallman,

He had started the GNU project in 1983 to develop the free operating system GNU GNU (a recursive acronym for GNU's Not Unix).

to pursue the Free Software Movement In 1985 Stallman founded the Free Software Foundation (FSF), dedicated to promoting computer users' rights to use, study, copy, modify and redistribute computer programs.

The FSF promotes the development and use of free software and free documentation. In particular, FSF promotes the GNU operating system, used widely today in its GNU/Linux variant, based on the Linux kernel developed by Linus Torvalds.

FSF believes that free software is a matter of freedom, not price.

The Free Software Foundation of India (FSF India), the official Indian affiliate of the FSF, was formally inaugurated by Richard Stallman at the Freedom First! Conference at Thiruvanathapuram, Kerala on 20 July 2001.

FSF INDIA will be the national agency for the promotion of the use of free software, i.e. software distributed under the GNU General Public Licence (GNU GPL) or other licences approved by FSF, in all domains.

SO FREE IS ALL ABOUT SOFTWARE FREEDOM

Who is he? Before we know about him Just first let us know his name He is Mr. Tux

LinuxLinux is a free Unix-type operating system originally created by Linus Torvalds with the assistance of developers around the world. Developed under the GNU General Public License , the source code for Linux is freely available to everyone.

When Linus Torvalds first developed Linux back in August of 1991, the operating system basically consisted of his kernel and some GNU tools. With the help of others Linus added more and more tools and applications. With time, individuals, university students and companies began distributing Linux with their own choice of packages bound around Linus' kernel. This is where the concept of the "distribution" was born.

Today, creating and selling Linux distributions is a multi-million dollar business. You can buy a boxed version of Linux from companies such as

Red Hat, Debian, SuSE, MandrakeSoft, and many.......

You can also download Linux from any number of companies and individuals.

Linux has an official mascot, TuxTux, the Linux penguin, which was selected by Linus Torvalds to represent the image he associates with the operating system. Tux was created by Larry Ewing and Larry

Apart from the fact that it's freely distributed, Linux's functionality, adaptability and robustness, has made it the main alternative for proprietary Unix and Microsoft operating systems. Microsoft operating systems.

What is an Operating System?What is an Operating System?

Operating systems provide a software platform on top of which other programs, called application programs, can run.With the aid of the firmware and device drivers, the operating system provides the most basic level of control over all of the computer's hardware devices.

A kernel part of the operating system connects the application software to the hardware of a computer.

An operating system can be divided into many different parts. One of the most important parts is the kernel,

which controls low-level processes that the average user usually cannot see: it controls how memory is read and written, the order in which processes are executed, how information is received and sent by devices like the monitor, keyboard and mouse, and deciding how to interpret information received by networks.

The user interface is the part of the operating system that interacts with the computer user directly, allowing them to control and use programs. The user interface may be graphical with icons and a desktop, or textual, with a command line.

A kernel connects the application software to the hardware of a computer.

With the aid of the firmware and device drivers, the operating system provides the most basic level of control over all of the computer's hardware devices. It manages memory access for programs in the RAM, it determines which programs get access to which hardware resources, it sets up or resets the CPU's operating states for optimal operation at all times, and it organizes the data for long-term non-volatile storage with file systems on such media as disks, tapes, flash memory, etc.

A Linux distribution, commonly called a "distro", is a project that manages a remote collection of system software and application software packages available for download and installation through a network connection. This allows the user to adapt the operating system to his/her specific needs. Distributions are maintained by individuals, loose-knit teams, volunteer organizations, and commercial entities.

A distribution is responsible for the default configuration of the installed Linux kernel, general system security, and more generally integration of the different software packages into a coherent whole. Distributions typically use a package manager such as Synaptic, YAST, or Portage to install, remove and update all of a system's software from one central location.

Although Linux distributions are generally available without charge, several large corporations sell, support, and contribute to the development of the components of the system and of free software.

An analysis of Linux showed 75 percent of the code from December 2008 to January 2010 was developed by programmers working for corporations, leaving about 18 percent to the traditional, open source community.

Some of the major corporations that contribute include Dell, IBM, HP, Oracle, Sun Microsystems, Novell, Nokia. A number of corporations, notably Red Hat, have built their entire business around Linux distributions.

Wanna know more about TUX

Some of the Famous Linux distributions for a Desktop/ Laptop As there are many distributions each for particular domain, bioinformatics has got few Linux

distributions like BioBrew, Bio-Linux, PhyLIS, Vlinux, DNALinux, BioKnoppix

Many more Linux distributions for your better understanding of Linux

Which one is more famous?

my favorite is ubuntu for Desktop/Laptop, I assume that Redhat for Server

Luckily biolinux is of my favorite Ubuntu base

Coming to the point

Bio-linux6 is one of the Linux Distro's

is of Ubuntu Linux 10.04 based Bioinformatics Platform, An Operating System

of

Ubuntu - GUN/Linux - biolinux6

NEBC works to enable environmental research in the molecular age.The NEBC collects and stores environmental 'omics data from researchers in accordance with the NERC Data Policy.

The NERC Environmental Bioinformatics Centre was established in 2002 to provide bioinformatics, data management and computing supporting to the NERC research community using 'omics technologies.

The NEBC supports environmental researchers who are generating and using molecular data through the development and provision of tools designed to fit their needs.

Many of these tools are developed in collaboration with others, and are generally useful to anyone engaged in biological research.

The Natural Environment Research Council (NERC), established by Royal Charter in 1965, is one of seven UK Government Research Councils.

The Centre for Ecology and Hydrology (CEH) is one of the Centres and Surveys of NERC, and is the leading UK body for research, survey, and monitoring in terrestrial, and freshwater environments.

The NEBC Toolbox includes the Bio-Linux computing platform.

Because an Operating systems provide a software platform on top of which other programs, called application programs, can run.

Your choice of operating system, therefore, determines to a great extent the applications you can run.

Biolinux6 is an Operating System provides a software platform for bioinformatics, Where you can run and build bioinformatics application.

There are nearly 500 bioinformatics applications bundled along with the biolinux6.

A biolinux6 not just a GNU/Linux operating system for bioinformatcis but a software platform for bioinformatics upon GNU/Linux.

And a Computer systems for biologist

What is the language of computers?What is the language of computers?

0 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 10 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 11 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 01 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 10 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 11 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 1 1 0 10 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 0 1 0 11 1 1 0 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 11 0 1 0 1 0 1 0 01 1 1 0 1 0 1 0 1 0 11 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0

1

A programming language is an artificial language designed to express computations that can be performed by a computer.

Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human communication.

Like we have many languages for communication among us,Computers do have mana languages over time.

Apart from the them, these language packages are specific to bioinformatics.

BiojavaBioperlBiorubyBiopythonEclipse

These language specific packs are used to build bioinfomatic applications upon Bio-Linux platform

b i o informatics

Computer and statistics are Computer and statistics are applied to the field ofapplied to the field of molecular biology

Inorder to convertInorder to convertBiological data intoBiological data intoComputer CODE/Digital DATA

0 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 10 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 11 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 00 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 0 1 0 11 1 1 0 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 11 0 1 0 1 0 1 0 01 1 1 0 1 0 1 0 1 0 11 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0

1Sequence Format Sequence Format DATA FORMAT of Biological DATA

Data Formats

Many bioinformatics Applications and their DB have their own DATA Formats.

Indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa.

There are many formats for different kind of information.

Video file - have many formats, ex: mpg, mpeg, avi, dat, wmv ...Audio file - have many formats, ex: mp3, wav, ...Image file - have many formats, ex: jpeg, png, gif, bmp …Text file - have many formats, ex: notepad, wordpad, pdf, PS ...

Like we have different Number Systems to deal with mathematics, Biological Data (Information) therefore can have different formats.

Like wise the sequence formats are many in form.If you don't hold your sequence in a recognized standard format, you will not be able to analyze your sequence easily.

Sequences can be read and written in a variety of formats.

What a sequence format IS

Sequence formats are ASCII TEXT

They are the required arrangement of characters, symbols and keywords that specify what things such as the sequence, ID name, comments, etc.

There are generally no hidden, unprintable 'control' characters in any sequence format.

>xyz some other commentttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactggagggcatt

All standard sequence formats can be printed out or viewed simply by displaying their file.

Sequence Database Formats

* EMBL * GenBank * SwissProt * PIR

Sequence FilesFiles can hold sequences in standard recognised formats.

Multiple sequencesSome sequence formats can hold multiple sequences in one file.

Preferably, you should stay away from formats that can't cope with multiple sequences in a file.

An application may accept different Input/Output File Formats.

Identification

A sequence does not require any sort of identification, but it certainly helps!

Most sequence formats include at least one form of ID name, usually placed somewhere at the top of the sequence format.

The simple format fasta has the ID name as the first word on its title line. For example the ID name 'xyz':


IDs and Accessions

An entry in a database must have some way of being uniquely identified in that database. Most sequence databases have two such identifiers for each sequence - an ID name and an Accession number.

EMBL, GenBank and SwissProt share an Accession numbering scheme - an Accession number uniquely identifies a sequence within these three databases. Annotation and Features

Most formats allow you to hold other description, annotation and comments, for example fasta format holds comments in the title line:


Other formats have specific fields for holding information such as references, keywords, associated entries in other databases and feature tables

The Sequence

Nucleotide (DNA or RNA) sequences are usually stored in the IUBMB standard codes.Similarly, protein sequences are usually stored in the IUPAC standard one-letter codes.

For example, fasta format holds the sequence as anything after the '>' line until the next entry starts:


There are exceptions to this code,for example, staden format uses non-standard ambiguity codes.

Nearly every sequence analysis package written since programs were first used to read and write sequences has invented its own format. Except for EMBOSS.

Interestingly EMBOSS has not invented its own format.

The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.

EMBOSS breaks the historical trend towards commercial software packages.EMBOSS is available as a free Open Source software

"The European Molecular Biology Open Software SuiteThe European Molecular Biology Open Software Suite"

A free Open Source softwareOpen Source software analysis package specially developed for the needs of the molecular biology user community.

EMBOSS is "The European Molecular Biology Open Software Suite".

EMBOSSEMBOSS is a free Open Source softwareOpen Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.

The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit.

EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole.

EMBOSS breaks the historical trend towards commercial software packages.

Jemboss is a graphical user interface to EMBOSS. Jemboss is developed by the EMBOSS team. The software is free and part of the EMBOSS distribution.

The uses and interfaces to EMBOSS have long grown beyond our ability to keep track of them. EMBOSS is used extensively in production environments.EMBOSS has several important advantages:

- A properly constructed toolkit for creating robust bioinformatics applications or workflows. - A comprehensive set of sequence analysis programs. - All sequence and many alignment and structural formats are handled. - Extensive programming library for common sequence analysis tasks. - Additional programming libraries for many other areas including string handling, pattern-matching, list processing and database indexing. - It is free-of-charge . - It is an open-source project. - It runs on practically every UNIX or GNU you can think of and some that you can't, plus MS Windows and MacOS. - Each application has the same style of interface so master one and you've mastered them all. - The consistent user interface facillitates GUI designers and developers. - It integrates other popular publicly available packages. - It is free of arbitrary size limits: there are no limits on the amount of data that can be processed. For the programmer, memory management for objects such as sequences and arrays is simplified.

EMBOSS is mature and stable. A major new version of EMBOSS is released each year.

What can I use EMBOSS for?

Within EMBOSS you will find around hundreds of applications covering areas such as:

* Sequence alignment, * Rapid database searching with sequence patterns, * Protein motif identification, including domain analysis, * Nucleotide sequence pattern analysis---for example to identify CpG islands or repeats, * Codon usage analysis for small genomes, * Rapid identification of sequence patterns in large scale sequence sets, * Presentation tools for publication, ........ and much more

Popular applications include:

prophet Gapped alignment for profiles.infoseq Displays some simple information about sequences.water Smith-Waterman local alignment.Pepstats Protein statistics.showfeat Show features of a sequence.palindrome Looks for inverted repeats in a nucleotide sequence.eprimer3 Picks PCR primers and hybridization oligos.profit Scan a sequence or database with a matrix or profile.extractseq Extract regions from a sequence.marscan Finds MAR/SAR sites in nucleic sequences.tfscan Scans DNA sequences for transcription factors.patmatmotifs Compares a protein sequence to the PROSITE motif database.showdb Displays information on the currently available databases.wossname Finds programs by keywords in their one-line documentation.abiview Reads ABI file and display the trace.tranalign Align nucleic coding regions given the aligned proteins.

EMBOSS, currently consists of more than 200 applications.

These may be used alone or in conjuction with one another to assist in the computational analysis of biological problems.

22 applications that can be applied to the alignment of two or more sequences. In addition to applications for creating alignments, such as dot plots, local, global and multiple alignment there are a variety of programmes for determining consensus and

variation within existing alignments.

61 applications have been written with the purpose of providing analysis for nucleic acids. Composition, codon usage and repeat motifs may all be established using this portion

of the software. Other analyses such as restriction mapping, primer design, translation and mutation may also be performed.

41 protein analysis programmes currently available for analysis of amino acid sequences including secondary structure prediction, pattern recogniton and composition analysis.

12 separate utilites to create and index databases are also modules within the EMBOSS suite,

enabling you to build and query your own sequence repositories.

Futher programs contribute to the general analysis content of the suite, the creation of phylogenetic distance matrices.offering simulation opportunities such as Michaelis-Menten kinetics

EMBOSS can be installed in any of the GNU/Linux Operating Systesm

Note: EMBOSS is coded in the Programming Language C

Bio-Linux distribution actually contains many of the bioinformatics applications, utilities, IDE like Eclipse SDK and NEBC Tools, is also packaged with EMBOSS.

Open the 'Bioinformatics Docs' icon on the Desk, you can see this html document.

Alphabetical ordered List of all the bioinformatics applications with Bio-Linux6, here Uncategorized

Wanna install Bio-Linux

http://nebc.nerc.ac.uk/

http://nebc.nerc.ac.uk/

After once downloading the Bio-Linux DVD image file (iso),You can burn to a DVD by selecting source to the downloaded ISO, Later you can freshly install it to your hard-drive/ or any other system, even you can install Bio-Linux as a multi-boot

Dual-boot meaning Installing Bio-Linux Side-by your existing OS, if windows is your OS, After installing Bio-Linux as dual-boot you can have boot option for both windows and Bio-Linux.

Once you have the DVD burned, If you do not want to install Bio-Linux to your system, You can test Bio-Linux by simply setting booting option to first boot your system from the Bio-Linux DVD.

Best Option to work with, if you don't have your own System,You can create your own Bio-Linux USB Start-Up Disk, and You can carry it to work with Bio-Linux on almost any system.

Once USB Start-Up created, You can even install Bio-Linux for any of your friends system, or You can simply use it to your work on Bio-Linux by setting booting option to first boot the system from the USB Stick.

UNetbootin is a utility for windows users to create a Linux Start-Up Disks

USB Stick Should be not less than 4GB.

If you have installed Bio-Linux DVD, or having a Bio-Linux USB Start-Up Disk you can make your own startup disk from Bio-Linux

In the MenuBar you can find this utility to make a start-up disk.

Applications / Systems Tools / Usb memory stick maker

DNALinux VD is a preconfigured virtual machine (VM) with applications targeted for bioinformatics (both DNA and protein analysis). This virtual machine runs on top of the free VMWare Player.

With this distrubution you just boot from the CD and you have a fully functional Linux OS distribution with open source applications targeted for the molecular biologist.

VLinux Bioinformatics workbench is a Linux distribution for Bioinformatics. It is easy to use, no installation required, CD-based distribution based on Knoppix 3.3. It includes a variety of sequence and structure analysis packages.It is an Open source product released under the GNU GPL License.

PhyLIS is a user-friendly, free linux distribution for phylogenetics. Install it and you have an instant phylogenetics workstation.

BioBrew is a collection of open-source applications for life scientists and an in-house project at Bioinformatics.Org.

Software Freedom from?

gain complete freedom from software you possess

you deserve to use software that is:-

free from restrictionfree to share and copyfree to learn and adapt

free to work with others

you deserve free software

there are no boundaries for Linux except Linux– sureshsaragadam

THANK YOU

bio-linux presentation, biolinux

Documents