Prediction of the coding sequences of unidentified human genes. VII. The complete sequences of 100 new cDNA clones from brain which can code for large proteins …

T Nagase, K Ishikawa, D Nakajima, M Ohira… - DNA …, 1997 - academic.oup.com
T Nagase, K Ishikawa, D Nakajima, M Ohira, N Seki, N Miyajima, A Tanaka, H Kotani…
DNA Research, 1997academic.oup.com
In this series of projects of sequencing human cDNA clones which correspond to relatively
long transcripts, we newly determined the entire sequences of 100 cDNA clones which were
screened on the basis of the potentiality of coding for large proteins in vitro. The cDNA
libraries used were the fractions with average insert sizes from 5.3 to 7.0 kb of the size-
fractionated cDNA libraries from human brain. The randomly sampled clones were single-
pass sequenced from both the ends to select clones that are not registered in the public …
Abstract
In this series of projects of sequencing human cDNA clones which correspond to relatively long transcripts, we newly determined the entire sequences of 100 cDNA clones which were screened on the basis of the potentiality of coding for large proteins in vitro. The cDNA libraries used were the fractions with average insert sizes from 5.3 to 7.0 kb of the size-fractionated cDNA libraries from human brain. The randomly sampled clones were single-pass sequenced from both the ends to select clones that are not registered in the public database. Then their protein-coding potentialities were examined by an in vitro transcription/translation system, and the clones that generated proteins larger than 60 kDa were entirely sequenced. Each clone gave a distinct open reading frame (ORF), and the length of the ORF was roughly coincident with the approximate molecular mass of the in vitro product estimated from its mobility on SDS-polyacrylamide gel electrophoresis. The average size of the cDNA clones sequenced was 6.1 kb, and that of the ORFs corresponded to 1200 amino acid residues. By computer-assisted analysis of the sequences with DNA and protein-motif databases (GenBank and PROSITE databases), the functions of at least 73% of the gene products could be anticipated, and 88% of them (the products of 64 clones) were assigned to the functional categories of proteins relating to cell signaling/communication, nucleic acid managing, and cell structure/motility. The expression profiles in a variety of tissues and chromosomal locations of the sequenced clones have been determined. According to the expression spectra, approximately 11 genes appeared to be predominantly expressed in brain. Most of the remaining genes were categorized into one of the following classes: either the expression occurs in a limited number of tissues (31 genes) or the expression occurs ubiquitously in all but a few tissues (47 genes).
Oxford University Press