A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology
This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.
- Surveys the development of techniques and approaches on pattern recognition in biomolecular data
- Discusses pattern recognition in primary, secondary, tertiary and quaternary structures, as well as microarrays, phylogenetic trees and biological networks
- Includes case studies and examples to further illustrate the concepts discussed in the book
Autorentext
Mourad Elloumi, PhD, is Professor in Computer Science at the University of Tunis-El Manar, Tunisia. Dr. Elloumi is the author/co-author of more than 50 publications in international journals and conference proceedings related to Algorithmics, Computational Molecular Biology, and Knowledge Discovery and Data Mining.
Costas S. Iliopoulos, PhD, is Professor of AlgorithmDesign at King's College London, UK. Dr. Iliopoulos co-authored over 300 peer-reviewed articles in pattern matching and combinatorics of strings. He serves on the editorial board of the Journal of Discrete Algorithms, Computer Mathematics & Combinatorial Computing, and System Biology & Biomedical Technologies.
Jason T. L. Wang, PhD, is Professor of Computer Science at the New Jersey Institute of Technology, USA. Dr. Wang has published extensively on Data Mining and Computational Molecular Biology, and has been a member of program committees for over 200 conferences and workshops in these and related areas.
Albert Y. Zomaya, PhD, is the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, University of Sydney, Australia. Dr. Zomaya published more than 500 scientific papers and articles and is author, co-author or editor of more than 20 books. Dr. Zomaya is Fellow of AAAS, IEEE, and IET.
Inhalt
LIST OF CONTRIBUTORS xxi
PREFACE xxvii
I PATTERN RECOGNITION IN SEQUENCES 1
1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
Giuseppe Lancia
1.1 Introduction / 3
1.2 Single Individual Haplotyping / 5
1.2.1 The Minimum Error Correction Model / 8
1.2.2 Probabilistic Approaches and Alternative Models / 10
1.3 Population Haplotyping / 12
1.3.1 Clark's Rule / 14
1.3.2 Pure Parsimony / 15
1.3.3 Perfect Phylogeny / 19
1.3.4 Disease Association / 21
1.3.5 Other Models / 22
References / 23
2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
Sima Behpour and Bhaskar DasGupta
2.1 Introduction / 28
2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32
2.2.1 Average Length of Optimal Barcodes / 33
2.3 Entropy-Based Information Content Technique for Designing
Approximation Algorithms for String Barcoding Problems / 34
2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36
2.4.1 Reductions from Set Covering Problem / 36
2.4.2 Reduction from Graph-Coloring Problem / 38
2.5 Heuristic Algorithms for String Barcoding Problems / 39
2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39
2.5.2 Balanced Partitioning Approach / 40
2.6 Conclusion / 40
Acknowledgments / 41
References / 41
3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
Matteo Comin and Davide Verzotto
3.1 Introduction / 43
3.2 Whole-Genome Sequence Analysis / 44
3.2.1 Background on Whole-Genome Comparison / 44
3.2.2 Alignment-Free Methods / 45
3.2.3 Average Common Subword / 46
3.2.4 KullbackLeibler Information Divergence / 47
3.3 Underlying Approach / 47
3.3.1 Irredundant Common Subwords / 48
3.3.2 Underlying Subwords / 49
3.3.3 Efficient Computation of Underlying Subwords / 50
3.3.4 Extension to Inversions and Complements / 53
3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53
3.4 Experimental Results / 54
3.4.1 Genome Data sets and Reference Taxonomies / 54
3.4.2 Whole-Genome Phylogeny Reconstruction / 56
3.5 Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
Chengpeng Bi
4.1 Introduction / 65
4.2 Multiple Sequence Local Alignment / 67
4.2.1 Overall Objective Function / 67
4.2.2 Maximum Likelihood Model / 68
4.3 Motif Finding Algorithms / 70
4.3.1 DEM Motif Algorithm / 70
4.3.2 WEM Motif Finding Algorithm / 70
4.3.3 Metropolis Motif Finding Algorithm / 72
4.3.4 Gibbs Motif Finding Algorithm / 73
4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74
4.4 Time Complexity / 75
4.5 Case Studies / 75
4.5.1 Performance Evaluation / 76
4.5.2 CRP Binding Sites / 76
4.5.3 Multiple Motifs in HelixTurnHelix Protein Structure / 78
4.6 Conclusion / 80
References / 81
5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
Carl Barton, Tomá Flouri, Costas S. Iliopoulos, and Solon P. Pissis
5.1 Introduction / 83
5.2 Definitions and Notation / 85
5.3 Problem Definition / 87
5.4 Algorithms / 88
5.5 Conclusion / 94
References / 95
II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97
6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang