Writing a research paper thesis: Comparison of Database and File Storage

Comparison of Database and File Storage Author: Noronjon Qalandarov CONTENTS (Jump to) ACKNOWLEDGEMENTS SUMMARY 1. INTRODUCTION 2. AIMS AND METHODOLOGIES 3. LITERATURE REVIEW 3.1 Technologies and definitions 3.1.1 RDBMS 3.1.2 Native XML DB 3.2 DATABASE PRODUCTS 3.2.1 MySQL database 3.2.2 eXist and Sedna databases 4. DATABASE BENCHMARKING 4.1. Storing XML in file systems 4.2. Document size 4.3. Updates 4.4. Description of soil sampling and sample preparation 4.5. Determination of pendimethalin in methanol extract 5. ANALYSIS 6. DISCUSIONS 7. CONCLUSIONS 8. REFERENCES ACKNOWLEDGEMENTS First of all thanks to my supervisor Ing.Alexandr Vasilenko for advices and assistance during the work on this diploma thesis and all his support during whole practical work. Special thanks to all members and coordinators of Europian Commission Erasmus Mundus program, especially to coordinator of CASIA project Ewa Wietsma and PhDr. Vlastimil Ãƒâ€žÃ…â€™ernÃƒ ½, CSc. for granted me and gave me opportunity to study at Czech University of Life Sciences. I also would like to thanks to Department of Information Technologies, represented by Ing.MiloÃ… ¡ Ulman, Ph.D. for his professional skills, for kindness and support. Thanks to all academic staff of university for their contribution to obtaining high-quality skills and knowledge. SUMMARY Database systems are well-known for consistent storage, retrieval, and manipulation of data. At the same time, the Extensible Markup Language (XML) is generally accepted as data description language for both web-based information systems. XML is self-desibing. It can provide flexible information identification, and can be extensively used in many application domains such as chemistry, biology, and e-business, etc. With the development of the web applications and the large amounts of XML documents that are being generated, it is therefore necessary to work out how to manage them efficiently. Databases are the prime storage engines for many different types of data. Traditional DBMS are designed for regular data. However, XML data often includes some irregular data such as pictures, audio and video files etc, which means that the storage of XML data is a challenge to traditional relational database DBMSs. Keywords: XML, RDBMS, database, relational databases, storage, data and file, analysis, solutions, software, web application 1. INTRODUCTION As the use of XML has grown, it is now generally accepted that XML is not only useful for describing new document formats for the Web but is also suitable for describing structured data. Examples of structured data include information that is typically contained in spreadsheets, program configuration files, and network protocols. XML is preferable to previous data formats because XML can easily represent both tabular data (such as relational data from a database or spreadsheets) and semistructured data (such as a Web page or business document) (Obasanjo, 2003). Popular pre-existing formats such as comma separated value (CSV) files either work well for tabular data and handle semi-structured data poorly, or like RTF are too specialized for semi-structured text documents. This has led to the widespread adoption of XML as the lingua franca of information interchange. As more and more organisations and systems employ XML within their information management and exchange strategies, classical data management issues pertaining to XMLÃ¢â‚¬â„¢s efficient and effective storage, retrieval, querying, indexing and manipulation arise. From this environment we have seen the emergence of native XML databases. These are designed for seamless storage, retrieval, and manipulation of XML data and integration with related technologies (Noordij, 2002). However, a number of questions arise regarding Native XML Database (NXD) technology. Does it represent a paradigm shift? More importantly, is the performance of NXD technology sufficient to provide an alternative to standard database technology, or will existence be the status quo? 2. AIMS AND METHODOLOGIES The diploma thesis investigates the advantages and disadvantages of storing data and files in native XML databases and relational databases. Main goal of this diploma thesis is to compare the approaches of a number of varying solutions. Partial goals are: To explain the main differences between the database models; To compare different solutions of storing data in different platforms; To analyze performance of XML and RDBS models (size, speed, access, etc); The several methodology parts are defined to accomplish the thesis. The methodology of the diploma thesis is based on research and analysis of relevant information resources. In the first part collected the necessary information about the database models. The further step was to define requirements of the database systems in order to exactly characterize the system processes. Within this requirements the definitions, the tables and the graphs are also fulfilled due to properly do the practical section of the thesis. Practical process and analysis will draw on results which will be given by the research study. Finally, comparison of analysis of the storing data models and file storage made in a different development platform will be made. Based on the synthesis of the theoretical and the practical knowledge, final conclusions will be formulated. 3. LITERATURE REVIEW 3.1 Technologies and definitions In this chapter we define all terms and technologies needed to understand rest of this paper. We start by basic definitions (e.g. what is an XML document), continue by characteristics of XML documents (e.g. what is a depth of an XML document) and their schemes and finish by benchmark related definitions (e.g. what is an XMLMS). Definitions also contain examples when it is appropriate. Relational data storage unit implemented in the framework of powerful databases such as MS SQL Server, Oracle, MySql, etc. actually able to meet all requirements for the server machines. At these facilities are built and successfully operate the systems from a variety of online stores to systems of automation of bank activity. High performance, reliability, and advanced administration tools allow for functionality and scalability within a large range of tasks. But the relational concept of data presentation within storage demands reduction them to relational structure, that means allocation from stored these same objects and placing them in line by one or a group of tables, that is the fixed and unchangeable structure. As a result, the limit of applicability of such systems lies in the field of tasks over strictly structured data. There are solutions of unification of relational object for storage of poorly structured data. But they, as a rule, lead to sharp losses of productivity an d to increase in labor input of development and support of all system on the basis of such decision as complicate storage structure, lead to partial refusal of control devices of integrity given by the server and to strong complication of inquiries. Applying as a data presentation basis within storage the xml-notation, it is possible to lift limits on rigid structurization of data and to receive the device of storage of diverse data. Such approach is used in servers Tamino, MarkLogic Server, Sedna, Timber, etc. Besides, xml became de facto the data presentation standard in information systems. However, the effective use xml to develop application systems currently constrained in particular multi-user access restrictions, and low-speed transactional mechanisms work with large data files. XML has several advantages over other languages Ãƒ ¢Ã¢â€š ¬Ã¢â‚¬ ¹Ãƒ ¢Ã¢â€š ¬Ã¢â‚¬ ¹/ formats of the description of data at exchanging data between applications: Platform independence. The XML language allows exchanging data to the systems which are based on different platforms. The XML document can be created and sorted as the text file by means of outdated or built-in programming languages whose composition does not include a special library for working with XML. Support by producers. Libraries for work with XML are created for all leading programming languages and popular DBMS. Use of these libraries allows significantly reduce the amount of code when developing gateways between applications. Self-documenting. The XML document is readable for the person. Besides, existence of the date description in it allows creating automatic processing programs, for example universal modules of loading of the data arriving from different systems into a one repository. Hierarchy. A key feature of the language. In difference, for example, from the CSV format (the text file with a divider ;), XML allows to describe easily difficult structures of the objects given with an unlimited enclosure. Objectivity. The data structure of XML is perfectly combined with object-oriented programming model. Each tag of the XML document can be mapped to a class or class property of the processing program. On the other hand, there is an opportunity to describe in a XML format each applied object of subject domain as a separate tag. Expansibility. In use of XML format, you can add new tags. It wont lead to fatal change of data structure, simply reading and writing programs will need to be added with classes or the functions that recognize these tags. Safe and efficient management of large volumes of data is a challenging task, which is traditionally solved by database management systems. When storing XML data, it is necessary to provide reliability, the transaction nature, recoverability, high availability, security, effective search device and scalability and modification. All these requirements define the necessary tools and functionality of the XML data storage systems and limit the applicability of existing technologies and resouces. RDBMS Relational databases are widely used. They encapsulate the storage and data processing mechanisms, offering effective methods for structured data storage for faster query execution. On the other hand, XML is a data format used for exchange of non-structured data between incompatible systems or applications. Application of relational database is limited, but the obvious advantages of XML representation in the allocated task areas are relevant in todays systems. LetÃ¢â‚¬â„¢s consider the key differences between relational and XML-data. Neither XML nor relational format is definitely the best solution for any problem. There are various data management needs for which relational data model is insufficient and the use of XML allows improving the solution characteristics, reducing the complexity and sometimes recognizing task feasible. In a relational database, data is stored in tables consisting of rows and columns. The data of a certain type is stored in column for all of the table records. Each table record is presented in the row. Order of the rows in the table is not associated with any ordering of data, unlike XML, where internally present document order affects, for example, the data returned by such function XPath, as position (). Only the simplest relational data can be stored in the same table, a typical relational database has many tables with complex logical relationships between them. Data in different tables are linked by keys. For example, in the table Customers can be a field (or column) CustomerID. Identification of orders for a particular user is made easy by the corresponding value in the column CustomerID of the table Orders. The data relationships can be one-to-one (for example, one son can have only one father, one-to-many or one son, two parents, one user has several orders), or many-to-many (one item can be in many orders, and each order can be different goods). Each of these relationships can be represented by storing data in two or more related tables. Native XML DB Generally distinguish databases XML-enabled and native XML database opportunities. The database is called XML-enabled, if it is a model of data storage kernel and data processing is not XML data model. In many cases, its core is relational model that requires a mapping between XML data model and the relational model. All relational database systems can be considered as XML-enabled database, because they support such a mapping for XML data management. The term native XML database is used in different ways by different groups. Native XML database has the following three characteristics: It defines the logical model for XML-document. The data is stored and selected according with this model. The model must include elements, attributes, PCDATA, and document order. XML-document is the basic unit of logical storage. No specific physical model of storage is required. It means that it can be based on relational, hierarchical or an object-oriented database. In particular, this definition allows transformation from the XML data model to another model of data storage and processing. This is what we have defined for XML-enabled databases. Thus, it is required to native XML database also has the following two properties: XML data model (XML Infoset) fundamental logical data model, which is used in the database and is available to database users if data type is an XML. XML data model is the basic unit of a physical storage of XML-data without displaying them in a different data model. This brief definition means that XML is not just extended data type, this is how the data is processed, both logically and physically. The data presented in XML, schema correspond to the physical scheme of storage on disk. This model is best for efficient search of XML-data. DATABASE PRODUCTS MySQL database eXist and Sedna databases DATABASE BENCHMARKING Storing XML in file systems We should not forget that the most of XML-documents are stored in file systems. The idea XML-document means storing on disk, just as you keep any of other documents on your desktop. Many applications do not ever go further this first step, and always keep XML-documents in file system. Storing XML documents in file system simple and natural, not only because the term Ã¢â‚¬Å"XML documentÃ¢â‚¬ means it, but also that the hierarchical organization of file system is very similar to the hierarchical organization of the document. There is a clear parallel between the syntax of the URL or file path and simple XPath expressions, so it looks quite natural appeal to the node Ã¢â‚¬Å"/bat/bazÃ¢â‚¬ in the Ã¢â‚¬Å"/foo/bar.xmlÃ¢â‚¬ . Before moving on to the Ã¢â‚¬Å"realÃ¢â‚¬ XML databases, consider, what are the limitations of XML documents storing in file systems. XML data internally ordered, as in this simple example: Value=123.45 Currency=US Dollars /> Value=4500.12 Currency=US Dollars /> Value=8000.00 Currency=US Dollars /> Document size It makes sense to store XML documents on disk when you need to work with static small sized documents in WWW. File systems can now effectively support gigabyte files; so, knowing a path to any XML document, you can effectively get access to information which was stored in it. Important factor is granularity of information to which access is required. If you always need the complete document, this system works quite well. However, if you need to retrieve only a small part of a large document using DOM or XPath, then you have huge overhead due to having to read all document before you will be able to extract from it part you are interested in. Also, you must do not forget to analyze these documents whenever you access them through the DOM or XPath. Of course, this consideration applies only to this type of work with documents. If all you need is to work with documents without their modification or transformation on WWW, it is better to prepare them to work in XML. Updates Another important question arising during the storage of XML documents on disk are updates. If you manually run a small set of well-formed XML documents on the desktop or the web server, the updates do not cause difficulties. But once you need to enable the updates by many users, or even worse, if you develop a transactional application, you need to take some additional steps to perform updates. One of ways of solution this problem is the storage of documents in the repository WebDAV, which resolves issues of blocking and the parallel addressing instead of you. If you are interested in this approach, you can try to use a version control system such as Subversion (http://subversion.tigris.org/). Subversion can work as a WebDAV repository and provides all the features of version control system, including the fixing of any modifications history of your documents. For many applications it is very important opportunity, and this opportunity is one of those which directly are not supported by the databases considered in this thesis. Description of soil sampling and sample preparation Determination of pendimethalin in methanol extract ANALYSIS 6. DISCUSIONS CONCLUSIONS REFERENCES Carlos Coronel,Steven A. Morris,Peter Rob. Database Systems: Design, Implementation, and Management. Cengage Learning, 2011. 692p. ISBN 9780538469685 AKMAL B EDITOR CHAUDHRI,AWAIS EDITOR RASHID,Roberto Zicari. Xml Data Management: Native Xml and Xml-Enabled Database Systems. Addison-Wesley Professional, 2003. 641p. ISBN 9780201844528 EMC Education Services. Information Storage and Management: Storing, Managing, and Protecting Digital Information in Classic, Virtualized, and Cloud Environments. John Wiley Sons, 2012. 528p. ISBN 9781118236963 Vrana,I. Projecting of information systems with UML, CULS Prague, 2009, 150p. ISBN 9788021319769 http://kavayii.blogspot.cz/2010/01/xml.html Comparison of relational and XML data storage methods Noronjon Qalandarov CULS, Prague 2014Page 1

Writing a research paper thesis

Sunday, July 21, 2019

Comparison of Database and File Storage

No comments:

Post a Comment