Toronto Metropolitan University
Browse

Exploring PSI-MI XML Collections Using DescribeX

Download (829.77 kB)
journal contribution
posted on 2023-09-13, 16:40 authored by Reza SamaviReza Samavi, Mariano Consens, Shahan Khatchadourian, Thodoros Topaloglou

PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In  this  study  we  use  DescribeX,  a  novel  visualization  technique  of  (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns,  and  evolution  of  different  data  collections.  Our  analysis  shows  DescribeX enhances  understanding  the  instance-level  structure  of  PSI-MI  data  sources  and  is  a useful tool for standards designers, software developers, and PSI-MI data providers. 

History

Language

English

Usage metrics

    Computer Engineering

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC