Penko Ivanov, Elitsa Pavlova
The Financial Times (Bulgaria)
https://doi.org/10.53656/math2023-4-2-enr
Abstract. The current paper focuses on a framework for structuring large document stores with the help of intelligent metadata. The described landscape includes a proprietary knowledge graph which ingests millions of concepts from external, third-party data providers and accommodates internal class taxonomies; an NLP service for automated annotation of textual data; an annotations quality control mechanism; tools for knowledge graph ontology and concept management; and an extensive API layer. The authors present an approach they have tested and proved successful in one of the leading media companies in the world, whose media content is a core data asset. The proposed solutions enable content analytics in their proper context and allow explicit and implicit connections between the content and other company data – i.e., user (media content consumer) data. The latter empowers the efficient application of advanced analytical models for searches and recommendations and the implementation of accurate data-driven virtual assistants.
The paper advises addressing the metadata quality concerns, which the authors’ extensive practice identifies as an essential prerequisite for applied analytics delivering significant business value.
Keywords: software engineering; AI; data science; machine learning; NLP; metadata; knowledge graphs; ontology; metadata quality; business analytics