Data has become a substantial source of corporate competitive advantage, as information technology dramatically has changed industry structure and market. The data model is the foundation stone for companies to strategically manage and utilize their data. The existing data model is written for technical purposes to develop and operate corporate database, which makes the data model isolated from field users. The intervention of data designer without user engagement induces misinterpretation of data requirements and consumes time and cost for data modeling. Automated data modeling research has been actively conducted to enable the users to take a proactive role in data modeling so that companies can leverage data more agile. The data modeling system needs to automate the process of data object extraction and qualification performed by experts. For decades, knowledge-based and rule-based research has been conducted to extract and identify data objects. However, these studies have been unable to incorporate agile business requirements into the data model due to relying heavily on previous results. Moreover, the existing systems have limitations in field applicability because the systems are semi - automated methods that qualify data objects interacting with users who do not have knowledge of data model.
In this thesis, we propose a relationship-oriented data modeling automation (ROM) that fully automates data modeling from textual job descriptions freely created by field users without knowledge base construction that consumes a lot of time and money or any strict restrictions for job descriptions. ROM extracts object candidates from job descriptions, constructs a network including contextual information, and automatically qualify data objects by using relationship information between objects. ROM also exploits a domain corpus to eliminate the ambiguity of job descriptions. The domain corpus is constructed by transforming field vocabulary into context vectors using neural network language model. In the final data object qualification step, we use a discrete choice model including relational variables such as centrality and structural hole, which are computed in relation to each other in contextual network. In order to evaluate the applicability of the proposed ROM, we developed a pilot system as well. Experimental results have shown that ROM greatly improves the performance of data object qualification over conventional automation methods.