Enterprises are handling increasing amounts of unstructured data (electronic data that are not stored in a predefined structure, like office documents, e-mail, web info), frequently kept in repositories which have structures of limited efficiency & accessibility. Moreover the internal structure of files is usually not standardised and may not be efficient, in terms of information retrieval and reusability.

According to international studies, more than 85% of business data are of unstructured nature.   The advent of web content and the necessity to use proactively the web channel in the market, has further increased the need to efficiently manage info content of unstructured nature. The volume of information is rapidly increasing, thus becoming unmanageable (info glut).

The  increasing need to handle business information efficiently, in a highly competitive environment, has driven business efforts to improve ways of storing, retrieving, analyzing and reusing unstructured data. All relevant efforts aim to develop a meaningful structure which shall accommodate  unstructured data. In other words to convert unstructured data to semi-structured data: data having a higher degree of structure than the former (not using a highly granular structure as data stored in fields of a relational database table,  but also not stored in a loosely & ineffectively structured data repository).  

Traditionally, techniques & technologies used to handle structured data (DBMS, SQL) were incompatible to those used to handle unstructured data (file servers, content management systems, collaboration tools). The term Business Intelligence stems from the structured world while the term Knowledge  or Content management  stems from the unstructured world. The combined retrieval & analysis of  information (e.g. for a Customer) from both structured & unstructured data has been traditionally carried out manually. 

However the term business intelligence does no longer refer exclusively to the structured data world.Convergence of structured & unstructured data technologies, is currently experienced.

The introduction of a central data repository can mitigate the negative effect caused by the development of information silos. A common rule for structured and unstructured data. 

In order to develop a structure for handling unstructured data, an information model needs to be developed. This model has to accommodate the needs of different user groups: customers,  info users, content authors, while being structured meanifully: e.g. per product line, per business process

The use of DTDs (Document Type Definition) or XML schemas to structure content internally by introducing semantic tags, can highly enhance the capability to retrieve and reuse information hidden in documents.

The use of the standard RSS (Rich Site Summary) has being expanding on the Web, to describe the content of sites, especially on content which is frequently being updated (e.g. news content). RSS allows site syndication, an approach to share content on the web, thus increasing its accessibility & diffusion.   

New tools allow the automatic gathering of filtered news: RSS news readers are expected to partially replace web browsers in the future, since they allow the expedited retrieval of information customised to the user needs. 

Contact us

Contact us

Home page - Unstructured data

 

Disclaimer - Copyright -
Disclaimer -                Copyright Pleroforea.com,     All rights reserved