Finding The Treasure In Unstructured Big Data

Have you wanted to do predictive analytics about your business but found that a vast amount of the data required to do this is unavailable? Most of this valuable information is locked up as unstructured data in document images or hard copy forms. Have you considered the business management impact that access to this data in a meaningful format would have on keeping your business competitive?

Zendeux Business Data Solutions, at the request of a number of our customers researched how it would be possible to use this unstructured data. Based on our research and development investigation, we saw that there is a solution that allows businesses to take full advantage of unstructured data to obtain extraordinary business value from it.

In order to be successful, there must be a repeatable, standard process that converts unstructured data into a relational ERD (Enterprise Relational Database) for use in such applications as predictive analytics. This process applies to all types of business data across all aspects of a business that wants to have a competitive advantage. This process for unstructured data is somewhat different from the processes used to create relational databases that exist in a company. While the process is somewhat more complex, our research showed that it can be both cost effective and valuable for our customers.

This process allows for rapid analysis of unstructured documents, generic text, internet feed, log files, and other text based data sources. Using the process, IT staffs will be able to provide abstracted normalized data models. These data models created from the various types of unstructured data can then be combined with data residing in other traditionally created relational databases to provide a 360 view and comprehensive historical picture of all aspects of business operations. The result is that a CDM (canonical or common data model) can be created allowing this integrated data to be used by multiple operational applications across an organization beyond just single purpose business intelligence applications.

Because the data is derived from unstructured, document data, the relational model needs to have full tractability back to the original source documents that can be used in an ETL process. This is an absolute requirement for data based on legal documents as well as for data required for regulatory compliance. An auditor needs to be able to trace data back to source documents to validate its accuracy.

Handling unstructured data requires that you leverage data mining and text analytics techniques to rapidly capture data attributes within a document type that can be generalized to an entity type. The process should include index data profiling and analysis to capture data type, format, completeness (Mandatory, Optional), and context based relevance. When the data is extracted, standard database modeling best practices should be used to identify database elements such as Entity Type, Subtype and Cardinality.

County Document Example

In the example above, using unstructured Data Modeling techniques, we identified 20 entities, 157 attributes, and 23 relationships for Deeds of Trust in LA County.

These technologies and tools as employed by Zendeux as shown in the figure below provide usable data extracted from documents as well as an audit trail that allows traceability back to all original source documents. This traceability allows the entire process to be validated and audited for accuracy, a capability that validates the entire end to end process. This is an important capability that helps gain project buy in from previously skeptical business users who have not yet been exposed to this ground breaking document conversion process.

Document Data Management Flow

Zendeux offers a free ROI analysis to show how Zendeux can use this process can help make your company more competitive as well as more profitable.

To learn more about our data migration and integration services, please feel free to contact us directly at 805-222-5840 or visit our site http://www.zendeux.com\.