Unlocking the secrets in unstructured big data

Have you wanted to do predictive analytics about your business but a vast amount of the data required to do this exists only in documents, not in a business intelligence friendly database? Have you considered all the business management impacts that access to this data in a meaningful format would have on keeping your business competitive?

Zendeux Business Data Solutions, at the request of a number of its customers researched how it would be possible to accomplish this. Based on our research and development investigation, we have been able to provide a solution that allows our customers to take full advantage of unstructured data to obtain extraordinary business value from it.

We developed a repeatable, standard process that converts unstructured data into a relational ERD (Enterprise Relational Database) for use in such applications as predictive analytics. This process applies to all types of business data across all aspects of a business that wants to have a competitive advantage. This process is somewhat different from the processes to create relational databases that exist in a company prior to applying our process. While the process is somewhat more complex, we have found it to be both cost effective and valuable for our customers.

Our process, known as Document Data Management™ (DDM) allows rapid analysis of unstructured documents, generic text, internet feed, log files, and other text based data sources. As a result, we are able to provide abstracted normalized data models. These data models created from the various types of unstructured data can then be combined with data residing in other traditionally created relational databases to provide a 360 view and comprehensive historical picture of all aspects of business operations. The result is that a CDM (canonical or common data model) can be created allowing this integrated data to be used by multiple operational applications across an organization beyond just single purpose business intelligence applications.

Because the data is derived from unstructured, document data, the relational model needs to have full tractability back to the original source documents that can be used in an ETL process. This is an absolute requirement for data based on legal documents as well as for data required for regulatory compliance. An auditor needs to be able to trace data back to source documents to validate its accuracy.

Zendeux’s DDM (Document Data Modeling™ ) leverages data mining and text analytics techniques to rapidly capture data attributes within a document type that can be generalized to an entity type. The process uses index data profiling and analysis to capture data type, format, completeness (Mandatory, Optional), and context based relevance. When the data is extracted, we identify standard database modeling best practices to identify database elements such as Entity Type, Subtype and Cardinality.

County Document Example

In the example above, using Zendeux ‘s Document Data Modeling techniques, we identified 20 entities, 157 attributes, and 23 relationships for Deeds of Trust in LA County.

The technology and tools Zendeux employs as shown in the figure below, provides usable data extracted from documents as well as an audit trail that allows traceability back to all original source documents. This traceability allows the entire process to be validated and audited for accuracy, a capability that validates the entire end to end process. This is an important capability that helps gain project buy in from previously skeptical business users who have not yet been exposed to this ground breaking document conversion process.

Document Data Management Flow

We offer a free ROI analysis to show how this process can help make your company more competitive as well as more profitable.

To learn more about our data migration and integration services, please feel free to contact us directly at info@zendeux.com or visit our site www.zendeux.com.

Leave a Reply

Your email address will not be published. Required fields are marked *