Unlocking the secrets in unstructured big data

Have you wanted to do predictive analytics about your business but a vast amount of the data required to do this exists only in documents, not in a business intelligence friendly database? Have you considered all the business management impacts that access to this data in a meaningful format would have on keeping your business competitive?

Zendeux Business Data Solutions, at the request of a number of its customers researched how it would be possible to accomplish this. Based on our research and development investigation, we have been able to provide a solution that allows our customers to take full advantage of unstructured data to obtain extraordinary business value from it.

We developed a repeatable, standard process that converts unstructured data into a relational ERD (Enterprise Relational Database) for use in such applications as predictive analytics. This process applies to all types of business data across all aspects of a business that wants to have a competitive advantage. This process is somewhat different from the processes to create relational databases that exist in a company prior to applying our process. While the process is somewhat more complex, we have found it to be both cost effective and valuable for our customers.

Our process, known as Document Data Management™ (DDM) allows rapid analysis of unstructured documents, generic text, internet feed, log files, and other text based data sources. As a result, we are able to provide abstracted normalized data models. These data models created from the various types of unstructured data can then be combined with data residing in other traditionally created relational databases to provide a 360 view and comprehensive historical picture of all aspects of business operations. The result is that a CDM (canonical or common data model) can be created allowing this integrated data to be used by multiple operational applications across an organization beyond just single purpose business intelligence applications.

Because the data is derived from unstructured, document data, the relational model needs to have full tractability back to the original source documents that can be used in an ETL process. This is an absolute requirement for data based on legal documents as well as for data required for regulatory compliance. An auditor needs to be able to trace data back to source documents to validate its accuracy.

Zendeux’s DDM (Document Data Modeling™ ) leverages data mining and text analytics techniques to rapidly capture data attributes within a document type that can be generalized to an entity type. The process uses index data profiling and analysis to capture data type, format, completeness (Mandatory, Optional), and context based relevance. When the data is extracted, we identify standard database modeling best practices to identify database elements such as Entity Type, Subtype and Cardinality.

County Document Example

In the example above, using Zendeux ‘s Document Data Modeling techniques, we identified 20 entities, 157 attributes, and 23 relationships for Deeds of Trust in LA County.

The technology and tools Zendeux employs as shown in the figure below, provides usable data extracted from documents as well as an audit trail that allows traceability back to all original source documents. This traceability allows the entire process to be validated and audited for accuracy, a capability that validates the entire end to end process. This is an important capability that helps gain project buy in from previously skeptical business users who have not yet been exposed to this ground breaking document conversion process.

Document Data Management Flow

We offer a free ROI analysis to show how this process can help make your company more competitive as well as more profitable.

To learn more about our data migration and integration services, please feel free to contact us directly at info@zendeux.com or visit our site www.zendeux.com.

Talend Migration Projects Using Zendeux Fast Track Integration Services

Often, in data integration and data migration projects, there are unaccounted for hidden costs that creep up and cost businesses a significant amount of money as well as implementation delays. Our experience working with a number of Fortune 500 companies shows that one must account for all costs associated with these projects beyond standard ETL functionality to measure true ROI.

As part of any data integration project Zendeux manages, our Talend experience, while critical, is only one part of our overall data integration strategy. Our project management philosophy brings a holistic approach to project management. We take into account all aspects of successful data management.

Using a best of breed data integration tool such as Talend without attention to what data should be considered and how that data should be normalized and validated will not create the quality result needed in a data integration project. Determining what information assets should be used and how they should be tailored during the data integration project is a critically important part of the data integration project, perhaps more so than what tool is used to move the data.

To this end, our data migration and integration services are designed to save our clients money related to resourcing, project time, and indirect costs with a focus on what data should be moved and how it should be represented in the target application. Once we measure those costs, we determine the most effective approach to ensure positive ROI for their project due to cost savings and tangible business benefits.

A data consolidation project that we successfully completed for a national real estate investment company illustrates this. Their goal was to integrate disparate real estate information from companies that they had acquired in major US real estate markets.

Prior to engaging us, the client looked at the consolidation process purely as a standard data migration project. After Zendeux provided an initial analysis of the data contained in these databases, the client realized there was much more work required beyond simple one to one data migration. Our analysis showed that these databases used different metadata definitions and data formats to represent such things as property value, legal title and property description. Just migrating data as is to a single data repository would not have provided accurate data fit for its business purpose, a single source of truth in a consolidated national real estate database.

Before engaging Zendeux, they planned to hire one or two in-house full time resources to augment their IT team. They would be responsible for implementing a simplified data migration process. This process would consolidate the many and disparate legacy databases into a single national data repository within a SalesForce.com CRM application. The consolidated data would then be loaded into a new real estate application under development as part of this project.

Our comparative analysis of their in house employee approach VS our fast track services showed them how we could reduce the project timeline by 29 months and save them $408,000 in SalesForce.com subscription fees alone! Overall, we achieved a total savings of $570,000. We accomplished this by eliminating the time consuming tasks that would have been required during the ETL project trying to repair data problems in real time. They avoided this problem through the use of our methodology.

This approach is something we offer to all of our prospective clients. The presale analysis and ROI analysis is completely free of cost. If we cannot show you an ROI, then we will at least leave you with a great costs analysis!

To learn more about our data migration and integration services, please feel free to contact us directly at info@zendeux.com or visit our site www.zendeux.com.

300% ROI in Data Migration Project Using Zendeux Data Integration Services

Often, in every IT project, there are hidden costs that are not accounted for that creep up and cost businesses a significant amounts of money.  During my previous roles as an executive at several Fortune companies.I realized that one must correctly account for all costs associated with an IT project to measure true ROI.

When we established Zendeux years ago, our basic premise and philosophy in doing business was to bring tangible value to our clients.  This is mainly because I had played the role of the client to so many vendors and consulting companies that tried to sell me benefits only, leaving me to guess the value!  That is why we always begin our engagements with understanding the true business objective and tangible value while measuring the direct and indirect costs of the project.

Zendeux Fast Track Data Migration Services


To this end, our data migration and integration services are designed to save our clients money related to resourcing, project time, and indirect costs.  Once we measure those costs, we then come up with the most effective approach to ensure positive ROI in their project due to costs saving and/or tangle business benefits.

I like to illustrate the success of this approach using a recent project we did for a large national real estate investment company with offices in all major US cities.  Before becoming our client, they were planning  to hire one or two in-house full time resources to augment their IT  team.  These new employees were intended to help migrate data from seven different legacy systems to Salesforce.com and then from there into their brand new in-house custom CRM system.  These systems were scattered across 25 locations nationally.

To untrained eyes, hiring in-house resources are certainly more cost effective than hiring an outside vendor to do the job.  However,  working together with the client,

our joint analysis of their costs and timeline estimate using the in-house approach VS our fact track services showed them how we could reduce the project timeline by 29 months and save them $408,000 in Salesforce subscription fees alone!  Overall, we showed a saving of about $570,000 and 300% ROI.

We gave them two fast track options, one with us doing the entire data migration project, or a hybrid model with a lower cost to work with their existing resources to complete the job in 6 months instead of 35 months!

We finished  the job on time and on budget!  By reducing the project timeline we managed to reduce the significant cost associated with interim license fees and project resource costs.  Our project management also significantly reduced non-tangible business costs by avoiding  a lengthy deployment of a new CRM system.

This approach is something we offer to all of our prospective clients. The presale analysis and ROI analysis is completely free of cost.  If we cannot show you an ROI, then we will at least leave you with a great costs analysis!

To learn more about our data migration and integration services, please feel free to contact us directly at info@zendeux.com or visit our site www.zendeux.com.

Zendeux and Project Management in a very large Public Utility in California

I am a project manager in a very large public utility in California and work on infrastructure and compliance projects. Having attended and achieved the Zendeux Blue Belt in Information & Data Governance frameworks, I can say without reservation that working in such a large corporate environment armed with the knowledge provided by the Blue Belt makes my work in the Cyber Security domain that much more productive and enjoyable. Stakeholders from SME’s (Subject Matter Experts) to administrators to senior c-level executives recognize and acknowledge the quality of work delivered.

If you are a project manager interested in boosting your career with dual afterburners, check out the Zendeux Blue Belt….it rocks…

Thanks Zendeux… you guys rock!!! 😀


Mits Shinohara

What Does Document Data Management Have to do with Big Data?

Document Data Management is the discipline that consists of processes, tools, and techniques that are used to define, model, discover, extract, integrate, standardize, normalize, report, and govern the data embedded within documents.  This should not be confused with Document Management which is mainly used to manage the actual documents within an organization.  Document management is more interested in the original document and preservation of the document while being able to locate or classify it as needed.

To that end, document management uses metadata to describe documents using several simple document attributes that help users classify and find it easily.  This approach is similar to how books are cataloged in a library.  However, Document Data Management is more interested in the content or data within the documents than the classification or archiving.

It is said that 70 to 80 percentage of data inside an organization is actually unstructured.  That means this type of data is not usually stored in tables or even spreadsheets and cannot be abstracted into attributes and fields.  Unstructured data is deeply embedded in the texts of many documents types such as invoices, purchase orders, sales contracts, maintenance narrative, and many more.  Sometimes these documents are actually stored within databases as long unstructured texts such as notes, comments, support case narratives, doctors’ notes, and others.

With emergence of big data and the availability of techniques to store, search, and mine unstructured data using big data tools and techniques, there has been a great increase in demand for discovering and extracting this type of data from documents.  Companies are now trying to extract valuable data from huge volumes of call center cases to understand customer sentiment, product defects, fraud detection, and many more powerful insights.

However, so far all such efforts have been quite organic and limited to data mining rather that data management.  The main reason for this limitation is that there has been no well defined disciplines or methodologies that describe how unstructured data within documents should be managed.

Document Data Management as a discipline tries to address this void by providing the methods, techniques, processes, and in short the science of managing document data. I started my work with document data since the late 1990’s when I was building my first form processing software called FormBase which extracted key data points from printed forms and stored them in a database.  It also could identify the type of form among many different types and archive it in the correct location inside a document management software.

Today we are dealing with large volumes of fully unstructured data such as real estate county records, medical records, oil and gas maintenance records, and other exciting but challenging unstructured data to manage.  But this time around, we are trying to build the disciple, methods, processes, and tools to allow us to define, model, extract, integrate, report, and govern document data.  The goal is to turn unstructured document data into a structured form where it can be fully integrated into the rest of the enterprise data which can then be used in operations and business intelligence.  Imagine enhancing the information breadth, depth, and insight of an organization by adding data that was till now untapped.

In my next blog post I will write more about what each aspect of document data management entails and how organizations can incorporate this discipline into their overall data management strategy.

Why Should Project Managers Know Data Management?

It is not surprising that most data management professionals at some point in time in their career have either played the role of an IT project manager or have worked as one in an official capacity.

In IT most technical changes including data related changes are implemented through a project in one form or another.  Now if we accept the notation that change is a constant reality in today’s businesses, then that means most technical changes in an organization may be implemented through a project.

Most if not all IT projects have some data impact or are impacted by data.  That is, whether they are rolling out a new software solution, hardware solution, reporting, integration, migration, and others, there are elements of data tasks involved.  These projects often require a delicate balance between complex tasks in data management domain.  For example, a software implementation project would need data architecture, migration, integration, quality, and governance.  This translate to many tasks and many resources as well as dependencies.

A project manager must manage all these tasks and understand their impact on resourcing, budget, timeline, solution, and much more!  Most IT projects can easily fail to meet these demands when one of these moving parts is out of order.

Project managers who understand these data management principles can clearly understand what is required in each phase and what to look out for.  They can manage expectation and impact with project sponsors and stakeholders to ensure a successful and smooth project delivery.

It is because of this that we recently developed a very specific data management training for IT project managers to train them in the fundamentals of data management as well as specific types of projects that they will be dealing with.  They can now use the same tools and techniques we use in data management to ensure a successful delivery of data solutions in their IT projects.

Last week we delivered the first of these training sessions in Indianapolis and it was attended by a number of seasoned project managers who found it to be extremely valuable (according to their survey and interactive feedback).  We are offering another one in Newport Beach, CA in September.  The goal is to have at least one per month in different cities.

If you are interested in learning more about these classes, you can go to our training page.


Till next time!

Majd Izadian

Copyright 2013, Zendeux Business Data Solutions

Data Project Management OC2

Big Data, New Hype, or New Reality?

IT industry is not immune to the “New Shining Object Syndrome” that impacts other industries as well as consumers.  Every once in a while we get a new technology or idea that grabs the attention of IT professionals and CIOs alike.

There have been many examples of these new ideas, some of which have endured the test of time and some that have not.  There was once a huge excitement around Object Oriented Programming (OOP), SOA (Service Oriented Architecture), BI (Business Intelligence), ASP (Application Service Provider), Master Data Management (MDM), and now Big Data!  Some of these ideas like OOP have been observed into the way we write our software where we no longer call it that.  ASP (not the same as Active Server Pages) has evolved into SAAS (Software As a Service) and hardly anyone remembers about ASP anymore.

Big Data however, has created a huge buzz lately and the excitement is very contagious and seems to have gotten the attention of the media much more than the previous trends.  The reason may be that unlike those other technical breakthroughs in the IT industry, this one has a strong business and even consumer impact.  Though many do not clearly understand what Big Data means, they have by now a vague notion that it somehow involves or impacts them.

News such as the NSA’s (National Security Agency) tapping into the metadata of domestic and intentional phone communications to Google, Amazon, and other companies profiling consumer’s every click on the web has awaken fear and excitement in normal consumers.  For this reason alone, it is no longer easy to ignore Big Data as a mere hype or “new shining object”, but perhaps a new reality which we are all forced to be reckoned with.

Big data itself and its potential is more analogous to the internet at its early days where only few understood its immediate potential and perhaps none could forecast its future potentials.  Like the internet, the early days of Big Data is only accessible to few and far in between, and like the internet, unless it is readily available to the common man, it will not fulfill its full potential and depth.

Therefore, if I were a betting man, I would bet on the Big Data.  But one should be weary of the hype, as like the internet, Big Data could find many .COM victims who may be blinded by the “new shining object” without knowing its risks and potentials of what makes it shine!

Copyright, 2013, Majd Izadian, CEO Zendeux Business Data Solutions

Big Data, Hype or Reality?

To MDM or not to MDM, Part 2

As I discussed in Part 1 portion of this blog, MDM is not a tool, but rather is a discipline that can be implemented in an organization through a program. And MDM Program could leverage a solution (referred to it as an MDM Platform) to help manage master data. So what is an MDM Platform made of? Before I answer that, I would like to point out again that an MDM Program is not single dimensional but rather has 4 dimensions as I listed in Part 1:

  • Framework
  • Architecture
  • Process
  • Governance

Therefore, any solution for MDM should be made of components that correspond to these dimensions. Also since these dimensions are related, the components need to be designed in an integrated fashion.

An MDM platform then has the following components:

  • Data Governance – Includes data stewardship, change management, policies, standards, communication, and workflow functionality
  • Data Quality – Data profiling, validation, search before create, integration ability with 3rdparty validation services
  • Standard CRUD Services – Provides data services based on standard CRUD processes with encapsulation of business rules and DQ validation and Data Governance
  • Master Data IntegrationLayer – Common data integration layer for ETL and access by downstream systems manageable by governance rules for data usage
  • Single Master Data Source – A single data repository for master data attributes as the source for creation and access
  • Single Master Data Architecture – A common data architecture for every master data object in alignment with business processes that ensures uniqueness and scalability.

Now, when we look at the list of MDM Platform components above, we can see that each can have a great deal of functionalities that can enable the management of master data. Each component’s functionality can be implemented in a variety of ways using different technologies and techniques. For example, Data Quality has much functionality associated to it and there are a lot of vendors that use different technologies and approaches to provide different solutions. That is true for data integration and all other components as well.

Before we get into the tools options, I think we need to address the different solution implementation. It is important to note at this point though that an effective MDM Program (and thus an MDM Platform) should have all these components in an integrated fashion to address most or all of an MDM Program’s need. There are tools that offer all the components in an integrated fashion. There are also vendors that offer only specific components in the above list. So what are your options? In general there are three approaches to MDM Platform implementation:

OPTION A: Full Vendor MDM Tool Implementation
IMPACT: Replacement of all existing MDM Solutions and Repositories, replacing all existing data integration for Master Data. More holistic, but very expensive and may not be scalable for future technology as components are tightly coupled

OPTION B: Full in House MDM Tool Build
IMPACT: Incremental deployment, replacing only some of the existing data integration for Master Data. Less expensive in short run but expensive in long term and will not scale as well we the components that are offered by industry core competency

OPTION C: Integrated Best of Breed MDM Platform
IMPACT: Leverages existing usable components and repositories. Uses best of breed for each component but integrate tightly through SOA. Less impact on IT and business, can be implemented incrementally, scalable as components can be replaced for future technology without impacting the solution

Having come from a business solution architecture background, it has been proven to me that an integrated solution of components is always better than tightly coupled solutions or functionalities. The architecture of integrated architecture is more flexible and scalable. It is flexible in a sense that it would allow you to pick the best current or future functionality and technology to support it without breaking the whole solution. It is scalable because it allows you to scale your solution to more functionalities and integration to other components, applications, or systems without impacting the integrity of the solution as a whole. But clearly this choice is requires careful architectural analysis as well. Sometimes the existing Architecture has already integrated the master data with ERP’s transactional functionality so tightly that it would be very costly to introduce new or external components to the mix. Also, there are ERP systems whose data models are not normalized (or even normal J) enough to allow easy integration of other best of breed components. Sorry, can’t name names!

So in conclusion, yes, I would say you must MDM, but MDM it right! Understand what MDM is and what it means to your organization. Then try to educate your executives or peers to understand that MDM should be an integral part of the company’s operation and culture. And make sure that you don’t make the mistake of thinking that MDM covers all your data management needs. Master data is important, but it is not all the data that you need to manage in your company.

Finally, in reality, master data management can be done without any fancy tools, but the tools will just makes things easier and more scalable. I am a firm believer that Information Management has existed way before technology has. If you can wrap your mind around that concept, then technology becomes your friend and you would think clearly when choosing the right vendor or tool for your data management needs.

But perhaps that is the subject for another blog! 🙂

(Original publish and Copyright date 2011 © Majd Izadian)

To MDM or Not to MDM, Part 1

I once told an executive of a very large company who asked me about MDM tools that, “Most MDM tools are scams!” Ok, I may have made some people unhappy with that statement, but what I mean as I explained further to him was that, MDM is not a tool, it is a program based on disciplines. It is true that many vendors may like to introduce MDM as a tool, nicely packaged in box, that once installed can solve all of your master data problems. Actually my beef is with that group of vendors, because in my view saying MDM is a tool oversimplifies it. The problem comes when some poor soul in some data management organization is trying very hard (and for all the right reasons) to sell their executives the idea of MDM. At that point their executives are then bombarded by calls from sales guys from MDM “tool” vendors that tell them they have their magic pill ready to swallow. These solutions normally mean an entire degutting of what they already have – whether it is necessary or not- and replacing it with a fancy MDM tool. The executive who asked me that question was actually in the same dilemma. He has been approached by large vendors that have been pushing hard to implement an MDM tool as a solution to their MDM problems.

It is very important to know the difference between several concepts when it comes to MDM solutions. Let’s start by introducing you to them. Master Data – represents information about objects (things) that exist independent of transactions and can play different roles in a number of processes within or between organizations. (I left systems out intentionally) Master Data Management- includes all processes, standards, methodologies, policies, roles, and tools that are used to ensure master data quality, consistency, uniqueness, and performance for all impacted stakeholders including but not limited to business transactions, reporting, and integration. A Master Data Management Platform*- is a solution that facilitates Master Data Management implementation by providing integrated and orchestrated functionalities, capabilities, and technologies for master data architecture, quality, governance, integration, services, and System of Record. (*Platform does not imply a tool, it implies a solution.) A Master Data Management Tool- is a solution provided by a vendor which combines several components of a Master Data Management Platform and usually is designed for specific master data objects (Oracle PIM, SAP MDM, Siperian MDM, and many more). Therefore, there is a clear distinction between MDM as a program, MDM as a platform, and MDM as a tool. What most companies need first is a program and not a tool. An MDM program will address more fundamental questions in the business and IT landscape of a company than just what is, where master data should reside, or how it should be entered. It also becomes deeply rooted into an overall Information and Solution Architecture of a company as well as processes and culture. So, what is in the scope of an MDM Program? In general, I break the MDM scope into four parts: 1. Framework 2. Architecture 3. Process 4. Governance Each of these is divided into their own sub items that further describe how an MDM program should be established which is a little beyond this blog. But the idea is that a program to manage master data in a company is multi-dimensional where each dimension deserves thoughtful planning, execution, and an ongoing operation. In my next blog, I will begin to describe the components of an MDM platform and solution. As you can guess by now, platform solution definition comes before choosing a tool! 🙂

-(originally publish and copyright date 2011, Majd Izadian)