Mastering Financial Customer Data at Multinational Scale

Your Customer Data…Consolidated or Chaotic?

In an ideal world, you know your customers. You know

  • who they are,
  • what business they transact,
  • who they transact with,
  • and their relationships.

You use that information to

  • calculate risk,
  • prevent fraud,
  • uncover new business opportunities,
  • and comply with regulatory requirements.

The problem at most financial institutions is that customer data environments are highly chaotic. Customer data is stored in numerous systems across the company. Most, if not all of which, has evolved over time in siloed environments according to business function. Each system has its

  • own management team,
  • technology platform,
  • data models,
  • quality issues,
  • and access policies.

Tamr1

This chaos prevents the firms from fully achieving and maintaining a consolidated view of customers and their activity.

The Cost of Chaos

A chaotic customer data environment can be an expensive problem in a financial institution. Customer changes have to be implemented in multiple systems, with a high likelihood of error or inconsistency because of manual processes. Discrepancies with the data leads to inevitable remediation activities that are widespread, and costly.

Analyzing customer data within one global bank required three months to compile and validate its correctness. The chaos leads to either

  1. prohibitively high time and cost of data preparation or
  2. garbage-in, garbage-out analytics.

The result of customer data chaos is an incredibly high risk profile — operational, regulatory, and reputational.

Eliminating the Chaos 1.0

Many financial services companies attempt to eliminate this chaos and consolidate their customer data.

A common approach is to implement a master data management (MDM) system. Customer data from different source systems is centralized into one place where it can be harmonized. The output is a “golden record,” or master customer record.

A lambda architecture permits data to stream into the centralized store and be processed in realtime so that it is immediately mastered and ready for use. Batch processes run on the centralized store to perform periodic (daily, monthly, quarterly, etc.) calculations on the data.

First-generation MDM systems centralize customer data and unify it by writing ETL scripts and matching rules.

Tamr2

The harmonizing often involves:

  1. Defining a common, master schema in which to store the consolidated data
  2. Writing ETL scripts to transform the data from source formats and schemas into the new common storage format
  3. Defining rule sets to deduplicate, match/cluster, and otherwise cleanse within the central MDM store

There are a number of commercial MDM solutions available that support the deterministic approach outlined above. The initial experience with those MDM systems, integrating the first five or so large systems, is often positive. Scaling MDM to master more and more systems, however, becomes a challenge that grows exponentially, as we’ll explain below.

Rules-based MDM, and the Robustness- Versus-Expandability Trade Off

The rule sets used to harmonize data together are usually driven off of a handful of dependent attributes—name, legal identifiers, location, and so on. Let’s say you use six attributes to stitch together four systems, A and B, and then the same six attributes between A and C, then A and D, B and C, B and D, and C and D. Within that example of 4 systems, you would have twenty four potential attributes that you are aligning. Add a fifth system, it’s 60 attributes; a sixth system, 90 attributes. So the effort to master additional systems grows exponentially. And in most multinational financial institutions, the number of synchronized attributes is not six; it’s commonly 50 to 100.

And maintenance is equally burdensome. There’s no guarantee that your six attributes maintain their validity or veracity over time. If any of these attributes need to be modified, then rules need to be redefined across the systems all over again.

The trade off for many financial institutions is robustness versus expandability. In other words, you can have a large-scale data mastering implementation and have it wildly complex, or you can do something small and have it highly accurate.

This is problematic for most financial institutions, which have very large-scale customer data challenges.

Customer Data Mastering at Scale

In larger financial services companies, especially multinationals, the number of systems in which customer data resides is much larger than the examples above. It is not uncommon to see financial companies with over 100 large systems.

Among those are systems that have been:

  • Duplicated in many countries to comply with data sovereignty regulations
  • Acquired via inorganic growth, purchased companies bringing in their own infrastructure for trading, CRM, HR, and back office. Integrating these can take a significant amount of time and cost

tamr3

When attempting to master a hundred sources containing petabytes of data, all of which have data linking and matching in different ways across a multitude of attributes and systems, you can see that the matching rules required to harmonize your data together gets incredibly complex.

Every incremental source added to the MDM environment can take thousands of rules to be implemented. Within just a mere handful of systems, the complexity gets to a point where it’s unattainable. As that complexity goes up, the cost of maintaining a rules-based approach also scales wildly, requiring more and more data stewards to make sure all the stitching rules remain correct.

Mastering data at scale is one of the riskiest endeavors a business can take. Gartner reports that 85% of MDM projects fail. And MDM budgets of $10M to $20M per year are not uncommon in large multinationals. With such high stakes, making sure that you get the right approach is critical to making sure that this thing is a success.

A New Take on an Old Paradigm

What follows is a reference architecture. The approach daisy chains together three large tool sets, each with appropriate access policies enforced, that are responsible for three separate steps in the mastering process:

  1. Raw Data Zone
  2. Common Data Zone
  3. Mastered Data Zone

tamr4

Raw Data Zone The first sits on a traditional data lake model—a landing area for raw data. Data is replicated from source systems to the centralized data repository (often built on Hadoop). Data is replicated in real time (perhaps via Kafka) wherever possible so that data is most up to date. For source systems that do not support real-time replication, nightly batch jobs or flat-file ingestion are used.

Common Data Zone Within the Common Data Zone, we take all of the data from the Raw Zone—with the various different objects, in different shapes and sizes, and conform that into outputs that look and feel the same to the system, with the same column headers, data types, and formats.

The toolset in this zone utilizes machine learning models to categorize data that exists within the Raw Data Zone. Machine learning models are trained on what certain attributes look like—what’s a legal entity, or a registered address, or country of incorporation, or legal hierarchy, or any other field. It does so without requiring anyone having to go back to the source system owners to bog them down with questions about that, saving weeks of effort.

This solution builds up a taxonomy and schema for the conformed data as raw data is processed. Unlike early-generation MDM solutions, this substantially reduces data unification time, often by months per source system, because there is:

  • No need to pre-define a schema to hold conformed data
  • No need to write ETL to transform the raw data

One multinational bank implementing this reference architecture reported being able to conform the raw data from a 10,000-table system within three days, and without using up source systems experts’ time defining a schema or writing ETL code. In terms of figuring out where relevant data is located in the vast wilderness this solution is very productive and predictable.

Mastered Data Zone In the third zone, the conformed data is mastered, and the outputs of the mastering process are clusters of records that refer to the same real-world entity. Within each cluster, a single, unified golden, master record of the entity is configured. The golden customer record is then distributed to wherever it’s needed:

  • Data warehouses
  • Regulatory (KYC, AML) compliance systems
  • Fraud and corruption monitoring
  • And back to operational systems, to keep data changes clean at the source

As with the Common Zone, machine learning models are used. These models eliminate the need to define hundreds of rules to match and deduplicate data. Tamr’s solution applies a probabilistic model that uses statistical analysis and naive Bayesian modeling to learn from existing relationships between various attributes, and then makes record-matching predictions based on these attribute relationships.

Tamr matching models require training, which usually takes just a few days per source system. Tamr presents a data steward with its predictions, and the steward can either confirm or deny them to help Tamr perfect its matching.

With the probabilistic model, Tamr looks at all of the attributes on which it has been trained, and based on the attribute matching, the solution will indicate a confidence level of a match being accurate. Depending on a configurable confidence level threshold, It will disregard entries that fall below the threshold from further analysis and training.

As you train Tamr and correct it, it becomes more accurate over time. The more data you throw at te solution, the better it gets. Which is a stark contrast to the rules-based MDM approach, where the more data you throw at it, it tends to break because the rules can’t keep up with the level of complexity.

Distribution A messaging bus (e.g., Apache Kafka) is often used to distribute mastered customer data throughout the organization. If a source system wants to pick up the master copy from the platform, it subscribes to that topic on the messaging bus to receive the feed of changes.

Another approach is to pipeline deltas from the MDM platform into target system in batch.

Real-world Results

This data mastering architecture is in production at a number of large financial institutions. Compared with traditional MDM approaches, the model-driven approach provides the following advantages:

70% fewer IT resources required:

  • Humans in the entity resolution loop are much more productive, focused on a relatively small percentage (~5%) of exceptions that the machine learning algorithms cannot resolve
  • Eliminates ETL and matching rules development
  • Reduces manual data synchronization and remediation of customer data across systems

Faster customer data unification:

  • A global retail bank mastered 35 large IT systems within 6 months—about 4 days per source system
  • New data is mastered within 24 hours of landing in the Raw Data Zone
  • A platform for mastering any category of data—customer, product, suppler, and others

Faster, more complete achievement of data-driven business initiatives:

  • KYC, AML, fraud detection, risk analysis, and others.

 

Click here to access Tamr’s detailed analysistamr4

Optimizing Your GRC Technology Ecosystem

Most organizations rely on multiple technologies to manage GRC across the enterprise. Optimizing a GRC technology ecosystem aligned with a defined GRC process structure improves risk-informed business decisions and achievement of strategic business objectives. This illustration outlines ways to continuously optimize your GRC technology ecosystem for

  • greater process consistency
  • and development of actionable information.

An integrated GRC technology ecosystem built on common vocabulary, taxonomy and processes enables

  • more accurate and timely reporting,
  • increased reliability of achievement of objectives
  • and greater confidence in assurance with less burden on the business.

Here are just a few of the key benefits:

Process and Technology Alignment

  • Common methods for core tasks, uniform taxonomies, and consistent vocabulary for governance, risk management and compliance across the organization
  • Risk-based actions and controls that ensure timely responses to changed circumstances
  • Standardized GRC processes based on understanding where in the organization each defined process takes place and how data is used in managing risks and requirements
  • Connected technologies as necessary to gain a complete view of the management actions, controls and information needed by each user

Governance Systems to include:

  • Strategy / Performance
  • Board Management
  • Audit & Assurance Tools

Risk Systems to include:

  • Brand & Reputation
  • Finance / Treasury Risk
  • Information / IT Risk
  • External Risk Content
  • Third Party Risk

Compliance Systems to include:

  • Policies
  • Helpline / Hotline
  • Training
  • EHS (Environment Health and Safety)
  • Fraud / Corruption
  • Global Trade
  • Privacy
  • Regulatory Change
  • AML (Anti Money Laundering) / KYC (Know Your Customer)

Enabling Systems to include:

  • Data Visualization
  • Analytics
  • Business Intelligence
  • Predictive Tools
  • External Data Sources

Protective Systems to include:

  • Information Security
  • Data Protection
  • Assets Control

Benefits and Outcomes

  • Enhanced tracking of achievement of objectives and obstacles
  • Connected reporting for board/management/external stakeholders
  • Timely understanding of impact from operational decisions
  • Actionable view of changes needed to meet regulatory requirements
  • Clear action pathways for resolution of issues and process reviews
  • Consistent risk assessments feeding into advanced analytics
  • Improved predictive capabilities to support strategic planning
  • Control testing and audit trails for response to regulators and auditors
  • Greater confidence in assurance with less burden on the business
  • Enterprise-wide, departmental and geographic control standards

OCEG

Tips for Optimization

1. Process Framework

  • Identify tasks appropriate for standardization and schedule implementation across units
  • Assess vocabulary used throughout organization for inconsistencies and establish rules
  • Adjust process model periodically to continue alignment with business objectives and activities

2. Technology Ecosystem

  • Periodically review GRC technologies for gaps and duplication of systems
  • Assess appropriateness of connection of systems for data sharing and user access
  • Maintain a current road map for re-purposing and acquisition of technologies

3. Outcome Management

  • Apply standard processes for resolution of issues and remediation of identified process framework or technology ecosystem weaknesses
  • Enhance reporting capabilities with refined report structure and delivery methods/schedules
  • Ensure all users apply the process framework and understand how best to use the technology

Click here to access OCEG’s illustration in detail

Insurance Fraud Report 2019

Let’s start with some numbers. In this 2019 Insurance Fraud survey, loss ratios were 73% in the US. On average, 10% of the incurred losses were related to fraud, resulting in losses of $34 billion per year.

By actively fighting fraud we can improve these ratios and our customers’ experience. It’s time to take our anti-fraud efforts to a higher level. To effectively fight fraud, a company needs support and commitment throughout the organization, from top management to customer service. Detecting fraudulent claims is important. However, it can’t be the only priority. Insurance carriers must also focus on portfolio quality instead of quantity or volume.

It all comes down to profitable portfolio growth. Why should honest customers have to bear the risks brought in by others? In the end, our entire society suffers from fraud. We’re all paying higher premiums to cover for the dishonest. Things don’t change overnight, but an effective industry-wide fraud approach will result in healthy portfolios for insurers and fair insurance premiums for customers. You can call this honest insurance.

The Insurance Fraud Survey was conducted

  • to gain a better understanding of the current market state,
  • the challenges insurers must overcome
  • and the maturity level of the industry regarding insurance fraud.

This report is a follow up to the Insurance Fraud & Digital Transformation Survey published in 2016. Fraudsters are constantly innovating, so it is important to continuously monitor developments. Today you are reading the latest update on insurance fraud. For some topics the results of this survey are compared to those from the 2016 study.

This report explores global fraud trends in P&C insurance. This research addresses

  • challenges,
  • different approaches,
  • engagement,
  • priority,
  • maturity
  • and data sharing.

It provides insights for online presence, mobile apps, visual screening technology, telematics and predictive analytics.

Fraud-Fighting-Culture

Fraudsters are getting smarter in their attempts to stay under their insurer’s radar. They are often one step ahead of the fraud investigator. As a result, money flows to the wrong people. Of course, these fraudulent claims payments have a negative effect on loss ratio and insurance premiums. Therefore, regulators in many countries around the globe created anti-fraud plans and fraud awareness campaigns. Several industry associations have also issued guidelines and proposed preventive measures to help insurers and their customers.

Fraud1

Engagement between Departments

Fraud affects the entire industry, and fighting it pays off. US insurers say that fraud has climbed over 60% over the last three years. Meanwhile, the total savings of proven fraud cases exceeded $116 million. Insurers are seeing an increase in fraudulent cases and believe awareness and cooperation between departments is key to stopping this costly problem.

Fraud2

Weapons to Fight Fraud

Companies like Google, Spotify and Uber all deliver personalized products or services. Data is the engine of it all. The more you know, the better you can serve your customers. This also holds true for the insurance industry. Knowing your customer is very important, and with lots of data, insurers now know them even better. You’d think in today’s fast digital age, fighting fraud would be an automated task.

That’s not the case. Many companies still rely on their staff instead of automated fraud solutions. 67% of the survey respondents state that their company fights fraud based on the gut feeling of their claim adjusters. There is little or no change when compared to 2016.

Fraud3

Data, Data, Data …

In the fight against fraud, insurance carriers face numerous challenges – many related to data. Compared to the 2016 survey results, there have been minor, yet important developments. Regulations around privacy and security have become stricter and clearer.

The General Data Protection Regulation (GDPR) is only one example of centralized rules being pushed from a governmental level. Laws like this improve clarity on what data can be used, how it may be leveraged, and for what purposes.

Indicating risks or detecting fraud is difficult when the quality of internal data is subpar. However, it is also a growing pain when trying to enhance the customer experience. To improve customer experience, internal data needs to be accurate.

Fraud4

Benefits of Using Fraud Detection Software

Fighting fraud can be a time-consuming and error-prone process, especially when done manually. This approach is often based on the knowledge of claims adjustors. But what if that knowledge leaves the company? The influence of bias or prejudice when investigating fraud also comes into play.

With well-organized and automated risk analysis and fraud detection, the chances of fraudsters slipping into the portfolio are diminished significantly. This is the common belief among 42% of insurers. And applications can be processed even faster. Straightthrough processing or touchless claims handling improves customer experience, and thus customer satisfaction. The survey reported 61% of insurers currently work with fraud detection software to improve realtime fraud detection.

Fraud5

Click here to access FRISS’ detailed Report