Friday, January 23, 2009

Customer Identity Resolution

For the past two years I was involved in solving a challenging problem faced by many customer facing businesses in service industry: customer identity resolution.

Problem definition: implementing a solution for online customer matching for a large telecommunications company servicing more than 10 million customers.

Customer Relationship Management applications face a major challenge in implementing accurate and timely customer identification for several reasons:
  • customer reps misspelling customer name and addresses
  • large number of records (ranging anywhere from few millions to hundreds of millions) drives search response times higher
  • potential for fraud in case of identity misrepresentation
The solution to this problem is usually a complex mix of business intelligence tools and custom developed application code. In our case it uses SAS Dataflux product to generate match codes for customer name and address. Match codes are stored in Oracle database together with the original customer information. Several search or look up algorithms are implemented in a mix of Java web services and database store procedures to perform fast customer matching for thousands of customer reps serving customer calls in real time.

There are few lessons learned from this:
  • don't trust legacy data, use a data cleansing tool to validate and normalize
  • no matter how sophisticated, no out of box tool offers a comprehensive solution
  • performance tuning is critical and will drive a lot the design and implementation, as I found in performance testing with real live data volumes, only few weeks before going live ...