Whether you are preparing data for analytics or reporting, performing a data migration or consolidation, or creating a unified view of customer, product, or vendor, the organization name attribute is a critical data component. It must be clean and standardized to allow an accurate view of your business operations and to support optimal business decisions, especially after an M&A event.
Most of us have heard about the recent merger or acquisition of well-known brands: CVS and Aetna, T-Mobile and Sprint, BB&T and SunTrust Bank – these are massive companies using a variety of systems and applications to support their customer 360 efforts, churn analysis and prediction, and fraud detection, to name a few use cases where data quality is a critical concern.
In my experience profiling, cleaning, and mapping data, I’ve noticed 4 key capabilities which drive the value in solving the aforementioned use cases:
Visual Data Profiling & Full Discovery
In order to determine the nature and extent of any data anomalies and validate whether the org name attribute is accurate, you must understand the full distribution of values. (Important Note: A sample will leave you with a skewed understanding and may mislead your analysis due to false presumptions.) This visual data quality profile is the key to validating that you have properly and comprehensively finished your standardization work.

Dynamic Column Transformations
Although it may be difficult to anticipate all the different transformations that may be needed to clean up and reconcile entity / master data, it is highly beneficial to use an agile solution that provides a variety of dynamic transformation options. Then, apply bulk (multi-column) transformations to trim out the whitespace, standardize on case, apply a split (especially when recurring dash marks or parentheses indicate the inclusion of regional or other identifying information that should be parsed into a new column).
Perhaps the most effective approach includes the use of intelligent algorithms on the full dataset in order to detect all the potential duplicates and then to apply the recommended fixes (at scale) for each group, or cluster, of similar org names. Applying this key step almost always reveals a major boost in data accuracy – the number of unique org names decreases and you see the standardization impact immediately.

Point-and-Click Deduplication
Deduplicate the entire dataset such that you’re left with only the before and after versions of your Org Name attribute. This allows us to generate a mapping file based on your standardization work which may be useful for lookup purposes!

Intuitive Steps Management & Reuse
Collaboration and reuse are popular topics in the self-service data prep space. You should be able to easily share your work, whether it’s the entire data workflow project or just certain steps, so you never have to redo work or reinvent the wheel.
Recall that many of these steps will come in handy for other datasets which contain a company, supplier, or vendor name attribute.

Data Prep offers all of these key capabilities to business analysts, data scientists, and data quality specialists and is designed to tackle the various data quality issues often found (or buried) in customer and vendor/supplier data. To see each of these capabilities in action for this scenario, check out my use case vignette video: Org Name Standardization
Consider the value you can gain today on similar use cases where Data Prep provides a great advantage for profiling, combining, and transforming your raw data. The results can be significant and realized in a fraction of the time compared to the legacy, status-quo process!
Free Trial
DataRobot Data Prep
Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications
Try now for free
About the author
DataRobot
The Next Generation of AI
DataRobot AI Platform is the next generation of AI. The unified platform is built for all data types, all users, and all environments to deliver critical business insights for every organization. DataRobot is trusted by global customers across industries and verticals, including a third of the Fortune 50.
Meet DataRobot