Topic
Machine Learning and Deduplication
By: Forest Gregg
Date: Sept. 13, 2018, 6 p.m.
Machine learning and record linkage: Finding duplicates or matching data when you don't have primary keys is one of the biggest challenges in preparing data for data science. At DataMade we have built a python, open source machine learning library to help developers, and a product Dedupe.io to help everyone else. We describe the problem and how we use machine learning to scale to tens of millions of records.