Data Cleansing
- The better your database, the better your datamining.
- So, if you're going to attempt some datamining, it's best
to clean up your data base.
- This is particularly important, when you've combined
databases to get one to work with.
- Combining databases often leads to things called Data Warehouses.
- One thing you want to do is remove redundant data.
- One of the standard problems is having multiple copies of
the same person due to different versions of names or
addresses.
- For instance, I'm Chris Huyck, Dr C Huyck, Mr Christian Huyck
and to some extent Mr Christian Jones.
- There is a fair amount of software out there for data
cleansing.
- Unfortunately, this only seems to work for specific problems.
For example, there is a lot for address correction.
- You also want to make sure, as much as possible, that the
data is accurate.
- You may want to remove old data.
- Wikipedia helps again with
http://en.wikipedia.org/wiki/Data_cleansing
- Data Ladder has asked to
have a link from this page as they're a company who
has expertise in data cleansing.