Data Cleanup

I’m working on a project to randomize name and address data for a project.  Basically I want to be able to take live data and do a one-way (non-reversible) scramble for showing to the public and for training purposes.  I’ve downloaded a lot of records from GenerateData.com, a site that I really like for the purpose, but it’s a little too ‘international’ for my needs.  So I’ve grabbed data from a number of places.

One places that I took was my iPhone contact list.  I generated a PDF of all records with just the name and address.  All I wanted was street addresses, but I very quickly ran in to a major problem after I did a Select All and pasted it in to Excel for cleanup before loading it in to a database.

The data looked like this:

Zeffiro's Pizzaria                                                    135 N. Water St

The problem was that there was an unpredictable and inconsistent number of spaces between the company name and the address.  Incidentally, if you ever find yourself in Las Cruces, NM, Zeffiro’s is fantastic pizza from a wood-fired oven.  And yes, that’s their address.

I figured out a simple Excel formula to handle the extraction:

=TRIM(RIGHT(<datacolumn>,50))

It wasn’t 100% effective, but it was pretty darn good for extracting just that part of the address.  Grab the rightmost X columns, and the TRIM() function removes both leading and trailing spaces, and Robert is your parent’s brother.

The process was a little more complicated than that, but ultimately I ended up with what I needed.

Good luck if you have a similar need!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s