_______________
As I have recently graduated and I am looking for work, I decided to work on a mash-up(*) to test my technical prowess.
One source of information, the PCA directory, to my knowledge has not yet been mapped. With just under 2,000 entries that would be about the right size/complexity/magnitude. What a great place to start!
Here is the final product: http://www.maconserv.com/pcamap/.
Of course, the best way to get the most updated information is to scrape the directory web pages whenever somebody load the map. I don't want to have to maintain a copy of their directory myself. I'd rather let them update their directory and simply "scrape" from it whenever needed; however, I'll explain that this isn't practical in just a jiffy.
(Wait, I take that back! The absolute BEST way is to have access to their database so that my map wouldn't need to rely on loading the web pages to get the needed data. However, I'm an outsider and don't have access to their database :/ )
Further, Google Maps documentation states, "Geocoding is a time and resource intensive task. Whenever possible, pre-geocode known addresses...and store your results in a temporary cache of your own design."
Since the PCA directory IS a list of known addresses, it would make sense to ask Google for the latitude/longitude coordinates of each directory entry once and store them in a file so that the map can simply load information from that file (that I will have access to on my server) over and over again.
And my final problem was a security issue. I was being blocked from accessing the PCA directory web pages via JavaScript using jQuery. In modern parlance this is referred to as "cross-site scripting", and can be a viscious little buggar when it comes to online security. The PCA directory uses HTTP POST to tell the server which state's directory information to load, so that also added complexity to the process.
So, now, at this point, it looked like I would scrape data from the PCA directory web pages myself (i.e. copy and paste from my browser, Firefox), store it to a file on my local server to which I had free access, ask Google for the coordinates of each church entry (and save that to a file, of course!), and use the saved file data to display the locations on the map. And, that is basically the final process I used:
- Go to the PCA directory site and copy several state's directory information to a text file on my computer. (I wanted to do it bit-by-bit--no pun intended--so as to make it easier to catch problems.)
- Upload this "raw" data via FTP to my web server so that a PHP routine can clean up the data in the file.
- Run the "clean-up" PHP routine. It removes white space from around the fields (each entry has a church name, city, state, phone number, e-mail address, website address, presbytery name, and pastor name) and then separates each field with a tab character. Each church is listed on an individual line, so it is easy to scroll through the file line by line.
- Save the result of the "clean-up" routine to a new file. I'm actually simply appending each line to the new file, but that's not a huge issue.
- Run the "cleaned-up" data through a "coding" PHP routine. This takes the "cleaned-up" data, finds the city/state information for each church and asks Google through the HTTP geocoding service for the latitude and longitude coordinates. For about 270 locations, Google responded with 0,0 instead of valid coordinates. Those will need to be obtained individually later.
- Save the new coordinate data in a new "geocoded" file. To keep it simple, I added the coordinates to the end of each line after the original fields from the "cleaned-up" file.
- Lastly, run a final PHP routine that formats the tab-separated fields to JSON (JavaScript Object Notation). A JSON file can be easily read into the browser using a remote call via jQuery.
- Now, I have the directory information, plus the Google-provided geocoordinates in a format that can be easily read by JavaScript. The next part is to display it!
Loading it is a problem. Originally, it took about 30 seconds in IE/Firefox to load. I was able to reduce that to about 20 seconds by tweaking the code (I had been creating and destroying extra markers that weren't needed, slowing down the render time).
Google's Chrome browser does an outstanding job with JavaScript in general, but especially the Maps code. It loads the map in 5 seconds!
Looking at IE's memory footprint, it was showing about 125 MB (wow!) to load and render the map, and a little more each time the map is moved or zoomed.
All in all, I think I learned from this experience. If you have any comments or suggestions as to how I could improve it, make it faster, etc., let me know! As I mentioned, my copy of the directory is not updated when the PCA directory is modified, so there will be a version discrepancy soon. If you have thought of a way that addresses that problem, let me know that, too!
_____________
(*) Wikipedia: "a mashup is a web application that combines data from more than one source into a single integrated tool"
No comments:
Post a Comment