Portland OR, United States (Portland Metro Area)
I live within and am interested in determining what type(s) of interesting information can be gleaned from the Portland Metropolitan OSM file. The map below depicts the area encompassed by the OSM file (black dots) and each purple dot represents the unique zip codes discovered within the ways_tags and nodes_tags.
def fix_city_name(name, mapping=MAPPING):
"""Splits tag.attrib['v'] and checks each string against MAPPING.
If there's a value match, the string is changed to the new value."""
if name in mapping:
name = name.replace(name, mapping[name])
return name
def fix_zip_codes(zip_codes):
"""Expects a string. Will search the string for a consecutive 5 digits and
return the string as a zip code or leave blank if there's no match."""
zip_code = re.compile('\d{5}')
zip_code = zip_code.findall(zip_codes)
if zip_code:
return zip_code[0]
else:
return ''
def fix_street_name(name, mapping=MAPPING):
"""Splits tag.attrib['v'] and checks each string against MAPPING.
If there's a value match, the string is changed to the new value."""
name = name.strip()
x = name.split()
for y in x:
if y in mapping:
name = name.replace(y, mapping[y])
return name
SELECT value
FROM (SELECT * FROM nodes_tags UNION ALL
SELECT * FROM ways_tags) tags
WHERE key='phone'
GROUP BY value
The table below shows the various formats phone numbers come in. They should be corrected to a standard format for consistency.
SELECT COUNT(DISTINCT(users.uid))
FROM (SELECT uid FROM nodes UNION ALL
SELECT uid FROM ways) users;
1,392
SELECT tags.value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
SELECT * FROM ways_tags) tags
WHERE tags.key LIKE 'city'
GROUP BY tags.value
ORDER BY count DESC;
SELECT tags.value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags
UNION ALL
SELECT * FROM ways_tags) tags
WHERE tags.key='postcode'
GROUP BY tags.value
ORDER BY count DESC;
SELECT contrib.user, COUNT(*) as count
FROM (SELECT user FROM nodes
UNION ALL SELECT user FROM ways) contrib
GROUP BY contrib.user
ORDER BY count DESC
LIMIT 10;
SELECT tags.value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
SELECT * FROM ways_tags) tags
WHERE tags.key='amenity'
GROUP BY tags.value
ORDER BY count DESC;
SELECT value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
SELECT * FROM ways_tags) tags
WHERE key='cuisine'
GROUP BY value
ORDER BY count DESC;
SELECT value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
SELECT * FROM ways_tags) tags
WHERE key='sport'
GROUP BY value
ORDER BY count DESC;
Based upon the collected data, as shown in Corrected OSM File Issues, there are a relatively small number of issues. Specifically, only 40 city names and 50 zip codes required standardization. Additionally, fewer that 240 street names were transformed from short form to long form.
As mentioned in Other Ideas About the Dataset, the Portland data is very thorough, but the more rural communities surrounding Portland would benefit from more users and data. Bringing awareness of the OSM project and its benefits in terms of data availability to potential new users seems to be an intergral component to the continued success of OSM.