Press Release
Cornell maps the world’s photos
Contact: Paul Redfern
Cell: (607) 227-1865
FOR RELEASE: April 23, 2009
ITHACA, NY – Cornell University computer scientists used a supercomputer
at the Cornell Center for Advanced Computing to download and analyze nearly 35
million Flickr photos taken by over 300,000 photographers from around the globe.
Their main goal was to develop new methods to automatically organize and label
large-scale collections of digital data. A secondary result of the research was
the generation of statistics on the world's most photographed cities and
landmarks, gleaned from the analysis of the multi-terabyte photo collection:
•
The top 25 most photographed cities in the Flickr data are: (1) New York City
(2) London (3) San Francisco (4) Paris (5) Los Angeles (6) Chicago (7)
Washington, DC (8) Seattle (9) Rome (10) Amsterdam (11) Boston (12) Barcelona
(13) San Diego (14) Berlin (15) Las Vegas (16) Florence (17) Toronto (18) Milan
(19) Vancouver (20) Madrid (21) Venice (22) Philadelphia (23) Austin (24) Dublin
(25) Portland.
•
The top seven most photographed landmarks are: (1) Eiffel Tower - Paris (2)
Trafalgar Square - London (3) Tate Modern museum - London (4) Big Ben - London
(5) Notre Dame - Paris (6) The Eye - London (7) Empire State Building - New York
City.
The study also identified the seven most photographed landmarks in each of the
top 25 cities. Most of these landmarks are well-known tourist attractions, but
some surprising results emerged. For example, one striking result in the Flickr
data is that the Apple Store in midtown Manhattan
is the 5th-most photographed place in
New York City – and, in fact, the 28th-most photographed
place in the world.
Cornell developed techniques to automatically identify places that people find
interesting to photograph, showing results for thousands of locations at both
city and landmark scales. "We developed classification methods for
characterizing these locations from visual, textual and temporal features," says
Daniel Huttenlocher, the John P. and Rilla Neafsey Professor of Computing,
Information Science and Business and Stephen H. Weiss Fellow. "These methods
reveal that both visual and temporal features improve the ability to estimate
the location of a photo compared to using just textual tags."
Cornell's technique of finding representative images is a practical way of
summarizing large collections of images. The scalability of the method allows
for automatically mining the information latent in very large sets of images,
raising the intriguing possibility of an online travel guidebook that could
automatically identify the best sites to visit on your next vacation, as judged
by the collective wisdom of the world's photographers.
To perform the data analysis, the researchers used a mean shift procedure
and ran their application on a 480-core Linux-based Dell PowerEdge 2950
supercomputer at the Cornell Center for Advanced Computing (CAC) called the
“Hadoop Cluster.” Hadoop is a framework used to run applications on large
clusters of computers. It uses a computational paradigm called Map/Reduce to
divide applications into small segments of work, each of which can be executed
on any node of the cluster. “As the creation of digital data accelerates," says
CAC Director David Lifka, "supercomputers and high-performance storage systems
will be essential in order to quickly store, archive, preserve, and retrieve
large-scale data collections.”
The results of this research were presented in April 2009 at the 18th
International World Wide Web Conference in Madrid. Details are available in
the paper entitled
"Mapping
the World's Photos" by Cornell Computer Science researchers
David Crandall, Lars Backstrom, Daniel Huttenlocher, and Jon Kleinberg.
Visualizations
from the project show how planetary-scale datasets can provide insight into
different kinds of human activity – in this case those based on image; on
locales, landmarks, and focal points scattered throughout the world; and on the
ways in which people are drawn to them.
This research was supported in part by the National Science Foundation (NSF) and
by funding from Google, Yahoo! and the John D. and Catherine T. MacArthur
Foundation. The Cornell Center for Advanced Computing
is supported by Cornell
University, the NSF, DOD, USDA,
and members of its corporate program.