In a recent article, researchers at Stanford University, University of Michigan, Baylor College of Medicine, and Rice University teamed up on a HUGE scale project. Their goal was to use deep learning, Google Street View, and data from the American Community Survey census data to estimate the demographic statistics of neighborhoods and voter preference - whether a household voted Republican or Democrat.
In order to do this, they used 50,000,000 - yes, 50 million - pictures from Google Street View to classify a total of 2,657 types (categories) of cars from 1990 to the present. That's an average of 18,818 pictures per category. In the end, they had some very interesting findings. Here are some examples:
***People who voted Democrat were more likely to drive a sedan, while a Republican voter was more likely to drive an extended-cab pickup truck. Hmmm, I wonder if Texas had an impact on that stat!
***Their findings were able to very accurately predict median household income; percentage of Asians, Blacks, and Whites; and education levels.
***In Milwaukee, WI, they were able to correctly classify voting precinct results with 85% accuracy. The same applied to to Gilbert, AZ, (97% accuracy), and Birmingham, AL, (83% accuracy), among others.
Why is this study important? First, it shows the necessity for large amounts of data to properly train neural networks and perform deep learning. If the researchers had smaller numbers of images per category, say 1,000, the results may have been very different. It highlights a point that was made over and over last week at the Nvidia GPU Technology Conference. "To properly perform deep learning, you need a lot data!" Not a little data, a lot!!!! Herein lies the problem with many projects, such as medical imaging. The models are not as good as they could be simply because there is just not enough data (see my last blog post). Second, it shows that very disparate data - images from Google Street View and census data - can be used to make interesting conclusions that could affect elections or the sale of cars, among other things. If you get a chance, take a look at the article.
Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei
Proceedings of the National Academy of Sciences Dec 2017, 114 (50) 13108-13113; DOI: 10.1073/pnas.1700035114
Link to the paper: http://www.pnas.org/content/114/50/13108