John Pace
  • Home
  • About
  • Contact

​​
​
Data Scientist, husband, father of 3 great daughters, 5x Ironman triathlon finisher, just a normal guy who spent a lot of time in school.
Let’s explore data science, artificial intelligence, machine learning, and other topics together.

50,000,000 pics for one project?  Yep!

4/2/2018

0 Comments

 
Picture

In a recent article, researchers at Stanford University, University of Michigan, Baylor College of Medicine, and Rice University teamed up on a HUGE scale project.  Their goal was to use deep learning, Google Street View, and data from the American Community Survey census data to estimate the demographic statistics of neighborhoods and voter preference - whether a household voted Republican or Democrat.

In order to do this, they used 50,000,000 - yes, 50 million - pictures from Google Street View to classify a total of 2,657 types (categories) of cars from 1990 to the present.  That's an average of 18,818 pictures per category.  In the end, they had some very interesting findings.  Here are some examples:
  • People who voted Democrat were more likely to drive a sedan, while a Republican voter was more likely to drive an extended-cab pickup truck.  Hmmm,  I wonder if Texas had an impact on that stat!
  • Their findings were able to very accurately predict median household income; percentage of Asians, Blacks, and Whites; and education levels.
  • In Milwaukee, WI, they were able to correctly classify voting precinct results with 85% accuracy.  The same applied to to Gilbert, AZ, (97% accuracy), and Birmingham, AL, (83% accuracy), among others.

Why is this study important?  First, it shows the necessity for large amounts of data to properly train neural networks and perform deep learning.  If the researchers had smaller numbers of images per category, say 1,000, the results may have been very different.  It highlights a point that was made over and over last week at the Nvidia GPU Technology Conference.  "To properly perform deep learning, you need a lot data!"  Not a little data, a lot!!!!  Herein lies the problem with many projects, such as medical imaging.  The models are not as good as they could be simply because there is just not enough data (see my last blog post).  Second, it shows that very disparate data - images from Google Street View and census data - can be used to make interesting conclusions that could affect elections or the sale of cars, among other things.  If you get a chance, take a look at the article.

You can read the paper here: http://www.pnas.org/content/114/50/13108

Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei. Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States.
Proceedings of the National Academy of Sciences Dec 2017, 114 (50) 13108-13113; DOI: 10.1073/pnas.1700035114

​If you have questions and want to connect, you can message me on LinkedIn or Twitter. Also, follow me on Twitter @pacejohn, LinkedIn https://www.linkedin.com/in/john-pace-phd-20b87070/, and follow my company, Mark III Systems, on Twitter @markiiisystems

#deeplearning #computervision #artificialintelligence #ai #google #stanford #baylorcollegeofmedicine #universityofmichigan #riceuniversity #streetview #pnas #acs #census #neuralnetwork
0 Comments



Leave a Reply.

    Archives

    November 2020
    September 2020
    August 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    May 2019
    April 2019
    March 2019
    April 2018
    March 2018
    January 2018
    November 2017

    Tweets by pacejohn
Proudly powered by Weebly
  • Home
  • About
  • Contact