Coolhunting like a Streetfighter

bk_pydata
One of the most exciting applications of Social Media data is the automated identification, evaluation and prediction of trends. I already sketched some ideas in this blog post. Last year – and this was one of my personal highlights – I had the opportunity to speak at the PyData 2014 Berlin on the topic of Street Fighting Trend Research.

In my talk I presented some more general thoughts on trend research (or “coolhunting” as it is called nowadays) on the Internet. But at the core were three examples on how to identify research trends from the web (see this blogpost), how to mine conference proposals (see this analysis of Strata abstracts) and how to identify trending locations on Foursquare (see here). All three examples are also available as IPython Notebooks on my Github page. And here’s the recorded version of the talk.

The PyData conference was one of the best conferences I attended. Not only were the topics very diverse – ranging from GPU optimization to the representation of women in the PyData community – but also the people attending the conference were coming from different backgrounds: lawyers, engineers, physicists, computer scientists (of course) or statisticians. But still, with every talk and every conversation in the hallways, you could feel the wild euphoria connecting us all with the programming language and the incredible curiosity.

The Top 7 Beautiful Data Blog Posts in 2014

Domo_After2014 was a great year in data science – and also an exciting year for me personally from a very inspirational Strata Conference in Santa Clara to a wonderful experience of speaking at PyData Berlin to founding the data visualization company DataLion. But it also was a great year blogging about data science. Here’s the Beautiful Data blog posts our readers seemed to like the most:

  1. Datalicious Notebookmania – My personal list of the 7 IPython notebooks I like the most. Some of them are great for novices, some can even be challenging for advanced statisticians and datascientists
  2. Trending Topics at Strata Conferences 2011-2014 – An analysis of the topics most frequently mentioned in Strata Conference abstracts that clearly shows the rising importance of Python, IPython and Pandas.
  3. Big Data Investment Map 2014 – I’ve been tracking and analysing the developments in Big Data investments and IPOs for quite a long time. This was the 2014 update of the network mapping the investments of VCs in Big Data companies.
  4. Analyzing VC investment strategies with Crunchbase data – This blog post explains the code used to create the network.
  5. How to create a location graph from the Foursquare API – In this post, I explain a way to make sense out of the Foursquare API and to create geospatial network visualizations from the data showing how locations in a city are connected via Foursquare checkins.
  6. Text-Mining the DLD Conference 2014 – A very similar approach as I used for the Strata conference has been applied to the Twitter corpus refering to Hubert Burda Media DLD conference showing the trending topics in tech and media.
  7. Identifying trends in the German Google n-grams corpus – This tutorial shows how to analyze Big data-sets such as the Google Book ngram corpus with Hive on the Amazon Cloud.

How to create a location graph from the Foursquare API

Monday, I’ll be speaking on “Linked Data” at the 49th German Market Research Congress 2014. In my talk, there will be many examples of how to apply the basic approach and measurements of Social Network Analysis to various topics ranging from brand affinities as measured in the market-media study best for planning, the financial network between venture capital firms and start-ups and the location graph on Foursquare.

Because I haven’t seen many examples on using the Foursquare API to generate location graphs, I would like to explain my approach a little bit deeper. At first sight, the Foursquare API differs from many other Social Media APIs because it just allows you to access data about your own account. So, there is no general stream (or firehose) of check-in events that could be used to calculate user journeys or the relations between different places.

Fortunately, there’s another method that is very helpful for this purpose: You can query the API for any given Foursquare location to output up to five venues that were most frequently accessed after this location. This begs for a recursive approach of downloading the next locations for the next locations for the next locations and so on … and transform this data into the location graph.

I’ve written down this approach in an IPython Notebook, so you just have to find your API credentials and then you can start downloading your cities’ location graph. For Munich it looks like this (click to zoom):

Munich seen through Foursquare check-ins
Munich seen through Foursquare check-ins

The resulting network is very interesting, because the “distance” between the different locations is a fascinating mixture of

  • spatial distance: places that are nearby are more likely to be connected (think of neighborhoods)
  • temporal distance: places that can be reached in a short time are more likely to be connected (think of places that are quite far apart but can be reached in no time by highway)
  • affective/social distance: places that belong to a common lifestyle are more likely to be connected

    Feel free to clone the code from my github. I’m looking forward to seeing the network visualizations of your cities.