New Podcast on Machine Learning

talkingmachinesThis new machine learning podcast “Talking Machines – Human Conversations on Machine Learning” really sounds like a lot of fun (and deep insight of course):

We start with Kevin Murphy of Google talking about his textbook that has become a standard in the field. Then we turn to Hanna Wallach of Microsoft Research NYC and UMass Amherst and hear about the founding of WiML (Women in Machine Learning). Next we discuss academia’s relationship with business with Max Welling from the University of Amsterdam, program co-chair of the 2013 NIPS conference (Neural Information Processing Systems). Finally, we sit down with three pillars of the field Yann LeCun, Yoshua Bengio, and Geoff Hinton to hear about where the field has been and where it might be headed.

Downloading the first episode from January 1st right now.

The Top 7 Beautiful Data Blog Posts in 2014

Domo_After2014 was a great year in data science – and also an exciting year for me personally from a very inspirational Strata Conference in Santa Clara to a wonderful experience of speaking at PyData Berlin to founding the data visualization company DataLion. But it also was a great year blogging about data science. Here’s the Beautiful Data blog posts our readers seemed to like the most:

  1. Datalicious Notebookmania – My personal list of the 7 IPython notebooks I like the most. Some of them are great for novices, some can even be challenging for advanced statisticians and datascientists
  2. Trending Topics at Strata Conferences 2011-2014 – An analysis of the topics most frequently mentioned in Strata Conference abstracts that clearly shows the rising importance of Python, IPython and Pandas.
  3. Big Data Investment Map 2014 – I’ve been tracking and analysing the developments in Big Data investments and IPOs for quite a long time. This was the 2014 update of the network mapping the investments of VCs in Big Data companies.
  4. Analyzing VC investment strategies with Crunchbase data – This blog post explains the code used to create the network.
  5. How to create a location graph from the Foursquare API – In this post, I explain a way to make sense out of the Foursquare API and to create geospatial network visualizations from the data showing how locations in a city are connected via Foursquare checkins.
  6. Text-Mining the DLD Conference 2014 – A very similar approach as I used for the Strata conference has been applied to the Twitter corpus refering to Hubert Burda Media DLD conference showing the trending topics in tech and media.
  7. Identifying trends in the German Google n-grams corpus – This tutorial shows how to analyze Big data-sets such as the Google Book ngram corpus with Hive on the Amazon Cloud.

The Rise of the Chief Data Officer

Just a few months after Ogilvy & Mather created a new job position for a Chief Data Officer Todd Cullen, another WPP agency is following this example. Mindshare USA just appointed Bob Ivins as the company’s first Chief Data Officer directly reporting to the CEO Colin Kinsella.

Among the reasons for this move seems to be the growing importance of passively collected data in the agencies’ data warehouses and their clients’ marketing and enterprise software. Then there’s of course the massive data wealth that’s out there in the open: billions of tweets, check-ins, posts and comments by the modern digital population.

Now that data management platforms such as x+1, BlueKai or Adobe are among the standard tools in digital marketing and audience-buying – and the larger agencies even work with their own custom-created platforms – all in all data is becoming the new competitive edge.

This development gains further traction with a new development: Up to now, agencies were the only ones with full access to advertising data. They were the ones who did the ROI modelings and attribution analyses – and charged their clients for this service. But more and more advertisers are demanding their campaign data back in order to do their own analyses in combination with market and media surveys, customer data and retail data.

Big Data Business Models

Agencies suddenly are in need to develop new offers for their clients that mean more than just evaluating media plans and campaign KPIs. Especially since the advent of real-time bidding and automated optimization, media planning has lost quite lot of importance for the agencies. I’d argue one of their new business fields will be data-driven:

  • identification of data-sources and data brokerage

  • analysis of their customers’ data-value

  • combination and refinement of data

  • real-time data management and data-driven learning

This all hints at a bright future for Data Scientists and Data Officers in advertising.