What to expect from Strata Conference 2015? An empirical outlook.

In one week, the 2015 edition of Strata Conference (or rather: Strata + Hadoop World) will open its doors to data scientists and big data practitioners from all over the world. What will be the most important big data technology trends for this year? As last year, I ran an analysis on the Strata abstract for 2015 and compared them to the previous years.

One thing immediately strikes: 2015 will be probably known as the “Spark Strata”:


If you compare mentions of the major programming languages in data science, there’s another interesting find: R seems to have a comeback and Python may be losing some of its momentum:


R is also among the rising topics if you look at the word frequencies for 2015 and 2014:


Now, let’s take a look at bigrams that have been gaining a lot of traction since the last Strata conference. From the following table, we could expect a lot more case studies than in the previous years:


This analysis has been done with IPython and Pandas. See the approach in this notebook.

Looking forward to meeting you all at Strata Conference next week! I’ll be around all three days and always in for a chat on data science.

DLD Conference – what were Twitter users discussing?

While I was taking a look at the network dynamics and relations of the Twitter conversations at the DLD conference in Munich, Salesforce and Radian6 took a more “traditional” approach and segmented the conversations in terms of topics, users and countries. While a tag cloud is able to give a first impression on the relevant content of the discussions, a semantic analysis goes much deeper and shows the relations between the terms used by the conference attendants. Here’s a look at the most important and most frequently connected words related to the Twitter hashtags “#DLD12” and “#DLD”:

The most frequently used words and related concepts have been the following:

See also: Networking at the DLD conference part 1 and part 2