Yesterday, Jörg has written a blog post on Data Storytelling with Smartphone sensor data. Here’s a practical approach on how to analyze smartphone sensor data with R. In this example I will be using the accelerometer smartphone data that Datarella provided in its Data Fiction competition. The dataset shows the acceleration along […]
Today, the Twitter engineering team released another very interesting Open Source R package for working with time series data: “AnomalyDetection“. This package uses the Seasonal Hybrid ESD (S-H-ESD) algorithm to identify local anomalies (= variations inside seasonal patterns) and global anomalies (= variations that cannot be explained with seasonal patterns).
As a […]
One of the most exciting applications of Social Media data is the automated identification, evaluation and prediction of trends. I already sketched some ideas in this blog post. Last year – and this was one of my personal highlights – I had the opportunity to speak at the PyData 2014 […]
I already mentioned the Hastie & Tibshirani course on statistical learning as one of my personal highlights in data science last year. My second highlight is also an online course, also by leading experts on their field (this time: Big Data and data mining), also based on a (freely available) book and also by […]
One thing that’s particularly great about the Internet is the Sharing Economy. So much information, know-how, content is given out for free on a daily basis. Here’s three fascinating unpublished books that you can take a look at right now. And to make them even greater, you can always give the authors your feedback, bugs […]
This new machine learning podcast “Talking Machines – Human Conversations on Machine Learning” really sounds like a lot of fun (and deep insight of course):
We start with Kevin Murphy of Google talking about his textbook that has become a standard in the field. Then we turn to Hanna Wallach of Microsoft Research NYC and UMass Amherst […]
2014 was a great year in data science – and also an exciting year for me personally from a very inspirational Strata Conference in Santa Clara to a wonderful experience of speaking at PyData Berlin to founding the data visualization company DataLion. But it also was a great year blogging about data […]
The crypto-currency Bitcoin and the way it generates “trustless trust” is one of the hottest topics when it comes to technological innovations right now. The way Bitcoin transactions always backtrace the whole transaction list since the first discovered block (the Genesis block) does not only work for finance. The first startups such as Blockstream […]
What I like most about the R and Python developer and user communities, is their incredible openness and generosity. One of the finest examples in the past year was the online course “Statistical Learning” taught by Stanford professors Trevor Hastie and Rob Tibshirani.
In this MOOC they explain very understandably (even […]
Monday, I’ll be speaking on “Linked Data” at the 49th German Market Research Congress 2014. In my talk, there will be many examples of how to apply the basic approach and measurements of Social Network Analysis to various topics ranging from brand affinities as measured in the market-media study best for planning, the
If you look at the investments in Big Data companies in the last few years, one thing is obvious: This is a very dynamic and fast growing market. I am producing regular updates of this network map of Big Data investments with a Python program (actually an IPython Notebook).
But what insights can be […]
One of the most remarkable features of this year’s Strataconf was the almost universal use of IPython notebooks in presentations and tutorials. This framework not only allows the speakers to demonstrate each step in the data science approach but also gives the audience an opportunity to do the same – either during the session […]
One of the most interesting Big Data companies in this network analysis of Venture Capital connections has in my opinion been Domo. Not only did it receive clearly above average funding for such a young company, but it was also one of the nodes with the best connections through Venture Capital firms and their […]
As the data-base for the Big Data Investment Map 2014 also includes the dates for most of the funding rounds, it’s not hard to create a time-series plot from this data. This should answer the question whether Big Data is already over the peak (cf. Gartner seeing Big Data reaching the “trough of […]
Here’s an updated version of our Big Data Investment Map. I’ve collected information about ca. 50 of the most important Big Data startups via the Crunchbase API. The funding rounds were used to create a weighted directed network with investments being the edges between the nodes (investors and/or startups). If there were multiple […]