For most of the data that we encounter a common check can be number of rows in the table vs number of unique values. This can be especially important when we are analyzing data related to customers, merchandise items, students enrolled in a program or any other business case where the subject can be uniquely identified.
Social Media Data Analysis
There are many different patterns that emerge from the analysis of Social media data, for example Twitter. I will try to bring here the most evident and hopefully important ones that emerge from a study of the university of Illinois in collaboration with social media data vendor GNIP. From this study the Global Twitter Heartbeat project originated.
Let’s start with the time in which Twitter users connect and tweet about facts and happening. This behavior can be modelled as a time series analysis. Well it emerges that there are two peaks of activity during the day. One around 8-9 o’clock in the morning and another one around 21-22 o’clock in the evening.
The left graphics is about the week days and the right one about the weekend days.
With regard to the tweets made with a geographic reference there are different patterns that emerge.
-Geographic proximity is found to play a minimal role both in who users communicate with and what they communicate about.