A data scientist working with a Social Media company like Facebook, Twitter, Linkedin, Google Plus+, Pinterest, Tumblr or Instagram have few things in common. They define and answer questions that can have a big impact on the product, business, and peoples’ lives, especially if those questions or their answers are not well understood, well known, or even well formed yet.
I’ll start with the users first. The strength of a social media site lies in its users. Social network analysis views relationships in terms of nodes (people) and edges (links or connections – the relationship between the people). A good social media site has algorithms that make friend suggestions accurately. Good friend suggestion algorithms are extremely valuable because they encourage connections (and the strength of an online social network increases dramatically as the number of edges increase). If you want to try out a problem that data scientists face in real life in social media firms visit this link in Kaggle.
Another top priority in a social media firm is in-depth analyses of usage patterns, user engagement and opportunity for improving product features. The information can be derived from many user related variables- where users go, user’s home feed, their own content; how content impacts user experience (flow, amount, evolution, interests); correlations with actions like liking, sharing, clicking or revisiting one’s own content and user demographics like age, gender, country etc. All this information is analyzed primarily to focus on two things: new user activation and user retention.
In order to understand the evolving trends in usage and user’s demand for new features data scientists usually maintain a rigorous culture around AB experiments and tools for in-depth analysis.
A substantial portion of data scientist’s effort is invested in understanding and improving the operations, including building a framework for detection of anomalous changes in business metrics to prevent problems and learn about opportunities, porn and spam metrics and strategy, and data infrastructure.
In order to perform all the above a data scientist needs the help of an array of tools and technologies like- standard big data stack of S3, Hive, Redshift, Rstudio (including ggplot and knitr) etc.