Category Archives for "Analytics Career"

Jan 27

Role of a Data Scientist in a Social Media Company

By ganpati | Analytics Career

A data scientist working with a Social Media company like Facebook, Twitter, Linkedin, Google Plus+, Pinterest, Tumblr or Instagram have few things in common. They define and answer questions that can have a big impact on the product, business, and peoples’ lives, especially if those questions or their answers are not well understood, well known, or even well formed yet.

Starting with the User

I’ll start with the users first. The strength of a social media site lies in its users. Social network analysis views relationships in terms of nodes (people) and edges (links or connections – the relationship between the people). A good social media site has algorithms that make friend suggestions accurately. Good friend suggestion algorithms are extremely valuable because they encourage connections (and the strength of an online social network increases dramatically as the number of edges increase). If you want to try out a problem that data scientists face in real life in social media firms visit this link in Kaggle.

Another top priority in a social media firm is in-depth analyses of usage patterns, user engagement and opportunity for improving product features. The information can be derived from many user related variables- where users go, user’s home feed, their own content; how content impacts user experience (flow, amount, evolution, interests); correlations with actions like liking, sharing, clicking or revisiting one’s own content and user demographics like age, gender, country etc. All this information is analyzed primarily to focus on two things: new user activation and user retention.

Features: How User Interact with Product

social media - affinity vs targetability

In order to understand the evolving trends in usage and user’s demand for new features data scientists usually maintain a rigorous culture around AB experiments and tools for in-depth analysis.

The Real Challenge is Implementation

A substantial portion of data scientist’s effort is invested in understanding and improving the operations, including building a framework for detection of anomalous changes in business metrics to prevent problems and learn about opportunities, porn and spam metrics and strategy, and data infrastructure.

Technology and Tools

In order to perform all the above a data scientist needs the help of an array of tools and technologies like- standard big data stack of S3, Hive, Redshift, Rstudio (including ggplot and knitr) etc.

Here are few experiences shared by users in Quora for social media companies like: Facebook Click Here to Read, LinkedIn Click Here to Read and PinterestClick Here to Read

Recommended Posts

Leading Data Scientists to Follow in 2016
Infographic: How Different Industries Use Analytics

Jan 16

Leading Data Scientists to Follow in 2016

By ganpati | Analytics Career , Getting Started

Due to the diversity of work, specialization, publications, experience and fan following of world’s leading data scientists it became very difficult for us to put across the names of leading Data Scientists to follow in 2016.

We finally decided to do a poll among ourselves (a group of bloggers at analyticscosm) and publish the initial list. Since this initiative is first of its kind at Analyticscosm, we’d keep updating the list based on the comments that we receive, and based on further research that we carry out.
I think the list we’ve come across is fairly representative of various industries and disciplines (from Sociometry ). So here’s our list of Leading Data Scientists and their Linkedin/Twitter profiles (not in any particular order):

    • Andrew Ng

  • Chief Scientist of Baidu; Chairman and Co-founder of Coursera; Assoc. Professor (Research) of Stanford University
    Social Media: Linkedin; Twitter; Quora

    • DJ Patil

  • U.S. Chief Data Scientist at White House Office of Science and Technology Policy; Co-coined the term “Data Scientist”
    Social Media:Linkedin; Twitter;Quora

    • Jeff Hammerbacher

  • Founder and Chief Scientist of Cloudera, Assistant Professor at the Icahn School of Medicine at Mount Sinai, and Director at Sage Bionetworks
    Social Media:Linkedin; Twitter;Quora

    • Brian Wilt

  • Director, Head of Data Science and Analytics at Jawbone
    Social Media:Linkedin; Twitter

    • Hilary Mason

  • Member of Board of Directors, Anita Borg Institute for Women in Technology; Founder, Fast Forward Labs; Data Scientist in Residence, Accel Partners
    Social Media:Linkedin; Twitter; Quora

    • John Akred

  • Founder & CTO @ Silicon Valley Data Science
    Social Media:Linkedin; Twitter; Quora

    • Alex `Sandy’ Pentland

  • MIT (Media Lab, Sloan Business School, Institute for Data, Systems, and Society), Cogito Corp, Thasos Group
    Co-Founder and Board of Directors, Cogito Corp
    Social Media:Linkedin; Twitter

    • Sebastian Thrun

  • Founder and CEO at Udacity; Research Professor, Stanford University
    Social Media:Linkedin; Twitter; Quora

    • Jure Leskovec

  • Chief Scientist at Pinterest; Assist. Professor at Stanford
    Social Media:Linkedin; Twitter

    • Kevin Novak

  • Head of Data Science Platform at Uber
    Social Media:Linkedin; Twitter

    • Riley Newman

  • Head of Data Science at
    Social Media:Linkedin; Twitter; Quora

    • Yair Livne

  • Director of Product Management at Quora, Yair previously completed a PhD in Economics at Stanford GSB, after working at an Israeli hedge fund. He has a B.Sc. and masters in math from the Hebrew University.
    Linkedin; Twitter; Quora

    • Michelangelo D’Agostino

  • Data scientist at Civis Analytics, Reformed physicist turned data scientist. Science writer and teacher. Former member of the Obama 2012 data analytics machine
    Linkedin; Twitter; Quora

    • Diane Wu

  • Co Founder, Trace Genomics; Previously deep learning data scientist @ MetaMind, Machine Learning @ Palantir Technologies. PhD @ Stanford (’12) majoring in Genetics and specializing in Computational Biology
    Linkedin; Twitter; Quora

    • Sean Gourley

    CTO and cofounder of Quid
    Linkedin; Twitter; Quora

    Some Other Leading Data Scientists

  • Jonathan Goldman: Linkedin; Twitter
  • George Roumeliotis: Linkedin
  • Jace Kohlmeier: Linkedin; Twitter
  • John Foreman: Linkedin; Twitter
  • Usama Fayyad: Linkedin; Twitter
  • Jack Y. Chen: Linkedin; Twitter
  • Amy Gershkoff: Linkedin; Twitter
  • Jan 16

    2 Types of Data Scientists

    By ganpati | Analytics Career

    Data Scientists are people with some mix of coding and statistical skills who also know about the industry metrics and data products. I’ve categorized them into 3 types based on my experience:

    I’ll call the first type as Static data Scientists- They work primarily with static data.They are very similar to a statistician (and hence the name) but knows all the practical aspects of working with large data which is outside the regular statistics curriculum: data cleaning, wrangling, dealing with very large data sets, visualization, deep knowledge of a particular domain, nice way of presenting the observations and so on.

    Static data Scientists also know decent level of coding though they aren’t experts. Static data Scientists are much more comfortable in experimental design, forecasting, modeling, statistical inference, or other things typically taught in statistics departments. But their work revolves around the products and how it will evolve and not just analyzing data and finding out p-values and confidence intervals unlike hardcore statisticians.

    The second type is dynamic data scientists- Though they share some statistical background with static data guys, they are also strong coders and may have come from a software engineering background. They are more interested in using data “in production.” They often deal with transactional data and build models based on the dynamic nature of user’s interaction with the product on a day to day basis. An example can be, recommendations about products, people you may know, ads, movies, search results which keep changing on a dynamic basis.

    This categorization is crude. Many Data Scientists role are overlap between the two types discussed above.