Category Archives for "Big Data Trends"

Jan 13

On the Way to Unlocking the Value of Big Data

By ganpati | Big Data Trends

Real value of Big Data can’t be realized until global citizens can be reassured that their data won’t be misused.

Properly exploited, Big Data should be transformative, increasing efficiency, unlocking new avenues in life-saving research and creating as yet unimaginable opportunities for innovation, but the opportunity can’t be taken until concerns (as businesses gather more and more personal information about all of us) about privacy and security have been overcome.

There are risks, as well as opportunities, from Big Data. Personal data is only a small proportion of big data, with huge potential from non-personal datasets across various industries. People are completely in the dark as they don’t know how much their data is worth to companies and they can’t see the negative consequences of a lack of transparency. They are not understanding that companies are in business to make money and they have no sense that they own this personal information.

The constant battle between privacy and accuracy

Companies view their personalization systems as infinitely growing repositories; the bigger the repository, better the quality of insights. However they can easily cross the thin/Red line of data privacy. Hence companies that want to aggregate data from various sources must often comply with data privacy rules. Balancing data insights with data privacy issues becomes important.

Privacy is a relative term

There are country-specific laws governing the collection and usage of data, let alone protecting a global citizen’s right to privacy. Governments and regulatory agencies have drafted a wide range of data privacy rules, regulations, laws, directives and frameworks in an effort to address the concerns data use creates. These include the EU Data Protection Directive, the APEC Framework etc.

The anonymisation / data masking and re-use of data is a solution option that needs to be considered as big data becomes increasingly a part of our lives. Clarity is needed to give big data users the confidence they need to drive forward an increasingly big data economy, and individuals that their personal data will be respected.

Oct 23

Free eBook on Mining Massive Datasets

By ganpati | Big Data Trends

A new edition of Mining Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullmanis is now available. It is used for a number of data mining courses at colleges across the US (and globe). Here are just a few of the topics from the book.

Map-Reduce and
Link Analysis
Recommendation Systems
Dimensionality Reduction
Mining Social-Network Graphs
Large-Scale Machine Learning

You can download the latest version of the book as a single big PDF file (511 pages, 3 MB).

Oct 21

Film Dialogue from 2,000 screenplays, Broken Down by Gender and Age

By ganpati | Big Data Trends


The prevailing theme: white men dominate movie roles.

But it’s all rhetoric and no data, which gets us nowhere in terms of having an informed discussion. How many movies are actually about men? What changes by genre, era, or box-office revenue? What circumstances generate more diversity?

In January 2016, researchers reported that men speak more often than women in Disney’s princess films. The claim was validated with double the sample size, 30 Disney films, including Pixar. The results: 22 of 30 Disney films have a male majority of dialogue. Even films with female leads, such as Mulan, the dialogue swings male. Mushu, her protector dragon, has 50% more words of dialogue than Mulan herself.

This analysis was for compiling real data. Framed as a census rather than a study. Googled their way to 8,000 screenplays and matched each character’s lines to an actor. From there, compiled the number of words spoken by male and female characters across roughly 2,000 films, arguably the largest undertaking of script analysis, ever.

The analysis contains aging out of Hollywood: Men vs. Women; dialogue, by Cast Member and Gender and other interesting aspects of movie dialogues.

Read More+

1 2 3 5