The majority of data stored by businesses is in these relational databases. And in addition, these databases are exceptionally good at storing complicated business data sets as well as allowing for efficient information retrieval. So having a strong understanding of relational databases is essential to being an effective data scientist.
To start with you should have a grasp of filters, joins, aggregations etc. for querying the database. Here is a post that you can follow: Filters, Joins, Aggregations, and All That: A Guide to Querying in SQL
For very large data sets, hadoop, the Hadoop Distributed File System (HDFS), and MapReduce are typically used to store and analyze these large data sets. Apache Hive is an implementation of SQL on top of MapReduce which brings the power of SQL to hadoop. Apache Pig and Scalding are similar competitors.