BigData : A Problem And Their Solution

“can you imagine how much data social media like Facebook is receiving everyday ”

Atul kumar jha

4 min readSep 17, 2020

your answer might be 100 GB or 500 GB or 1000 GB or 5000 GB…or something else?

you will be surprised to know the answer…

Facebook is receiving 500+ TB of data per day . can you imagine that…

Remember that above mentioned data i.e 500+ TB data is per day data .This is really a huge data and this is called BigData .

One more question here is as a social media user like Facebook user , Will you visit the social media if they delete your data like photo and videos after some days ? I think the answer is No. That’s why they store each and every bit of data and for lifetime until some unavoidable circumstances came.

The question here is not that how much data Facebook is receiving , The question is how Facebook manage that huge data i.e how they process the data in second , where they store all these BigData permanently and how much cost came to manage this BigData . That’s why this BigData is a problem.

if you are are thinking that they are storing in Hard Disk they it will cost very high and if any how they manage the cost of Hard Disk how they manage the speed of search. for example -

In any laptop lets say we have 1 TB sata hard disk in which we copy a data of 1 Gb , it approximately take 1 minute. keeping that in mind can we imagine that how much it take to copy 500 TB . It will takes some days to copy and if we search anything from that i will again take days to show result . isn’t it ?

But Facebook doesn’t work like this. whenever we search our photo or video it show in some second . how they do ?

The concept or technology they are using is DISTRIBUTED STORAGE .

WHAT IS DISTRIBUTED STORAGE ?

Let’s take an example to understand this ..

let’s assume that i have a 40 GB data and i have to copy it , but i have 5 laptop with 1st one 50 GB storage and rest 4 have 10 GB each.

my question is in which laptop you will store this 40 GB data ?

I think your answer is 1st one which have 50 GB storage . isn’t it ? because we can’t store 40 Gb data in 10 GB storage but we can store 40 GB data in 50 GB storage .

But Distributed storage concept says that i will store this 40 GB data in 4 different laptop with 10 GB storage each and every laptop will be connected with some network. This is called DISTRIBUTED STORAGE .

Benefits of Distributed Storage :

1. This have many hard disk connected with some network which reduce the storage problem because it is nearly impossible to create one hard disk that can store 500+ TB of data daily . so we distributed it in many small small storage .

2. This also solve the problem of speed as let’s assume we have N hard disk to store data and in other hand we have 1 hard disk . if we search any thing in one hard disk it take time = T but if we search same thing in N hard disk connected with some network , it will search that data in all hard disk in parallel . so in comparison to single hard disk search time , in this case search time will decrease by T/N times and hence, speed of search will increase by N times .

3. cost problem : as we distributed storage in small small storage , it will reduce the cost of hard disk .

The above setup is called “Cluster” and the software or technology we use to manage this big data problem is HADOOP.

In next article i will clear you about the HADOOP ECOSYSTEM.

THANKS FOR READING……..