DataSync in AWS

In general, everyone maintains data of their own, either it might be an Organizational/Individual and it can be in any format (image, .zip, .doc, media, pdf, .exe, .deb, .dmg, etc.). In olden days people used to store the data on papers and then into the machines known as computers. Many storage devices like External HDD, SAN, etc. came into existence. Again the new era has come known as cloud to store the data and few trending companies like Google Drive, One Drive, AWS, etc. stores the user’s data. But storing of data depends on the many factors like registration of accounts, storage limits, pay for use, bandwidth while transferring of data, etc. Everyone (Organization/Individual) who uses these data storage services (Google Drive, One Drive, AWS S3, etc.) stores data within no time i.e., the service should be fast, reliable, secure and more over the data synchronization is must.

Many cloud offering companies have achieved this and recently AWS a well-known Organization who offers vast services introduced another service in its recent re:Invent program as AWS DataSync. This Service comes with optimization of the time and network and fast synchronization which is 10 times more than usual transfer rate in short this service is called the Migration with AWS DataSync.

Many popular sectors like Media & Entertainment, Oil & Gas, Life Sciences, Scientific, etc. where they have large amounts of data in their on-premises and they need to secure the data somewhere (traditional storages/cloud) so that they can free space after migration of data from on-premises to somewhere by the EOD. Here Comes the fast, reliable, secure, optimized service in recent times comes into existence as AWS DataSync. DataSync is not only for Migration it offers other use cases like data processing, disaster recovery, and one-off transfers of large datasets.

Advantages:

  1. Single DataSync is capable of saturating a 10Gbps of network line.
  2. Can control the network bandwidth so that other application in on-premises doesn’t affect while DataSync.
  3. Security is provided by AWS.
  4. Online Migration for active data sets, timely transfers for continuously generated data.
  5. Replication for business continuity.
  6. No charge for service i.e., only for the amount of GB transferred.
  7. Incremental Transfers.
  8. Transfer can be in two ways:
  9. On-Premises to AWS
  10. AWS to On-Premises.

Limitations:

  1. Limited to 100 number of tasks per region.
  2. Max – 50 Millions of files per task in DataSync.
  3. For 20 Millions of files need 64 GiB of RAM to VM.
  4. Max – 10Gbps of throughput per task.
  5. For addition of limits contact the AWS and the pricing is different.

 

Available Regions:

AWS DataSync is available in US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.

Functionality:

Implementation:

  • Select the Use Case
    • On-Premises to AWS
    • AWS to On-Premises
  • Create Agent.
    • Download the VMWare image from AWS Console and deploy on the On-Premises VMWare ESXi Hypervisor.
    • Activation Key is generated when On-Premise IP address connects to that Sync agent.
    • After generation of Activation key from AWS Console, need to pass it to the agent for successful creation of Agent.
  • Create Task by choosing Step1.
  • Source Location: Configure the location Type of On-Premise (NFS) and the agent who is working for DataSync.
  • Destination Location: Choose S3 or EFS and provide the folder names of user’s choice with IAM role you configure.
  • Tick the Boxes for following of User’s choice:
    • Enable Verification
    • Ownership.
    • Permissions.
    • Timestamps.
    • Setting Bandwidth (Use available / Setting Bandwidth (MiB/s))

Status:

Tasks are listed where the console consists of Task ID created by AWS, Source and destination location, created timestamp. Status of the task, soon the Transfer initiates is displayed in console with Duration, MiB/s, Files/sec, Task IDand start time. All the transfers are incremental and if there are no changes in On-Premises still AWS Console keeps the history that it checks the synch of On-Premises with Sync Agent and will be stored in history of DataSync Console.

Summary:

With DataSync, no need of any CLI cause the automation of manual work with no extra effort for parallelizing and number of transfers at a time. Security is entirely handled by AWS.

Over to you:

Now What??? Every service involves in storing large dataset to cloud has their own strength. Everyone knows about AWS Snowball Edge but now DataSync in AWS is a powerful pack of punch cause of its timely transfers for continuously generated data. Try with your own practices by using AWS DataSync.