Twitter Linkedin-in Instagram
  • Services
    • Cloud Services / Cloud Solutions
      • Cloud Transformation Strategy / Cloud Migration Strategy
      • Cloud Migration Services | Cloud Data Migration
      • Cloud Managed Services | Cloud Data Management Services
    • Data Engineering & Analytics
      • Data Lakes
      • Data Engineering
      • BI Analytics
    • Artificial Intelligence
    • Professional Services
  • Industries
    • Telecommunications
    • Healthcare & Life Sciences
    • Financial Services
    • Media
    • Retail
    • Startup
    • Manufacturing
  • Aws
    • AWS Automation
    • AWS Migration
    • AWS Development
    • AWS Case Studies
  • Insights
    • Blogs
    • Case Studies
    • Resources
  • Events
  • About Us
    • About Virtuetech
    • Leadership Team
  • Career
  • Contact
Menu
  • Services
    • Cloud Services / Cloud Solutions
      • Cloud Transformation Strategy / Cloud Migration Strategy
      • Cloud Migration Services | Cloud Data Migration
      • Cloud Managed Services | Cloud Data Management Services
    • Data Engineering & Analytics
      • Data Lakes
      • Data Engineering
      • BI Analytics
    • Artificial Intelligence
    • Professional Services
  • Industries
    • Telecommunications
    • Healthcare & Life Sciences
    • Financial Services
    • Media
    • Retail
    • Startup
    • Manufacturing
  • Aws
    • AWS Automation
    • AWS Migration
    • AWS Development
    • AWS Case Studies
  • Insights
    • Blogs
    • Case Studies
    • Resources
  • Events
  • About Us
    • About Virtuetech
    • Leadership Team
  • Career
  • Contact

Clean Layer Case Study

Clean Layer Case Study

GoDaddy, the domain registration and management giant, is now migrating their majority of their infrastructure and data warehouse to AWS. GoDaddy group decided to go with AWS due to its deep experience in delivering a highly reliable global infrastructure, as well as an unmatched track record of technology innovation, to support their rapidly expanding business.

The Challenge

GoDaddy is the world's largest web host by market share, with over 62 million registered domains. This global growth has led to extremely large amount of data being generated across various projects and teams. Multiple teams of GoDaddy wanted to streamline their process of cleaning data before loading it into Amazon S3. This called for a generic configuration based framework to reduce the development efforts, which can be used across the different teams of GoDaddy.

Why Amazon Web Services

Running a cleaning process on more than 40 tables with huge amounts of data requires a scalable and cost-effective infrastructure, which is why Virtue Tech recommended GoDaddy to go with AWS as the capacity provided by AWS is a perfect fit. It gives the broadest and deepest portfolio of purpose-built analytics services optimized for unique analytics use cases. These services are all designed to be the best in class, which means we never have to compromise on performance, scale, or cost when using them. Spark on Amazon EMR runs 3x faster than standard Apache Spark 3.0 and we can run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions.

Running Critical Applications on AWS

GoDaddy provisions Amazon Elastic Map Reduce (Amazon EMR) cluster to run generic configuaration based Pyspark framework, which has automated the entire process of loading the clean data to AWS Glue. This framework is developed in such a way that user is given option of handle incremental as well historical data for both partitioned and non-partitioned tables. In addition to that, it supports txt, csv, json and parquet input file formats. It also handles slowly changing dimensions (SCD), depending upon the usecase. Column mapping feature of this framework helps the user to map the raw column name from source to new column names in AWS Glue tables, which enables all the stakeholders from GoDaddy to run their SQL like queries and analyze the same on AWS glue and Amazon Athena. Not only is the process of querying now simpler, but the queries themselves also take much less time to complete, since the queries now runs on Amazon Athena, and thus reports takes no time to generate, which earlier used to take lots of time.  

GoDaddy Clean Layer System Configuration Diagram

The Benefits

GoDaddy’s decision to move to a new AWS-based architecture to streamline the process of loading clean data across multiple teams helped the company save on time and money. With an on-premise solution, developing the framework for each team, used to take more than two weeks of time; however, after migrating to AWS and streamlining the process, the process takes hardly a day or two, as only system configuration file needs to be created.  Running framework on Amazon EMR not only made the process 3x times faster but also saved on more than 50-80% of total costs.

Share your thoughts with us at contact.us@virtuetechinc.com on how these innovations will help you.

Recent Posts

  • How To Build A Chrome Extension
  • Dataset Metadata
  • Data Governance And Its Top Use Cases
  • Blockchain & NFT
  • Improve Observability Using AWS X-Ray

Category

Categories

  • AI
  • AWS Automation
  • AWS Development
  • AWS Migration
  • Blog
  • Career
  • Case Studies
  • Cloud
  • Data
  • Home
  • IOT
  • ML
  • Virtue Tech

Lets
Build
Your
website

Enquire Now

Follow Us

Twitter Icon-linkedin Instagram

Blog

Related Articles

How To Build A Chrome Extension

Building an extension over chrome browser adds a lot

Dataset Metadata

Gone are the days when data is the only

Data Governance And Its Top Use Cases

Data has become a core strategic asset that not

See More

We are a team of highly skilled professionals with 20+ years of experience, who are lock and step with the industry 4.0 journey and evolution.

Follow Us
Twitter Instagram Icon-linkedin
Services
  • Cloud Services
  • Data Services
  • Internet of things
  • AI | ML
  • Professional
  • Cloud Services
  • Data Services
  • Internet of things
  • AI | ML
  • Professional

Industries

  • Healthcare & Life Sciences
  • Manufacturing
  • Media
  • Retail
  • Telecommunications
  • Financial Services
  • Healthcare & Life Sciences
  • Manufacturing
  • Media
  • Retail
  • Telecommunications
  • Financial Services

About Us

  • About Virtuetech
  • Blogs
  • Case Studies
  • Contact Us
  • Careers
  • About Virtuetech
  • Blogs
  • Case Studies
  • Contact Us
  • Careers

2023 © Copyrights VirtueTech Inc | Privacy Policy | Disclaimer