Clean Layer Case Study Clean Layer Case Study
  • Services
    • Cloud Services | Cloud Solutions
      • Cloud Transformation Strategy | Cloud Migration Strategy
      • Cloud Migration Services | Cloud Data Migration
      • Cloud Managed Services | Cloud Data Management Services
    • DATA ENGINEERING & ANALYTICS
      • Data Lakes
      • Data Engineering
      • BI Analytics
    • Artificial Intelligence
    • PROFESSIONAL SERVICES
  • Industries
    • Telecommunications
    • Healthcare & Life Sciences
    • Financial Services
    • Media
    • Retail
    • Startup
    • Manufacturing
  • AWS
    • AWS Automation
    • AWS Migration
    • AWS Development
    • AWS Case Studies
  • Insights
    • Case Studies
    • Blogs
  • About Us
    • About VirtueTech
    • Leadership Team
  • Careers
  • Let’s Connect
  • Services
    • Cloud Services | Cloud Solutions
      • Cloud Transformation Strategy | Cloud Migration Strategy
      • Cloud Migration Services | Cloud Data Migration
      • Cloud Managed Services | Cloud Data Management Services
    • DATA ENGINEERING & ANALYTICS
      • Data Lakes
      • Data Engineering
      • BI Analytics
    • Artificial Intelligence
    • PROFESSIONAL SERVICES
  • Industries
    • Telecommunications
    • Healthcare & Life Sciences
    • Financial Services
    • Media
    • Retail
    • Startup
    • Manufacturing
  • AWS
    • AWS Automation
    • AWS Migration
    • AWS Development
    • AWS Case Studies
  • Insights
    • Case Studies
    • Blogs
  • About Us
    • About VirtueTech
    • Leadership Team
  • Careers
  • Let’s Connect
  •  

AWS Development

Category: AWS Development

Clean Layer Case Study

GoDaddy, the domain registration and management giant, is now migrating their majority of their infrastructure and data warehouse to AWS. GoDaddy group decided to go with AWS due to its deep experience in delivering a highly reliable global infrastructure, as well as an unmatched track record of technology innovation, to support their rapidly expanding business.

The Challenge

GoDaddy is the world’s largest web host by market share, with over 62 million registered domains. This global growth has led to extremely large amount of data being generated across various projects and teams. Multiple teams of GoDaddy wanted to streamline their process of cleaning data before loading it into Amazon S3. This called for a generic configuration based framework to reduce the development efforts, which can be used across the different teams of GoDaddy.

Why Amazon Web Services

Running a cleaning process on more than 40 tables with huge amounts of data requires a scalable and cost-effective infrastructure, which is why Virtue Tech recommended GoDaddy to go with AWS as the capacity provided by AWS is a perfect fit. It gives the broadest and deepest portfolio of purpose-built analytics services optimized for unique analytics use cases. These services are all designed to be the best in class, which means we never have to compromise on performance, scale, or cost when using them. Spark on Amazon EMR runs 3x faster than standard Apache Spark 3.0 and we can run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions.

Running Critical Applications on AWS

GoDaddy provisions Amazon Elastic Map Reduce (Amazon EMR) cluster to run generic configuaration based Pyspark framework, which has automated the entire process of loading the clean data to AWS Glue. This framework is developed in such a way that user is given option of handle incremental as well historical data for both partitioned and non-partitioned tables. In addition to that, it supports txt, csv, json and parquet input file formats. It also handles slowly changing dimensions (SCD), depending upon the usecase. Column mapping feature of this framework helps the user to map the raw column name from source to new column names in AWS Glue tables, which enables all the stakeholders from GoDaddy to run their SQL like queries and analyze the same on AWS glue and Amazon Athena. Not only is the process of querying now simpler, but the queries themselves also take much less time to complete, since the queries now runs on Amazon Athena, and thus reports takes no time to generate, which earlier used to take lots of time.  

GoDaddy Clean Layer System Configuration Diagram

The Benefits

GoDaddy’s decision to move to a new AWS-based architecture to streamline the process of loading clean data across multiple teams helped the company save on time and money. With an on-premise solution, developing the framework for each team, used to take more than two weeks of time; however, after migrating to AWS and streamlining the process, the process takes hardly a day or two, as only system configuration file needs to be created.  Running framework on Amazon EMR not only made the process 3x times faster but also saved on more than 50-80% of total costs.

Read More
JIRA Case Study

GoDaddy, the domain registration and management giant, is now migrating their majority of their infrastructure and data warehouse to AWS. GoDaddy group decided to go with AWS due to its deep experience in delivering a highly reliable global infrastructure, as well as an unmatched track record of technology innovation, to support their rapidly expanding business.

The Challenge

Jira Software is a project management tool that supports any agile methodology, featured with agile boards, backlogs, roadmaps, reports. This Jira data is gathered over the period as various teams logs and tracks their projects work and issues using Jira Software. Python API calls pulls the data from Jira in the form of nested JSON files into Amazon S3. To analyze this semi-structured data so as to get the business insights form it, was a huge challenge for the GoDaddy analysts

Why Amazon Web Services

Amazon Simple Storage Service (Amazon S3) provides secure, durable, and highly scalable storage for structured and semi structured data and is the preferred storage service of choice for the data lake. Amazon Redshift can also efficiently query and retrieve data from json files in S3 without having to load the data into Amazon Redshift native tables. We can create Amazon Redshift external tables by defining the structure for files and registering them as tables in the AWS Glue Data Catalog. The advantage of this approach is that it is very simple and we are using only native AWS services, which are all closely integrated.

 Running Critical Applications on AWS

Virtue Tech implemented Extract-Load-Transform (ELT) approach to deliver quick results. Python data extraction framework calls the Jira API to extract the data from JIRA and loads the data into Amazon S3 in the form of JSON files without performing any transformations. This raw extraction process is scheduled to run daily through Apache Airflow.  This data is cleaned through the PySpark framework running on Amazon Elastic Map Reduce (Amazon EMR), which cleans the data, converts it into a readable format, and stores the data in parquet format. Once data is cleaned and transformed, we copy the data into redshift for easy reporting and analytics. In case of success or failure, an Email is sent to the respective teams using Amazon Simple Email Service (SES).

GoDaddy Jira Generic System Configuration Diagram

The Benefits

Semi-structured JSON data from Jira was loaded into Amazon Redshift for easy reporting. Using Redshift and Amazon EMR, GoDaddy has greatly improved the ability of internal teams to quickly access and analyze data, allowing for easy scaling as the business grows. With AWS, not only scaling is much more effective, but also query’s performance has improved from hours to mere seconds. This results in richer, real-time analytics that is benefitting all of GoDaddy’s business teams

Read More
GoPay Case Study

GoDaddy, the leading cloud platform dedicated to small, and independent ventures. Helping take millions of ideas into the digital space and providing an all-in-one place to make your digital presence. The company’s cloud platform provides 20+ million customers and entrepreneurs to make their mark on the digital landscape by providing various hosting, website building and ecommerce support.

The Challenge

As E-commerce evolves, so does the needs that new entrepreneurs have. For them to compete with the largest of stores, they too require a payment gateway that would allow convenient online subscription payments. For this, GoDaddy looked to acquire GoPay. It matched the current needs of the intricate web store infrastructure and could provide it to their customers. The challenge was to merge the newly acquired company into an application for GoDaddy’s customers. Re-engineering the existing infrastructure for actionable insights and strategic development.

Key Challenges:  

  • The data was not ready for Business Intelligence & Analytics.
  • The information systems between the client and the company were not compatible forming Data Silos.
  • Adoption of the standard practices of authorization was a requirement set by client.
  • GoPay was hosted on a PCI AWS account.

Why AWS?

Both GoDaddy and GoPay had been using AWS for certain pieces of infrastructures, as it was the best option for those services. Building on top of that Virtue Tech suggested GoDaddy to leverage the large amount of transactional data being created for GoPay and create a bespoke scripting tool for extraction and migration of the data into a newly created data lake. With its unlimited storage capable of storing the terabytes of daily transactional data, the Amazon Simple Storage Service (S3) was used to implement the data lake. S3 is able to provide a secure, durable, and scalable storage for large data collections, and integrates with Amazon’s EMR. EMR is a fully managed extract, transform and load (ETL) service, which is used with PySpark and Amazon Athena to provide data ingestion and processing.  

Benefits of AWS?

  • Swiftly migrated, updating 120 tables in just 2 months.
  • Standard practices of authorization were introduced, reinforcing enterprise grade security throughout the GoDaddy infrastructure.
  • Enabled rapid testing of new tools and technologies, with the increased availability of data.
  • Removal of Data Silos, led to removal of redundant values, providing significant savings in data center costs.
  • Reduced costs for infrastructure maintenance by 50%, while also cutting licensing costs by integrating open-source tools.
Read More

Latest Blogs

  • BLOCKCHAIN ANALYTICS & ITS POTENTIAL USE-CASES
  • Amazon Redshift and its high-performance ingredients
  • DataOps: Future of Businesses in Data World

We are a team of highly skilled professionals with 20+ years of experience, who are lock and step with the industry 4.0 journey and evolution.
Email : contact.us@virtuetechinc.com
  |     |  

Services
  • CLOUD SERVICES
  • DATA SERVICES
  • INTERNET OF THINGS
  • AI | ML
  • PROFESSIONAL
Industries
  • TELECOMMUNICATIONS
  • HEALTHCARE & LIFE SCIENCE
  • FINANCIAL SERVICES
  • MEDIA | RETAIL | STARTUP
  • MANUFACTURING
About Us
  • ABOUT VIRTUETECH
  • CAREER
  • CONTACT US
  • CASE STUDIES
  • BLOGS
2020 © copyrights VIRTUETECH | PRIVACY POLICY | DISCLAIMER