GoDaddy, the domain registration and management giant, is now migrating their majority of their infrastructure and data warehouse to AWS. GoDaddy group decided to go with AWS due to its deep experience in delivering a highly reliable global infrastructure, as well as an unmatched track record of technology innovation, to support their rapidly expanding business.
Jira Software is a project management tool that supports any agile methodology, featured with agile boards, backlogs, roadmaps, reports. This Jira data is gathered over the period as various teams logs and tracks their projects work and issues using Jira Software. Python API calls pulls the data from Jira in the form of nested JSON files into Amazon S3. To analyze this semi-structured data so as to get the business insights form it, was a huge challenge for the GoDaddy analysts
Why Amazon Web Services
Amazon Simple Storage Service (Amazon S3) provides secure, durable, and highly scalable storage for structured and semi structured data and is the preferred storage service of choice for the data lake. Amazon Redshift can also efficiently query and retrieve data from json files in S3 without having to load the data into Amazon Redshift native tables. We can create Amazon Redshift external tables by defining the structure for files and registering them as tables in the AWS Glue Data Catalog. The advantage of this approach is that it is very simple and we are using only native AWS services, which are all closely integrated.
Running Critical Applications on AWS
Virtue Tech implemented Extract-Load-Transform (ELT) approach to deliver quick results. Python data extraction framework calls the Jira API to extract the data from JIRA and loads the data into Amazon S3 in the form of JSON files without performing any transformations. This raw extraction process is scheduled to run daily through Apache Airflow. This data is cleaned through the PySpark framework running on Amazon Elastic Map Reduce (Amazon EMR), which cleans the data, converts it into a readable format, and stores the data in parquet format. Once data is cleaned and transformed, we copy the data into redshift for easy reporting and analytics. In case of success or failure, an Email is sent to the respective teams using Amazon Simple Email Service (SES).
GoDaddy Jira Generic System Configuration Diagram
Semi-structured JSON data from Jira was loaded into Amazon Redshift for easy reporting. Using Redshift and Amazon EMR, GoDaddy has greatly improved the ability of internal teams to quickly access and analyze data, allowing for easy scaling as the business grows. With AWS, not only scaling is much more effective, but also query’s performance has improved from hours to mere seconds. This results in richer, real-time analytics that is benefitting all of GoDaddy’s business teams