Subhead: Migrating to Redshift leapfrogged primitive data warehouse infrastructure constraints
Problem: Popular tech company acquired Registry from Neustar in 2020. Prior to the acquisition popular tech company had an initiative to move their infrastructure to the AWS cloud. Virtue Tech was brought in during the project to migrate the Registry on-prem infrastructure to AWS. There had been several false starts and popular tech company needed experienced data architects to ensure they would meet the stringent SLA requirement dates to complete the migration to AWS. The Virtue Tech team developed an architecture and worked with the popular tech company management team to validate the proof the concepts. The project was divided into three phases and then delivered making sure that the final TSA deadline was met.
Registry had a lot of compliance reports from partners and registrar's looking for information. The data is from all over the world. China, Taiwan, Australia, Europe and of course from the U.S. The raw data comes from the various source pipelines, however Neustar had a very primitive infrastructure that had been built a long time ago, and it was not in the cloud. A few people manually maintained the infrastructure. The processes, upgrades and any issues that came up were all managed reactively and manually. There were no notifications for errors or performance related issues on various jobs that run in the background to generate these reports and get the data updated.
- There were several false starts prior to the Virtue Tech team being brought to the project. This meant they had only a few months to design a data architecture and complete the migration to stay in compliance with the SLA migration completion date.
- Report data was coming from multiple countries and clients and needed to be generated per contractual agreements with customers after the migration.
Solution:
The data architecture that Virtue Tech developed for the project is depicted in the diagram below. On the left you see the data sources. Some of the source data comes from a group called Narwhal, based in Melbourne, Australia. They are also now part of . The data is kept secure with encryption/decryption logic. The solution includes encryption technology with GPG keys that are maintained and set to expire after a certain predetermined time following security best practices. The raw data is in 50 to 60 tables. The data moves through the process using the reference tables, then it’s transformed into the intermediate tables and we may add a final set of tables too. Overall, there are between 150-250 tables in use every day.
The data scheduler is Managed Workflow for Apache Airflow (MWAA). Scheduler helps track back each task within a daily, weekly, or monthly load. AWS also provides a service to check and display the timing on a dashboard that management can view to see how things are working. The Docker maintains container logic which can be updated every couple of months. For example, you can check for any errors and team performance. Data can be made available for visualization in Tableau or other visualization tools.
Keys to success: Due to the experience and skills of the Virtue Tech team, they were able to quickly design and prove out the architecture. The team devised a three-phased approach to the project that enabled them to successfully complete the project within the final deadline.
Results:
- The popular tech company team was able to meet the overall SLA deadline for migrating to AWS cloud.
- Saved over $125,000 per year on labor cost of maintaining on-premise infrastructure.
- Cut the processing time for reports by 2/3rds.
- Enhanced security and flexibility with AWS cloud.
- Added alert capability with Slack and email integration that didn’t exist prior to the migration.
- No capital costs for migration.
Outcome and moving forward: The popular tech company team was able to meet the overall SLA deadline for migrating to the AWS cloud. At a minimum, the project has saved over $125,000 per year on labor cost of maintaining the on-premise infrastructure. The performance improvements were compelling too. Reports that were taking seven hours to run are now running in under two hours. This means that the processing time for reports was cut by more than 2/3rds. Popular tech company is also benefitting from the enhanced security and flexibility they get with the AWS cloud. The data architecture designed by Virtue Tech added alert capabilities that didn’t exist prior to the migration. Errors and performance metrics are being instantly communicated due to integrations with Slack and email. Plus, there were no capital costs for migrating to AWS. Due to the success of the project, there is currently interest and plans within popular tech company to add more data sources from Teradata, MMX, IN and others. Then there will be even more internal and external customers who want to get reports and insights from this Framework.