Plug and play forecasting analytics using data compression and sql reduction
The end goal is to increase derisking and predictability for the end clients at lower costs.
Methodology for achieving the required ends for the end client:
- Turbodata adopted the Inmon methodology for datawarehouse development so that multiple data sources could be added onto the same datawarehouse. That is the change from one data source to another was done with ease. More details on the attached web page link: http://mnnbi.com/mnnbi_tallydataconsolidation.html
-
The benefits of the normalization of data were as follows:
- The incremental data load took minimum time and had minimum impact on the source system. The ETL team was able to commit the incremental data load to a maximum of 2GB RAM from multiple source systems. The source systems did not hang with the incremental data load working.
- Error handling was done with ease at staging layer.
- Massive data compression took place due to reduced data redundancies.
- The business logic was coded between staging and the ODS layers thereby reducing the length of the final sql code.
The attached video shows a more detailed description of the benefits listed above:
https://www.youtube.com/watch?v=k9Oe9EDRfTM&feature=youtu.be
The joins were reduced in the data mart layer(over which a reporting layer was built).
The ETL team was able to develop extremely complex reports using the datawarehouse as in the attached sample:
https://www.youtube.com/watch?v=1ySNrgI4Na0&feature=youtu.be
Due to the data compression for most projects the ETL team are able to bring the data within 1 GB. Hence the desktop version of Microsoft Power BI could be used free of cost for the end client.
Reducing the cost of predictive analytics solutions
Most of the end clients use high end predictive tools over the datawarehouse/ over the direct data extract from various source databases. With large datasets predictive analytics using in memory solutions entails high usage of RAM. The ETL team has gone around this issue in the following manner:
-
A seamless environment was created for ETL, reporting and thereafter predictive analytics on SQL/C# and .Net. The reasons for the same are attached herewith:
- Maintenance becomes easier since there is a single platform for all aspects.
- The cost comes down since the resources to be used for ETL can also be used for predictive analytics
- Error handling becomes very easy since errors can be captured before in the
Hypothesis testing
Based on the hypothesis testing, the ETL team developed ARIMA analysis and Market Basket analysis in SQL using seamless integrated set of stored procedures. That is the ARIMA analysis flowed from the datawarehouse A,B,C categorization. The ETL team thus reduced the requirement for high end R and Python developers to code over the datawarehouse thereby presenting a seamless solution to the end client on a 8GB RAM machine.
Benefits to the end client:
- The end client gets immediate and confirmed peace of mind and satisfaction through immediate deployment of predictive and forecasting analytics modules.
- No additional hardware/software requirements need to be taken
- The costs are way lower for the end client.
- Large scale deployment is possible with the given set of solutions.
Please check the attached video for the same:
https://www.youtube.com/watch?v=PQfsfWLkaP8&feature=youtu.be
A more detailed video is attached herewith:
https://www.youtube.com/watch?v=7DoZDBay-bQ
Epilogue
The ETL team has been inspired by the following management books:
- ‘Profit Beyond Measure’ by Thomas Johnson and Anders Brohms
- McKinsey Mind by Ethan Rasiel and Paul Friga.
- Blue Ocean Strategy by W. Chan Kim and Renee Mauborgne
- Better India, Better World by Narayana Murthy
|