SAYO SANU

Real-time Analytics Dashboard

Capstone Project: Carnegie Mellon Univeristy, Masters in Information Systems Management (concentration in Business Intelligence and Data Analytics)
Data architecture design for flexible and scalable solution that provides real-time insights into critical system's performance at major US pharmaceutical chain.

Project Goals: Our primary goal was to design an IT architecture that will allow executives in the office of the CIO of a $136 billion pharmaceutical chain to monitor and report on the health and performance indicators of critical retail systems (e.g. POS, Supply Chain systems) across the United States in real-time.
A sample of indicators to be monitored include:
(i) Receipt Print Time
(ii) Payment Authorization Time
(iii) Item Scan Time


Design Approach: The architecture design was broken up into 4 logical phases from the source data to final output.
Oops! I can't find this image right now.
For this project, the visualization phase was out of scope. We researched and identified potential solutions that met client requirements for all in-scope phases (ingestion, storage and transformation). The architectural design options were evaluated using a scorecard for a final recommendation before tools and vendors were selected.

After research, the team proposed 3 architectural options optimized for different strengths.
Peridot: This architectural design was proposed as a light-weight technology stack.
Data is extracted from source systems using traditional batch ETL solutions. The data is loaded into a cold storage data lake for adhoc queries and basic data analysis. From the data lake, the data is stored in a downstream datawarehouse for a 30-day period. Subsets of the transactional data are then stored in data marts for easier and faster access on the dashboard. Pre-defined rules are applied before analytics data is displayed on the dashboard.

The downside of this design is degraded performance in providing "near real-time" analytics because of the number of components. Also, there is an impact on solution reliability and fault-tolerance using traditional batch ETL process.
Oops, I can't find this image right now

Amethyst: Our second recommendation uses a messaging queue to ingest data from the source systems. Messaging systems allow for asynchronous communication, decoupling systems and improving fault-tolerance. The data is transformed using a stream processing application that can process continuous data input. All data is stored in a data lake for adhoc queries, advanced analytics and predictive modeling. Some stream processing applications provide advanced modeling. Data is stored in datawarehouses for a shorter period for historical analysis. Subsets of the transactional data are then stored in data marts for easier and faster access on the dashboard. Pre-defined rules are applied before analytics data is displayed on the dashboard.


Oops, I can't find this image right now


Andensine: Our final recommendation also uses a messaging queue to ingest data from the source systems. The data is also transformed using a stream processing application that can process continuous data input. The stream application is integrated with the analytics dashboard as a downstream system. All data is stored in a cold storage data lake for adhoc queries, advanced analytics and predictive modeling, and more recent historical data is available in variety of databases (NoSQL, SQL, etc.) for quicker analysis.
Oops, I can't find this image right now


Following these proposals, we evaluated each architectural option against our client's key requirements using a weighted scorecard.

Oops, I can't find this image right now
We scored each architectural option based on our knowledge of information systems, our research, and counsel from esteemed client and advisors.
Oops, I can't find this image right now
Andensine and Amethyst options with messaging queues and streaming applications scored hire on scalability and flexibility, which was a key requirements. Our client wanted to keep the solution flexible for changing business needs. Business changes may come with a growth in data volume and processing requirements. Also, messaging systems provide transparency (our source system will not need to be concerned about the location/address of the source system) and can ingest variety data formats (structured and unstructured).

These options both scored low on total cost of ownership. We conducted a cost-analysis based on our best estimates of products available on Microsoft Azure's platform.

Adensine architecture was our final recommendation to enable real-time insights on the health and performance of critical systems in a retail and e-commerce organization.
Andensine's architecture and components allow our client to grow and take advantage advanced technology strategies.

Oops, I can't find this image right now

Please see capstone poster for more details on this project.