| Title | Rodriguez, Heberto MCS 2025 |
| Alternative Title | WHAT IS THE IMPACT OF GZIP, LZ77, AND LZW COMPRESSION ON MQTT; COMMUNICATION EFFICIENCY IN IOT DEVICES? |
| Creator | Rodriguez, Heberto |
| Collection Name | Master of Computer Science |
| Description | The focus of this study is to examine the performance of well known compression algorithms GZIP,; LZ77 and LZW in the context of IoT devices when transmitting textual logs, especially for important; applications like alert systems for veterans and the elderly where reliable and efficient data; transmission is crucial. |
| Abstract | The focus of this study is to examine the performance of well known compression algorithms GZIP,; LZ77 and LZW in the context of IoT devices when transmitting textual logs, especially for important; applications like alert systems for veterans and the elderly where reliable and efficient data; transmission is crucial. These compression algorithms will be evaluated through two distinct experiments; each independently compressing a JSON text string and subsequently sending these through; Message Queuing Telemetry Transport (MQTT) which is a messaging protocol that allows Internet; of Things (IoT) devices to communicate with each other. Phase 1 of the research will focus on; benchmarking compression performance in a controlled environment. This is done by generating; random JSON objects and converting it into string format. This message structure will be fed to; each compression algorithm one thousand times. This phase is direct and is not affected by external; factors such as network latency or api connectivity. It serves to assess how well these algorithms do; in an ideal condition. Phase 2 simulates a real-time device. This experiment introduces real-world; constraints such as network latency, cellular connectivity, and external API calls. The device will; obtain real-time geolocation by using Google's Api and determine if it is in certain geofence boundaries; as well as simulate IoT sensor data such as random heart rate and blood pressure information; transmitting these health and location data at regular intervals. The device will be connected to a; constant cellular internet connection. These data are then compressed and transmitted over Message; Queuing Telemetry Transport (MQTT). The receiving side of MQTT will use decompression; algorithms to decompress the data acquired. Several key performance benchmark metrics such as; compression ratio, compression and decompression speed, latency and data size to determine their; impact on data transmission are saved to a PostgreSQL database. By looking closely at how compression; affects MQTT communications, this research provides valuable insights into making data; transmission in IoT environments more efficient, such that critical information can be sent reliably; without losing performance. The findings showed that Gzip consistently provided the best balance; of compression efficiency, speed, and reliability for transmitting JSON over MQTT. This research; contributes to the understanding of how compression techniques impact IoT communication and; offers insights for improving data transmission in real-world communication where timely and reliable; alerts are essential. |
| Subject | Computer science; Algorithms; Communication--Research; Communication |
| Digital Publisher | Digitized by Special Collections & University Archives, Stewart Library, Weber State University. |
| Date | 2025 |
| Medium | theses |
| Type | Text |
| Access Extent | 101 page pdf |
| Conversion Specifications | Adobe Acrobat |
| Language | eng |
| Rights | The author has granted Weber State University Archives a limited, non-exclusive, royalty-free license to reproduce his or her thesis, in whole or in part, in electronic or paper form and to make it available to the general public at no charge. The author retains all other rights. For further information: |
| Source | University Archives Electronic Records: Master of Computer Science. Stewart Library, Weber State University |
| OCR Text | Show WHAT IS THE IMPACT OF GZIP, LZ77, AND LZW COMPRESSION ON MQTT COMMUNICATION EFFICIENCY IN IOT DEVICES? By Heberto Rodriguez A thesis Submitted to the faculty of the MSCS Graduate Program of Weber State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Computer Science Graduation April 25, 2025 Ogden, Utah Approved: Brian Rague Committee Chair, Ph.D. Abdulmalek Al-Gahmi -Gaahhm mii (Apr 23, 2025 09:56 MDT) AbdAubldmualm leakleAk lA-lG Committee member, Ph.D. Hugo Valle (Apr 23, 2025 09:23 PDT) Hugo Valle Committee member, Ph.D. Date: Copyright © 2025 Heberto Rodriguez All Rights Reserved ii ACKNOWLEDGMENTS I am profoundly thankful to my wife and children, whose unwavering love, patience, and encouragement carried me through this journey. Their sacrifices made this achievement possible, and I am forever grateful. I would like to express my deepest gratitude to my committee chair, Dr. Brain Rague, and my committee members, Dr. Abdulmalek Al-Gahmi and Dr. Hugo Valle, for their invaluable guidance, encouragement, and expertise throughout the course of this research. Their support helped shape this work and brought it to completion. I would also like to thank my extended family, friends, and colleagues for their constant support and motivation. Special thanks go to the faculty and staff of the Department of Computer Science at Weber State University for creating an enriching and supportive environment during my graduate studies. iii ABSTRACT The focus of this study is to examine the performance of well known compression algorithms GZIP, LZ77 and LZW in the context of IoT devices when transmitting textual logs, especially for important applications like alert systems for veterans and the elderly where reliable and efficient data transmission is crucial. These compression algorithms will be evaluated through two distinct experiments each independently compressing a JSON text string and subsequently sending these through Message Queuing Telemetry Transport (MQTT) which is a messaging protocol that allows Internet of Things (IoT) devices to communicate with each other. Phase 1 of the research will focus on benchmarking compression performance in a controlled environment. This is done by generating random JSON objects and converting it into string format. This message structure will be fed to each compression algorithm one thousand times. This phase is direct and is not affected by external factors such as network latency or api connectivity. It serves to assess how well these algorithms do in an ideal condition. Phase 2 simulates a real-time device. This experiment introduces real-world constraints such as network latency, cellular connectivity, and external API calls. The device will obtain real-time geolocation by using Google’s Api and determine if it is in certain geofence boundaries as well as simulate IoT sensor data such as random heart rate and blood pressure information transmitting these health and location data at regular intervals. The device will be connected to a constant cellular internet connection. These data are then compressed and transmitted over Message Queuing Telemetry Transport (MQTT). The receiving side of MQTT will use decompression algorithms to decompress the data acquired. Several key performance benchmark metrics such as compression ratio, compression and decompression speed, latency and data size to determine their impact on data transmission are saved to a PostgreSQL database. By looking closely at how compression affects MQTT communications, this research provides valuable insights into making data transmission in IoT environments more efficient, such that critical information can be sent reliably without losing performance. The findings showed that Gzip consistently provided the best balance of compression efficiency, speed, and reliability for transmitting JSON over MQTT. This research contributes to the understanding of how compression techniques impact IoT communication and offers insights for improving data transmission in real-world communication where timely and reliable alerts are essential. iv TABLE OF CONTENTS Page ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective and Research Significance . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 2.2 2.3 2.4 2.5 The Role of IoT in Data Transmission . . . . . . . . . . . . . . . . . . . . . . . MQTT: A Lightweight Protocol for IoT . . . . . . . . . . . . . . . . . . . . . . . Data Compression for IoT: Necessity and Impact . . . . . . . . . . . . . . . . . . Evaluating Compression Techniques for MQTT . . . . . . . . . . . . . . . . . . Summary and Research Gap - summarizing all these studies . . . . . . . . . . . . 5 6 7 8 10 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 3.2 3.3 3.4 3.5 3.6 3.7 11 11 13 15 16 17 18 19 19 20 20 20 21 21 23 27 27 Introduction 1.1 1.2 1.3 2 3 3.8 3.9 The Role of IoT in Data Transmission . . . . . . . . . . . . . . . . . . . . . . . IoT Device Configuration (Transmitter) . . . . . . . . . . . . . . . . . . . . . . . EC2 Server and Broker Configuration . . . . . . . . . . . . . . . . . . . . . . . . Additional Hardware and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Configuration and Access Rules . . . . . . . . . . . . . . . . . . . . . . Google Geolocation and Geofence Setup . . . . . . . . . . . . . . . . . . . . . . Compression Algorithm Implementation . . . . . . . . . . . . . . . . . . . . . . 3.7.1 GZIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 LZ77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3 LZW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.4 Raw JSON (Baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.5 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Testing Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.2 Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Metrics and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.0.1 Compressed Data Size . . . . . . . . . . . . . . . . . . . . . . v 3.9.0.2 3.9.0.3 3.9.0.4 3.9.0.5 3.9.0.6 3.9.0.7 3.9.0.8 3.9.0.9 3.9.0.10 3.9.0.11 3.9.0.12 4 Compression Ratio . . . . . . . . . . . . . . . . . . . . . . . . Compression Time . . . . . . . . . . . . . . . . . . . . . . . . Packet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Integrity Check . . . . . . . . . . . . . . . . . . . . . . . Decompression Time . . . . . . . . . . . . . . . . . . . . . . . MQTT Publish Time . . . . . . . . . . . . . . . . . . . . . . . Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . Battery Uptime . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 27 28 28 28 28 28 28 29 29 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1 Phase 1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Compression Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Phase 1 Compression Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Phase 1 Compression vs Decompression Time . . . . . . . . . . . . . . . 4.1.4 Phase 1 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Phase 1 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.6 Phase 1 Packet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.7 Phase 1 Publish Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.8 Phase 1 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.9 Phase 1 Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.10 Phase 1 Battery Uptime . . . . . . . . . . . . . . . . . . . . . . . . . . . Phase 2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Phase 2 Compression Success Across 72 Alert Packets . . . . . . . . . . 4.2.2 Phase 2 Compression Time . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Phase 2 Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Phase 2 Battery Uptime . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Phase 2 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Phase 2 Throughput, Latency, Publish Time . . . . . . . . . . . . . . . . 31 31 32 33 35 36 37 37 38 39 40 42 43 44 45 46 47 48 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 5.2 53 54 4.2 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 APPENDIX A: Additional Figures or Code . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.1 A.2 A.3 A.4 A.5 A.6 Comparison of GZIP vs. RAW JSON Compression . . . . . . . . . . . . . . . . PostgreSQL Publisher Metrics Table . . . . . . . . . . . . . . . . . . . . . . . . PostgreSQL Subscriber Metrics Table . . . . . . . . . . . . . . . . . . . . . . . . PostgreSQL Battery Uptime Metrics Table . . . . . . . . . . . . . . . . . . . . . Main Script (Publisher) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Util Script (Publisher) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 58 58 59 59 59 73 A.7 A.8 A.9 A.10 A.11 A.12 A.13 Compression Packet Tester Script (Publisher) . . . . . . . . . . . . . . . . . . . . MQTT Publisher Connection Script . . . . . . . . . . . . . . . . . . . . . . . . . Database Connection Script (Publisher) . . . . . . . . . . . . . . . . . . . . . . . LZ77 Script (Publisher/Subscriber) . . . . . . . . . . . . . . . . . . . . . . . . . LZW Script (Publisher/Subscriber) . . . . . . . . . . . . . . . . . . . . . . . . . Main Script (Subscriber) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subscriber Metrics Collector Script (Subscriber) . . . . . . . . . . . . . . . . . . vii 75 76 78 82 85 87 90 LIST OF TABLES Table Page 3.1 3.2 3.3 3.4 INIU Power Bank Specifications . . . . . . . . . . . . . . . . . . . . . . . . . Defined Geofences Used in the Simulation . . . . . . . . . . . . . . . . . . . . Structure of a Data Packet Sent During Phase 1 . . . . . . . . . . . . . . . . . . Structure of a Data Packet Sent During Phase 2 . . . . . . . . . . . . . . . . . . 15 17 23 26 4.1 4.2 4.3 41 43 4.4 4.5 Coefficient of Variation and Quartile Coefficient of Dispersion . . . . . . . . . . Battery Uptime Summary Statistics (in Hours) . . . . . . . . . . . . . . . . . . Packet Reception Rates by Compression Algorithm During Phase 2 (Out of 72 Total Packets Sent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coefficient of Variation and Quartile Coefficient of Dispersion – Phase 2 . . . . Battery Uptime Summary Statistics (in Hours) – Phase 2 . . . . . . . . . . . . . A.1 A.2 A.3 A.4 Comparison of GZIP vs. RAW JSON Compression . . . . . . . . . . . . . . . Structure of phase1 compressionmetrics and phase2 compressionmetrics . . . . Structure of phase1 subscribermetrics and phase2 subscribermetrics . . . . . . Structure of phase1 batterytest and phase2 batterytest . . . . . . . . . . . . . . 58 58 59 59 viii 44 50 50 LIST OF FIGURES Figure 1.1 Page MQTT Process.The standard process of an MQTT client sending a message to the broker. Two subscriber clients receiving the message from the broker. . . . . 2 2.1 MQTT QOS Levels. Image for the three QOS levels utilized by MQTT. . . . . . 6 3.1 Raspberry Pi 3 Model B+ used as the IoT data transmitter. The device is connected via hotspot to simulate a mobile environment and is responsible for generating, compressing, and transmitting data to the MQTT broker. . . . . . . . . AWS Ec2 Instances. The three instances utilized for testing. Subscriber client, Mosquitto MQTT Broker and Postgres DB . . . . . . . . . . . . . . . . . . . . MQTT QOS Levels. Image for the three QOS levels utilized by MQTT. . . . . . Phase 1: Phase 1 controlled environment workflow . . . . . . . . . . . . . . . . Phase 2: Phase 2 real-world simulation workflow . . . . . . . . . . . . . . . . . 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 Average Compressed Size Across Algorithms for 20MB Payload. This graph compares the effectiveness of each compression algorithm in reducing data size. Lower values indicate better compression efficiency. . . . . . . . . . . . . . . . Average Compression Ratio Across Algorithms for 20MB Payload. This chart includes a RAW baseline to compare compression performance. Higher ratios indicate better compression efficiency compared to the original uncompressed data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average Compression and Decompression Time. Shows the full time of compression with decompression for each algorithm. . . . . . . . . . . . . . . . . Checksum Integrity Check. A checksum calculation from the original data sent to the receiver. The receiver recalculates the checksum from the received data and compares it with the transmitted checksum. If the values match, the data is considered intact; otherwise, it indicates possible data corruption. . . . . . . . Average Throughput per Algorithm. This bar chart visualizes the efficiency of data transmission, calculated as compressed size divided by publish time (bytes per second). Higher throughput indicates faster, more efficient delivery. . . . . . Average Latency per Algorithm. This chart shows how long it took for each packet to travel from publisher to subscriber. Latency is crucial in evaluating real-time system responsiveness. . . . . . . . . . . . . . . . . . . . . . . . . . Phase 1 Packet Delivery. All algorithms successfully delivered packets in a controlled environment with sequential compression, confirming reliable operation without packet loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MQTT Publish Time Variability. This bar graph shows the range and distribution of publish times per algorithm, reflecting consistency and delay variability in network transmission. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU Usage During Compression. This grouped bar chart illustrates CPU usage before, during, and after compression for each algorithm, indicating processing intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average Memory Usage per Algorithm. This bar chart shows the memory (RAM) consumed during compression. Lower memory usage is favorable for resourceconstrained IoT devices like Raspberry Pi. . . . . . . . . . . . . . . . . . . . . ix 12 13 15 21 24 32 33 34 35 36 37 38 39 40 41 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 Average Battery Usage per Algortihm. This bar chart shows the uptime or battery consumption of each algorithm full process: Compression, Transmitting via MQTT and saving metrics in Postgres. . . . . . . . . . . . . . . . . . . . . . . 42 Packet Delivery Validation per Algorithm (Phase 2). Packet Loss Across Algorithms in Phase 2: Each of the 72 packets generated represented a real-time alert. GZIP and RAW maintained high success rates, while LZ77 and LZW fell behind due to slower processing speeds. . . . . . . . . . . . . . . . . . . . . . 45 Average Compression and Decompression Time (Phase 2). Shows the full time of compression with decompression for each algorithm capture during phase 2. 46 Average Throughput per Algorithm (Phase 2). Phase 2 bar chart visualizes the efficiency of data transmission, calculated as compressed size divided by publish time (bytes per second). Higher throughput indicates faster, more efficient delivery. 47 Average Latency per Algorithm (Phase 2). This phase 2 chart shows how long it took for each packet to travel from publisher to subscriber. Latency is crucial in evaluating real-time system responsiveness. . . . . . . . . . . . . . . . . . . 48 MQTT Publish Time Variability (Phase 2). This bar graph shows the range and distribution of publish times per algorithm, reflecting consistency and delay variability in network transmission captured during phase 2. . . . . . . . . . . 49 CPU Usage During Compression (Phase 2). This grouped bar chart of phase 2 illustrates CPU usage before, during, and after compression for each algorithm, indicating processing intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Average Memory Usage per Algorithm (Phase 2). This bar chart shows the memory (RAM) consumed during compression. Lower memory usage is favorable for resource-constrained IoT devices like Raspberry Pi captured during phase 2. . 51 Average Battery Usage per Algorithm (Phase 2). This bar chart shows the phase 2 uptime or battery consumption of each algorithm full process: Compression, Transmitting via MQTT and saving metrics in Postgres. . . . . . . . . . . . . . 52 x CHAPTER 1 Introduction 1.1 Background The Internet of Things (Iot) has shaped our world of how data is processed and communicated from device to device in our daily lives. One of the earliest examples of IoT dates back to the early 1980’s when a Coca-Cola vending machine was connected to the internet and provided information if the drink was available and if it was cold so that anyone inquiring could purchase it [1]. Since then, the use of IoT has evolved into smart devices, agriculture applications, enterprise use and office use. One important application of IoT messaging is in health care alert systems where real-time data is critical for monitoring, alerting emergency responders, and providing periodic status device updates. This research is relevant to healthcare applications where IoT medical devices play a crucial role in monitoring vitals signs and geolocation for vulnerable individuals such as veterans and the elderly. However veterans and the elderly are not the only ones that may suffer from illness or disabilities that need urgent medical attention. There are many others that also suffer from these conditions making it more difficult for them to perform specific tasks like driving to the hospital or suffering from memory loss and losing their way back home. To put loved ones at ease the use of IoT medical devices provides peace of mind due to providing their location and a feasible way to call their emergency responder. IoT medical alert devices benefit all who need it. Real-time data transmission has empowered IoT technology by enabling devices to communicate critical quickly and reliably, ensuring that critical information reaches its intended targeted receiver and has been made reliable for devices to communicate. This form of communication between devices allows them to collect and transmit data between a producer and subscriber. There are many ways for IoT devices to communicate with each other. Similar to language that allows humans to communicate with one another, IoT devices use a specific language that has a set of rules called Protocols which set a standard on how devices communicate with each other. A particular protocol that is the focus of this research is called Message Queuing Telemetry Transport (MQTT). Unlike other traditional HTTP based communication, MQTT is designed as a lightweight messaging protocol and a standard for IoT messaging. Figure 1.1 utilizes a publish1 subscribe format where a publisher produces a message to a topic hosted by the MQTT broker. The MQTT broker is the host which handles the messages and topics. A good analogy for this is that the broker acts like the Post Office and handles all the messages (mail) that go to it and makes sure that it gets to the correct places. The topics in MQTT are like addresses where the data will be contained. This topic acts as the mail route. If a device publishes data to a topic, let’s say topic “ThesisWork”, a subscriber or receiver can subscribe to that same topic and retrieve the data that is being produced on that topic. The MQTT Broker ensures that the messages reach their intended receiver and acts as a key component in IoT communication. The content type needs to be a UTF-8 encoded string. The most common way is to take JSON data and serialize it as UTF-8. This serialization process converts the text based JSON string into a binary byte stream. The broker utilized for this research is Mosquitto MQTT. The Mosquitto Broker, is known for being open source and implements the MQTT protocol. Figure 1.1: MQTT Process.The standard process of an MQTT client sending a message to the broker. Two subscriber clients receiving the message from the broker. 1.2 Problem Statement There is however a drawback to this which is the constraint on the amount of data we can send through the internet. The max payload size that Mosquitto MQTT Broker allows for a single message is 256 MB. This number makes it a challenge to be able to send large metadata that is important in one attempt. If a message size exceeds 256 MB it will not be accepted by the broker. This constraint requires the metadata to be broken up into smaller chunks of data. However, breaking the data into smaller chunks increases the number of messages sent to the broker and introduces other 2 constraints like bandwidth, possible data corruption, latency, and network limitations. 1.3 Objective and Research Significance This drawback led to my thesis question: What is the impact of Gzip, LZ77, and LZW compression on MQTT communication efficiency in IoT devices? This question forms the basis of my research which investigates whether using well known compression algorithms can improve transmission efficiency by reducing payload size, minimizing latency, and ensuring that large messages remain within the MQTT broker’s allowable size limits all without compromising data integrity. The approach of the research involves data collection and device setup. A Raspberry Pi is configured to act as an IoT device. The device will generate textual logs in the form of JSON to simulate real-world usage scenarios, particularly for critical applications like alert systems for veterans and elderly. There will be two experiments conducted one in a controlled environment that will randomly generate alert messages and status updates, ensuring a diverse set of messages. The second test will simulate a real-world environment where the textual logs will be created by three actual events. The first event will be to simulate heart rate and alert if the heart rate is over or under a certain threshold. The second will simulate if the device will send an alert if the blood pressure is also over or under a certain threshold. Finally the third test uses Google’s Api to obtain actual gps coordinates that are used in tangent with the Haversine formula to calculate the distance between two points on the earth’s surface. This is used to detect if the device is within the radius of an address like home etc. If the device detects it is within one of the radius for geofence it will send an alert that it has arrived in that area. Once the device leaves it will send an alert leaving the area. The compression algorithms used in this research are Gzip, LZ77 and LZW. These classes of algorithms are known as Lossless Data Compression Algorithms; this means that they perfectly reconstruct the data back to its original form after being compressed. To achieve the desired results of efficient MQTT communication the study introduces the compression technique after the JSON string has been encoded in UTF-8 in the hopes of reducing the data size before transmission. The effectiveness of the chosen compression algorithms will be assessed by measuring key performance indicators.The aim is to see how well each technique reduces the data size while keeping the original information intact. By conducting experiments in controlled and real word environments this study provides insight 3 into which compression technique offers the best balance between Compressed Data Size, Compression Ratio measures how much the data was reduced to, Time it takes to Compress measuring how long it took to compress the data before sending, Time it takes to Publish via MQTT measuring how quickly the compressed data can be transmitted, Time it takes to Decompress upon receiving the data this measures how long it took to decompress the data, Data Integrity Check, Latency (delay in transmission), Packet Loss (data that fails to reach its destination), and Throughput (amount of data sent successfully in each time). This research can provide a start into exploring the effectiveness of implementing compression algorithms in efficient MQTT communication. 4 CHAPTER 2 Literature Review 2.1 The Role of IoT in Data Transmission In recent years IoT has had a major boom that has infiltrated our lives at home, in business and healthcare. The number of IoT applications has surpassed the number of humans on the globe illustrating that overall growth and adoption of IoT technology. These devices can be in the form of thermometers, medical devices, garage door openers, smart devices like Alexa, transportation and many more. This constant growth means that data continues to grow and expand becoming a crucial aspect in our lives. In a recent paper titled “Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. The basic Iot model is a three layered model. This consists of the Application Layer, Network Layer, and the Perception Layer. It also provides six important elements that become the building block of the IoT structure which are identification, sensing, communication, computation, services, and semantics work together to uniquely identify devices, gather and transmit data, process information locally or in the cloud, deliver smart services, and extract meaningful insights from data to enable intelligent, interconnected systems. [2]. Of these six we will be focusing on communication, that is communication between devices. There are different ways IoT devices can communicate with each other. This means many machines are talking with one another in different ways. This form of communication is called protocol. The protocol utilized in this thesis is Message Queue Telemetry Transport (MQTT). This is the most common protocol used in IoT. It has its advantage of being lightweight and simple to use. That also comes with its disadvantages which limits the size of data that can be transmitted over it. MQTT resides in the Application Layer thus establishing the grounds of the form of communication utilized in many IoT applications. IoT can be utilized in many large platforms like Google and Amazon. Cloud Computing has provided another venue for IoT where it can store large data and provide an ease of data management. The author’s contribution proved very insightful and helped provide a ground base for the research provided in this thesis. 5 2.2 MQTT: A Lightweight Protocol for IoT Due to its lightweight MQTT and open source MQTT Mosquitto has been a popular broker for many developers and researchers in IoT communication. MQTT protocol functions with three key components: the broker, a subscriber and a publisher. As shown in Figure 1.1. The publisher sends a message on a specific topic to the broker, the broker’s job is to organize where the messages go. The subscriber requests messages from the broker through a topic. The topic serves as the address where certain messages will be located. A publisher will publish to a topic and a subscriber will read from that topic. Figure 1 shows a simple process of the flow of a message being published by the client. The client can be a device, machine, cloud server and a laptop if desired. The publisher connects to the broker and sends its topic, message (payload) and the Quality of Service (QOS). Figure 2.1: MQTT QOS Levels. Image for the three QOS levels utilized by MQTT. Figure 2.1 shows the three levels of QOS. At most once (QOS 0). QOS 0 is the lowest level; this configuration states that the client doesn’t need a receipt to see if the message was transmitted. There is no waiting for an acknowledgment that the message was ever received. At least once (QOS 1) is the second level of QOS. This configuration ensures that the message is received. This means that the client that publishes the message will wait to move on sending another message until it has received an acknowledgement from the receiver. This guarantees that the message was delivered successfully. Exactly Once (QOS 2) is the third and max level of service. This configuration is a four way handshake between the publisher and the receiver. This is accomplished when the client publishes a message it will receive a PUBREC from the receiver acknowledging the receipt of the message. The publisher will send the receiver a PUBREL packet to discard any stored states. Finally the receiver will send the publisher one final packet PUBCOMP completing its process [3]. 6 There have been many studies that utilize MQTT, specifically Mosquitto MQTT. The paper “Efficient Data Management in Agricultural IoT: Compression, Security, and MQTT Protocol Analysis” explores compression algorithms over MQTT. Their research had different tests done using the MQTT QOS levels. They used a subset of 100 messages for comparison and programmed it using Java as the primary programming language and utilized and implemented the Paho client with open source MQTT Broker Eclipse Mosquitto version 3.1.1. They test with three different scenarios using all three QOS levels [4]. QOS 0 which is the fastest one and also the best for real-time scenarios. Even though the focus of their research is not MQTT their use of Mosquitto MQTT with QOS levels provides a solid foundation of using MQTT for low powered devices. To give more strength in utilizing MQTT for low powered devices the paper “D-MQTT: design and implementation of a pub/sub broker for distributed environments” also utilizes Mosquitto MQTT. The research proposes that multi brokers connect together and work amongst each other. This multi-broker is named D-MQTT which is an extension of multiple Mosquitto MQTT brokers. The Mosquitto Broker is hosted in a Docker Image Container along with their custom D-MQTT plugin. Having the broker in a Docker Container allows for better control when measuring CPU and memory usage [5]. Though their research doesn’t specifically use one Mosquitto broker for message they take advantage of a bridging feature it has to connect multiple brokers. Another study was performed using Mosquitto MQTT to better understand the inner workings of a webcam. In “Real-time animation of equipment in a remote laboratory” the study sets up a standard MQTT connection between a client, service broker and lab server to capture real-time messages from various sensors and actuators. They used different brokers ranging from RabbitMQ, ZeroMQ, ApacheMQ and MQTT. The first three proved to be more than what was required for their experiment and went with a more lightweight Mosquitto MQTT which is what is designed for lightweight applications [6]. For lightweight applications Mosquitto MQTT works great especially for low power and simple tasks. 2.3 Data Compression for IoT: Necessity and Impact An issue with MQTT being lightweight is the amount of data that can be transferred through it and the limitations found in smaller devices. MQTT has a payload size limit of 268435455 bytes [7]. These payload limitations have motivated the exploration of compression techniques to reduce data 7 size and enable more efficient MQTT communication, particularly in scenarios where bandwidth and payload constraints are critical. This possibility is important especially for small remote devices that need to send as much data as they can and use little amount of power to do so [8]. Previously mentioned studies have investigated the use of compression techniques to address this challenge, demonstrating their effectiveness in significantly reducing MQTT payload sizes. 2.4 Evaluating Compression Techniques for MQTT These studies have explored testing several well known compression algorithms for IoT environments. Including Gzip, LZ77, LZW, Huffman Coding, Zlib, Golomb–Rice or other variants based on these algorithms. These are Lossless compression algorithms meaning they perfectly reconstruct the data back to its original form after being compressed. This is critical when applied to real-time data that needs to be received exactly as it was sent from the publishing client. In the paper “Efficient data management in agricultural IoT: Compression, security, and MQTT protocol analysis”, the authors address the data management challenges in Agriculture IoT (Precision Agriculture) by analyzing the role of data compression and security using MQTT Protocol. Their aim was to find a balance between efficient message transmission and security while optimizing resource use. The hardware used is a STM32 microcontroller. A small device introducing key challenges in limitation of processing power and storage. Precision Agriculture is data intensive technology requiring large amounts of data for accurate operation and security. They faced unreliable internet connection creating a possibility of data loss in the case of loss of internet. Their study evaluated different lossless compression algorithms, including Huffman Coding, Zlib, LZW, and Golomb-Rice. This also made it a lot easier to transmit data over MQTT. Their implementation of MQTT was implemented in Java using Paho which is an open source packet used client side to connect with an MQTT broker. Their performance test measured the compression ratio and reduction percentage. Their test also extended to testing these compression algorithms in different QOS levels. Their results showed that Huffman Coding performed best, offering the highest compression ratio [4]. Similar compressing techniques have also been applied to IoT devices, the paper “IoT Sensor Data Stream Compression with Hybrid Compression Algorithms” authors explored a similar study. Their study explores IoT Sensor Data Stream Compression with Hybrid Compression Algorithms. They implemented a Dynamic Huffman Coding with Run-Length Encoding (RLE). Their goal is to 8 use this enhanced version of Huffman Coding to minimize bandwidth usage, computational overhead, and power consumption. The Dynamic Huffman Coding with Run-Length Encoding (RLE) is used in real-time to process large volumes of sensor data. They take 6,000 readings from an ultrasonic sensor using an Arduino. They compare their enhanced version of Huffman Coding up against Static Huffman, RLE, LZW, ZLib, LZMA and BZ2 and evaluate them on their compression ratio, computational efficiency and energy consumption. What they found was that using Hybrid Dynamic Huffman Coding + RLE outperforms all other traditional algorithms. It provided lower computational and energy costs [9]. In the paper “Data compression techniques in IoT-enabled wireless body sensor networks: A systematic literature review and research trends for QoS improvement”. QOS refers to Quality of Service. The authors research how compression techniques can improve the efficiency of power consumption for Wireless Sensor networks. This paper explains how sensors gather a vast quantity of data that is mostly useless or redundant but crucial in sending real-time health information. The study categorizes compression approaches into three types: communication compression, sampling compression, and data compression. They use lossy and lossless compression algorithms with a specific attention to LZW and LZ77. Their findings show that different compression techniques had their strengths in certain areas. LZW is effective in compression motion based data such as angular velocity, acceleration, and data from gyroscopes and accelerometers. LZ77 is better suited for compressing physiological and biomedical data [10]. In the paper “Innovative Energy Savings Using Gzip IP Within IoT Devices” author Nikos Zervas looks into using Gzip compression technique and Deflate decompression technique to reduce power consumption in IoT devices. He explains that new IP cores have allowed the incorporation of power saving in booting or waking up and transmitting the event in IoT devices. He further dives into these two main sections. In the booting process is to use a “on-the-fly” decompression process that takes the Gzip compressed firmware data to minimize the storage needed during booting of the device. The idea is similar for transmitting data over any network topology. The Gzip technique is used before sending data over a wireless network and decompressing it on the receiving side. In the case of Zervas using Radio Frequency subsystems the byte size is reduced which leads to an overall way to effectively save on energy consumption. The study concludes that while there may be a slight delay utilizing Gzip compression the energy savings more than compensate for this trade-off 9 [11]. 2.5 Summary and Research Gap - summarizing all these studies The reviewed literature reviews highlight the significant importance of compression techniques and demonstrate the need for efficient data handling in IoT environments. As previously mentioned the world of IoT is rapidly growing and data is getting larger. The fact still remains that IoT devices are lightweight and are low powered. Introducing many constraints such as payload size, battery consumption, low-bandwidth. These previous studies have set the ground for my research by exploring lossless compression techniques and using MQTT broker for IoT communication. This has led me to ask “What is the impact of Gzip, LZ77, and LZW compression on MQTT communication efficiency in IoT devices?” This research seeks to contribute to the fast paced growing technology of IoT and data growth. My focus is simulating both controlled and real-world healthcare scenarios using a Raspberry Pi and JSON-based message logs, this research evaluates how each compression technique performs when integrated into MQTT communication. Metrics such as compressed size, latency, transmission time, decompression speed, and data integrity will be measured to provide a comparative analysis and determine which algorithm offers the best balance between efficiency and reliability. 10 CHAPTER 3 Methodology 3.1 The Role of IoT in Data Transmission This section outlines the experimental design used to evaluate the three compression algorithms performance in the context of an IoT based alert device. The objective of the methodology is to assess how these algorithms affect the performance of sending data over MQTT. These three algorithms will be compared against each other as well as the transmission of the raw JSON string acting as the benchmark. There will be two phases where the tests will be performed. Phase 1 acts as a controlled environment where we can evaluate how well these algorithms do in the best possible conditions. The second will simulate an actual real time device where the device will randomly generate simulated health alerts that will be triggered at defined intervals. This second phase will utilize Google to obtain real time GPS location to trigger geofence alerts. The section will also discuss the setup of Mosquitto MQTT, the use of AWS EC2 instance and PostgreSQL for data storage. Additionally, this section details the system architecture, including the setup and configuration of the Mosquitto MQTT broker, the use of AWS EC2 instances, PostgreSQL for data storage, and the Raspberry Pi as the publishing device. The approach taken ensures a comprehensive evaluation of each algorithm in both theoretical and practical settings. 3.2 IoT Device Configuration (Transmitter) The device acting as the MQTT publishing client is a Raspberry Pi 3 B+ (B Plus) with 1.4 GHz 64-bit Quad-Core Processor, 1 GB RAM. This device was chosen for compatibility with supporting python programming language and high 32 GB of storage capacity. One notable limitation of the Raspberry device is the 1gb of ram that has posed challenges during high payload test compression. This challenge will be covered later in this section. The Raspberry Pi does a substantial amount of work. The device generates a JSON payload to be compressed using one of the three compression algorithms (Gzip, LZ77, or LZW) and sent over MQTT. The device is the acting publishing client that transmitts the data to the MQTT broker. In order to do so the broker must allow the Raspberry 11 Pi’s ip address to connect with it. This configuration for the ip address is done in the Ec2 hosting the Mosquitto broker. In code the device has the necessary configurations needed to connect to the Mosquitto broker and send its payload and the configurations to connect to PostgreSQL to store the metrics it measures. Figure 3.1: Raspberry Pi 3 Model B+ used as the IoT data transmitter. The device is connected via hotspot to simulate a mobile environment and is responsible for generating, compressing, and transmitting data to the MQTT broker. The Raspberry Pi device will connect to a wireless network like WiFi or have some sort of constant cellular connection like a sim card. Since the Raspberry Pi model used does not support a SIM card for direct cellular access, an alternative solution was implemented. The device was provided with hotspot connection through an Iphone 13 Pro Max. The Iphone would disable its own wifi connection and route its cellular data exclusively to the Raspberry Pi. This setup ensured a consistent connection. The Iphone and the Raspberry Pi were kept in close proximity of each other to avoid internet disconnection. The phone ip address does change from time to time; this will require an update to be done in inbound security rules for both the Mosquitto Broker and the PostgreSQL Ec2 instance. An update is also required in the PostgreSQL pg hba config file. This update directs PostgreSQL to allow access to ip address of the device attempting to access it. Figure 3.1 highlights the size of the Raspberry Pi to provide a physical scale of the device and relevant ports used during testing: 12 1. The micro-USB power input. 2. HDMI output. 3. USB ports used to connect a mouse and keyboard. 3.3 EC2 Server and Broker Configuration Figure 3.2 shows three AWS EC2 instances were set up to host and run Mosquitto MQTT Broker, MQTT Subscriber and Postgres DB instead of using a laptop. This creates and captures a more realistic simulation of the data being transmitted via MQTT. All three EC2 servers are linux based and their type is the t2.micro free tier provided by AWS. Each has their own security group that allows for the configuration rules for multiple or different ip addresses to connect with the server. Each EC2 instance must have its security group configured to allow inbound connections from the device’s IP address. As mentioned before Postgres needs an extra control access in its pg hba config file. This allows the device to connect to the Postgres instance and allow it to write to the database. To configure the MQTT environment we start with the Mosquitto broker. The Mosquitto broker was installed and running on the EC2 instance named MosquittoBroker for simplicity. With the EC2 configuration setup in AWS this allowed for a user signed into AWS to connect to the instance. Upon running the instance we are met with an Ubuntu terminal that allows us to run the proper commands to install Mosquitto MQTT , enable it, and once it is up and running we can give it commands to start or restart the broker. Mosquitto MQTT defaults to port 1883 and is continually running on the EC2 instance. Figure 3.2: AWS Ec2 Instances. The three instances utilized for testing. Subscriber client, Mosquitto MQTT Broker and Postgres DB The Subscriber Client named MosquittoSubscriber in EC2 similar to the Mosquitto broker uses Ubuntu terminal however it differentiates from the broker by using python scripts that connect 13 to MQTT broker and to PostgreSQL DB. The subscriber client job is to subscribe to the topic that the publishing client is sending its messages to. Once the subscriber client runs it sets up all its configuration to connect to the Mosquitto broker and listen for any messages on the topic “test status/#”. The hash sign is a wildcard character. This wildcard comes in handy when there are different layers of the message we want to subscribe to. In this example we added the deviceId at the end of the topic “test status/abc123”, “test status/abc456” and so on. This allows us to send the device id and obtain it very quickly without needing to scan the entire JSON once decompressed. The hashtag allows us to receive all these messages. The subscriber receives the message and from the topic name identifies which compression algorithm was used by parsing the topic name and stores the value in a string field. The subscriber checks this field to see if the message payload needs to be decompressed with either GZip, LZ77, LZW, or just decode it back to JSON. During decompression the client will measure the time it took to decompress, its latency, throughput, its data integrity by performing a checksum and send it over to PostgreSQL for storing. The subscriber client uses the same files of the compression algorithm but instead of compression it utilizes the decompression methods in each. During setup of the publishing and subscribing client, an error occurred while installing the psycopg2 library utilized to connect to the database due to the absence of a GCC compiler, which is required to compile certain Python packages. This issue was resolved by installing the development tools and PostgreSQL libraries using sudo yum groupinstall ”Development Tools” -y and sudo yum install postgresql-devel -y, followed by installing psycopg2-binary via pip. Once verified, remote access to the database was established via SSH and PostgreSQL CLI, allowing real-time data to be stored and later analyzed. PostgreSQL was used as the database solution to facilitate data storage between the publisher and subscriber clients, making it easy to collect and retrieve performance metrics from the EC2 instance after each experiment. Originally, metric data was going to be stored in CSV files, however, accessing and managing multiple CSV files remotely from the AWS EC2 instance proved inefficient and less flexible. Using a PostgreSQL database allowed for structured, queryable storage that could be accessed programmatically via a Python script from another machine. This approach simplified remote access, improved data integrity, and enabled more efficient querying and analysis of experimental results. PostgreSQL contains in total six tables. The tables are grouped by each experimental 14 phase. Each table is named the same with the exception of what phase the metrics are from. The compressionmetrics, subscribermetrics, and batterytest tables for both Phase 1 and Phase 2 share the same structure. These are presented in Tables A.2, A.3, and A.4, respectively. Security Group Rules on AWS were configured to allow inbound TCP access on port 5432 for PostgreSQL. 3.4 Additional Hardware and Tools Figure 3.3: MQTT QOS Levels. Image for the three QOS levels utilized by MQTT. Table 3.1: INIU Power Bank Specifications Model BI-B5 Battery Capacity 20000mAh / 74Wh Input (USB-C) DC 5V = 3A, 9V = 2.22A Output (USB-C) DC 5V = 3A, 9V = 2.22A, 12V = 1.5A Output1/2 DC 4.5V = 5A, 5V = 4.5A, 9V = 2A, 12V = 1.5A Table 3.1 provides the specs of the battery. The purpose of the power bank found in Figure 3.3 was to provide a mobile power source to the Raspberry Pi. A standard Micro-USB cable was used to connect the power bank to the device. While there are several methods to power a mobile device a typical power bank was chosen for simplicity, availability and low cost. This allowed the device to 15 have mobility allowing its code to run while away from a wired connection to an outlet. However, the power bank did provide not the Raspberry Pi any useful information like battery percentage, voltage or power consumption. As a result obtaining battery metrics through software was not possible. The other integral part of allowing mobility to the Raspberry Pi is the source of internet connection. An Iphone 13 Pro Max was used to provide uninterrupted wireless internet to the device. The Iphone’s built-in feature to become a Hotspot is a feasible way to accomplish providing the device constant connection. However, some limitations may apply depending on the cell data plan provided to the Iphone. This constraint however was not an issue in this research. The only problem with using the power bank and Iphone is moving too far away from the Raspberry Pi causing it to lose internet connection. Also it became cumbersome to have to carry the Raspberry Pi, Power Bank and the Iphone together. This however provided the necessary resources to allow the device to be fully mobile. There can be other options to explore that provide the Pi with internet connection or to use a smaller battery that can attach to the device, however, these will not be further looked into in this research. Suffice to say that the power bank and the use of the Iphone’s Hotspot feature provided the needed mobility to allow the Raspberry Pi to run the algorithms in any location and send the payload over MQTT and connect to the database and save the measurements it captured. Figure 3.3 shows the power bank used for the research and its size. The percentage display became an unreliable source to accurately measure how much power was used from the Raspberry Pi and usage from the compression algorithms. As one algorithm or all of the algorithms may have not used one percent thus not clearly showing the power consumption of each and since this power bank did not provide the device with any other useful information it was just used to power the device. 3.5 Network Configuration and Access Rules Connecting the device to the cloud servers is straightforward. The AWS EC2 instances each have their security group. The security group inbound rules control what machines can connect to it. To allow other machines to connect an ip address and port is required to be saved in the security group. The Raspberry Pi and MqttSubscriber client ip addresses and the TCP port (port 1883 for the Mosquitto Broker and port 5432 for Postgres DB) were stored in the Postgres instance and its pg hba 16 config file as well as the Mosquitto broker. The EC2 instances also have their own ip address which is used to connect with other instances. There was a minor issue presented when connecting the Raspberry Pi to the Iphone’s Hotspot that a new ip address was always created with a new Hotspot connection. Meaning the Hotspot on the phone was turned off when the device was not being used for testing which resetted the public ip address of the phone. Every new Hostpot connection from the phone provided the new ip address. This required an update of the ip address to be done in the security group inbound rules of each instance the device connected to. To obtain the public ip address of the phone a simple google search from the Raspberry Pi (which had a web browser) of “What is my ip address” resulted in many websites providing the address. Having the ip address in the security group inbound rules the subscriber client can connect with the Mosquitto broker and Raspberry Pi can connect with the Mosquitto broker and Postgres without any issues. 3.6 Google Geolocation and Geofence Setup To simulate a live device geolocation and provide its real-time data. The use of Google Maps Api was used to simulate location-based alerts. The first thing needed to accomplish this was to obtain an API Key from Google. The api key allows authorization to access their geolocation services. Using this key the Raspberry Pi is able to make requests to the Google Geolocation RESTful endpoint at https://www.googleapis.com/geolocation/v1/geolocate?key=API KEY. Before sending the request to Google the Raspberry Pi performs a Wifi scan command to obtain the surrounding mac addresses and signal strengths from the surrounding access points. The wifi packet is attached to the Geolocation endpoint as part of the request payload. Upon a successful excecution Google Geolocation Api returns the Latitude and Longitude from the mac addresses provided. The Raspberry Pi holds in memory the Latitude and Longitudes of the geofence locations that the device should send an alert if it arrives or leaves that location. Table 3.2: Defined Geofences Used in the Simulation Location Name Home Work School Church Parents Home Latitude 1.10000 1.10000 41.19473 1.10000 1.10000 17 Longitude -1.10000 -1.10000 -111.94146 -1.10000 -1.10000 Radius (m) 150 150 150 150 150 Table 3.2 shows the different geofences used. Some Latitudes and Longitudes were omitted for privacy reasons and replaced with dummy information; the idea is to capture what the geofences look like. Regardless, real Latitude and Longitude were given the Raspberry Pi to calculate if it is within the radius of any of the geofences provided. The radius value can be configured, the smaller the number the more precise we can pinpoint a location and larger the number the more generalized the location is. To calculate the distnace the Raspberry Pi uses the Haversine Formula to calculate the distance between the Latitude and Longitude of the device’s current location provided by Google to the four geofences it has in memory. The formula is used to calculate the distance of two points of a spherical object like the earth. The Haversine formula used in this implementation is based on the method described in the paper ”location-based services for presence systems” [12]. The value of the Latitude and Longitude are calculated: a = sin 2 ∆lat 2 + cos(lat1)x cos(lat2) x sin2 ( ∆long ) 2 (1) The intersection of the axis is calcualted: √ √ c = 2 a tan 2( a. 1 − a) (2) Finally the distance of the two point locations are calculated: .d = R.c (3) R is the radius of the Earth, which is approximately 3,959 miles (6,371 kilometers) [13] and d is the calculated distance in meters. If the distance is less than or equal to the geofence radius it is considered to be inside that geofence area. Raspberry Pi then sends an arrived or left the area based on the information it receives from the locations calculated. 3.7 Compression Algorithm Implementation The core objective of this thesis is to evaluate the performance of different data compression algorithms in the context of IoT alert systems. This thesis presents three lossless compression techniques 18 (Gzip, LZ77, and LZW). Each algorithm was tested using the same JSON object size and evaluated based on two primary factors: the time required to compress the data, and the accuracy with which the original data could be reconstructed after decompression. These algorithms are categorized as lossless because they restore the original data exactly, with no information lost during compression or decompression. Although the restoration of data was expected due to the lossless nature of the algorithms, it was still important to validate this through testing to ensure that no anomalies occurred during the compression-decompression process. In the world of real-world emergency alerts speed and data integrity are critical to saving lives. Any delays or loss of information could cause life threatening consequences. The goal is to analyze how well each algorithm does compared to one another and not only between themselves but also against a raw uncompressed JSON baseline to determine which algorithm proves to be the most efficient if any for use in time sensitive IoT environments. The use of compression techniques, such as Gzip, LZ77, and LZW, is well-documented for efficiently reducing data size while preserving data integrity, and for its efficiency in reducing data size without significant loss of information [14][15]. 3.7.1 GZIP GZIP is a data compression algorithm provided from the zlib library module in python. The module provides compression and decompression functions. Internally it combines the LZ77 Compression with Huffman coding. It is provided with a bytes-like object otherwise an error will occur requiring it. This is done by taking the JSON object and encoding it in UTF-8 this turns the object into a stream of bytes and can be safely compressed by the algorithm. Underneath it calls its derived compression method that returns a bytes object containing the compressed data. It uses a history buffer (window size) because it uses LZ77 it looks for patterns or repeated sequences it then utilizes Huffman coding to assign shorter binary codes to the repeated sequences. To decode the reverse is applied where Huffman decoding is used after LZ77 decompressing is performed [16][17][18][19]. 3.7.2 LZ77 LZ77 was created by Jacob Siv and Abraham Lempel in 1977 which is universally used for text compression. The idea is to minimize the redundancy of frequent words and create a sort of a 19 dictionary(jump, length) as a placeholder for when a similar word appears. The dictionary works more like a pointer where the jump is how many words back we go to find the word and length is how long the string is. The algorithm uses a sliding window and look-ahead buffer to identify sequences and find the longest match within that window. The implementation of this algorithm in this thesis returns a bitarray [20][21]. After the compression is completed by the returned bitarray is then converted to a bytes object by using the .tobytes() method in order to be sent through MQTT. 3.7.3 LZW The LZW algorithm builds upon the procedure of Siv’s and Lempel’s LZ77 algorithm and was further enhanced by Terry A. Welch in 1984 becoming a variation of LZ77. LZW is a dictionary based compression algorithm. It uses a single code number to represent a substring. The algorithm’s compression strategy relies on a translation table which is a dictionary mapping strings of characters to fixed-length codes, 12 bits is the most common use. It stores the strings that it has encountered before and maintains a prefix property for each string stored. The algorithm also uses a greedy parsing algorithm in which the input string is examined in one pass. The LZW implementation in the thesis returns a list of representing the compressed form of the JSON input then serialized using pythons struct module to convert it into a binary format to send through MQTT [22][23]. 3.7.4 Raw JSON (Baseline) In Raw JSON no actual compression algorithm was used. The JSON object was instead simply converted into a JSON string and properly formatted to UTF-8 encoded to comply with MQTT’s payload requirement. 3.7.5 Huffman Coding Huffman Coding is used internally by Gzip and by many other algorithms. It is a lossless compression method that assigns binary codes to input symbols based on their frequency. The method involves constructing a binary tree from the bottom up. It pairs two least probable symbols at each step until the tree is built. Frequent symbols receive shorter codes while less frequent ones get longer codes [24][25]. 20 3.8 Performance Testing Procedure To evaluate the performance of the different compression algorithms, two different test experiments were conducted. In the thesis, these experiments will be referred to as phases. Phase 1: a controlled environment and Phase 2: a realistic simulation of IoT alerts. Both were designed to measure resource usage, compression efficiency, and transmission viability from the Raspberry Pi to the EC2 instance via the Mosquitto broker. 3.8.1 Phase 1 In phase 1 a batch of 40 structured JSON packets were pre-generated and transmitted to each compressing algorithm. These packets included a wide range of information to simulate real world data from an alert IoT device. Figure 3.4: Phase 1: Phase 1 controlled environment workflow The packets contained a packet id, timestamp of the packet created, the device id, user information like name and address, a location, message, severity of the alert, critical data, alert type, location history and health logs. More details of the packet used in Phase 1 can be found in Table 3.3 After the packet has been generated and sent through the first compressing method a new packet is created in each loop. It would be quicker to just have the packets generated and saved to a file however in a real world scenario devices won’t have pre-saved files to use. Instead an approach to mimic a real device was chosen in generating this data on the spot. Figure 3.4 gives the design of the 21 flow of phase 1 each packet is dynamically generated and passed through the selected compression algorithm before being transmitted via MQTT. Measurements are captured during the compression process and after transmitting via MQTT. Phase 1 not only served as a controlled benchmark in comparing the compression algorithms but also revealed which algorithms were able to handle the different payload sizes most effectively. Algorithms such as LZ77 and LZW took excessively a long time to compress 100MB of data, Gzip did fine with 100 MB but Raspberry Pi ran out of memory attempting to use Gzip to compress 200 MB. Sending Raw JSON over at 100 MB took over 15 minutes but struggled with 200 MB. Testing with different payload sizes became extremely useful to test the limits of the algorithms and the limits of the device itself. 22 Table 3.3: Structure of a Data Packet Sent During Phase 1 Field Type Description packet id Integer Unique identifier for the packet. timestamp String (ISO 8601) Time when the packet was generated. device id String Identifier for the device sending the packet (e.g., RPi003). user info Object Contains user details including name, age, medical conditions, emergency contacts, etc. alert type String Type of alert, such as FALL DETECTED, DEVICE STATUS, HEALTH CHECK. message String Informational message describing the event. severity String Severity level of the event: HIGH, MEDIUM, or LOW. critical data String Core payload information (e.g., alert or status message). location Object Contains latitude and longitude values. location history Array of Objects Historical GPS data including past locations with timestamps. health logs Array of Objects Time-stamped logs of health metrics like heart rate and blood pressure. 3.8.2 Phase 2 The objective for phase two is to simulate a real world IoT alert device for elderly and veterans. The purpose for this is to capture the same metrics for phase one in a less controlled environment. Data is critical when it comes to elderly and veterans safety and the more information we can get from an IoT device and faster we can provide the necessary aid they will need. Phase two will simulate three different real life scenarios which can be life critical to seniors and veterans. The simulation will start by looping between the three scenarios. Figure 3.5 shows the high level design of Phase 2. Phase 2 takes a similar approach to phase 1 where alert packets are generated to create the 23 Figure 3.5: Phase 2: Phase 2 real-world simulation workflow payload size desired. The alert packets are triggered when an alert event happens. This test also uses the data size that worked for all the algorithms in phase 1 which is 20MB. The payload of these alert events contain the user information, device information, and the data from the alert event triggered. More details of the packet used in Phase 2 is found in Table 3.4. These alerts are not in sequential order, they are based on a scheduled interval. Having this interval allows the device to behave like an IoT device. Where alerts are triggered similar to a real life alert. The alert events that will be simulated are heart rate, blood pressure, and geofence. However, there is a slight difference between this flow and phase 1 test. This phase simulates a real world device where the device needs to report different sensors to maintain accurate monitoring for the elderly or veteran. Because these alerts do not trigger in sequential order like phase 1 the compression algorithms are put to the test to see how well each performs under the demands of real-time processing. Each algorithm operates independently in a non-blocking environment handling alerts as they arrived. This ensures that the system mimics real-world unpredictability and concurrency, where alerts must be processed and transmitted quickly without missing the next alert. The goal of this phase is to 24 evaluate how efficiently each algorithm manages these asynchronous and high-priority alerts while maintaining performance, integrity, and minimal resource usage. This will allow discovery of any packet losses because the algorithm was not ready to process the next alert event. Scheduling and Timing: In Phase 2, I set up a time tracker to keep track of the heart rate time, blood pressure time, and geofence time. These are useful to determine which scenario to run during the loop of phase 2. I have set these variables to hold the current time and they will be used in conjunction with the time interval values. There are three variables for the time intervals: heart rate has an interval of three minutes, blood pressure is 5 minutes, and the geofence interval is one minute. As we are looping, we take the stored time of the heart rate and subtract it from the current time to see if it is equal to or greater than the interval time. The same logic is applied for blood pressure and geofence. If any of these conditions are true, we run the corresponding scenario based on the interval. Heart Rate Sensor: For the heart rate sensor, I randomly assign an integer between 60 and 120. These numbers closely represent the normal range of where a healthy person’s heart rate should be. I have three severity levels High, Normal, and Low that are assigned depending on the random number generated. High severity is returned if the heart rate is equal or above the 120 range, Low is returned if the heart rate is 60 or below and in the case that neither is above or below the normal heart range, Normal is returned. If the Either High or Low is returned a data packet will be prepared to send over to be compressed and sent through MQTT. Blood Pressure Sensor: The Blood pressure sensor has a similar design the the heart rate sensor. It gets a random integer assigned between 90 and 140 that represents the systolic number of blood pressure. If the number is lower than 90 or above 140 an alert event will be triggered with its corresponding data packet and sent to be compressed and transmitted over MQTT. Geolocation and Geofence: After acquiring the current location using Google’s Geolocation API, the device proceeds to evaluate whether it has entered or exited any predefined geofences. These geofences are stored in memory as a list of latitude and longitude pairs with configurable radius values. In phase 2 the Raspberry Pi checks every minute whether the device is in or out of any Geofence it has saved. Haversine formula is used to calculate the distance between the devices 25 current location and the Geofences location. The distance is then measured against the geofences radius and if it less or equal to it a packet is prepared and sent to be compressed and sent to the MQTT broker. Table 3.4: Structure of a Data Packet Sent During Phase 2 Field Type Description packet id String (UUID) Unique identifier for the packet. timestamp String (ISO 8601) Timestamp when the packet was generated. device id String Identifier of the Raspberry Pi or IoT device. user info Object Contains user metadata: name, age, medical conditions, allergies, emergency contacts, veteran status, preferred hospital. alert type String Specifies the type of alert (e.g., HEART RATE, BLOOD PRESSURE, GEOFENCE ARRIVAL, GEOFENCE DEPARTURE). device firmware version String Current firmware version of the device (e.g., V1.17). message String A readable message summarizing the alert event. severity String Severity level of the alert (HIGH, MEDIUM, LOW). critical data Mixed Specific sensor readings or alert payloads (e.g., WiFi data or alert text). location Object Current GPS coordinates (latitude, longitude) retrieved from Google’s Geolocation API. location history Array of Objects Sequence of previously recorded locations including timestamp and associated Geofence (e.g., Home, Work). health logs Array of Objects Series of health metrics including timestamp, heart rate, heart status, blood pressure, and BP status. 26 3.9 Metrics and Measurement To evaluate the effectiveness of compression techniques in the context of MQTT-based IoT communication, a variety of performance metrics were collected on both the publishing and subscribing clients. They assess the efficiency and effectiveness of the compression algorithms (Gzip, LZ77, LZW), the impact on MQTT data transmission, and the resource consumption on a constrained device like the Raspberry Pi. A successful compression method should demonstrate high compression efficiency for example educed data size and fast execution, minimal impact on MQTT communication (low latency, low packet loss, and high throughput), low CPU and memory usage, and high data integrity after decompression. 3.9.0.1 Compressed Data Size The final size of the data packet after compression, measured in bytes. This metric gives information on how much smaller the data packet is after being compressed. This metric is captured right after the algorithms perform its compression. 3.9.0.2 Compression Ratio Defined as the ratio between the compressed size and the original size. A higher ratio indicates better compression. For example, a ratio of 4.0 means the original data was four times larger than the compressed data. 3.9.0.3 Compression Time The time (in milliseconds) it takes to compress the data. This information is important to visualize the speed of each compression algorithm which is important in real-time IoT systems. 3.9.0.4 Packet Loss This metric is included to evaluate the number of packets sent with those successfully received. If 1,000 alert packets were generated it is expected that the compression algorithms are able to compress the alert packet, send it to MQTT and save the metrics in PostgreSQL. Phase 1 allowed all packets generated to be compressed, sent and received. Phase 2 being a real-time scenario shows which compression algorithms struggled to keep up with the alert packets generated. 27 3.9.0.5 Data Integrity Check Not to be confused with packet loss. Data Integrity checks the integrity of data after compression and decompression of lossless compression algorithms, a CRC32 checksum is calculated using Python’s zlib.crc32() function. The checksum of the original data is compared against that of the decompressed data. A match between the two confirms that no corruption occurred during transmission or processing. 3.9.0.6 Decompression Time The time required to restore the compressed data back to its original form. This metric is captured on the subscriber client, where the decompression process takes place after receiving the data. 3.9.0.7 MQTT Publish Time The time it takes to transmit a packet from the publisher (Raspberry Pi) to the broker. This metric is captured at the start and end of the sending the data via MQTT. 3.9.0.8 Throughput The rate at which compressed data is successfully transmitted over the network. It is calculated by dividing the compressed data size by the measured latency. This metric is expressed in bytes per second (Bps) and reflects the data transmission efficiency. 3.9.0.9 Latency The time delay between publishing a packet on the sender side and receiving it on the subscriber side. It is calculated by subtracting the original send timestamp from the time of receipt, providing a one-way latency measurement in seconds. 3.9.0.10 CPU Usage Measures the percentage of processor time consumed during the compression operation. This was calculated by tracking the CPU time used by the process during compression and comparing it to the actual wall-clock time taken to perform the task. 28 3.9.0.11 Memory Usage The amount of RAM consumed during the compression process. This was measured by tracking the change in the process’s resident memory (RSS) before and after compression using the psutil library. 3.9.0.12 Battery Uptime This power consumption metric tracks the amount of time the Raspberry Pi remains operational during the test phase, measured from the moment it begins sending packets until the packet is transmitted via MQTT. It provides insight into how each compression algorithm affects the energy efficiency of the device. 29 CHAPTER 4 Results and Discussion In attempting to test with large size of payload some issues were encountered when compressing large-scale JSON data using GZIP, LZ77, and LZW on a Raspberry Pi and AWS EC2 instance. The initial intent was to evaluate how each algorithm handled payloads of 200MB, 100MB, 75MB, 50MB, and 20MB. However, during testing, severe memory constraints, excessive compression time, and system failures influenced the decision to reconsider large-scale runs. 200MB Payload Test The Raspberry Pi was unable to handle any of the compression methods at 200MB. The raw JSON transmission resulted in an Out-Of-Memory (OOM) kill at approximately 793MB of memory usage. GZIP also failed under similar circumstances, consuming around 489MB before being terminated by the system’s OOM killer. LZW consumed approximately 807MB of memory and was also killed. LZ77 did not crash but took over 4 hours and still had not finished compressing, rendering it infeasible in a real-time alert setting. Given the excessive time and memory requirements, this test was abandoned for all algorithms. 100MB Payload Test At 100MB, the Raspberry Pi was able to compress the data and send it via MQTT, and metrics were successfully recorded in PostgreSQL. However, the EC2 instance encountered issues on the receiving end. Decompressing raw JSON stalled the instance, requiring a system reboot. Due to this, the raw format was excluded from further testing. GZIP worked well on the Pi but failed during decompression on the EC2 instance due to memory limitations on the free-tier configuration. LZ77 and LZW both took over an hour to compress the payload, and neither completed the task. These tests were marked as infeasible for real-time environments and were halted accordingly. 75MB and 50MB Payload Test By the time 75MB and 50MB testing was conducted, earlier observations had revealed the growing impracticality of certain algorithms at higher payload sizes. RAW and GZIP continued to push the limits of available memory and did not consistently finish on either the Pi or EC2. LZ77 and LZW still exhibited poor performance, with compression times remaining over an hour or resulting in OOM termination. At this point, it became evident that payload sizes at or above 75MB would not be suitable for large-scale or real-time simulation, particularly 30 on hardware like the Raspberry Pi. A payload of 20MB proved to be where the Raspberry Pi was able to successfully compress and transmit via MQTT message using each algorithm and save the metrics in PostgreSQL. While MQTT allows for a max payload of 256MB, resource constraints like memory and EC2 free-tier instance made it infeasible to test beyond 20MB. These limitation reflect the reality of low powered IoT devices where actually sending large payloads is constrained by memory, CPU, and low powered IoT devices. Therefore, the findings of this thesis are not meant to assess the performance at the MQTT protocol limit, but rather to evaluate how compression algorithms perform. Although testing near 256MB payload was desired, 20MB served to be sufficient to reveal how well these algorithms compress the payload. During testing the compression algorithm LZ77 performed the slowest compared to others. As a result testing with 1,000 packets seemed impractical requiring several days to complete. Instead 160 packets were tested in phase 1, 40 for each algorithm. 4.1 Phase 1 Results The observations quickly highlighted the performance differences, making it evident that increasing the number of test packets to 1,000 would not substantially alter the conclusions. Although the desired testing payload size of 200MB was not reached due to limitations of hardware and software related time and complexity constraints. 20MB still provided meaning insights and contains a sizable amount of data. Phase 1 test results highlight the strength and weaknesses of each compression algorithm. Each figure discussed here will present its various performance parameters defined earlier in this thesis for evaluation. These individual metrics will then be analyzed collectively to determine which algorithm if any, outperforms the baseline of using raw JSON data. 4.1.1 Compression Size Each algorithm was given random generated data amounting to 20MB. Figure 4.1 clearly displayed that GZIP and LZW performed better against LZ77 and raw JSON. GZIP was able to compress the data a little more over LZW proving that it can significantly reduce the size of the payload to be transmitted over MQTT. 31 Figure 4.1: Average Compressed Size Across Algorithms for 20MB Payload. This graph compares the effectiveness of each compression algorithm in reducing data size. Lower values indicate better compression efficiency. 4.1.2 Phase 1 Compression Ratio Results coming from the average compression ration in Figure 4.2 confirms the compression size of Figure 4.1. The compression ratio highlights how well each algorithm performed in compression the original data. Compared to the raw JSON of 1 since there is no compression done it sets the benchmark of the original data. The average compressed data of GZIP is 9.8 times smaller than the original data. LZW had a strong performance averaging a 5.71 not as good as GZIP but still effective. LZ77 was the third best performing at 3.66. The results from Figures 4.1 and Figure 4.2 show that using compression algorithm is very effective when the desire is to send a large payload over MQTT. 32 Figure 4.2: Average Compression Ratio Across Algorithms for 20MB Payload. This chart includes a RAW baseline to compare compression performance. Higher ratios indicate better compression efficiency compared to the original uncompressed data. 4.1.3 Phase 1 Compression vs Decompression Time Although compressing the data into smaller bytes is the goal the speed of that compression taking place is just as important. Figure 4.3 displays the time each algorithm took to compress the data and decompress it when received from the EC2 subscriber instance. Raw JSON was expected to be the fastest because there is no compression done. The JSON object is encoded into bytes to be sent over MQTT and once the EC2 subscriber receives it the payload is decoded back to JSON object. Since there is no compression done it is the fastest however, the entire 20MB payload is sent over MQTT as we see in Figure 4.1. LZ77 took approximately five and a half minutes making taking the longest time to compress of all the other algorithms. One thing to make clear. Although the decompression time for LZ77 is 3.37 seconds this figure should be interpreted with caution. 33 Figure 4.3: Average Compression and Decompression Time. Shows the full time of compression with decompression for each algorithm. Post-decompression analysis in Figure 4.4 revealed that the integrity of the data did not match the original input for LZ77, indicating potential issues with the decompression process or the fidelity of the algorithm’s implementation. The integrity check will be performed only in Phase 1, as the results confirm that the compression algorithms maintain data integrity due to their lossless nature. Consequently, integrity validation is not repeated in Phase 2, since the algorithms are expected to behave consistently across both phases. LZW demonstrated strong compression capabilities and would perform well in scenarios where the payload was not so large. GZIP algorithm times reveal that it’s decompression time is almost instant with how well the GZIP algorithm is. Highlighting its efficiency and suitability for real-time applications. 34 Figure 4.4: Checksum Integrity Check. A checksum calculation from the original data sent to the receiver. The receiver recalculates the checksum from the received data and compares it with the transmitted checksum. If the values match, the data is considered intact; otherwise, it indicates possible data corruption. 4.1.4 Phase 1 Throughput Throughput values from Figure 4.5 highlight the efficiency of which algorithm can transmit compressed data over the network. Measured by the compressed data size/Publish time, raw JSON string had the highest throughput with 962,807 Bps since the data was not compressed. Without any processing delay, Raspberry pi was able to send more data via MQTT and very high speeds. However with the amount of Bps raw puts out this could lead to bandwidth constraints in real world environments. 35 Figure 4.5: Average Throughput per Algorithm. This bar chart visualizes the efficiency of data transmission, calculated as compressed size divided by publish time (bytes per second). Higher throughput indicates faster, more efficient delivery. 4.1.5 Phase 1 Latency Latency results in Figure 4.6 show how long each algorithm took to travel from the publisher (Raspberry Pi) to the subscriber(EC2 instance). Raw JSON as expected demonstrated the lowest latency at 25.92 seconds as no compression or decompression took place. GZIP followed close with 71.67 seconds offering a more balanced trade off between speed and size. LZW introduced higher latency at 188.88 seconds, while LZ77 had the worst performance at 548.99 seconds. These findings suggest that while compression can save bandwidth, it may significantly delay message delivery, especially in real-time environments. For latency-sensitive systems, algorithms like RAW or GZIP would be preferable. 36 Figure 4.6: Average Latency per Algorithm. This chart shows how long it took for each packet to travel from publisher to subscriber. Latency is crucial in evaluating real-time system responsiveness. 4.1.6 Phase 1 Packet Loss As far as reliability in packets transmission. Each packet that was transmitted from the publisher was received by the subscriber. This does simulate a real world situation. Figure 4.7 displays that no packets were lost via transmission. 4.1.7 Phase 1 Publish Time Figure 4.8 highlights how quickly the publisher published a message using each compression method. The figure highlights how quickly it gets published to the broker. GZIP achieved the fastest publish time, demonstrating its efficiency even when processing compressed payloads. In contrast, RAW JSON exhibited the slowest publish time due to its significantly larger payload size, which took 37 Figure 4.7: Phase 1 Packet Delivery. All algorithms successfully delivered packets in a controlled environment with sequential compression, confirming reliable operation without packet loss. longer to transmit over the network. 4.1.8 Phase 1 CPU Usage Figure 4.9 reveals the CPU usage of each algorithm before, during, and after compression. The LZ77 stands out the most, having consumed an extremely high 315.13% CPU usage during its compression. This suggests that it will potentially be unsuitable for constrained environments. In contrast, LZW performed more efficient than LZ77 at 75.86% CPU usage but still significantly higher than GZIP at 7.24% and Raw JSON at 2.6%. Raw JSON performed slightly better than GZIP, but GZIP provides the better balance having to compress the payload offering better efficiency in compression and CPU usage. 38 Figure 4.8: MQTT Publish Time Variability. This bar graph shows the range and distribution of publish times per algorithm, reflecting consistency and delay variability in network transmission. 4.1.9 Phase 1 Memory Usage Figure 4.10 highlights the average memory used by the RAM during compression. These numbers have a slight resemblance to the compression size. However, each compression size and memory usage highlight different aspect of performance. The clear winner is GZIP utilizing the least amount of RAM making it efficient for real-time devices. High memory usage could lead to instability or performance bottle necks as mentioned earlier in this section where issues arose with larger payloads. This behavior was likely influenced by the nature of their internal processing, which involves managing additional data structures throughout compression. These trends emphasize that when selecting a compression method for resource-constrained systems like Raspberry Pi, memory efficiency is just as critical as speed or compression effectiveness. 39 Figure 4.9: CPU Usage During Compression. This grouped bar chart illustrates CPU usage before, during, and after compression for each algorithm, indicating processing intensity. 4.1.10 Phase 1 Battery Uptime Figure 4.11 and Table 4.2 battery usage revealed that raw JSON consistently maintained the most stable power demand. Despite their compression complexity GZIP and LZW demonstrated moderate consistency with comparable means with a similar balance overall. We can utilize the Coefficient of Variation (CV) and Quartile Coefficient of Dispersion (QCD) to provide true data dispersion for the battery metrics [24]. CV expresses the relative standard deviation in percentages: Coefficient of Variation (CV) = Standard Deviation Mean QCD measures the relative dispersion of the middle 50% of the dataset: 40 Figure 4.10: Average Memory Usage per Algorithm. This bar chart shows the memory (RAM) consumed during compression. Lower memory usage is favorable for resource-constrained IoT devices like Raspberry Pi. QCD = Q3(75%) − Q1(25%) Q3(75%) + Q1(25%) Table 4.1: Coefficient of Variation and Quartile Coefficient of Dispersion Algorithm CV / QCD GZIP 0.59 / 0.49 LZ77 0.57 / 0.47 LZW 0.58 / 0.49 RAW 0.59 / 0.49 41 Figure 4.11: Average Battery Usage per Algortihm. This bar chart shows the uptime or battery consumption of each algorithm full process: Compression, Transmitting via MQTT and saving metrics in Postgres. In Table 4.1 both CV and QCD demonstrate that LZ77 yielded the most stable battery usage performance across packets, with lower variation in both total range and interquartile spread. GZIP and RAW showed the highest variability, which may reflect differences in payload processing behavior or packet-level fluctuations. Overall, all algorithms operated within a fairly consistent battery usage range. 4.2 Phase 2 Results Testing with the payload of 20MB the results captured the behavior of each algorithm as they were put under real-world demands. Phase 2 simulates Heart Rate, Blood Pressure and Geolocation. Each alert event triggering in a scheduled interval. The goal of this phase is to evaluate how ef- 42 Table 4.2: Battery Uptime Summary Statistics (in Hours) Algorithm GZIP LZ77 LZW RAW Count 40 40 40 40 Mean 3.18 3.31 3.21 3.16 Std 1.88 1.88 1.88 1.88 Min 0.03 0.17 0.06 0.01 25% 1.61 1.74 1.64 1.60 50% 3.17 3.30 3.20 3.16 75% 4.75 4.88 4.78 4.74 Max 6.30 6.44 6.33 6.29 ficiently each algorithm manages these asynchronous and high-priority alerts while maintaining performance, integrity, and minimal resource usage. This will allow discovery of any packet losses because the algorithm was not ready to process the next alert event, helping identify limitations in concurrency handling, thread blocking, or delays in the compression pipeline that may not have been visible in a controlled, sequential test like Phase 1. Phase 2 builds upon the controlled environment of phase 1 by introducing a more dynamic and realistic simulation. In this phase, alert packets were triggered asynchronously at scheduled intervals to mimic real-world IoT alert scenarios. Unlike phase 1 that had a more uniform approach to measure the performance of each compression algorithm. Phase 2 tests how well each algorithm compresses the data to quickly transmit it via MQTT, save the metrics to Postgres and be available again to process the next alert event. Each figure from phase 2 results highlight critical performance metrics, including packet delivery reliability, compression time, memory and CPU usage, and battery efficiency. The following subsection analyze each result in the context of real-world environment. This will uncover the strengths and weaknesses of each algorithm and how reliable practical it is to send a large amount of data over MQTT without compromising responsiveness, resource efficiency, or data integrity. By examining these results, we can determine whether any algorithm presents significant bottlenecks under asynchronous workloads or whether it is suitable for deployment in real-time resource constrained IoT systems. 4.2.1 Phase 2 Compression Success Across 72 Alert Packets In Phase 2, each compression algorithm operated independently and asynchronously on its own thread. Once a packet was generated, the algorithms thread began compressing the alert packet. That thread remained locked, unable to process the next packet until the current packet was fully 43 compressed, transmitted via MQTT, and the relevant metrics were logged into PostgreSQL. Only after completing the full process cycle could the thread handle the next incoming alert packet. This setup allowed the simulation to reveal whether if any packet loss occurred during the realtime conditions. Unlike Phase 1 where each compression algorithm sequentially compressed the payload packet regardless how long it took, Phase 2 was designed to emulate a real-world alert system. A total of 72 alert packets were generated during the simulation. As described in Section 3.8.2, alert packets were generated at rapid, fixed intervals to simulate time-sensitive emergency scenarios. Each algorithm began by compressing the first packet. However, due to differences in compression speed, not all algorithms were able to keep up. Algorithms that were still busy processing a previous packet when the next alert was triggered missed the opportunity to handle that next packet. Slower algorithms like LZ77 and LZW fell behind. Their longer compression times caused them to remain locked, making them miss several alerts entirely. GZIP and raw JSON were uninterrupted because of their asynchronous design allowing them to process the next packet as normal. Figure 4.12 and Table 4.3 show that out of 72 alert events packets generated GZIP performed the best with a 93.05% percent success rate only missing 5 packets at a close second was raw JSON with a 91.67% success rate. This uncovered that while LZ77 and LZW are good compression algorithms, they are not suitable for real-world alert scenarios. If the payload is smaller or the scheduled intervals weren’t so rigorous LZW could perhaps contend with GZIP and raw JSON. Table 4.3: Packet Reception Rates by Compression Algorithm During Phase 2 (Out of 72 Total Packets Sent) Algorithm RAW GZIP LZW LZ77 Total Sent 4.2.2 Packets Received 66 67 24 6 72 Success Rate (%) 91.67% 93.05% 33.33% 8.33% 100% Phase 2 Compression Time Figure 4.12 revealed that LZ77 had the highest average of compressing time of 2258 seconds. This clearly indicated that the algorithm struggled to keep up with the demands of speed required from the real-time events. Because of its slow compression time LZ77 only had a success rate of 8.33% 44 Figure 4.12: Packet Delivery Validation per Algorithm (Phase 2). Packet Loss Across Algorithms in Phase 2: Each of the 72 packets generated represented a real-time alert. GZIP and RAW maintained high success rates, while LZ77 and LZW fell behind due to slower processing speeds. of alert packets sent to the subscriber. LZW performed honorably but still fell behind in its compression speed. In contrast to both LZ77 and LZW, both GZIP and Raw JSON all their compressing all their alert packets in under 10 seconds maintaining a high delivery rate. These findings indicate that compression speed is critical in a real-world alert environment. Algorithms that cannot compress quickly enough before the next alert are more than likely to cause packet loss. 4.2.3 Phase 2 Memory Usage Memory Usage in Figure 4.18 while displays the memory usage for each algorithm in Phase 2. The memory usage remained consistent with prior observations of memory usage for each algorithm. GZIP and raw JSON had higher memory footprints and LZ77 remained low. Interestingly, LZW 45 Figure 4.13: Average Compression and Decompression Time (Phase 2). Shows the full time of compression with decompression for each algorithm capture during phase 2. algorithm reported a negative average memory usage during compression. While this result is likely due to a measurement timing issue or system-level memory optimization (e.g., garbage collection), further investigation was not conducted as it did not impact system stability or packet transmission. This anomaly is acknowledged but does not significantly affect the overall interpretation of results. 4.2.4 Phase 2 Battery Uptime Figure 4.19 and Table 4.5 display the average battery uptime across different compression algorithms under the real-time alert conditions of Phase 2.The battery usage reflected each algorithms efficiency throughout the compression, MQTT transmission, and Postgres metric logging. Unlike Phase 1, where all algorithms maintained relatively stable battery behavior, Phase 2 introduced more noticeable variation. Table 4.4 shows GZIP and RAW remained with a low and 46 Figure 4.14: Average Throughput per Algorithm (Phase 2). Phase 2 bar chart visualizes the efficiency of data transmission, calculated as compressed size divided by publish time (bytes per second). Higher throughput indicates faster, more efficient delivery. consistent uptime values, showing strong energy efficiency even under high alert frequency. LZ77 showed the most variability in battery demand, while GZIP and RAW remained nearly uniform in both metrics, reinforcing their suitability for energy-constrained systems. LZ77’s excessive resource needs may limit its practicality in real-time IoT deployments. 4.2.5 Phase 2 CPU Usage Figure 4.17 shows a significant spike with LZ77. This indicates the intense use of multi-threaded CPU usage. In contrast, GZIP and RAW maintained moderate and relatively balanced CPU usage across all three phases. These results support the conclusion that excessive CPU demand from LZ77 and LZW likely delayed their ability to accept new packets. This aligns with earlier packet loss re- 47 Figure 4.15: Average Latency per Algorithm (Phase 2). This phase 2 chart shows how long it took for each packet to travel from publisher to subscriber. Latency is crucial in evaluating real-time system responsiveness. sults. While these CPU usage metrics provide a general indication of the processing intensity for each algorithm, it is important to note that the environment was not fully isolated. External background processes and thread scheduling behavior may have influenced CPU readings. Additionally, since each compression algorithm was assigned to its own thread, the increased parallel activity may have contributed to overall CPU utilization, particularly for more demanding algorithms like LZ77 and LZW. 4.2.6 Phase 2 Throughput, Latency, Publish Time Phase 2 gave similar results to Phase 1’s throughput with RAW maintaining a high throughput due to zero compression time and full payload transmission. GZIP showed its advantage with decent 48 Figure 4.16: MQTT Publish Time Variability (Phase 2). This bar graph shows the range and distribution of publish times per algorithm, reflecting consistency and delay variability in network transmission captured during phase 2. throughput of reduce payload size. LZ77 and LZW reinforced concerns about their inefficiency under real-world loads. Latency Figure 4.15 and Publish Time Figure 4.16 displayed the same behavior from each algorithm as Phase 1’s results. RAW maintained the fastest average transmission time but with higher publish time due to larger payload size. GZIP achieved the best balance, offering low latency and fast publish time, making it ideal for time-sensitive IoT applications. 49 Table 4.4: Coefficient of Variation and Quartile Coefficient of Dispersion – Phase 2 Algorithm GZIP LZ77 LZW RAW CV / QCD 0.00 / 0.00 0.13 / 0.09 0.08 / 0.08 0.00 / 0.00 Table 4.5: Battery Uptime Summary Statistics (in Hours) – Phase 2 Algorithm GZIP LZ77 LZW RAW Count 67 6 24 66 Mean 0.01 0.64 0.12 0.01 Std 0.00 0.08 0.01 0.00 Min 0.01 0.54 0.10 0.01 25% 0.01 0.58 0.11 0.01 50% 0.01 0.63 0.12 0.01 75% 0.01 0.70 0.13 0.01 Max 0.01 0.72 0.15 0.02 Figure 4.17: CPU Usage During Compression (Phase 2). This grouped bar chart of phase 2 illustrates CPU usage before, during, and after compression for each algorithm, indicating processing intensity. 50 Figure 4.18: Average Memory Usage per Algorithm (Phase 2). This bar chart shows the memory (RAM) consumed during compression. Lower memory usage is favorable for resource-constrained IoT devices like Raspberry Pi captured during phase 2. 51 Figure 4.19: Average Battery Usage per Algorithm (Phase 2). This bar chart shows the phase 2 uptime or battery consumption of each algorithm full process: Compression, Transmitting via MQTT and saving metrics in Postgres. 52 CHAPTER 5 Conclusion and Future Work 5.1 Conclusion The goal of this study was to evaluate the impact of GZIP, LZ77, and LZW compression algorithms on MQTT communication efficiency in IoT devices. This thesis explored the impact of compression algorithms (GZIP, LZ77, and LZW) on MQTT communication efficiency within IoT Devices and Alert Events Scenarios. To answer the research question: “What is the impact of GZip, LZ77, and LZW compression on MQTT communication efficiency in IoT devices?” each algorithm was measured in different ways to evaluate which would result as the most balanced and efficient compression algorithm. Through a two test phase approach both controlled and real-time scenarios were simulated to evaluate key performance metrics including compression size, speed, data integrity and transmission reliability. By comparing these against a baseline of raw JSON transmission, the findings highlighted both advantages and trade-offs each compression algorithm introduces on a payload size of 20MB. The results of Phase 1 and Phase 2 gave clear evidence of which algorithm performed best compared to the others. It also highlighted that raw JSON string worked well in many cases. Coming into this with a preconceived belief that LZ77 would be a contender for best performing algorithm was actually the opposite. LZ77 a good compression algorithm performed the worst of all the compression algorithms tested. It was poor in compressing speed, CPU used, and latency. It did do a good job compressing the data into smaller bytes, but its detrimental factor is its speed. Making it inefficient for any real-world scenarios with a large payload. LZW performed better than LZ77 in all other measurements. It was a strong contender against GZIP in compressing the payload, it had decent compression times and decompression times. LZW performed decent in a controlled environment. It did however struggle keeping up with the high demands of real-time data and alert environment. Only having a success rate of 33.33% in a real-world scenario. It also performed poorly against GZIP and raw JSON its compression time during the real-world scenario which if it could compress quicker it would not have missed so many alert packets. Thus becoming an inefficient algorithm for a real-world alert event system. 53 After Phase 1 it became evident that GZIP was the top contender of the three algorithms. GZIP performed best in compressing the data size in phase 1 testing but improved its compression in Phase 2 testing. It was on par with raw JSON in latency, compression time (Phase 2) and decompression time, it had a success rate of 93.05% proving that it can handle the rigorous demand of real-time scenarios. Because of its compression complexity it performs a little bit more slower latency than raw JSON. The trade-off of a few extra seconds is outweighed by the significant gains in reduced payload size and improved compression ratio. A detailed performance comparison between GZIP and RAW is included in Appendix A, Table A.1, highlighting the balance GZIP achieves in compression ratio, memory usage, and success rate. Based on the analyzed data, GZIP proved to be the most effective overall choice when compared to raw JSON, LZ77, and LZW. It achieved a strong balance between compression ratio, processing speed, CPU usage, and reliability under real-time conditions. While raw JSON had faster performance, the compression algorithms had better compression ratios, GZIP consistently delivered moderate-to-high performance across all critical metrics, making it the most balanced and efficient option for real-time IoT alert systems. This thesis effectively answers the research question by demonstrating how compression algorithms can both enhance by reducing the payload size improving bandwidth efficiency and challenge by increasing CPU load or delay processing MQTT communication efficiency in IoT environments, particularly for devices that must operate reliably in time-sensitive scenarios. 5.2 Future Work While this thesis provides meaningful insights into the use of GZIP, LZ77, and LZW compression algorithms for communication in IoT devices. There are several opportunities for future exploration: • Algorithm Optimization: GZIP emerged as the most balanced algorithm in this study, but further optimizations or lightweight variations of GZIP could be explored to reduce latency while maintaining high compression ratios. • Hardware: This study was conducted using a Raspberry Pi and AWS EC2 instance. Testing across a broader range of hardware could reveal more about algorithm suitability. • Payload Type Variation: While 20MB payloads were used to maintain consistency, future 54 work could explore how different payload types (e.g., binary, image, audio, or mixed sensor data) impact compression and communication performance. • Alternative Algorithms: Other compression techniques mentioned in the literature review chapter that other authors have used could be evaluated and compared to the algorithms in this thesis for broader benchmarking. • Subscriber Scalability: This study focused on a single subscriber scenario. Future work could explore how scaling to multiple subscribers affects message delivery, broker load, and decompression performance. 55 References [1] J. Teicher, “The little-known story of the first iot device,” Available at https://www.ibm.com/ think/topics/iot-first-device, February 2018, accessed: Mar. 22, 2025. [2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of things: A survey on enabling technologies, protocols, and applications,” IEEE Communications Surveys & Tutorials, vol. 17, no. 4, pp. 2347–2376, 2015, fourthquarter 2015, doi: 10.1109/COMST.2015.2444095. [3] HiveMQ, “Mqtt essentials part 6: Mqtt quality of service levels,” Available at https://www. hivemq.com/blog/mqtt-essentials-part-6-mqtt-quality-of-service-levels/, accessed: Mar. 22, 2025. [4] M. Has, D. Kreković, M. Kušek, and I. P. Žarko, “Efficient data management in agricultural iot: Compression, security, and mqtt protocol analysis,” Sensors, vol. 24, no. 11, p. 3517, 2024. [5] L. Staglianò, E. Longo, and A. E. C. Redondi, “D-mqtt: Design and implementation of a pub/sub broker for distributed environments,” IEEE International Conference on Omni-Layer Intelligent Systems (COINS), vol. 2021, pp. 1–6, 2021. [6] M. Schulz, F. Chen, and L. Payne, “Real-time animation of equipment in a remote laboratory,” International Conference on Remote Engineering and Virtual Instrumentation (REV), vol. 2014, pp. 172–176, 2014. [7] E. Foundation, “mosquitto.conf — the configuration file for mosquitto,” Mosquitto Manual, Available at https://mosquitto.org/man/mosquitto-conf-5.html, March 2025, accessed: Mar. 22, 2025. [8] H. J. J. Ochoa, R. Peña, Y. L. Mezquita, E. Gonzalez, and S. Camacho-Leon, “Comparative analysis of power consumption between mqtt and http protocols in an iot platform designed and implemented for remote real-time monitoring of long-term cold chain transport operations,” Sensors, vol. 23, no. 10, p. 4896, May 2023. [9] K. Garikipati, T. Muppala, A. V. Chowdary, and A. S. Sahay, “Iot sensor data stream compression with hybrid compression algorithms,” International Conference on Computing Communication and Networking Technologies (ICCCNT), vol. 2024, pp. 1–8, 2024. [10] I. Nassra and J. V. Capella, “Data compression techniques in iot-enabled wireless body sensor networks: A systematic literature review and research trends for qos improvement,” Internet of Things, vol. 23, p. 100806, 2023, available at https://www.sciencedirect.com/science/article/ pii/S2542660523001294, Accessed: Mar. 22, 2025. [11] N. Zervas, “Innovative energy savings using gzip ip within iot devices,” Presented at the IPSOC 2015 Conference, Dec. 2–3, 2015. Available at https://www.cast-inc.com/sites/default/ files/pdfs/2020-02/cast-paper gzip-iot ipsoc2015.pdf, 2015, accessed: Mar. 22, 2025. [12] E. Winarno, W. Hadikurniawati, and R. N. Rosso, “Location based service for presence system using haversine method,” 2017 International Conference on Innovative and Creative Information Technology (ICITech), pp. 1–4, 2017. 56 [13] NASA, “Earth: In depth,” 2025, accessed: //solarsystem.nasa.gov/planets/earth/in-depth 2025-04-19. [Online]. Available: https: [14] Y. Hu and X. Wu, “The methods of improving the compression ratio of lz77 family data compression algorithms,” Proceedings of the Third International Conference on Signal Processing (ICSP’96), vol. 1, pp. 698–701, 1996. [Online]. Available: https://doi.org/10.1109/ICSIGP.1996.567359 [15] A. Chatterjee, R. J. Shah, and K. S. Hasan, “Efficient data compression for iot devices using huffman coding based techniques,” Proceedings of the IEEE International Conference on Big Data (Big Data), pp. 5137–5141, 2018. [Online]. Available: https://doi.org/10.1109/BigData.2018.8622282 [16] P. S. Foundation, “gzip — support for gzip files,” Python 3.13.2 Documentation, Available at https://docs.python.org/3/library/gzip.html, March 2025, accessed: Mar. 27, 2025. [17] F. S. Foundation, “Gnu gzip: General file (de)compression,” GNU Project, Version 1.13, Available at https://www.gnu.org/software/gzip/manual/gzip.html, February 2023, accessed: Mar. 22, 2025. [18] P. Deutsch, “Gzip file format specification version 4.3,” RFC 1952, Internet Engineering Task Force, Available at https://www.ietf.org/rfc/rfc1952.txt, May 1996, accessed: Mar. 22, 2025. [19] P. S. Foundation, “gzip — support for gzip files,” Available at https://docs.python.org/3/ library/gzip.html, 2025, accessed: Mar. 22, 2025. [20] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337–343, May 1977. [Online]. Available: https://doi.org/10.1109/TIT.1977.1055714 [21] GeeksforGeeks, “Lz77 compression technique,” Available at https://www.geeksforgeeks.org/ lz77-compression-technique/, 2025, accessed: Mar. 22, 2025. [22] T. A. Welch, “A technique for high-performance data compression,” Computer, vol. 17, no. 6, pp. 8–19, June 1984. [Online]. Available: https://doi.org/10.1109/MC.1984.1659158 [23] GeeksforGeeks, “Lzw (lempel–ziv–welch) compression technique,” Available at https://www. geeksforgeeks.org/lzw-lempel-ziv-welch-compression-technique/, 2025, accessed: Mar. 22, 2025. [24] R. Arshad, A. Saleem, and D. Khan, “Performance comparison of huffman coding and double huffman coding,” Proceedings of the 2016 Sixth International Conference on Innovative Computing Technology (INTECH), pp. 361–364, 2016. [Online]. Available: https://doi.org/10.1109/INTECH.2016.7845058 [25] S. J. Sarkar, N. K. Sarkar, and A. Banerjee, “A novel huffman coding based approach to reduce the size of large data array,” Proceedings of the 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), pp. 1–5, 2016. [Online]. Available: https://doi.org/10.1109/ICCPCT.2016.7530355 57 A.1 APPENDIX Comparison of GZIP vs. RAW JSON Compression Table A.1: Comparison of GZIP vs. RAW JSON Compression Metric Compression Ratio Average Compression Time (s) Average Decompression Time (s) Latency (Phase 2) CPU Usage (During Compression) Memory Usage Throughput Packet Success Rate (Phase 2) Battery Usage Consistency Overall Verdict A.2 GZIP High (e.g., 9.8x smaller) Moderate (Fast in Phase 2) Very Low Slightly higher than RAW Low (7.24%) Lowest among algorithms Moderate (30,099 Bps) 93.05% Stable Efficient and balanced RAW JSON 1.0 (no compression) None None Lowest Very Low (2.6%) Constant (moderate) Highest (939,320 Bps) 90.00% Most consistent Fast, but high payload size PostgreSQL Publisher Metrics Table Table A.2: Structure of phase1 compressionmetrics and phase2 compressionmetrics Column id algorithm checksum compressedsize compressionratio compressiontime cpuusagebefore cpuusageduring cpuusageafter throughput memoryusage packet packetid mqttpublishtime Type integer (PK) varchar(25) bigint integer double precision double precision double precision double precision double precision double precision double precision text integer double precision Description Unique row identifier Compression algorithm used (gzip, lz77, lzw) CRC32 Checksum for verifying integrity Size of compressed data in bytes Compression ratio (original/compressed size) Time taken to compress (in seconds) CPU usage before compression CPU usage during compression CPU usage after compression Compression throughput in bytes per second Memory usage during compression Payload data (compressed) ID of the packet Time taken to publish to MQTT 58 A.3 PostgreSQL Subscriber Metrics Table Table A.3: Structure of phase1 subscribermetrics and phase2 subscribermetrics Column id algorithm latency decompressiontime throughput crc32 packetsreceived packetid A.4 Type integer (PK) varchar(25) double precision double precision double precision bigint integer integer Description Unique row identifier Decompression algorithm used Time between publish and receive (in seconds) Time taken to decompress payload Decompression throughput in bytes per second CRC32 checksum for verifying integrity Count of packets received ID of the packet PostgreSQL Battery Uptime Metrics Table Table A.4: Structure of phase1 batterytest and phase2 batterytest Column id algorithm packetid uptime Type integer (PK) varchar(25) integer double precision Description Unique row identifier Algorithm being tested ID of the packet System uptime (in seconds) A.5 Main Script (Publisher) To toggle phase 2 line 216 phase1 send packets () needs to be commented out. Listing 1: Publsiher Main Script import json import time import random import googlemaps from datetime import datetime, timedelta import uuid import math import os import threading from queue import Queue from util import util from CompressionPacketTester import CompressionTester import requests import subprocess from databaseUtility import databaseUtility packetTimeStamp = None # Generate a single health log entry def generate_health_log_entry(timestamp): heart_rate = random.randint(50, 140) 59 heart_status = "LOW" if heart_rate < 60 else "HIGH" if heart_rate > 120 else "NORMAL" systolic = random.randint(85, 180) diastolic = random.randint(50, 110) bp_status = "LOW" if systolic < 90 else "HIGH" if systolic > 140 else "NORMAL" return { "timestamp": timestamp, "heart_rate": heart_rate, "heart_status": heart_status, "blood_pressure": f"{systolic}/{diastolic}", "bp_status": bp_status } # Generate health logs without writing to a file def generate_health_logs_streaming(file_path, target_size_mb=75): start_time = datetime.utcnow() - timedelta(days=30) total_size = 0 with open(file_path, "w") as file: file.write("[") # Start JSON array first_entry = True while total_size < target_size_mb * 1024 * 1024: timestamp = (start_time + timedelta(minutes=total_size // 500)).isoformat() + "Z" entry = json.dumps(generate_health_log_entry(timestamp)) + " ," file.write(entry if first_entry else "\n" + entry) first_entry = False total_size = file.tell() file.seek(file.tell() - 1) # Remove last comma file.write("]") # Close JSON array def random_location(): return { "latitude": round(random.uniform(-90.0, 90.0), 6), "longitude": round(random.uniform(-180.0, 180.0), 6) } # Generate a single location history entry def generate_location_history_entry(timestamp): return { "timestamp": timestamp, "location": random_location() } # Generate location history without writing to a file def generate_location_history_streaming(file_path, target_size_mb=75): base_time = datetime.utcnow() - timedelta(days=7) total_size = 0 with open(file_path, "w") as file: 60 file.write("[") first_entry = True while total_size < target_size_mb * 1024 * 1024: timestamp = (base_time + timedelta(minutes=total_size // 50) ).isoformat() + "Z" entry = json.dumps(generate_location_history_entry(timestamp )) + "," file.write(entry if first_entry else "\n" + entry) first_entry = False total_size = file.tell() file.seek(file.tell() - 1) file.write("]") db = databaseUtility() compression_tester = CompressionTester(util, db) if not os.path.exists("location_history.json") or os.path.getsize(" location_history.json") == 0: with open("location_history.json", "w") as f: json.dump([], f) # Load pre-generated data from JSON files with open("location_history.json", "r") as file: location_history = json.load(file) if not os.path.exists("health_logs.json") or os.path.getsize(" health_logs.json") == 0: with open("health_logs.json", "w") as f: json.dump([], f) with open("health_logs.json", "r") as file: health_logs = json.load(file) def generate_random_user(): first_names = ["John", "Jane", "Michael", "Sarah", "David", "Emily", "Robert", "Laura"] last_names = ["Smith", "Johnson", "Brown", "Williams", "Jones", " Garcia", "Davis", "Martinez"] military_branches = ["Army", "Navy", "Air Force", "Marines", "Coast Guard"] user = { "first_name": random.choice(first_names), "last_name": random.choice(last_names), "age": random.randint(60, 90), "medical_conditions": random.sample( ["Hypertension", "Diabetes", "Heart Disease", "Asthma", " Arthritis", "Alzheimer’s", "Cancer"], random.randint(1, 4) ), "allergies": random.sample( ["Penicillin", "Peanuts", "Shellfish", "Latex", "Dust", " Pollen"], 61 random.randint(0, 2) ), "emergency_contacts": [ { "first_name": random.choice(first_names), "last_name": random.choice(last_names), "relation": random.choice(["Spouse", "Daughter", "Son", "Brother", "Sister"]), "phone": f"+1-{random.randint(100, 999)}-{random.randint (100, 999)}-{random.randint(1000, 9999)}" } for _ in range(random.randint(1, 4)) ], "military_status": "Veteran" if random.random() < 0.7 else "NonVeteran", "military_branch": random.choice(military_branches) if random. random() < 0.7 else None, "preferred_hospital": f"VA Medical Center, {random.choice([’ Ogden’, ’Salt Lake City’, ’Los Angeles’, ’New York’, ’ Houston’])} UT" } return user def random_location(): return { "latitude": round(random.uniform(-90.0, 90.0), 6), "longitude": round(random.uniform(-180.0, 180.0), 6) } def generate_packet(packet_id): packetTimeStamp = datetime.utcnow().isoformat() + "Z" packet = { "packet_id": packet_id, "timestamp": packetTimeStamp, "device_id": f"RPi{random.randint(1, 100):03}", "user_info": generate_random_user(), "alert_type": random.choice(["FALL_DETECTED", "DEVICE_STATUS", " HEALTH_CHECK", "GEOFENCE STATUS"]), "message": random.choice(["An emergency alert has been triggered ", "Device status is OK", "Routine health monitoring"]), "severity": random.choice(["HIGH", "MEDIUM", "LOW"]), "critical_data": random.choice([ "EMERGENCY ALERT! CALL MOM", "Device functioning normally.", "Periodic health update.", "Button has been pressed, Alarm initiating" ]), "location": random_location(), "location_history": location_history, # Embed location history directly "health_logs": health_logs # Embed health logs directly } 62 return packet, packetTimeStamp def phase1_send_packets(): # Step 1: Pre-generate all packets once print("Generating packets...") x = time.time() packets = [generate_packet(packet_id) for packet_id in range(1000)] y = time.time() print("Finished packet generation", y-x) log_size = 10 start_log = time.time() generate_health_logs_streaming("health_logs.json", log_size) generate_location_history_streaming("location_history.json", log_size) end_log = time.time() totalLogSize = log_size + log_size total_log = end_log - start_log print(f"Total time to generate {totalLogSize}MB of logs: {total_log} miliseconds") # Setting up mqtt configs start_time = time.time() for packet_id in range(40): packet, timeStamp = generate_packet(packet_id) print("Size: ", len(packet)) # for packet_id in range(1000): for algorithm, compress_func in [ ("RAW", util.prep_raw_data), ("GZIP", util.compress_with_gzip), ("LZW", util.compress_with_lzw), ("LZ77", util.compress_with_lz77) ]: print(f"Starting Phase 1 with {algorithm}...") packet_id = int(str(uuid.uuid4().int)[:8]) # Track uptime and battery consumption for this algorithm algorithm_start_time = time.time() compression_tester.compress_and_send(algorithm, packet, compress_func, packet_id, timeStamp) # Calculate uptime uptime = (time.time() - start_time) / 3600 # Convert seconds to hours db.insert_battery_test_phase1(algorithm, packet_id, uptime) # Save battery usage data after processing all packets for the current algorithm algorithm_uptime = (time.time() - algorithm_start_time) / 3600 # Convert seconds to hours print(f"Completed {algorithm} in {algorithm_uptime:.2f} hours.") 63 # Save the final spreadsheet for all algorithms print("Phase 1 completed and battery usage logged.") # Run Phase 1 phase1_send_packets() time.sleep(99999) # PHASE 2 # Replace with your actual Google Maps API key gmaps = googlemaps.Client(key=API_KEY) compression_queues = { "RAW": Queue(), "GZIP": Queue(), "LZW": Queue(), "LZ77": Queue() } busy_flags = { "RAW": False, "GZIP": False, "LZW": False, "LZ77": False } processed_counts = { "RAW": 0, "GZIP": 0, "LZW": 0, "LZ77": 0 } # Initialize time trackers for each task start_time = time.time() last_heart_rate_check = time.time() last_blood_pressure_check = time.time() last_geofence_check = time.time() log_size = 20 start_log = time.time() generate_health_logs_streaming("health_logs.json", log_size) generate_location_history_streaming("location_history.json", log_size) end_log = time.time() totalLogSize = log_size + log_size total_log = end_log - start_log print(f"Total time to generate {totalLogSize}MB of logs: {total_log} miliseconds") # Define intervals for each task in seconds HEART_RATE_INTERVAL = 3 * 60 # 3 minutes BLOOD_PRESSURE_INTERVAL = 5 * 60 # 5 minutes GEOFENCE_INTERVAL = 1 * 60 # 2 minutes # Compression control 64 compression_queues = {"RAW": Queue(), "GZIP": Queue(), "LZW": Queue(), " LZ77": Queue()} busy_flags = {"RAW": False, "GZIP": False, "LZW": False, "LZ77": False} processed_counts = {"RAW": 0, "GZIP": 0, "LZW": 0, "LZ77": 0} def compression_worker(algo_name, compress_func): while True: if not compression_queues[algo_name].empty() and not busy_flags[ algo_name]: packet, packetid, timestamp = compression_queues[algo_name]. get() busy_flags[algo_name] = True print(f"[{algo_name}] Compressing packet {packetid}") start_time = time.time() # Use your central compression handler compression_tester.compress_and_send(algo_name, packet, compress_func, packetid, timestamp) duration = time.time() - start_time log_battery_usage(algo_name, start_time, packet, duration) processed_counts[algo_name] += 1 busy_flags[algo_name] = False # Start compressor threads threading.Thread(target=compression_worker, args=("RAW", util. prep_raw_data), daemon=True).start() threading.Thread(target=compression_worker, args=("GZIP", util. compress_with_gzip), daemon=True).start() threading.Thread(target=compression_worker, args=("LZW", util. compress_with_lzw), daemon=True).start() threading.Thread(target=compression_worker, args=("LZ77", util. compress_with_lz77), daemon=True).start() # Function to log battery data def log_battery_usage(algorithm, start_time, packet, duration): uptime = (time.time() - start_time) / 3600 db.insert_battery_test_phase2(algorithm, packet["packet_id"], uptime ) # Simulated address address = random.choice(["3848 Harrison Blvd, Ogden, UT 84403", "301 S Temple, Salt Lake City, UT 84101", "77 West 1300 S, Salt Lake City, UT 84115", "640 Bountiful Blvd, Bountiful, UT 84010" ]) geofences = [ {"name": "Home", "latitude": HOME_LAT, "longitude": HOME_LON, " radius": 150}, 65 {"name": "Work", "latitude": WORK_LAT, "longitude": WORK_LON, " radius": 150}, {"name": "School", "latitude": SCHOOL_LAT, "longitude": SCHOOL_LON, "radius": 150}, {"name": "Church", "latitude": CHURCH_LAT, "longitude": CHURCH_LON, "radius": 150}, {"name": "Parents Home", "latitude": PARENTS_LAT, "longitude": PARENTS_LON, "radius": 150}, ] USER_INFO = { "first_name": "John", "last_name": "Deer", "age": 78, "medical_conditions": [ "Hypertension", "Diabetes", "Heart Disease" ], "allergies": ["Penicillin"], "emergency_contacts": [ { "first_name": "Jane", "last_name": "Deer", "relation": "Wife", "phone": "+1-555-123-4567" }, { "first_name": "Mary", "last_name": "Deer", "relation": "Daughter", "phone": "+1-111-123-4567" }, { "first_name": "John Jr", "last_name": "Deer", "relation": "Son", "phone": "+1-222-123-4567" }, { "first_name": "Johnny", "last_name": "Deer", "relation": "Brother", "phone": "+1-444-123-4567" } ], "military_status": "Veteran", "military_branch": "Army", "preferred_hospital": "VA Medical Center, Ogdent UT" } def check_heart_rate(): heart_rate = random.randint(40, 180) if heart_rate < 60: 66 return "LOW" elif heart_rate > 120: return "HIGH" else: return "NORMAL" def check_blood_pressure(): systolic = random.randint(40, 180) if systolic < 90: return "LOW" elif systolic > 140: return "HIGH" else: return "NORMAL" def generate_location_history(num_entries=10_000): """Generate a location history based on movement between geofences. """ print("Generating Location History") location_history = [] current_time = datetime.utcnow() max_size_bytes = 20 * 1024 * 1024 # ˜20MB # Start at a random geofence current_geofence = random.choice(geofences) i = 0 while True: # Slight variation within the geofence radius lat = current_geofence["latitude"] + random.uniform(current_geofence["radius"], current_geofence["radius"]) lng = current_geofence["longitude"] + random.uniform(current_geofence["radius"], current_geofence["radius"]) location_entry = { "latitude": round(lat, 6), "longitude": round(lng, 6), "timestamp": (current_time - timedelta(seconds=i * 10)). isoformat() + "Z", "geofence": current_geofence["name"] } location_history.append(location_entry) i += 1 # Occasionally switch geofences (simulate travel) if random.random() < 0.01: # 1% chance to switch location current_geofence = random.choice(geofences) if len(json.dumps(location_history)) >= max_size_bytes: break print("Location History Complete") return location_history 67 def get_wifi_data(max_retries=50, retry_delay=600): """ Attempts to scan for Wi-Fi networks, handling errors and retrying if needed. Args: max_retries (int): Number of times to retry before giving up. retry_delay (int): Seconds to wait before retrying. Returns: list: A list of dictionaries containing MAC addresses and signal strengths. """ attempts = 0 while attempts < max_retries: try: # Run iwlist to scan for networks scan_output = subprocess.run( [’sudo’, ’iwlist’, ’wlan0’, ’scanning’], capture_output=True, text=True, timeout=10 # Avoid hanging indefinitely ).stdout # Parse the output to extract MAC addresses and signal strength wifi_data = [] networks = scan_output.split(’Cell’) for network in networks: mac_address = None signal_strength = None for line in network.split(’\n’): if "Address" in line: mac_address = line.split("Address:")[1].strip() if "Signal level" in line: signal_strength = int(line.split("Signal level=" )[1].split(’ ’)[0].strip()) if mac_address and signal_strength: wifi_data.append({ ’macAddress’: mac_address, ’signalStrength’: signal_strength }) # Return data if scanning was successful if wifi_data: return wifi_data except subprocess.TimeoutExpired: 68 print(f"Wi-Fi scan timed out. Retrying in {retry_delay} seconds...") except subprocess.CalledProcessError as e: print(f"Wi-Fi scan failed: {e}. Retrying in {retry_delay} seconds...") except Exception as e: print(f"Unexpected error during Wi-Fi scan: {e}. Retrying in {retry_delay} seconds...") # Increment attempt count and wait before retrying attempts += 1 time.sleep(retry_delay) print("Max retries reached. Unable to scan Wi-Fi networks.") return [] # Return an empty list if all retries fail # Function to send Wi-Fi data to Google Geolocation API and get location def get_location_from_wifi(wifi_data): # API URL for Google Maps Geolocation API url = f"https://www.googleapis.com/geolocation/v1/geolocate?key={ API_KEY}" # Prepare data payload with Wi-Fi access points payload = { "wifiAccessPoints": wifi_data } # Make a POST request to the Geolocation API response = requests.post(url, json=payload) if response.status_code == 200: location_data = response.json() lat = location_data.get("location", {}).get("lat") lng = location_data.get("location", {}).get("lng") print(f"Latitude: {lat}, Longitude: {lng}") else: print(f"Error fetching location: {response.status_code}, { response.text}") return lat, lng def generate_large_health_logs(): logs = [] start_time = datetime.utcnow() - timedelta(days=120) logs # 120 days of for i in range(100_000): # Generate 100,000 records timestamp = (start_time + timedelta(minutes=i)).isoformat() + "Z " # Simulated heart rate heart_rate = random.randint(40, 180) heart_status = "LOW" if heart_rate < 60 else "HIGH" if heart_rate > 120 else "NORMAL" 69 # Simulated blood pressure systolic = random.randint(40, 180) diastolic = random.randint(40, 180) bp_status = "LOW" if systolic < 90 else "HIGH" if systolic > 140 else "NORMAL" logs.append({ "packet_id": str(uuid.uuid4().int), "timestamp": timestamp, "device_id": "RPI123ABC", "heart_rate": heart_rate, "heart_status": heart_status, "blood_pressure": f"{systolic}/{diastolic}", "bp_status": bp_status, }) # Break early if JSON reaches ˜200MB if len(json.dumps(logs)) >= 20 * 1024 * 1024: break return logs def send_alert(alert_type, packet, packetid, timestamp): print(f"Queuing packet {packetid} for compression - Alert: { alert_type}") for algo in compression_queues: if not busy_flags[algo]: compression_queues[algo].put((packet, packetid, timestamp)) else: print(f"{algo} is busy. Skipping packet {packetid}.") def is_inside_geofence(lat1, lon1, geofences): for geofence in geofences: lat2, lon2 = geofence["latitude"], geofence["longitude"] radius = geofence["radius"] # Haversine formula dlat = math.radians(lat2 - lat1) dlon = math.radians(lon2 - lon1) a = (math.sin(dlat / 2) ** 2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon / 2) ** 2) c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a)) distance = 6371000 * c # Radius of Earth in meters print(f"Geofence: {geofence[’name’]}, Distance: {distance}, Radius: {radius}") if distance <= radius: return geofence["name"] return None previous_geofence = None while True: current_time = time.time() 70 device_id = "RPI123ABC" packetid = int(str(uuid.uuid4().int)[:8]) timestamp = datetime.utcnow().isoformat() + "Z" # Check Heart Rate if current_time - last_heart_rate_check >= HEART_RATE_INTERVAL: wifi_data = get_wifi_data() lat, lng = get_location_from_wifi(wifi_data) # Check Heart Rate heart_rate_status = check_heart_rate() print(f"Heart Rate bpm - Status: {heart_rate_status}") if (heart_rate_status == "HIGH" or heart_rate_status == "LOW"): packet = { "packet_id": packetid, "timestamp": timestamp, "device_id": device_id, "user_info": USER_INFO, "alert_type": "HEART RATE", "device_firmware_version": "V1.17", "message": f"Heart Rate - Status: {heart_rate_status}", "severity": "HIGH", "critical_data": "Heart Rate warning unhealthy level", "location": {"latitude": lat, "longitude": lng}, "health_logs": health_logs } print(len(packet)) send_alert("Heart Rate", packet, packetid, timestamp) last_heart_rate_check = current_time # Check Blood Pressure if current_time - last_blood_pressure_check >= BLOOD_PRESSURE_INTERVAL: wifi_data = get_wifi_data() lat, lng = get_location_from_wifi(wifi_data) blood_pressure_status = check_blood_pressure() print(f"Blood Pressure - Status: {blood_pressure_status}") if (blood_pressure_status == "HIGH" or blood_pressure_status == "LOW"): packet = { "packet_id": packetid, "timestamp": timestamp, "device_id": device_id, "user_info": USER_INFO, "alert_type": "BLOOD PRESSURE", "device_firmware_version": "V1.17", "message": f"Blood Pressure - Status: { blood_pressure_status}", "severity": "HIGH", "critical_data": "Blood Pressure warning unhealthy level ", "location": {"latitude": lat, "longitude": lng}, "health_logs": health_logs } print(len(packet)) 71 send_alert("Blood Pressure", packet, packetid, timestamp) last_blood_pressure_check = current_time # Check GeoFence try: if current_time - last_geofence_check >= GEOFENCE_INTERVAL: wifi_data = get_wifi_data() # Print the Wi-Fi data # ˜ print("Wi-Fi Data:", wifi_data) if wifi_data: # Send Wi-Fi data to Google API to get location lat, lng = get_location_from_wifi(wifi_data) current_geofence = is_inside_geofence(lat, lng, geofences) print(current_geofence) if current_geofence != previous_geofence: if current_geofence is not None: print(f"Device just arrived at: { current_geofence}") packet = { "packet_id": packetid, "timestamp": timestamp, "device_id": device_id, "user_info": USER_INFO, "alert_type": "GEOFENCE ARRIVAL", "device_firmware_version": "V1.17", "message": f"Device just entered geofence: { current_geofence}", "severity": "LOW", "critical_data": wifi_data, "location": {"latitude": lat, "longitude": lng}, "location_history": location_history } # ˜ print(len(packet)) send_alert("Geo Location (Arrival)", packet, packetid, timestamp) else: print(f"Device just left geofence: { previous_geofence}") packet = { "packet_id": packetid, "timestamp": timestamp, "device_id": device_id, "user_info": USER_INFO, "alert_type": "GEOFENCE DEPARTURE", "device_firmware_version": "V1.17", "message": f"Device just left geofence: { previous_geofence}", "severity": "HIGH", "critical_data": wifi_data, 72 "location": {"latitude": lat, "longitude": lng}, "location_history": location_history } print(len(packet)) send_alert("Geo Location", packet, packetid, timestamp) previous_geofence = current_geofence else: print("No Wi-Fi networks found or unable to scan.") packet = { "packet_id": packetid, "timestamp": timestamp, "device_id": device_id, "user_info": USER_INFO, "alert_type": "GEOFENCE STATUS", "message": f"No Wi-Fi networks found or unable to scan.", "severity": "MEDIUM", "critical_data": wifi_data, "location": {"latitude": lat, "longitude": lng}, "location_history": location_history } print(len(packet)) send_alert("Geo Location (Departure)", packet, packetid, timestamp) last_geofence_check = current_time except Exception: print(Exception) time.sleep(1) # ˜ # time.sleep(60) A.6 Util Script (Publisher) Listing 2: Publisher Utility Script import json import zlib import json import gzip from LZ77 import LZ77_Json_Compressor from LZW import LZW_Json_Compressor from mqttPublisher import MQTTPublisher import csv import os import psutil import struct import tempfile class util: # Function to compute CRC32 checksum def compute_crc32(data): 73 # ˜ print("DATA TO COMPUTE_CRC32: ", data) return zlib.crc32(data) & 0xFFFFFFFF # Returns unsigned 32-bit value def compression_ratio(compressed_data, original_data): # Calculate metrics compressed_size = len(compressed_data) original_size = len(original_data) compression_ratio = original_size / compressed_size if compressed_size != 0 else 0 return compression_ratio def prep_raw_data(data, file_path="temp_packet.json"): print("Compressing RAW data") json_bytes = json.dumps(data).encode(’utf-8’) print("Size: ", len(json_bytes)) print("RAw Compression complete") checksum_crc32 = util.compute_crc32(json_bytes) ratio = util.compression_ratio(json_bytes, json_bytes) return json_bytes, checksum_crc32, ratio def compress_with_gzip(data): print("Compressing GZIP data") json_string = json.dumps(data) print("Size: ", len(json_string.encode(’utf-8’))) gzip_compress = gzip.compress(json_string.encode(’utf-8’)) checksum_crc32 = util.compute_crc32(json_string.encode(’utf-8’)) ratio = util.compression_ratio(gzip_compress, json_string.encode (’utf-8’)) return gzip_compress, checksum_crc32, ratio def compress_with_lz77(data): print("Compressing LZ77 data") compressor = LZ77_Json_Compressor(window_size=4096) # window_size is optional 4096 json_string = json.dumps(data) print("Size: ", len(json_string.encode(’utf-8’))) lz77_compress = compressor.compress(json_string) lz77_compress_serialized = lz77_compress.tobytes() checksum_crc32 = util.compute_crc32(json_string.encode(’utf-8’)) ratio = util.compression_ratio(lz77_compress_serialized, json_string.encode(’utf-8’)) return lz77_compress_serialized, checksum_crc32, ratio def compress_with_lzw(data): print("Compressing LZW data") json_string = json.dumps(data) print("Size: ", len(json_string.encode(’utf-8’))) lzw_compress = LZW_Json_Compressor.lzw_encoding(data) lzw_compress_serialized = struct.pack(f">{len(lzw_compress)}I", *lzw_compress) checksum_crc32 = util.compute_crc32(json_string.encode(’utf-8’)) ratio = util.compression_ratio(lzw_compress_serialized, json_string.encode(’utf-8’)) 74 return lzw_compress_serialized, checksum_crc32, ratio def sendData(compressedPacket, packet_id, timestamp, algo): try: client_id = "" broker_ip = "" print("Sending to mqtt") # Instantiate the producer producer = MQTTPublisher(client_id, broker_ip) try: # Connect to the broker producer.connect() print("Message packet: ", len(compressedPacket)) producer.send_compressed_data("test_status", compressedPacket, packet_id, timestamp, algo) except Exception as e: print(f"Error while sending MQTT message: {e}") finally: # Disconnect from the broker producer.disconnect() except Exception as e: print("Error: ", e) A.7 Compression Packet Tester Script (Publisher) Listing 3: Publisher Compression Packet Tester Script import psutil import sys import time import json import os class CompressionTester: def __init__(self, util, database): self.util = util self.databaseUtility = database def compress_and_send(self, algorithm_name, packet, compression_function, packet_id, timestamp): # CPU metrics before compression proc = psutil.Process(os.getpid()) cpu_before = psutil.cpu_percent(interval=1) start_cpu = proc.cpu_times() mem_before = proc.memory_info().rss # Compression start = time.time() compressed_data, checksum_crc32, ratio = compression_function( packet) 75 end = time.time() compression_time = end - start cpu_during = psutil.cpu_percent(interval=0) cpu_after = psutil.cpu_percent(interval=1) end_cpu = proc.cpu_times() mem_after = proc.memory_info().rss memory_used = mem_after - mem_before cpu_time_used = (end_cpu.user - start_cpu.user) + (end_cpu. system - start_cpu.system) cpu_percent = (cpu_time_used / compression_time) * 100 if compression_time > 0 else 0 # Metrics: Memory and size # ˜ memory_usage = print("Compressed data mem usage: ", sys.getsizeof( compressed_data)) size_in_bytes = len(compressed_data) # Send and calculate transmission time send_start = time.time() self.util.sendData(compressed_data, packet_id, timestamp, algorithm_name) send_end = time.time() mqtt_transmission_time = send_end - send_start # Throughput calculation throughput = size_in_bytes / mqtt_transmission_time if mqtt_transmission_time > 0 else 0.0 print("Sending DB DAta") # send_to_phase1_database, send_to_phase2_database self.databaseUtility.send_to_phase2_database( algorithm_name, packet_id, json.dumps(size_in_bytes), packet, size_in_bytes, compression_time, cpu_before, cpu_time_used, cpu_after, checksum_crc32, throughput, memory_used, ratio, mqtt_transmission_time ) A.8 MQTT Publisher Connection Script Listing 4: Publisher Connection Script import paho.mqtt.client as mqtt 76 import time import sys class MQTTPublisher: def __init__(self, client_id, broker_ip, broker_port=1883, keepalive =6000): self.client_id = client_id self.broker_ip = broker_ip self.broker_port = broker_port self.keepalive = keepalive self.client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2) self.client.will_set("client_status", "Client Disconnected Unexpectedly", qos=0) def connect(self): if self.client.connect(self.broker_ip, self.broker_port, self. keepalive) != 0: print("Could not connect to MQTT broker!") sys.exit(-1) self.client.loop_start() def publish(self, topic, message, qos=1): result = self.client.publish(topic, message, qos) result.wait_for_publish() if result.rc == 0: print("Message sent successfully!") else: print("Failed to send message.") def start_countdown(self, topic, countdown=10): for i in range(countdown, 0, -1): message = f"Time Left: {i}" print(message) self.publish(topic, message) time.sleep(1) # Send a quit signal after countdown self.publish(topic, "quit") def send_compressed_data(self, topic, compressed_data, packet_id, timestamp, algo): if isinstance(compressed_data, bytes): full_topic = f"{topic}/{packet_id}/{timestamp}/{algo}" print(f"Sending to topic: {full_topic}") self.publish(full_topic, compressed_data, qos=0) else: print("Compressed data must be in bytes format!") def disconnect(self): time.sleep(10) self.client.loop_stop() self.client.disconnect() 77 A.9 Database Connection Script (Publisher) Listing 5: Publisher Database Script import psycopg2 import json class databaseUtility: def __init__(self): self.SERVER = "" # Use your public IP or local IP if using a VPN/SSH Tunnel self.DATABASE = "" # Use any valid database name self.USERNAME = "" # Server username self.PASSWORD = "" # SQL Server password self.PORT = ’’ self.conn_string = f"host={self.SERVER} port={self.PORT} dbname ={self.DATABASE} user={self.USERNAME} password={self. PASSWORD}" def send_to_phase1_database( self, algorithm_name, packetid, compressed_data, packet, size_in_bytes, compression_time, cpu_before, cpu_time_used, cpu_after, checksum_crc32, throughput, mem_used, ratio, transmission_time ): # Connect to the database try: conn = psycopg2.connect(self.conn_string) print("Connected to postgres phase 1") cursor = conn.cursor() # Extract packet details (assuming packet is a JSON string) packet_data = json.dumps(packet) # Insert data into Phase1CompressionMetrics insert_query = """ INSERT INTO Phase1CompressionMetrics ( Algorithm, CheckSum, CompressedSize, CompressionRatio, CompressionTime, CPUUsageBefore, 78 CPUUsageDuring, CPUUsageAfter, Throughput, MemoryUsage, Packet, PacketId, Mqttpublishtime ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s ) """ data = ( algorithm_name, checksum_crc32, size_in_bytes, ratio, compression_time, cpu_before, cpu_time_used, cpu_after, throughput, mem_used, compressed_data, packetid, transmission_time ) cursor.execute(insert_query, data) # Commit the transaction conn.commit() # Close the connection cursor.close() conn.close() except Exception as e: print("ERROR: ", e) print("Phase 1 data successfully inserted into the database.") def send_to_phase2_database( self, algorithm_name, packetid, compressed_data, packet, size_in_bytes, compression_time, cpu_before, cpu_time_used, cpu_after, checksum_crc32, throughput, mem_used, ratio, 79 transmission_time ): try: # Connect to the database conn = psycopg2.connect(self.conn_string) print("Connected to postgres phase 2") cursor = conn.cursor() # Extract packet details (assuming packet is a JSON string) packet_data = json.dumps(packet) # Insert data into Phase2CompressionMetrics insert_query = """ INSERT INTO Phase2CompressionMetrics ( Algorithm, CheckSum, CompressedSize, CompressionRatio, CompressionTime, CPUUsageBefore, CPUUsageDuring, CPUUsageAfter, Throughput, MemoryUsage, Packet, PacketId, Mqttpublishtime ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s ) """ data = ( algorithm_name, checksum_crc32, size_in_bytes, ratio, compression_time, cpu_before, cpu_time_used, cpu_after, throughput, mem_used, compressed_data, packetid, transmission_time ) cursor.execute(insert_query, data) # Commit the transaction conn.commit() # Close the connection cursor.close() 80 conn.close() except Exception as e: print("Error: ", e) def insert_battery_test_phase1(self, algorithm_name, packet_data, uptime): try: print("Inserting batter test phase 1") conn = psycopg2.connect(self.conn_string) cursor = conn.cursor() battery_data = ( algorithm_name, packet_data, uptime ) battery_query = """ INSERT INTO Phase1BatteryTest (Algorithm, PacketId, Uptime) VALUES (%s, %s, %s) """ cursor.execute(battery_query, battery_data) # Commit the transaction conn.commit() # Close the connection cursor.close() conn.close() except Exception as e: print("Error: ", e) def insert_battery_test_phase2(self, algorithm_name, packet_data, uptime): try: print("Inserting batter test phase 2") # ˜ print(self.conn_string) conn = psycopg2.connect(self.conn_string) cursor = conn.cursor() battery_data = ( algorithm_name, packet_data, uptime ) battery_query = """ INSERT INTO Phase2BatteryTest (Algorithm, PacketId, Uptime) VALUES (%s, %s, %s) """ cursor.execute(battery_query, battery_data) 81 # Commit the transaction conn.commit() # Close the connection cursor.close() conn.close() except Exception as e: print("Error: ", e) A.10 LZ77 Script (Publisher/Subscriber) Listing 6: Publisher/Subscriber LZ77 Script # # Original code by Wesam Manassra (@manassra) and Thomas A. V. Sattolo (@tsattolo) # # GitHub Repository: https://github.com/manassra/LZ77-Compressor # # Modified for JSON compression and decompression import struct import json from bitarray import bitarray class LZ77_Json_Compressor: """ An improved LZ77 compressor for JSON data that uses a rolling dictionary for faster match lookups. """ MAX_WINDOW_SIZE = 4096 # Set a higher window size if needed def __init__(self, window_size=4096): self.window_size = min(window_size, self.MAX_WINDOW_SIZE) self.lookahead_buffer_size = 15 # Maximum match length (4-bit stored, so max extra is 15) self.min_match = 3 # Minimum match length def compress(self, input_string, verbose=False): data = input_string.encode(’utf-8’) output = bitarray(endian=’big’) i = 0 min_match = self.min_match lookahead = self.lookahead_buffer_size window_size = self.window_size # Dictionary mapping a pattern (of length min_match) to a list of positions dict_pat = {} # Helper to update the dictionary for a given position def update_dict(pos): if pos + min_match <= len(data): pat = data[pos:pos + min_match] if pat in dict_pat: dict_pat[pat].append(pos) else: 82 dict_pat[pat] = [pos] # Process each position in the data while i < len(data): best_distance = 0 best_length = 0 # Try to get a candidate match using the current min_match pattern if i + min_match <= len(data): pat = data[i:i + min_match] candidates = dict_pat.get(pat, []) # Iterate over candidate positions that are still in our window for candidate in candidates: if candidate < i - window_size: continue # candidate too old # Extend match length starting from candidate and current i length = min_match while (length < lookahead and i + length < len(data) and candidate + length < i and data[candidate + length] == data[i + length]) : length += 1 if length > best_length: best_length = length best_distance = i - candidate if best_length == lookahead: break # Found maximum possible match # If a valid match is found (we require at least min_match bytes) if best_length >= min_match: # Output a match token: 1-bit flag set to 1 output.append(1) # Pack distance (assume 12 bits) and (match length minus min_match) (4 bits) # We subtract min_match so that the stored value is in [0, lookahead - min_match] token_length = best_length - min_match byte1 = best_distance >> 4 # Upper 8 bits of 12bit distance byte2 = ((best_distance & 0xF) << 4) | (token_length & 0 xF) output.frombytes(bytes([byte1, byte2])) if verbose: print(f"<1, {best_distance}, {best_length}>", end=’ ’) # Update dictionary for all positions in the matched segment for j in range(i, i + best_length): update_dict(j) i += best_length 83 else: # No sufficient match found output a literal token ( flag 0 plus literal byte) output.append(0) output.frombytes(bytes([data[i]])) if verbose: print(f"<0, {chr(data[i])}>", end=’ ’) update_dict(i) i += 1 # Optionally, prune dictionary entries that are out of the window if i % 1000 == 0: for key in list(dict_pat.keys()): dict_pat[key] = [pos for pos in dict_pat[key] if pos >= i - window_size] if not dict_pat[key]: del dict_pat[key] output.fill() return output def decompress(self, bitstream, verbose=False): """ Decompresses a bitarray created by compress(). Expected format: - Literal: flag=0, followed by 8-bit literal. - Match: flag=1, followed by 2 bytes: * First 12 bits: distance * Last 4 bits: match length offset (actual match length = stored + min_match) """ output = bytearray() pointer = 0 min_match = self.min_match while pointer < len(bitstream) - 9: bits for a token flag = bitstream[pointer] pointer += 1 # Ensure at least enough if not flag: # Literal: next 8 bits if pointer + 8 > len(bitstream): break literal = bitstream[pointer:pointer+8].tobytes()[0] output.append(literal) pointer += 8 if verbose: print(f"Literal: {chr(literal)}", end=’ ’) else: # Match: next 16 bits (2 bytes) if pointer + 16 > len(bitstream): break 84 byte1 = bitstream[pointer:pointer+8].tobytes()[0] byte2 = bitstream[pointer+8:pointer+16].tobytes()[0] pointer += 16 distance = (byte1 << 4) | (byte2 >> 4) token_length = byte2 & 0xF match_length = token_length + min_match if verbose: print(f"Match: distance={distance}, length={ match_length}", end=’ ’) # Copy the matching substring using slice extension start = len(output) - distance output.extend(output[start:start+match_length]) return output.decode(’utf-8’, errors=’ignore’) A.11 LZW Script (Publisher/Subscriber) Listing 7: Publisher/Subscriber LZW Script """ LZW Compression and Decompression in Python This implementation is based on the C++ code from the GeeksforGeeks article: "LZW ( L e m p e l ZivWelch ) Compression technique" URL: https://www.geeksforgeeks.org/lzw-lempel-ziv-welch-compressiontechnique/ """ import json import struct class LZW_Json_Compressor: def lzw_encoding(input_json): """ LZW Encoding for JSON objects (handles UTF-8 bytes). """ # Ensure input is serialized and UTF-8 encoded if isinstance(input_json, dict) or isinstance(input_json, list): input_json = json.dumps(input_json) # Convert to UTF-8 bytes input_bytes = input_json.encode(’utf-8’) table = {bytes([i]): i for i in range(256)} works on bytes p = b"" code = 256 output_code = [] # LZW dictionary for char in input_bytes: # Process each byte separately pc = p + bytes([char]) if pc in table: 85 p = pc else: output_code.append(table[p]) table[pc] = code code += 1 p = bytes([char]) if p: output_code.append(table[p]) return output_code def lzw_decoding(compressed_bytes): """ LZW Decoding for JSON objects. Args: compressed_bytes (bytes): Compressed binary data. Returns: dict or list: Decoded JSON object. """ try: # Convert bytes back to a list of integers output_code = list(struct.unpack(f">{len(compressed_bytes) // 2}H", compressed_bytes)) # Rebuild the dictionary for LZW decompression table = {i: bytes([i]) for i in range(256)} # Store as bytes for UTF-8 safety old = output_code[0] decoded_bytes = bytearray(table[old]) # Use bytearray for correct UTF-8 handling count = 256 for code in output_code[1:]: if code in table: entry = table[code] elif code == count: entry = table[old] + bytes([table[old][0]]) # Special case: new sequence else: raise ValueError(f"Invalid LZW code: {code}") decoded_bytes.extend(entry) table[count] = table[old] + bytes([entry[0]]) count += 1 old = code # Convert decompressed bytes back to a string (UTF-8 safe) decoded_string = decoded_bytes.decode(’utf-8’) # Convert JSON string back to a Python object return json.loads(decoded_string) 86 except Exception as e: print(f"Decoding error: {e}") return None def lzw_decoding1(output_code): """ LZW Decoding for JSON objects. Args: output_code (list): Encoded output codes. Returns: dict or list: Decoded JSON object. """ # print("\nDecoding") table = {i: chr(i) for i in range(256)} with single-character mappings old = output_code[0] s = table[old] c = s[0] decoded_string = s # print(s, end="") count = 256 # Initialize dictionary for code in output_code[1:]: if code in table: entry = table[code] else: entry = table[old] + c print(entry, end="") decoded_string += entry c = entry[0] table[count] = table[old] + c count += 1 old = code # Deserialize string back into a JSON object return json.loads(decoded_string) A.12 Main Script (Subscriber) Listing 8: Main Subscriber Script # Import the Paho MQTT package. import paho.mqtt.client as mqtt from metricsCollector import MetricsCollector from subscriber_util import Decompressor from datetime import datetime import psycopg2 from bitarray import bitarray received_packets = 0 87 # The callback for when the client connects to the broker. def on_connect(client, userdata, flags, reason_code, properties): print("Connected To Broker Thesis") if reason_code == 0: print("Made successful connection") # After establishing a connection, subscribe to the input topic. client.subscribe("test_status/#") else: print(f"Failed to connect to Broker. Error code: {reason_code}") # The callback for when a message is received from the broker. def on_message(client, userdata, msg): global received_packets #compressed_payload = msg.payload received_packets += 1 print("Received compressed payload") print("Payload Size: ", len(msg.payload)) #algo = "LZW" packet_id = None timestamp = None topic_parts = msg.topic.split("/") if len(topic_parts) >= 4: packet_id = topic_parts[1] timestamp = topic_parts[2] algo = topic_parts[3] print(f" Received Packet ID: {packet_id}") print(f" Received Timestamp: {timestamp}") print(f" Received algorithm: {algo}") try: # Attempt decompression with each algorithm decompressed_data = None decompression_time = 0 algorithm_used = None if algo == "GZIP": try: decompressed_data, decompression_time = Decompressor. decompress_gzip(msg.payload) algorithm_used = "GZIP" print(f"Decompressed data (GZip)") except Exception: pass # Ignore and continue to the next algorithm if algo == "LZ77": # Stop trying if decompressed_data is set try: received_bits = bitarray(endian=’big’) received_bits.frombytes(msg.payload) decompressed_data, decompression_time = Decompressor. decompress_lz77(received_bits) algorithm_used = "LZ77" print(f"Decompressed data (LZ77)") except Exception as e: print(e) 88 pass # Ignore and continue if algo == "LZW": # Stop trying if decompressed_data is set try: decompressed_data, decompression_time = Decompressor. decompress_lzw(msg.payload) algorithm_used = "LZW" print(f"Decompressed data (LZW)") except Exception: pass # Ignore and continue if algo == "RAW": # Stop trying if decompressed_data is set try: print("Decompressing RAW") decompressed_data, decompression_time = Decompressor. read_raw_data(msg.payload) algorithm_used = "RAW" print(f"Raw JSON Data") except Exception: print("No algorithms left. Could not decompress.") return # Stop execution if no decompression was successful # Extract packet metadata # print("Packet: ", packet) print("Grabbing metrics") sent_timestamp = datetime.fromisoformat(timestamp.replace("Z", " +00:00")) compressed_size = len(msg.payload) # Calculate metrics latency = MetricsCollector.calculate_latency(sent_timestamp) throughput = MetricsCollector.calculate_throughput( compressed_size, latency) computed_checksum = MetricsCollector.data_integrity_check( decompressed_data, algo) # Print metrics print(f"Decompression Time: {decompression_time} seconds") print(f"Latency: {latency} seconds") print(f"Throughput: {throughput} bytes/sec") print(f"Data Integrity: {computed_checksum}") print(f"Packets received: {received_packets}") conn_string = "host=18.227.79.51 port=5432 dbname=ThesisResults user=postgres password=Beto201!" conn = psycopg2.connect(conn_string) cursor = conn.cursor() print("Connected to Postgres") # Phase1SubscriberMetrics, Phase2SubscriberMetrics insert_query = """ INSERT INTO Phase2SubscriberMetrics ( 89 Algorithm, Latency, DecompressionTime, Throughput, Crc32, PacketsReceived, PacketId ) VALUES (%s,%s,%s,%s,%s,%s,%s) """ data = ( algorithm_used, latency, decompression_time, throughput, computed_checksum, received_packets, packet_id ) cursor.execute(insert_query, data) conn.commit() except Exception as e: print(f"Error decompressing payload: {e}") if msg.payload == b"quit": client.disconnect() if __name__ == "__main__": # Define an Id for the client to use. Id = "" # Define the Ip address of the broker. Ip = "" # Create a client. client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2) # Set the callback functions of the client for connecting and incoming messages. client.on_connect = on_connect client.on_message = on_message # Then, connect to the broker. client.connect(Ip, 1883, 60) # Finally, process messages until a ‘client.disconnect()‘ is called. client.loop_forever() A.13 Subscriber Metrics Collector Script (Subscriber) Listing 9: Subscriber Metrics Collector Script from datetime import datetime, timezone from subscriber_util import Decompressor import json class MetricsCollector: @staticmethod 90 def calculate_latency(sent_timestamp): try: # Convert both timestamps to seconds since the epoch sent_timestamp_seconds = sent_timestamp.timestamp() received_timestamp_seconds = datetime.now(timezone.utc). timestamp() # Calculate latency latency = received_timestamp_seconds sent_timestamp_seconds # Debugging output print(f"Sent Timestamp (seconds): {sent_timestamp_seconds}") print(f"Received Timestamp (seconds): { received_timestamp_seconds}") print(f"Latency: {latency:.3f} seconds") return latency except Exception as e: print(f"Error in latency calculation: {e}") @staticmethod def calculate_throughput(compressed_size, latency): if latency > 0: return compressed_size / latency return 0.0 @staticmethod def data_integrity_check(decompressed_data, algo): computed_checksum = Decompressor.compute_crc32(decompressed_data , algo) return computed_checksum 91 |
| Format | application/pdf |
| ARK | ark:/87278/s6xbk0nz |
| Setname | wsu_smt |
| ID | 153459 |
| Reference URL | https://digital.weber.edu/ark:/87278/s6xbk0nz |



