Simplified menu-driven data analysis tool with macro-like automation
- Authors: Kazembe, Luntha
- Date: 2022-10-14
- Subjects: Data analysis , Macro instructions (Electronic computers) , Quantitative research Software , Python (Computer program language) , Scripting languages (Computer science)
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/362905 , vital:65373
- Description: This study seeks to improve the data analysis process for individuals and small businesses with limited resources by developing a simplified data analysis software tool that allows users to carry out data analysis effectively and efficiently. Design considerations were identified to address limitations common in such environments, these included making the tool easy-to-use, requiring only a basic understanding of the data analysis process, designing the tool in manner that minimises computing resource requirements and user interaction and implementing it using Python which is open-source, effective and efficient in processing data. We develop a prototype simplified data analysis tool as a proof-of-concept. The tool has two components, namely, core elements which provide functionality for the data anal- ysis process including data collection, transformations, analysis and visualizations, and automation and performance enhancements to improve the data analysis process. The automation enhancements consist of the record and playback macro feature while the performance enhancements include multiprocessing and multi-threading abilities. The data analysis software was developed to analyse various alpha-numeric data formats by using a variety of statistical and mathematical techniques. The record and playback macro feature enhances the data analysis process by saving users time and computing resources when analysing large volumes of data or carrying out repetitive data analysis tasks. The feature has two components namely, the record component that is used to record data analysis steps and the playback component used to execute recorded steps. The simplified data analysis tool has parallelization designed and implemented which allows users to carry out two or more analysis tasks at a time, this improves productivity as users can do other tasks while the tool is processing data using recorded steps in the background. The tool was created and subsequently tested using common analysis scenarios applied to network data, log data and stock data. Results show that decision-making requirements such as accurate information, can be satisfied using this analysis tool. Based on the functionality implemented, similar analysis functionality to that provided by Microsoft Excel is available, but in a simplified manner. Moreover, a more sophisticated macro functionality is provided for the execution of repetitive tasks using the recording feature. Overall, the study found that the simplified data analysis tool is functional, usable, scalable, efficient and can carry out multiple analysis tasks simultaneously. , Thesis (MSc) -- Faculty of Science, Computer Science, 2022
- Full Text:
- Date Issued: 2022-10-14
Large and multi scale mechanistic modeling of Diels-Alder reactions
- Authors: Isamura, Bienfait Kabuyaya
- Date: 2022-04-06
- Subjects: Computational chemistry , Diels-Alder reaction , Python (Computer program language) , Reaction force theory , Fullerenes , Diolefins , AMADAR (Automated workflow for Mechanistic Analysis of Diels-Alder Reactions , ONIOM
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/232317 , vital:49981
- Description: The [4+2] cycloaddition reaction between conjugated dienes and substituted alkenes is known as the Diels-Alder (DA) reaction, in honor of two German chemists, Otto Diels and Kurt Alder, who first reported this marvelous chemical transformation. The DA reaction is one of the most popular reactions in organic chemistry, allowing for the regio- and stereospecific establishment of six-membered rings with up to four stereogenic centers. This pericyclic reaction has found many applications in areas as diverse as natural products chemistry, polymer chemistry, and agrochemistry. Over the past decades, the mechanism of the Diels-Alder (DA) reaction has been the subject of numerous studies, dealing with questions as diverse as the mechanistic pathway, the synchronicity, the use of catalysts, the effect of solvents and salts, etc. On the other hand, as an example, fullerenes (and particularly [60] fullerene) have been found to act as good dienophiles in DA reactions to the extent that many functionalized fullerenes with interesting applications are still synthesized by reacting C60 with dienes. However, despite the very abundant literature about the mechanism of the DA reaction, some pertinent questions have been still pending, including, without being restricted to, the prediction of transition state (TS) geometries and the modeling of DA reactions involving large systems, such as those of C60 fullerene. It must be emphasized that TSs are not easy to predict and the main reason is that many existing algorithms require that the search is initiated from a good starting point (guess TS), which must be very similar to the actual TS. This problem is even more difficult when many TSs are to be located as may be the case in large-scale studies. Moreover, due to the large size of the C60 molecule, the usage of accurate high-level computational methods in the investigation of its reactivity towards dienes is computationally costly, implying the need to find the best threshold between accuracy and computational cost. Therefore, the present study was carried out to contribute to solving the problems of large-scale prediction of DA transition state geometries and the multi-scale modeling of C60 fullerene DA reactions. To address the first problem (large-scale prediction of TSs), we have developed a python program named “AMADAR”, which predicts an unlimited number of DA transition states, using only the SMILES strings of the cycloadducts. AMADAR is customizable and allows for the description of intramolecular DA reactions as well as systems resulting in competing paths. In addition, The AMADAR tool contains two separate modules that perform reaction force analyses and atomic decomposition of energy derivatives from the predicted Intrinsic Reaction Coordinates (IRC) paths. The performance of AMADAR was assessed using 2000 DA cycloadducts and showed a success rate of ~ 95%. Most of the errors were due to basis set inconsistencies or convergence issues that we are still working on. Furthermore, a set of 150 IRC paths generated by the AMADAR program were analyzed to get insight into the (a)synchronicity of DA reactions. This investigation confirmed that the reaction force constant 𝜅 (second derivatives of the system energy with respect to the reaction coordinate) was a good indicator of synchronicity in DA reactions. A close inspection of the profile of 𝜅 has enabled us to propose an alternative classification of DA reactions based on their synchronicity degree, in terms of (quasi)-synchronous, moderate asynchronous, asynchronous, and likely two-steps DA reactions. Natural population analyses seemed to indicate that the global maximum of the reaction force constant could be identified with the formation of all the bonds in the reaction site. Finally, the atomic resolution of energy derivatives suggested that the mechanism of the DA reaction involves two inner elementary processes associated with the formation of each C-C bond. A striking mechanistic difference between synchronous and asynchronous DA reactions emerging from this study is that, in asynchronous reactions, the driving and retarding forces are mainly caused by the fast and slow-forming bonds (elementary process) respectively, while in the case of synchronous ones both elementary processes retard and drive the process concomitantly and equivalently. Regarding the DA reaction of C60 fullerene that was considered to illustrate the problem of multiscale modeling, we have constructed 12 ONIOM2 and 10 ONIOM3 models combining five semi-empirical methods (AM1, PM3, PM3MM, PDDG, PM6) and the LDA(SVWN) functional in conjunction with the B3LYP/6-31G(d) level. Then, their accuracy and efficiency were assessed in comparison with the pure B3LYP/6-31G(d) level considering first the DA reaction between C60 and cyclopentadiene whose experimental data are available. Further, different DFT functionals were employed in place of the B3LYP functional to describe the higher-layer of the best ONIOM partition, and the results obtained were compared to experimental data. At this step, the ONIOM2(M06-2X/6-31 G(d): SVWN/STO-3G) model, where the higher layer encompasses the diene and pyracyclene portion of C60, was found to provide the best tradeoff between accuracy and cost, with respect to experimental data. This model showed errors lower than 2.6 and 2.0 kcal/mol for the estimation of the activation and reaction enthalpies respectively. We have also demonstrated, by comparing several ONIOM2(DFT/6-31G(d): SVWN/STO-3G) models, the importance of dispersion corrections in the accurate estimation of reaction and activation energies. Finally, we have considered a set of 21 dienes, including anthracene, 1,3-butadiene, 1,3-cyclopentadiene, furan, thiophene, selenothiophene, pyrrole and their mono-cyano and hydroxyl derivatives to get insight into the DA reaction of C60 using the best ONIOM2(M06-2X/6-31 G(d): SVWN/STO-3G) model. For a given diene and its derivatives, the analysis of frontier molecular orbitals provides a consistent explanation for the substituent effect on the activation barrier. It revealed that electron-donating (withdrawing) groups such as -OH (–CN) cut down on the activation barrier of the reaction by lowering (extending) of the HOMOdiene – LUMOC60 gap and consequently enhancing (weakening) the interaction between the two reactants. Further, the decomposition of the activation energy into the strain and interaction components suggested that, for a given diene, electron-donating groups (here –OH) diminish the height of the activation barrier not only by favoring the attractive interaction between the diene and C60, but also by reducing the strain energy of the system; the opposite effect is observed for electron-withdrawing groups (here –CN). In contrast with some previous findings on typical DA reactions, we could not infer any general rule applicable to the entire dataset for the prediction of activation energies because the latter does not correlate well with either of the TS polarity, electrophilicity of the diene, or the reaction energy. , Thesis (MSc) -- Faculty of Science, Chemistry, 2022
- Full Text:
- Date Issued: 2022-04-06
Finite precision arithmetic in Polyphase Filterbank implementations
- Authors: Myburgh, Talon
- Date: 2020
- Subjects: Radio interferometers , Interferometry , Radio telescopes , Gate array circuits , Floating-point arithmetic , Python (Computer program language) , Polyphase Filterbank , Finite precision arithmetic , MeerKAT
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/146187 , vital:38503
- Description: The MeerKAT is the most sensitive radio telescope in its class, and it is important that systematic effects do not limit the dynamic range of the instrument, preventing this sensitivity from being harnessed for deep integrations. During commissioning, spurious artefacts were noted in the MeerKAT passband and the root cause was attributed to systematic errors in the digital signal path. Finite precision arithmetic used by the Polyphase Filterbank (PFB) was one of the main factors contributing to the spurious responses, together with bugs in the firmware. This thesis describes a software PFB simulator that was built to mimic the MeerKAT PFB and allow investigation into the origin and mitigation of the effects seen on the telescope. This simulator was used to investigate the effects in signal integrity of various rounding techniques, overflow strategies and dual polarisation processing in the PFB. Using the simulator to investigate a number of different signal levels, bit-width and algorithmic scenarios, it gave insight into how the periodic dips occurring in the MeerKAT passband were the result of the implementation using an inappropriate rounding strategy. It further indicated how to select the best strategy for preventing overflow while maintaining high quantization effciency in the FFT. This practice of simulating the design behaviour in the PFB independently of the tools used to design the DSP firmware, is a step towards an end-to-end simulation of the MeerKAT system (or any radio telescope using nite precision digital signal processing systems). This would be useful for design, diagnostics, signal analysis and prototyping of the overall instrument.
- Full Text:
- Date Issued: 2020
CubiCal: a fast radio interferometric calibration suite exploiting complex optimisation
- Authors: Kenyon, Jonathan
- Date: 2019
- Subjects: Interferometry , Radio astronomy , Python (Computer program language) , Square Kilometre Array (Project)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/92341 , vital:30711
- Description: The advent of the Square Kilometre Array and its precursors marks the start of an exciting era for radio interferometry. However, with new instruments producing unprecedented quantities of data, many existing calibration algorithms and implementations will be hard-pressed to keep up. Fortunately, it has recently been shown that the radio interferometric calibration problem can be expressed concisely using the ideas of complex optimisation. The resulting framework exposes properties of the calibration problem which can be exploited to accelerate traditional non-linear least squares algorithms. We extend the existing work on the topic by considering the more general problem of calibrating a Jones chain: the product of several unknown gain terms. We also derive specialised solvers for performing phase-only, delay and pointing error calibration. In doing so, we devise a method for determining update rules for arbitrary, real-valued parametrisations of a complex gain. The solvers are implemented in an optimised Python package called CubiCal. CubiCal makes use of Cython to generate fast C and C++ routines for performing computationally demanding tasks whilst leveraging multiprocessing and shared memory to take advantage of modern, parallel hardware. The package is fully compatible with the measurement set, the most common format for interferometer data, and is well integrated with Montblanc - a third party package which implements optimised model visibility prediction. CubiCal's calibration routines are applied successfully to both simulated and real data for the field surrounding source 3C147. These tests include direction-independent and direction dependent calibration, as well as tests of the specialised solvers. Finally, we conduct extensive performance benchmarks and verify that CubiCal convincingly outperforms its most comparable competitor.
- Full Text:
- Date Issued: 2019
De-identification of personal information for use in software testing to ensure compliance with the Protection of Personal Information Act
- Authors: Mark, Stephen John
- Date: 2018
- Subjects: Data processing , Information technology -- Security measures , Computer security -- South Africa , Data protection -- Law and legislation -- South Africa , Data encryption (Computer science) , Python (Computer program language) , SQL (Computer program language) , Protection of Personal Information Act (POPI)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/63888 , vital:28503
- Description: Encryption of Personally Identifiable Information stored in a Structured Query Language Database has been difficult for a long time. This is owing to block-cipher encryption algorithms changing the length and type of the input data when encrypted, which cannot subsequently be stored in the database without altering its structure. As the enactment of the South African Protection of Personal Information Act, No 4 of 2013 (POPI), was set in motion with the appointment of the Information Regulators Office in December 2016, South African companies are intensely focused on implementing compliance strategies and processes. The legislation, promulgated in 2013, encompasses the processing and storage of personally identifiable information (PII), ensuring that corporations act responsibly when collecting, storing and using individuals’ personal data. The Act comprises eight broad conditions that will become legislation once the new Information Regulator’s office is fully equipped to carry out their duties. POPI requires that individuals’ data should be kept confidential from all but those who specifically have permission to access the data. This means that not all members of IT teams should have access to the data unless it has been de-identified. This study tests an implementation of the Fixed Feistel 1 algorithm from the National Institute of Standards and Technology (NIST) “Special Publication 800-38G: Recommendation for Block Cipher Modes of Operation : Methods for Format-Preserving Encryption” using the LibFFX Python library. The Python scripting language was used for the experiments. The research shows that it is indeed possible to encrypt data in a Structured Query Language Database without changing the database schema using the new Format-Preserving encryption technique from NIST800-38G. Quality Assurance software testers can then run their full set of tests on the encrypted database. There is no reduction of encryption strength when using the FF1 encryption technique, compared to the underlying AES-128 encryption algorithm. It further shows that the utility of the data is not lost once it is encrypted.
- Full Text:
- Date Issued: 2018
NetwIOC: a framework for the automated generation of network-based IOCS for malware information sharing and defence
- Authors: Rudman, Lauren Lynne
- Date: 2018
- Subjects: Malware (Computer software) , Computer networks Security measures , Computer security , Python (Computer program language)
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/60639 , vital:27809
- Description: With the substantial number of new malware variants found each day, it is useful to have an efficient way to retrieve Indicators of Compromise (IOCs) from the malware in a format suitable for sharing and detection. In the past, these indicators were manually created after inspection of binary samples and network traffic. The Cuckoo Sandbox, is an existing dynamic malware analysis system which meets the requirements for the proposed framework and was extended by adding a few custom modules. This research explored a way to automate the generation of detailed network-based IOCs in a popular format which can be used for sharing. This was done through careful filtering and analysis of the PCAP hie generated by the sandbox, and placing these values into the correct type of STIX objects using Python, Through several evaluations, analysis of what type of network traffic can be expected for the creation of IOCs was conducted, including a brief ease study that examined the effect of analysis time on the number of IOCs created. Using the automatically generated IOCs to create defence and detection mechanisms for the network was evaluated and proved successful, A proof of concept sharing platform developed for the STIX IOCs is showcased at the end of the research.
- Full Text:
- Date Issued: 2018