Prof. Gerhard Trippen delivering Workshop 3 in the 2022-2023 IMI BIGDataAIHUB’s Case Competition at the University of Toronto Mississauga. Topic: Pandas Advanced Data Manipulation and Text Analytics.

Pandas Advanced Data Manipulation and Text Analytics with Prof. Gerhard Trippen - IMI BIGDataAIHUB Technical Workshops

On January 10, the 2022-2023 IMI BIGDataAIHUB’s Case Competition’s third Technical Workshop was led by Professor Gerhard Trippen. The workshop presented key insights on text processing to aid case competition participants with their text data analysis.

 

Prof. Gerhard Trippen delivering Workshop 3 in the 2022-2023 IMI BIGDataAIHUB’s Case Competition at the University of Toronto Mississauga. Topic: Pandas Advanced Data Manipulation and Text Analytics.
Prof. Gerhard Trippen delivering Workshop 3 in the 2022-2023 IMI BIGDataAIHUB’s Case Competition at the University of Toronto Mississauga. Topic: Pandas Advanced Data Manipulation and Text Analytics.

 

Adding a CSV File

                The workshop began with an explanation of how to add a CSV file to Jupyter Hub and provided tips. For example, when adding a file ensure the filename matches exactly, as Python will not correct any spelling errors.

 

String Methods

                Moreover, in the workshop, Professor Trippen provided several examples of string methods and functions to manipulate text columns. For instance, stripping off extra spaces, string method to change the text to lower case, string method to remove extra punctuation, and string methods to remove “stop words” such as “the.”

 

Regular Expressions

Professor Trippen provided a basic introduction to Regular Expressions (RegExes). A RegEx helps match strings of text. For further assistance with Regular Expressions visit regex101.com. This website allows you to choose different languages, such as Python or Java to set up. As you try different RegExes you will see an explanation of what the character will accomplish.

 

Cleaning up Text and Implementing a Spell Checker

Professor Trippen demonstrated how to edit text when needing to delete characters, fix transposed characters, and replace characters. Finally, Professor Trippen described steps that can be taken to create a spell checker from scratch.