Music Mashups
Project Managers:Sabina Han; and Aryan Thanki |
|
|
Key Idea: Using content analysis of songs—music key, beats-per-minute, etc—and audio manipulation techniques such as pitch-shifting and sound-source separation, develop an environment that assists users in creating a musical mashup. You've listened to the example mashups through the above playlist link. Now think about a computer-assisted environment that would support the identification and blending of different songs into such mashups. As suggested in the "Key Idea" text, an extremely useful capability, which could help identify songs that are suitable to mashup, is the application of audio content analysis. Further, in situations where songs might not have the same tempo or be in the same key, but follow the same relative chord progression, then audio content processing techniques such as pitch-shifting and time-stretching could be used to further massage the audio from different sources, so they fit together harmoniously. Links are provided below to software libraries that include such capabilities. As a way of boosting access to content, where music content analysis has already been applied, a project worth taking a look at is AcousticBrainz, a companion project to MusicBrainz. These projects step you into the world of Linked Open Data. A range of software technologies have been developed to support Linked Open Data: one of them is SPARQL (pronounced sparkle) queries. Having stored your Knowledge Graph in a Triplestore, you can then search it using SPARQL queries. In the case of AcousticBrainz, this means access to extracted musical features such as key, and beats per minute (bpm); as well as higher-level musical content such as genre. MusicBrainz stores details on more than 44 million tracks, and AcousticBrainz has computed features on some 30 million of them! So, plenty to choose from to develop compatible/harmonious mashups! You will also want to be able to cut and mix audio files together. There are plenty of libraries around that let you read and write out audio formats, such as WAV and MP3. The basic act of cutting, copying, and mixing is fairly straightforward to implement yourself: this essentially boils down to manipulating arrays of numbers; however, if implementing yourself, you do need to pay attention to issues such as precision and arithmetic overflow, which can lead to issues in the audio such as clipping. In the long run, taking a "rolling your own" approach to implementation will likely become a notable time-sink, as you encounter and work through such issues. Drawing upon the subject of Digital Signal Processing (DSP), there is much more that can be done in terms of how audio can be manipulated than audio mixing, such as applying filters to the audio to change its acoustic properties. For any popular programming language you care to name, there will be a selection of DSP audio processing libraries available for it. These will do much of the heavy lifting for you—although it should be noted this is not a zero-cost move, as now you will need to invest time in learning how they work. Given that cutting and combining different audio tracks is also part of what is required for this project, it is recommended that the team undertaking this project also review open-source audio/music editing applications. It could very well be that such software includes audio processing capabilities—quite likely using these same general-purpose audio processing library. Note: While the emphasis for this project is primarily on audio processing, if a team member wanted to take on generating video that gets played along with the audio mashup, then that would definitely fit within the overall aims of the project. Potentially useful resources:
Additional Note 1: The Eurovision Linked Open Data Digital Library demoed in class is an example of a project that leverages Linked Open Data to form the Digital Library. It also re-expresses the amassed data as more nuanced Linked Open Data, which now includes musical data such as the tempo of the various song, and the voting data. It is this that drives the visualisation component of the site (such as those available through the Visualizer tab), as well as providing a SPARQL endpoint for others to use. Additional Note 2: the tools developed that support Linked Open Data, such as SPARQL, are applicable to a wide range of subject material, not just in the music domain. This should be borne in mind when reading other project descriptions, as they could equally be used to help develop solutions to other projects listed here, such as SuppSense and Online Financial Scams. |
|
Having the [Student] Time of Your Life
Project Managers: Hans Lomboy; and Justin Poutoa |
|
|
Key Idea: Create a digital environment that enables you to better organise your time and studies at university. Studying at Waikato—like at any other university—requires accessing information spread across multiple web-based systems. Keeping on top of the competing time commitments of student life depends on having a clear view of what needs attention at any given time. While the necessary information is technically available, its distribution across disparate systems makes it both tedious to piece together and easy to overlook important details—sometimes with dire consequences! The goal of this project is to develop a student-centric digital environment that consolidates key information into one place. More than just bringing information together, the system will actively support planning and managing commitments—so you can be truly Having the [Student] Time of your Life. In terms of an implementation strategy, one potentially useful approach is to use user-scripting. This is a technique that lets you splice bespoke Javascript that you have control over, into your web browser which then gets run when you visit other people's websites. You can achieve this by installing a browser extension such as TamperMonkey to your browser. Then you are free to install whatever userscripts you see fit, which are typically keyed to spring into life when you visit a website that matches certain regular expressions. As you are able to specify the JavaScript you would like to run, this means you are able to access the Document Object Model (DOM) for the webpage that has been loaded in. Connect this with a backend server that is under your control, and then you have a way to access the data across the various university web-based systems, and export the relevant data out—via your backend server—into a single unified location, which forms the source for your student-centric app. Potentially useful topics and resources:
|
|
Uni Carparking: Find the Gap
Project Managers: Heath Carter; and Edward Wilson |
|
|
Key Idea: Create a mobile app that, in real-time, helps you decide where to park at the university. Have you noticed the increase, from last year to this year, in the number of cars using general parking areas on the university campus? This can make planning where to park when you come onto campus a lot more fraught. Gates 1 and 2b are convenient spots to park to get to lectures in S-block and L-block, but is it worth the risk if you're arriving with only, say, 10 minutes to spare? If no spaces are free, then the cost of heading to somewhere else to park will likely mean you will miss the start of the lecture. The goal of this project is to develop a software solution to help alleviate this problem. Even if the university were to move to a system where every parking space has a sensor, so each gate entrance can display how many free spaces that area has, this would not likely resolve the problem effectively. A typical university day will be characterised by high activity around the start of each hour. At such times, having the number of available spaces displayed as you enter the parking area is likely a poor guide to the reality on the ground: how many cars are ahead of you? If there aren't that many cars also leaving in the next few minutes, then your excursion into the parking area will prove fruitless! Taking a multi-pronged approach is probably best. In lieu of having a sensor for each carpark, there are Computer Vision techniques that can detect vehicles in an image. Alternatively an IoT (Internet of Things) approach could be taken, where a sensor beam being broken is used. Either of these approaches could be used to count the number of cars arriving and leaving an area. This could then be used to model the changing patterns in utilisation of the parking. Add to this, a mobile app that students can install. At its most basic level, this app could exploit GPS to record where and when they arrive at the university. Analysis of such data could be used to determine whether they were successful in finding a park, or else had to head elsewhere. Further, if successfully parked on campus, as they get out of the car the app could prompt them to give feedback on how full the car park was getting. This—and any other techniques the team can devise to estimate car parking availability—can then be used to provide the travel planning part of the mobile app: when you are leaving, using a street-map based view, the app can factor in how long it will take you to get to university, and from there show you the recommended area to head for on campus, and the route to take to get there. Potentially useful resources:
|
|
Battle Fleet Sweep
Project Managers: Caleb Gilchrist; and Daniel Jensen |
|
|
Key Idea: Develop a video game that is a hybrid of the Battleships and Minesweeper games My suggestion for a game that makes use of the respective game mechanics for Battleships and Minesweeper is to imagine a board where you have deployed your fleet on the left-hand side of the board. Ahead, on the right-hand side of the board is where your opponent has deployed a set of mines. During gameplay, when it is your turn, you can either send over a missile to see if you can hit one of your opponent's ships, or click on a grid cell in your area of the ocean where the mines have been deployed to reveal information about mine proximity to that cell. Following the missile firing, or mine-sweeping operation, you then have to choose one of your ships to move: diagonally forward and up one square, directly forward, or diagonally forward and down one square. If that move takes you into the path of a mine, then that segment of the ship is destroyed, same as if it had been hit by a missile. Taking this as a starting premise, there will be a myriad of details to work through. A key task will be to establish if there is anything implicit in such a description that would make it impossible to win: say your opponent is able to create a solid vertical line of mines—can the rules be set or adapted to prevent such play. Or perhaps the general premise described means it would be a bit boring to play as there is a fairly mundane, obvious strategy to follow. Again is there any adjustment of the rules, or an additional dimension to the game that can be included to offset this? A chance-like element that can be included, where a very small number of mines can be positioned on the left-hand side of the board at the beginning that can't be swept for (sabotage!); or partway into the game radio communications become permissible—risky to use (radio silence preferred!) but if you know your campaign hasn't been going well, perhaps taking a risky move might pay off? The above remarks are intended as suggestions to get you thinking. Feel free to break away entirely from these ideas. The bottom line is, I'm looking for an interesting, engaging board game idea themed around the ideas of being in command of a fleet of navy vessels. |
|
Monster Mashup
Project Managers: Emma Campbell; and Rafeea Siddika |
|
|
Key Idea: Develop a digital version of the Monster Drawing Game, which makes use of Generative AI APIs to provide both a computer player, and optionally assist someone in drawing one of the parts of the monster. I conceive of this game as being played on tablets, to give users a decent screen size to work with that is touch-sensitive. The department's budget for the course will run to a couple of (inexpensive) tablets. But by picking a web-app based software development stack such as React or React-Native (see above), it should be possible to deliver on the game being playable on smartphones, laptops, and desktop PCs by giving some attention to it operating on small screens, and when input is via a computer-mouse rather than touch input. In thinking through how to realise such an implementation, various distinct roles emerge.
|
|
Online Financial Scams: Trust the Government ... to make a bodge of it
Project Managers: Mahaki Leach; and Tanner Rowe |
|
|
Key Idea: Empower a user's web browser with the functionality needed to alert a user if they ever visit a website that is listed on one of the many—high-value but albeit poorly and confusingly presented—watchlists that government agencies from around the world produce related to financial scams. This project is about engaging in practical software development, based on a research project investigation undertaken at Waikato, led by Assoc Prof Nichols. The following description is text from an article that this team has written, and which is currently under review for publication:
Financial fraud is on the rise around the world. A survey by Netsafe in 2023 revealed that New Zealanders potentially lost $2.05 billion to financial fraud in the year before. To counter financial scams industry regulators around the world publish investor alerts describing newly identified threats within their jurisdictions. These warnings typically identify the entities involved and provide relevant background information (e.g., website URLs, emails, telephone numbers, postal addresses etc.). But this is where we get to the part where government agencies around the world are really making a bodge of the whole thing. Nichols et al.'s investigation shows that there is no consistency—let alone standardisation—in how this information is published, with the majority of the information written in freeform text, making it difficult for software to be written that can use the published data to automate alerting users when they visit a website that has been flagged. In cases where an agency has published the data in a machine readable format, there is more evidence of bodging it, as a review of this data showed that many of the provided fields were empty, with the exception of one field where the values of all the other fields appear to have been smushed into it, haphazard, with a random sprinkling of non-printable markup characters to boot. On the flipside, web browsers such as Google Chrome and Microsoft Edge have features built in that can be triggered when a user visits one of their listed fraudulent websites. But this list that these browsers use is opaque. It's not clear where the information is sourced from, and it is certainly the case that it is not the same as the government-compiled lists. This was confirmed by Nichols who accessed a variety of sites taken from government lists, to see if this triggered the browser's financial scam alert. It did not! In terms of developing a practical solution, the recent developments in AI have significantly improved the accuracy and flexibility of Natural Language Processing (NLP). In essence, this is a field of computing that takes text written for human consumption and turns it into a machine readable form. Its classical form takes a sentence and labels it into parts of speech: nouns, verbs, adjectives and so forth. It can also be used to perform sentiment analysis of text, and—at the outset, likely the most useful one—Named Entity Recognition. A principled way to identify within free text the names of entities such as people, places, and companies. As a way to combine the developed software into a functioning web browser, consider developing either a browser extension or using a user-scripting approach such as TamperMonkey (see Having the [Student] Time of Your Life above). Note: the title I have currently given this project is a bit of a mouthful. For your consideration, how about, Madoff with My Money! or Mmm! for short. |
|
That would be a D'oh from me!
Project Managers: Irene Paul; and Aditya Sharma | |
|
Key Idea: The aim of this project is to add to a desktop environment the ability to control the windows and tabs by speaking to in naturally: bring back that tab that just closed! When I'm interacting with my desktop with the mouse, sometimes the actual click I've just done didn't quite go how I planned, and the wrong thing in terms of windowing has just happened: I was off by a few pixels when I clicked, or something like a popup window suddenly appeared just as I was clicking. In my mind I'm thinking, that's not what I meant to happen, what I actually want is .... The aim of this project is to add to a desktop windowing environment the ability to control what happens to the windows by voice: undoing the last thing that happened (in the event of a mistake) would be a game changer, but the project should also look to support a range of regular desktop operations as well, spoken in a way that is natural for the user. I originally considered calling this project D'oh, paying homage to Homer Simpson's often-said phrase when things go wrong for him. However, as the project also includes the idea of supporting regular desktop interactions, but voiced naturally—rather than enforced keyword-restricted vocabulary, such as being forced to say stilted words like File➵Edit➵Copy—I felt it didn't fully embody what the project is looking to achieve. This is what led to the evolved title of That would be a D'oh from me!. Next I considered naming the software system to be developed, VALET—for Voice Activated natural Language desktop windowing EnvironmenT. A bit of a tortuous acronym, I admit, but it would lead to being able to say things like Hey Valet, restore that tab I just closed, and Hey Valet, iconify all the windows that are open. But then I got to thinking that locking in the name of the computer system in this way, forcing the user to have to use this name, is antithetical to the "speak how you want" aspect of the project. It might be something the Big Tech companies force you to do (Hey, Siri) continually requiring you to reinforce their chosen brand-name recognition. But that's not the game we're playing here. Sure, start with Valet as the name if you like, but include a capability to change this, if the user so wishes. Enough chit-chat. Let's talk specifics. This is a project where you would get into the nuts and bolts of a reasonably intricate piece of software: a desktop window manager, figuring out the key places to splice in the additional capabilities That would be a D'oh from me brings to the table. The desktop window manager for Windows and for MacOS are closed-sourced, so the obvious place to go, to experiment with this project idea, is GNU/Linux and the open-source desktop window managers that have been written for that. Yes, I'm afraid that is plural (window managers), so one of the steps needed in this project is to invest some time assessing the options that are "on offer" as it were. As the elements of a Window Manager are strongly connected to the underlying Operating System, a lot of them are written in the C programming language. Gnome Shell/Mutter is an interesting project in this space. It is written primarily in C, but includes the ability to control some of its functionality using JavaScript through its extension/plug-in architecture. In terms of the voice activation side of this project, as mentioned at the start, I am looking for the software to support a natural way for users to speak their instructions. The tasks of both text-to-speech and grammatically understanding text that makes up a sentence fall under the topic of Natural Language Processing (NLP). For the former, OpenAI's Whisper API has a reputation for being fast and accurate, and is notable as it can be set up to run locally on your computer, rather than needing to transfer the file to a cloud-based server. For the task of grammatically recognising the words spoken in natural text, the particular NLP tool that does this is called a Part of Speech (PoS) Tagger. Again there are many libraries and packages around that do this. If you would like to get a general sense of how this works out in practice, then you can experiment with one of the online demonstration sites, such as the following one based on the Stanford PoS Tagger, and another from CMU. The CMU one, while admittedly quite a bit older, is useful as it gets you used to the underlying PoS nomenclature for tag names that PoS taggers use. Another possible route to achieve the necessary NLP is to make use of LLM capabilities. With enough context, and the right prompting, it is certainly possible to take the recognised spoken audio, and have it processed by an LLM and returned in a machine-readable format, such as JSON, that effectively expresses what windowing event/functionality should be actioned. However, a potential show-stopper for using this approach is the length of time it would take the LLM to generate the JSON. If it takes 20 seconds or more to get the JSON back, then the viability of VALET as a useful ability to augment a desktop environment becomes questionable, because—given that length of time of VALET to respond—the user likely has other ways of achieving what they want in less time. There are plenty of articles posted on the topic of speeding up LLM inference. It should also be possible to exploit aspects of the envisioned project to your advantage. For example, if a user is unaware that there is a feature that allows them to minify all their windows in one go, then speaking such a command will be of benefit. Better still if the VALET-enhanced interface highlights which desktop feature it is activating in response to the command, so the user can learn about the feature. Further, if VALET keeps a cache of the spoken commands and the returned JSON, then when the user later on says the same thing, then the LLM step can be short-circuited. Or what about the value of being able to issue commands that the desktop interface doesn't directly support, such as close all my windows except Zoom? |
|
SuppSense/Truth in a Tub
Project Managers: William Malone; and Luka Milosevic |
|
|
Key Idea: Combining supplement and sense to help imply to the user that the app provides real science-backed information on the products they scan Alternative project name: Truth in a Tub—a play on words, as many sports supplements come in tubs and we are providing the truth about those supplements. Overview: The app would contain a database of sports supplements that a name and barcode could reference. This could be updated by users with our permission, as new supplements are produced and/or previously unknown brands or supplements are discovered. The app would then go through the ingredient list providing information on whether the ingredients within the product are scientifically backed. Additionally, it would allow the user to see a range of these papers sourced from Google Scholar. Examples of this may include:
These, as well as a range of other myths, could be debunked with the use of this app. |
Smoke and Mirror Projects: From the Vaults
The Smoke and Mirrors brand has been a signature component of
the Software Engineering programme at the University of Waikato from
its inception. First run in 2003, it started life as a group-based
project that 1st Year SE students, who had been learning to program
in the C++ language, undertook. In 2010 it moved to the 2nd year
level, with Java being the programming language taught, where it
has remained since.
It is one of the great pleasures in my job at Waikato
to be involved in providing the
Smoke and Mirrors experience for our SE students, and for so
many years—for all of the years the projects have run, in fact!
There even came a point where I would be on sabbatical in the
semester where Smoke and Mirrors was scheduled to run; however,
a year in advance the department had changed the semester it ran in,
so I could continue running the projects.
I haven't been able to locate any written record of the projects run
in that first year, sadly. One from that year that does come to
mind, however, was a team developing a software-based
arbitrary-precision calculator. As part of their presentation to the
department at the end of the semester, they demonstrated their
GUI-based calculator
calculating π to ... well ... a very high precision! For the
years 2004–2009 I have been able to track down the titles of
the projects that ran, which at least hints at the variety of
projects undertaken. For more recent years, I still have the project
briefs that the students start with when the projects are launched.
With a nod to posterity, here are the projects by year,
working back from newest to oldest.
|