Trying out Aho–Corasick algorithm

Estimated reading time: 3 mins

Today I found Aho–Corasick algorithm. I was instantly hooked on wanting to try it out. Since I am spending more time with Java, most of the day was spent playing with an implementation of this algorithm and maven.

I will tell you what, I was pleasantly surprised so far. I need to spend some more time on this but it looks like a promising solution. I wanted to use something like this on the elderoost project. So, here is what my MVP looks like so far.

For this problem I wanted to use a large piece of text as input. For example, I was reading the news earlier and I found a good candidate. I used the following news article as my input:

Ontario has promised to add 15,000 new long-term care beds across of the province in the next five years and increasing that to 30,000 in a decade.

Of the 6,000 being put in place immediately, 80 of them will be in Brant County.

“Your government is moving forward quickly to expand access to long-term care beds,” said Christine Elliott, the Deputy Premier and Minister of Health and Long-Term Care in a press release. “”More long-term care beds will help take the pressure off hospitals, end hallway medicine, allow doctors and nurses to work more efficiently and provide faster health care for Ontario patients and their families.”

The 80 beds will be shared between John Noble Home and Fox Ridge Care Community (Sienna Senior Living).

The province has also promised more than 1,100 beds and spaces in hospitals and community settings.

“This is good news for the people of Brantford-Brant,” said MPP Will Bouma in the release.

which I grabbed from here so you know it is very fresh — Published Monday, October 8, 2018 5:31PM EDT.

I then ran the MVP implementation against this input, and surprise surprise. I got the expected results of two hits on this article. Not only that, I also have the precise location in the article.

[672:687]=John Noble Home
[692:716]=Fox Ridge Care Community

And you know what? This solution brought me closer to what I wanted to do and how I wanted to do it. My previous implementation was using Elasticsearch but I never liked that approach. This is why that solution is not public. It did not seem to meet the requirements that I had and it did not produce repetitive results.

I’m excited spend some more time on this.

Oh yes, almost forgot. I used AhoCorasickDoubleArrayTrie implementation.

Maven frustrations

I’ve spent a minute today trying to sort out maven. Unfortunately, as I dug deeper, I found out that there was a play of two installations. I need to untangle this mess:

  $ echo $JAVA_HOME
  $ jenv enable-plugin maven
maven plugin activated
  $ maven
zsh: command not found: maven
  $ mvn
zsh: command not found: mvn
  $ jenv exec maven
jenv: maven: command not found
  $ jenv exec mvn
jenv: mvn: command not found

Today’s random link of the day: Dictionary of Algorithms and Data Structures (DADS).

I also changed the content of structured data for this page. Went from github.io to .me domain. If I recall correctly, I left it at github.io because I had difficulties with sharing my content. I tried it today and it seems to work just fine. If it ever breaks again, I will just go back =)

See you tomorrow.

· software engineering, algorithms

We announced a new product!