A network of connected dots, visualizing natural language processing
May 19, 2020 / in Computer Science, / by Number8

Alexis Alulema - The first time someone hears about Natural Language Processing (NLP), they often don’t consider it to be overly complicated. After some additional consideration, though, they’ll realize how much the AI community has worked on the topic. The technology is even appearing in headlines like this one, as featured on sciencealert.com: Google’s AI has Learned to Become ‘Highly Aggressive’ in Stressful Situations.

As the above article suggests, this technology could become dangerous in the wrong hands or if managed incorrectly. However, I believe that if we get a better understanding of the nuts-and-bolts of NLP, we will conclude that there is nothing ‘magic’ under the hood. Rather, there is a growing, rigorous body of scientific work on creating better algorithms to process data and generate impressive solutions.

Neural Networks have started to become ubiquitous ever since Deep Learning experienced considerable growth with big data, cloud computing, and the appearance of Deep Learning frameworks like TensorFlow or PyTorch. These advancements are visible in areas like Computer Vision (CV), where predictions are incredibly precise and reach 95-98% accuracy. This precision is at the core of impressive advancements in face detection, artificial face generation, self-driving cars, and many other implementations.

Natural Language Processing

In comparison, NLP currently reaches accuracy rates of around 80%. At first glance, it may appear that NLP is not as good as CV. However, I don’t believe this is a fair evaluation, as language is ambiguous and context-bound. For example, a native speaker of a language can often use and understand regional slang and idiomatic expressions comfortably. In comparison, a non-native speaker may have trouble understanding these expressions and might express similar ideas in a different way. These and other factors create a massive task for prediction algorithms, which must attempt to understand idiomatic expressions and much more due to the complexity of language. The best-rated algorithms like GPT-2 require more than 150 GB of data to train the algorithm; for context, 150 GB of training data means days of training neural networks algorithms on super-computers.

I implore you to dive deeper into this topic by reviewing my recently published paper, Deep Learning Methods in Natural Language Processing. In the paper, I attempt to classify the most relevant NLP Deep Learning methods to understand how to use them and identify which situations are best suited for specific NLP methods.

Who would benefit from this content

This paper is a state-of-the-art document that will serve as a starting point for researchers or developers who want to become familiar with the broad spectrum of NLP techniques and how to optimally apply them in different scenarios.

Alexis Alulema
Senior Software Architect and Machine Learning Engineer

LinkedIn: https://www.linkedin.com/in/alulema/
Twitter: @alulema
Personal Website: https://alexisalulema.com/

Share this entry
GET STARTED TODAY

We’d Love To Schedule A Time To Talk.

Provide your information to talk with a number8 Relationship Manager about your development needs today and feel what it’s like to be listened to before being sold a solution.

  • This field is for validation purposes and should be left unchanged.

We’re Everywhere

number8’s onshore office is located in Louisville, Kentucky where our Account and Relationship Managers work hard to provide all of our clients with exceptional customer service. We also have consultant offices located in Escazú, Costa Rica and San Pedro Sula, Honduras that give us a strong local presence allowing for top-level recruitment, technical training and low employee turnover.

Our Locations
orange_Pin-10Feb