Career Summary

My name is Jingshu Liu (don't be confused by the Pinyin script, it can be simply pronounced as "jing-shoe liou"). With hands-on capability on machine learning and deep learning, I am currently working at Easiware-Dictanova as a data scientist while preparing my Ph.D. in natural language processing advised by Emmanuel Morin. My job and research include NLP and machine learning, focusing on cross-lingual applications and sequence modeling with transfer learning using pre-trained language models. Besides, I am broadly interested in applied machine learning in real life scenarios and distributed system learning. PDF version can be downloaded.

Work Experience

Data scientist/Machine learning

2017 - Present
  • Built from scratch a bilingual neural network based word and phrase embedding mapping framework in Java with Deeplearning4j-0.91.
  • Implemented topic model pipelines using clustering on pre-trained unified phrase embeddings.
  • Provisioned sparse matrix support and other mathematical optimizations in Nd4j-0.91.
  • Designed and built an encoder-decoder framework for sequence modeling with Pytorch-1.2. Fully campatible in CPU and GPU mode which runned in OVH cloud server using the manage tools openstack and nvidia gpu cloud.
  • Incoporated pre-trained Transformer based language models into our neural networks for real life scenarios.
  • Fine-tuned pre-trained language models for diologues.

Achievements:

  • Improved the bilingual multi-word and single-word lexicon induction by an average of 22 points in MAP on client data.
  • The new topic modeling system replaced the existing rule-based topic modeling system.

Environments:

  • Java
  • Python
  • Pytorch
  • Deeplearning4j
  • Keras
  • Scikit Learn
  • OpenStack-Docker

Natural Language Processing Intern

2016
  • Implemented a term extraction and Aspect Based Sentiment Analysis pipeline for simplified and traditional Chinese language in Java with UIMA architecture and ElasticSearch storage.
  • Improved Chinese language preprocessing (POS-tagging) for FNLP. Meanwhile, added an innovative Chinese lemmatizer for reduplicated words.
  • Data cleaning and visulization using Pandas and R.

Achievements:

  • Achieved state-of-the-art results on Aspect Based Sentiment Analysis on Semeval2016 challenge.
  • Improved the term extraction accuracy by 50%.

Environments:

  • Java
  • UIMA
  • Deeplearning4j
  • Python
  • R
  • ElasticSearch

Natural Language Processing Intern

2015
  • Collaborated with researchers in Duel Project on humain dialogue classification.
  • Annotated sentiment analysis corpora with Brat.

Environments:

  • Java-Corenlp
  • Perl
  • Python
  • Numpy
  • Brat

Education

PHD candidate in NLP

2017 - Present (Expected to graduate in January, 2020)
Thesis title: Unsupervised cross-lingual representation modeling for variable length phrases.
  • Unsupervised bilingual phrase alignment.
  • Monolingual sequence modeling with RNN, CNN, LSTM and modern Transformer based architecture.
  • Bilingual word embedding.
  • Data augmentation/selection for low-resource scenario.

Results:

  • Improved state-of-the-art results on phrase synonymy by almost 33% on low-resourced specialized domain corpora.
  • Achieved state-of-the-art results on bilingual word mapping.
  • Proposed a new tree-free graph based neural network for encoding short sequences including single-words. It outperformed state-of-the-art results on unsupervised bilingual phrase mapping by an average of 8.8 points in MAP while holding a comparable results for the single-word subset.

Master in NLP

2014-2016

Notable courses:

Machine learning; Statistics; Algorithm; 1st order logic; Text mining.

BS in Applied Mathmatics

2012-2014

Notable courses:

Linear algebra; Mathmatical analysis; Java programming; C programming; Probability theory; HTML/CSS/PHP; MySQL

Exchange Program

2011-2012

BS in French Language and Finance

2008-2012

Notable courses:

French; Statistics; Visual basic programming; Accounting.

Publications

Skills & Tools

Backend

  • Java8
  • Python3
  • C
  • Perl
  • PHP

Machine Learning Framework

  • Deeplearning4j
  • Pytorch
  • Keras
  • R
  • Tensorflow
  • Torch

Others

  • HTML
  • CSS
  • MySQL
  • LaTeX
  • HDFS Hadoop
  • Git
  • Unit Test
  • Agile
  • Gradle
  • Lua
  • Neo4j
  • ElasticSearch

Other Projects

  • Poem bot
    Built and trained in Java a peom bot which can generate the next second part of a couplet given the first one. [code]
  • Hackthon CafData 2015
    Built from scratch in 48h a waiting time prediction system in Python based on the data given by la Caf in a hackthon competition.
  • Gounki game
    Implemented a Gounki game in C.
  • Sheep and wolf evolution game
    Implemented an evolution game in Java with a minimal UI.
  • Recipe website
    Built a recipe website with Mysql and PHP which was hosted in the campus network of Paris Dederot University. A student can register to find others who can teach him the recipes he wants to learn.

Volunteer Experience

Language

  • Chinese (Native)
  • English (Professional)
  • French (Professional)

Interests

  • Badminton, Basketball, Running
  • Language, History
  • Board & Video game