
Career Summary
My name is Jingshu Liu (don't be confused by the Pinyin script, it can be simply pronounced as "jing-shoe liou"). With hands-on capability on machine learning and deep learning, I am currently working at Easiware-Dictanova as a data scientist while preparing my Ph.D. in natural language processing advised by Emmanuel Morin. My job and research include NLP and machine learning, focusing on cross-lingual applications and sequence modeling with transfer learning using pre-trained language models. Besides, I am broadly interested in applied machine learning in real life scenarios and distributed system learning. PDF version can be downloaded.
Work Experience
- Built from scratch a bilingual neural network based word and phrase embedding mapping framework in Java with Deeplearning4j-0.91.
- Implemented topic model pipelines using clustering on pre-trained unified phrase embeddings.
- Provisioned sparse matrix support and other mathematical optimizations in Nd4j-0.91.
- Designed and built an encoder-decoder framework for sequence modeling with Pytorch-1.2. Fully campatible in CPU and GPU mode which runned in OVH cloud server using the manage tools openstack and nvidia gpu cloud.
- Incoporated pre-trained Transformer based language models into our neural networks for real life scenarios.
- Fine-tuned pre-trained language models for diologues.
Achievements:
- Improved the bilingual multi-word and single-word lexicon induction by an average of 22 points in MAP on client data.
- The new topic modeling system replaced the existing rule-based topic modeling system.
Environments:
- Java
- Python
- Pytorch
- Deeplearning4j
- Keras
- Scikit Learn
- OpenStack-Docker
- Implemented a term extraction and Aspect Based Sentiment Analysis pipeline for simplified and traditional Chinese language in Java with UIMA architecture and ElasticSearch storage.
- Improved Chinese language preprocessing (POS-tagging) for FNLP. Meanwhile, added an innovative Chinese lemmatizer for reduplicated words.
- Data cleaning and visulization using Pandas and R.
Achievements:
- Achieved state-of-the-art results on Aspect Based Sentiment Analysis on Semeval2016 challenge.
- Improved the term extraction accuracy by 50%.
Environments:
- Java
- UIMA
- Deeplearning4j
- Python
- R
- ElasticSearch
- Collaborated with researchers in Duel Project on humain dialogue classification.
- Annotated sentiment analysis corpora with Brat.
Environments:
- Java-Corenlp
- Perl
- Python
- Numpy
- Brat
Education
- Unsupervised bilingual phrase alignment.
- Monolingual sequence modeling with RNN, CNN, LSTM and modern Transformer based architecture.
- Bilingual word embedding.
- Data augmentation/selection for low-resource scenario.
Results:
- Improved state-of-the-art results on phrase synonymy by almost 33% on low-resourced specialized domain corpora.
- Achieved state-of-the-art results on bilingual word mapping.
- Proposed a new tree-free graph based neural network for encoding short sequences including single-words. It outperformed state-of-the-art results on unsupervised bilingual phrase mapping by an average of 8.8 points in MAP while holding a comparable results for the single-word subset.
Notable courses:
Machine learning; Statistics; Algorithm; 1st order logic; Text mining.
Notable courses:
Linear algebra; Mathmatical analysis; Java programming; C programming; Probability theory; HTML/CSS/PHP; MySQL
Notable courses:
French; Statistics; Visual basic programming; Accounting.
Publications
-
Alignement de termes de longueur variable en corpus comparables spécialisés
TALN2018
-
Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms
Coling2018
-
Continuous phrase representation learning with wrapped context prediction
In preparation
-
A unified and unsupervised framework for bilingual phrase alignment on specialized comparable corpora
ECAI2020
-
From unified phrase representation to bilingual phrase alignment in an unsupervised manner
In preparation
Skills & Tools
Backend
-
Java8
-
Python3
-
C
-
Perl
-
PHP
Machine Learning Framework
-
Deeplearning4j
-
Pytorch
-
Keras
-
R
-
Tensorflow
-
Torch
Others
- HTML
- CSS
- MySQL
- LaTeX
- HDFS Hadoop
- Git
- Unit Test
- Agile
- Gradle
- Lua
- Neo4j
- ElasticSearch
Other Projects
-
Poem botBuilt and trained in Java a peom bot which can generate the next second part of a couplet given the first one. [code]
-
Hackthon CafData 2015Built from scratch in 48h a waiting time prediction system in Python based on the data given by la Caf in a hackthon competition.
-
Gounki gameImplemented a Gounki game in C.
-
Sheep and wolf evolution gameImplemented an evolution game in Java with a minimal UI.
-
Recipe websiteBuilt a recipe website with Mysql and PHP which was hosted in the campus network of Paris Dederot University. A student can register to find others who can teach him the recipes he wants to learn.
Volunteer Experience
-
Custom layer for Deeplearning4jImplemented a custom layer for Deeplearning4j (before alpha version) and the pull request was merged into the main project.
-
Machine Learning MeetupTalk on Nantes machine learning meetup 2019.
-
Liaison managerResponsible for the communication between the team of Groupe Edmond de Rothschild and the host city for Extream Sailing Series 2011 in Qingdao.
-
InterpreterInterprater for Tianhui (SARL) at China Import and Export Fair in Guangzhou, 2010.
Language
- Chinese (Native)
- English (Professional)
- French (Professional)
Interests
- Badminton, Basketball, Running
- Language, History
- Board & Video game