About Me
I am a Professor of Data Science at Gisma University of Applied Sciences in Potsdam, Germany. My research focuses on designing effective and efficient data science systems for heterogeneous datasets, covering key areas such as applied machine learning and natural language processing. I have had the privilege of publishing my work in leading data science and mining conferences, including PVLDB, SIGMOD, and CIKM. I was also honored with the ACM SIGMOD Most Reproducible Paper Award. I earned my PhD with Summa Cum Laude from TU Berlin in 2020. Prior to that, I completed my MSc in Artificial Intelligence at the University of Tehran and my BSc in Computer Engineering at the Iran University of Science and Technology.
Honors
- Summa Cum Laude for PhD Thesis
- ACM SIGMOD Most Reproducible Paper Award 2020 π»
- Valedictorian with a GPA of 18.03/20 in MSc Class
- Valedictorian with a GPA of 17.09/20 in BSc Class
Career Timeline
Professor of Data Science at Gisma University of Applied Sciences
Feb 2021 β Present, Potsdam, Germany
- Designed and led tech programs
- Conducted and published applied research in top-tier data science conferences
- Designed and taught modern data science courses, including applied machine learning, deep learning, and natural language processing
- Served on the Academic Senate and Examination Board of the university
Data Scientist at Integration Alpha
Jul 2021 β Jul 2023, Remote Part-Time
- Built a text classification model to detect sustainability objectives in heterogeneous sustainability reports using Transformers
- Built an information extraction model to extract fine-granular details of sustainability objectives using Transformers
PhD Candidate and Research Assistant in Computer Science at TU Berlin
Feb 2017 β Feb 2021, Berlin, Germany
- Built semi-supervised data cleaning systems using a novel feature representation, active learning, and transfer learning
- Created a reproducible benchmark for state-of-the-art data-cleaning systems
Data Scientist at HomaPlus Corporation
Jan 2016 β Jan 2017, Tehran, Iran
- Built a text classification model to detect news article topics using TensorFlow
- Built a text clustering model to detect related/duplicate daily news using scikit-learn
- Built a word embedding model using the word2vec architecture
MSc in Artificial Intelligence at the University of Tehran
Sep 2013 β Sep 2015, Tehran, Iran
- Extracted content features that describe the egocentrism level of tweets
- Built popularity prediction models using the extracted content features and topic detection
BSc in Computer Engineering at Iran University of Science and Technology
Sep 2009 β Sep 2013, Tehran, Iran
Recent Research Papers and Artifacts
- Combat Greenwashing with GoalSpotter: Automatic Sustainability Objective Detection in Heterogeneous Reports π» π π
- Automatic Error Correction Using the Wikipedia Page Revision History π» π
- Semi-Supervised Data Cleaning with Raha and Baran π π πΊ
- Semi-Supervised Data Cleaning π» π π
- Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning π» π π πΊ
- Data Science fΓΌr alle: Grundlagen der Datenprogrammierung π» π
- Raha: A Configuration-Free Error Detection System π» π π πΊ
- REDS: Estimating the Performance of Error Detection Strategies based on Dirtiness Profiles π» π
- CLRL: Feature Engineering for Cross-Language Record Linkage π» π
- ED2: A Case for Active Learning in Error Detection π» π
- Towards Automated Data Cleaning Workflows π
- A Comprehensive Analysis of Tweet Content and Its Impact on Popularity π» π
- Facebook Userβs Like Behavior Can Reveal Personality π»
Recent Articles
- From Sci-Fi to Services: Chatbots in Customer Service π»
- AI and Education: A Paradigm Shift is Necessary π»
- Scholzβs Course for the Autumn - The Confident Chancellor π»
- Nine of the 50 Largest Online Shops Use Chatbots π»
- The Ever-Growing Importance of Open Source, Big Data, and AI π»
- Data Science Against Disinformation: How Artificial Intelligence Can Fact-Check Claims of Digital Election Campaigns π»
- Future Decision-Making Processes in the Data Science Era π»