Tutorial Program

23nd International Conference on Information and Knowledge Management (CIKM)
Shanghai, China. Nov 3-7, 2014

We are happy to announce that the following tutorials will be given at ACM CIKM 2014. 

Schedule Tutorial Title Tutorial Presenters
Tutorial 1
Nov 3, 2014
Crowdsourcing in Information and Knowledge Management
Lei Chen
Dongwon Lee
Meihui Zhang
Tutorial 2
Nov 3, 2014
Learning Non-IID Big Data Longbing Cao
Tutorial 3
Nov 3, 2014
Data Analytics in Healthcare: Problems, Challenges and Future Directions Fei Wang
Xiang Wang
Tutorial 4
Nov 7. 2014
E-commerce Personalization at Scale Elizabeth F. Churchill
Atish Das Sarma
Ranjan Sinha
Tutorial 5
Nov 7, 2014
Learning to Hash with its Application to Big Data Retrieval and Mining Wu-Jun Li
Tutorial 6
Nov 7, 2014
Deep Learning for Natural Language Processing: Theory and Practice Xiaodong He
Jianfeng Gao
Li Deng
Tutorial 1: Crowdsourcing in Information and Knowledge Management
Lei Chen (HKUST, China)
Dongwon Lee (Penn State University, USA)
Meihui Zhang (National University of Singapore, Singapore)
URL: https://sites.google.com/site/cikmtutorial/
Longbing Cao (University of Technology Sydney, Australia)
URL: http://www-staff.it.uts.edu.au/~lbcao/publication/noniidness-learning-online.pdf
Fei Wang (IBM T. J. Watson Research Center, USA)
Xiang Wang (IBM T. J. Watson Research Center, USA)
Elizabeth F. Churchill (eBay Research Labs, USA)
Atish Das Sarma (eBay Research Labs, USA)
Ranjan Sinha (eBay Research Labs, USA)
URL: http://labs.ebay.com/cikm2014-tutorial/index.shtml

Tutorial 1: Crowdsourcing in Information and Knowledge Management

Abstract: As a novel computation paradigm, crowdsourcing is being actively pursued in diverse academic disciplines. Within computer science, many sub-fields have embraced the concept of crowdsourcing with open arms and applied the concept to solve many challenging problems. Communities relevant to the CIKM conference such as Database, Information Retrieval, and Data Mining are no exception to this phenomenon and there have been many exciting new results using crowdsourcing appearing in recent literature. This tutorial, after gentle introduction on the history and concept of crowdsourcing, provides the overall landscape of crowdsourcing research in CIKM communities, with the particular focus on some of latest crowdsourcing research in Database field.

Presenter 1: Lei Chen, Hong Kong University of Science and Technology, China (leichen@cse.ust.hk)

Lei Chen received the BS degree in computer science and engineering from Tianjin University, Tianjin, China, in 1994, the MA degree from Asian Institute of Technology, Bangkok, Thailand, in 1997, and the PhD degree in computer science from the University of Waterloo, Canada, in 2005. He is currently an associate professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. His research interests include crowdsourcing over social media, social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. So far, he published over 200 conference and journal papers. He got the best paper awards in DASFAA 2009 and 2010. He is PC Track chairs for SIGMOD 2014, VLDB 2014, ICDE 2012, CIKM 2012, SIGMM 2011. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM, and WWW. Currently, he serves as an associate editor for IEEE Transaction on Data and Knowledge Engineering and Distribute and Parallel Databases. He is a member of the ACM and the chairman of ACM SIGMOD China Chapter.

Presenter 2: Dongwon Lee, Penn State University, USA (dongwon@psu.edu)

Dongwon Lee is currently an associate professor in the College of Information Sciences and Technology (a.k.a. iSchool) of the Pennsylvania State University, USA. He obtained his Ph.D. in Computer Science from UCLA in 2002. From 1995 to 1997, he has also worked as a programmer at AT&T Bell Labs.  Working mostly on the issues arising in the management and mining of diverse forms of data (e.g., relational records, documents, XML, and social media), he has (co-)authored over 130+ scholarly articles in selective publication outlets in Databases and Data Mining. He has also served as a PC member for major venues such as SIGMOD, VLDB, ICDE, CIKM, KDD, SDM, WWW, IJCAI, WSDM, and JCDL. Further details of his research can be found at: http://pike.psu.edu/

Presenter 3: Meihui Zhang, National University of Singapore, Singapore (mhzhang@comp.nus.edu.sg)

Meihui Zhang is currently a Research Fellow at National University of Singapore and will join Singapore University of Technology and Design as an Assistant Professor in August. She did her Ph.D. in Computer Science at National University of Singapore and obtained her B.E. in Computer Science from Harbin Institute of Technology, China. Her research interests mainly focus on database issues. Her Ph.D. thesis was on database exploration and schema extraction. She also works on crowdsourcing, massive data integration and spatio-temporal databases.


Tutorial 2: Learning Non-IID Big Data

Abstract: Learning from big data is increasingly becoming a major challenge and opportunity for big business and innovative learning theories and tools. Two of the most critical challenges of learning from big data are the uncovering of the explicit and implicit coupling relationships embedded in mixed heterogeneous data from multiple sources. The coupling and heterogeneity of the non-IIDness aspects form the essence of big data and most real-world applications, namely the data is not independent and identically distributed (IIDness). However, most classic theoretical systems and tools in statistics, data mining, database, knowledge management and machine learning assume the independence and identical distribution of underlying objects, attributes and values. For this, non-IIDness learning in big data emerges as a critical theoretical problem and is recently focused, which considers the couplings and heterogeneity between entities, properties, interactions and contexts. In this tutorial, we present a comprehensive overview of the non-IIDness learning, and introduce general frameworks and algorithms for non-IID classification, clustering, ensemble clustering, text mining, and recommender systems, and their challenges and prospects.

Presenter: Longbing Cao, University of Technology Sydney, Australia (longbing.cao@uts.edu.au)

Longbing Cao is a professor of information technology at the University of Technology Sydney (UTS), Australia. He is the Founding Director of the Advanced Analytics Institute at UTS, which dedicates to data science and big data analytics research, education and development. He is also the Research Leader of the Data Mining Program at the Australian Capital Markets Cooperative Research Centre. He is a Senior Member of IEEE, SMC Society and Computer Society. He serves as associate editor and guest editor on many journals, as conference co-chair of KDD2015, PAKDD13 and ADMA13, and program co-chair or vice-chair of PAKDD11, ICDM10, IAT11, ADMA10 etc., and program committee member on around 100 conferences including KDD, ICDM and AAMAS. His primary research interests include data science and data mining, machine learning, behavior informatics, and open complex intelligent systems.

In the area of data science, he started the relevant work in 2005, and created the first data science lab in Australia in 2007 which was probably one of the very first labs globally. He founded the Advanced Analytics Institute, a university research institute at the University of Technology Sydney. AAI is now widely recognized as the first Australian research group designed and dedicated to data science and big data analytics, and was the only group specially mentioned in the first Australian government whitepaper on big data. AAI and he were invited to talk about big data at the Australian Broadcasting Cooperation (ABC) TV. Since 2002, he has led teams working with many federal, state, private and international partner organisations, including Microsoft, SAS, Teradata, Australian Taxation Office, Department of Human Services, Department of Financial Services, Insurance Australian Group, and Westpac. He and his team’s work on big data analytics has led to estimated millions of dollars savings for the relevant organisation every year.

Tutorial 3: Data Analytics in Healthcare: Problems, Challenges and Future Directions

Abstract: Health informatics refers to the process of leveraging information technologies to improve the quality of healthcare delivery. In recent years, the application of data analytics technologies into healthcare has aroused considerable interests in both data analysis and medical communities. In this tutorial, we will introduce the healthcare data, popular analytic problems, point out challenges and future research directions.

Presenter 1: Fei Wang, IBM T. J. Watson Research Center, USA (feiwang03@gmail.com)

Dr. Fei Wang is now a Research Staff Member in healthcare analytics research group, IBM T. J. Watson Research Center. He got his M.S. and Ph. D. degrees from Department of Automation, Tsinghua University in 2008. After that, he spent one year in School of Computing and Information Science, Florida International University as a posdoc and another year in Department of Statistical Science, Cornell University as a postdoc. His research interests include data mining, machine learning and their applications in healthcare data analytics. He has published over 100 papers on the leading conferences like SIGKDD, SIGIR, ICML, IJCAI, AAAI, SDM, ICDM. He also serves as a referee for many distinguished journals including IEEE TPAMI, IEEE TKDE, DMKD, ACM TKDD and program committee member for many international conferences including KDD, ICDM and SDM. For more details, one can refer to his personal homepage athttps://sites.google.com/site/feiwang03/.

Presenter 2: Xiang Wang, IBM T. J. Watson Research Center, USA (wangxi@us.ibm.com)

Dr. Xiang Wang is a postdoctoral researcher at IBM T. J. Watson Research Center, Healthcare Analytics Research Group. He is broadly interested in data mining, machine learning, and their applications to real-world problems, in particular healthcare informatics. He received PhD in Computer Science (2013) from University of California at Davis, advised by Professor Ian Davidson. Before that, He received Master of Software Engineering (2008) and BS in Mathematics (2004), both from Tsinghua University. He has published around 20 papers on top venues in data mining such as KDD, ICDM, SDM and he won the best research paper runner-up award in SDM 2013.

Tutorial 4: E-commerce Personalization at Scale

Abstract: In this tutorial, we will present an in-depth discussion on personalization in the realm of e-commerce platforms with a focus on the scale of data and the consequent challenges. We will describe use cases in large online marketplaces like eBay. The discussion will also highlight why dimensions like location, device, and social contexts matter, in addition to traditional user preferences and historical profiles. Some of the challenges in effective personalization are due to the scale (variety, volume, and velocity) of data. Specifically an effective personalization approach would need to combine signals from multiple data sources (variety), deal with billions of events (volume), and negotiate real-time stream processing (velocity). We will highlight the use of some of the latest techniques - machine learning, data mining, cloud/Hadoop computing, and describe approaches that can benefit from both real time as well as offline processes while respecting industry-level latency considerations. The talk will also present recent research work from the web community to highlight the gap and opportunities between dealing with these scales at real time with real data.

Presenter 1: Elizabeth F. Churchill, eBay Research Labs, USA (echurchill@ebay.com)

Elizabeth Churchill joined eBay in 2012 as a Director leading HCI (Human Computer Interaction) research. A psychologist by training, Elizabeth has a PhD in Cognitive Science from the University of Cambridge in the United Kingdom. Prior to joining eBay, Elizabeth led HCI research at Yahoo labs, PARC (the Palo Alto Research Center) and FX Palo Laboratory, Fuji Xerox's lab in Palo Alto. 

Elizabeth is an active member of the Human Computer Interaction research community. She is the current Executive Vice President of the ACM's Special Interest Group on Computer Human Interaction (SigCHI). In her 20 year career, she has published at all the top tier HCI conferences, in numerous HCI related journals, has co-edited 5 HCI related texts, and is a regular columnist for the ACM's interactions magazine, a publication dedicated to user-centered design and the study of human-computer interaction. In 2010, she was recognized as a Distinguished Scientist by the Association for Computing Machinery (ACM). She is a Distinguished Visiting Scholar at Stanford University's Media X, the industry affiliate program to Stanford's H-STAR Institute.

Presenter 2: Atish Das Sarma, eBay Research Labs, USA (adassarma@ebay.com)

Atish Das Sarma is currently a Staff Research Scientist at eBay Research Labs. Prior to his current position, Atish was a Research Scientist at Google Research. Atish received his Ph.D. degree from Georgia Tech. and undergraduate from IIT-Bombay, all in computer science. Atish has published over 40 research papers and filed around 10 patents. Atish received the Best Paper Award at PODS-2008, a Facebook award in 2009 for work done that was named a top-50 fbFund Finalist for most promising upcoming start-up ideas, and a Google awards for excellent paper in structured data and best paper in information retrieval. 

Atish's work has been featured in Techcrunch, Huffington Post, The Verge, BBC, New Scientist, NPR, MIT Technology Review, Time, and several other outlets. Atish serves on numerous international program committees (such as KDD, WWW, WSDM, VLDB, SIGMOD, CIKM, ICWSM, and others) and has given several invited talks and tutorials.

Presenter 3: Ranjan Sinha, eBay Research Labs, USA (rsinha@ebay.com)

Ranjan Sinha is currently the lead scientist for Personalization at eBay Inc. He has over a decade of experience leading innovative projects in industry and academia. He has led several site-impacting projects that significantly enhanced shopping experience in eBay and led to multiple incremental business lifts. He received his award-winning Ph.D. degree in computer science from RMIT University under the supervision of Prof. Justin Zobel. He has over 25 research publications, including in top tier outlets such as ACM SIGMOD, VLDB journal, and Bioinformatics journal. Prior to joining eBay, Ranjan was an ARC Research Fellow and Chief Investigator at the University of Melbourne. He has taught courses, consulted for industry, reviewed papers for journals and conferences, mentored interns/colleagues in industry and supervised honors/PhD students in academia. He has given several invited talks at venues including Google and Bell Labs. 

Ranjan was awarded the Sort Benchmark medals for both JouleSort and PennySort in 2009. He was amongst the Top 12 Asia-Pacific Young Inventors and appeared on a center-spread article on Cutting-Edge Crusaders in the Wall Street Journal. He is currently a co-organizer of the popular Bay Area Search Meetup that has over 1,300 members.

Tutorial 5: Learning to Hash with its Application to Big Data Retrieval and Mining

Abstract: There has been increasing interest in nearest neighbor (NN) search in massive (large-scale) data sets in this big data era. In many real applications, it's not necessary for an algorithm to return the exact nearest neighbors for every possible query. Hence, in recent years approximate nearest neighbor (ANN) search algorithms with improved speed and memory saving have received more and more attention from researchers. Due to its low storage cost and fast query speed, hashing has been widely adopted for ANN search in large-scale datasets. The essential idea of hashing is to map the data points from the original feature space into binary codes in the hashcode space with similarities between pairs of data points preserved. Recently, most hashing methods adopt machine learning models to learn effective hashing functions from data, which results in a new research topic called Learning to Hash. Because NN search plays a fundamental role in many information retrieval and data mining models, learning to hash has recently become a very hot research topic with wide applications in many big data areas. This tutorial will provide a systematic introduction of learning to hash, including the motivation, models, learning algorithms, and applications in big data retrieval and mining.

Presenter 1: Wu-Jun Li, Nanjing University, China (liwujun@nju.edu.cn)

Dr. Wu-Jun Li is currently an associate professor of the Department of Computer Science and Technology at Nanjing University, P. R. China. From 2010 to 2013, he was a faculty member of the Department of Computer Science and Engineering at Shanghai Jiao Tong University, P. R. China. He received his PhD degree from the Department of Computer Science and Engineering at Hong Kong University of Science and Technology in 2010. Before that, he received his M.Eng. degree and B.Sc. degree from the Department of Computer Science and Technology, Nanjing University in 2006 and 2003, respectively. His main research interests include machine learning and pattern recognition, especially in statistical relational learning and big data machine learning (big learning). In these areas he has published over 30 peer-reviewed papers including leading journals such as TKDE and top conferences such as AAAI, AISTATS, CVPR, ICML, IJCAI, NIPS, SIGIR. He has served as the PC member of ICML'14, IJCAI'13/'11, NIPS'14, SDM'14, UAI'14, etc.

Abstract: Deep learning techniques have enjoyed tremendous success in the speech and language processing community in recent years, establishing new state-of-the-art performance in speech recognition, language modeling, and some natural language processing tasks. The focus of this tutorial is on deep learning approaches to problems in language or text processing, with particular emphasis on important applications including information retrieval, language understanding, knowledge representation, question answering, machine translation, and semantic modeling. 

In this tutorial, we first survey the latest deep learning technology, presenting both theoretical and practical perspectives that are most relevant to our topic. We plan to cover common methods of deep neural networks and more advanced methods of recurrent, recursive, stacking, and convolutional networks. Next, we review general problems and tasks in text/language processing, and underline the distinct properties that differentiate language processing from other tasks such as speech and image object recognition. More importantly, we highlight the general issues of language processing, and elaborate on how new deep learning technologies are proposed and fundamentally address these issues. We then place particular emphasis on several important applications: 1) information retrieval from text 2) language understanding/semantic parsing, 3) question answering, 4) machine translation. For each of the tasks we discuss what particular architectures of deep learning models are suitable given the nature of the task, and how learning can be performed efficiently and effectively using end-to-end optimization strategies. Beyond providing a systematic tutorial of the general theory, we also present hands-on experience in building state-of-the-art IR/LU/QA/MT systems. In the tutorial, we will share our practice with concrete examples drawn from our first-hand experience in major research benchmarks and some industrial scale applications which we have been working on extensively in recent years.

Presenter 1: Xiaodong He, Microsoft Research, USA (xiaohe@microsoft.com)

Xiaodong He is a Researcher of Microsoft Research, Redmond, WA, USA. He is also an Affiliate Professor in the Department of Electrical Engineering at the University of Washington, Seattle, WA, USA. His research interests include deep learning, information retrieval, natural language understanding, machine translation, and speech recognition. Dr. He has published a book and more than 60 technical papers in these areas, and has given tutorials at several international conferences in these fields. In benchmark evaluations, he and his colleagues have developed entries that obtained No. 1 place in the 2008 NIST Machine Translation Evaluation (NIST MT) and the 2011 International Workshop on Spoken Language Translation Evaluation (IWSLT), both in Chinese-English translation, respectively. He serves as Associate Editor of IEEE Signal Processing Magazine and IEEE Signal Processing Letters, and as Guest Editor of several IEEE journals. He served in the organizing committee of ICASSP2013 as the Chair of Special Sessions, and general Co-Chair of the Workshop on Speech and Language at NIPS 2008. He is a senior member of IEEE and a member of ACL.

Presenter 2: Jianfeng Gao, Microsoft Research, USA (jfgao@microsoft.com)

Jianfeng Gao is a Principal Researcher of Microsoft Research, Redmond, WA, USA. His research interests include Web search and mining, information retrieval, natural language processing and statistical machine learning. Dr. Gao has published more than 100 technical papers in these areas. He gave a tutorial on statistical translations and web search at 2011 ACL/SIGIR Summer School. In benchmark evaluations, he and his colleagues have developed entries that obtained No. 1 place in the 2008 NIST Machine Translation Evaluation (NIST MT) in Chinese-English translation. He was Associate Editor of ACM Trans on Asian Language Information Processing, (2007 to 2010), and was Member of the editorial board of Computational Linguistics (2006 – 2008). He also served as tutorial co-chair for CIKM2013, and area chairs for ACL2012, EMNLP 2010, ACL-IJCNLP 2009, etc.

Presenter 3: Li Deng, Microsoft Research, USA (deng@microsoft.com)

Li Deng is a Principal Researcher of Microsoft Research, Redmond, WA, USA. In the general areas of audio/speech/language technology and science, machine learning, signal/information processing, and computer science, he has published over 300 refereed papers in leading journals and conferences and 4 books. He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the International Speech Communication Association. He served on the Board of Governors of the IEEE Signal Processing Society (2008-2010). More recently, he served as Editor-in-Chief for the IEEE Signal Processing Magazine (2009-2011), which earned the highest impact factor in 2010 and 2011 among all IEEE publications and for which he received the 2012 IEEE SPS Meritorious Service Award. He recently served as General Chair of the IEEE ICASSP-2013, and currently serves as Editor-in-Chief for the IEEE Transactions on Audio, Speech and Language Processing. In 2009, in collaboration with Geoff Hinton, he initiated deep learning research at Microsoft. Since then, his technical work on and the leadership in industry-scale deep learning with colleagues and academic collaborators have created significant impact in speech recognition and in other areas of signal/information processing including text processing.