From Big Data Coursework for Computational Medicine
Revision as of 08:28, 8 September 2017 by Pathak (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

2017 Big Data Coursework for Computational Medicine (BDC4CM)

July 10 - July 13, 2017

New York City, New York

Download 2017 BDC4CM Final Schedule

Final Workshop Slides and Materials

  • Big Data Tools and Methods for Computational Medicine (Jyotishman Pathak) - PDF
  • Standardization and Normalization of Big Data (Christopher Chute) - PDF
  • Multiple Hypothesis Testing and Dependent Data (Claudia Neuhauser) - PDF
  • Knowledge Discovery and Data Mining From Big Data (Vipin Kumar) - PDF
  • Natural Language Processing Using Big Data (Stephen Johnson) - PDF
  • AI and the Future of our society (Claudia Neuhauser) - PDF
  • Big Data Visualization (David Pieczkiewicz) - PDF
  • Ethical and Legal Issues in Handling Big Data (Susan Wolf) - PDF
  • Datathon (Fei Wang) - ZIP folder

Request for Application for NIH funded BD2K Fellowship

Precision medicine’s promise to deliver the right treatment at the right time relies on our ability to extract information from high-dimensional data sets that combine traditional clinical data in electronic health records with data generated by high- throughput technologies. To meet this challenge, new approaches for data representation, integration, analysis, visualization and sharing need to be developed collaboratively by quantitative scientists, biomedical researchers, clinicians, and bioethicists.

We are seeking fellowship applications for a joint Weill Cornell Medicine, Johns Hopkins Medicine, and University of Minnesota week-long Big Data Coursework for Computational Medicine (BDC4CM) funded by the U.S. National Institutes of Health (NIH). BDC4CM will emphasize how to navigate the interface between research and practice by offering participants in-depth lectures, case studies and hands-on training from leading researchers in academia and industry.

Back To Top


  • Data and knowledge representation standards
  • Information extraction and natural language processing
  • Visualization analytics
  • Data mining and predictive modeling
  • Privacy and ethics
  • Applications in comparative effectiveness research and population health research and improvement


  • Receive tailored, in-depth instruction, hands-on laboratory modules, and case studies
  • Survey the most relevant research domains for big data in healthcare
  • Solve a real-world health data driven problem in a Datathon New-logo.jpg
  • Interact with distinguished scholars and world-renowned experts from academia and industry

Back To Top


Final schedule is here! Download PDF

Back To Top

Required Readings

Data Visualization

  • Chapter 3 (“Data visualization”) from Grolemund G, Wickham H. (2017) R for Data Science. O’Reilly. URL
  • West VL, Borland D, Hammond WE. (2015) “Innovative information visualization of electronic health record data: a systematic review.” Journal of the American Medical Informatics Association 22(2): 330-339. PMID: 25336597. URL
  • Rougier NP, Droettboom M, Bourne PE. (2014) “Ten simple rules for better figures.” PLoS Computational Biology 10(9): e1003833. PMID: 25210732. URL
  • Slides, Datasets and other materials

Ethical and Legal Issues in Handling Big Data

  • IS Kohane. Ten things we have to do to achieve precision medicine. Science 2015;349:37-38. URL
  • Precision Medicine Initiative (PMI) Working Group. The Precision Medicine Initiative Cohort Program: building a research foundation for 21st century medicine. Sept 2015. (Read pages -5, 43-45, 46-73, 78-87) URL
  • JE McEwen et al. Evolving approaches to the ethical management of genomic data. Trends Genet 2013;29(6):375-82. URL
  • SM Wolf et al. Managing incidental findings and research results in genomic research involving biobanks and archived data sets. Genet Med 2012;14(4):361-84. URL
  • BJ Evans. Barbarians at the gate: consumer-driven health data commons and the transformation of citizen science. Am J Law Med 2016;42(4):forthcoming. URL


Back To Top

Final Workshop Slides and Materials

  • Big Data Tools and Methods for Computational Medicine (Jyotishman Pathak) - PDF
  • Standardization and Normalization of Big Data (Christopher Chute) - PDF
  • Multiple Hypothesis Testing and Dependent Data (Claudia Neuhauser) - PDF
  • Knowledge Discovery and Data Mining From Big Data (Vipin Kumar) - PDF
  • Natural Language Processing Using Big Data (Stephen Johnson) - PDF
  • AI and the Future of our society (Claudia Neuhauser) - PDF
  • Big Data Visualization (David Pieczkiewicz) - PDF
  • Ethical and Legal Issues in Handling Big Data (Susan Wolf) - PDF
  • Datathon (Fei Wang) - ZIP folder

Back To Top


Christopher G. Chute, M.D., Dr.P.H.

Dr. Chute

Dr. Christopher Chute is the Bloomberg Distinguished Professor of Health Informatics; Professor of Medicine, School of Medicine; Professor, Health Policy & Management; Professor of Health Informatics, Bloomberg School of Public Health and School of Nursing, Johns Hopkins University, and Chief Health Informatics Research Officer at the Johns Hopkins University Health System

Previously, Dr. Chute was the founding PI of the eMERGE project at Mayo Clinic, which has pioneered techniques for high-throughput phenotyping from the EMR. He was also PI of the SHARP (Strategic Health IT Advanced Research Projects) on Secondary Data Use, and Co-PI on the SE MN Beacon Community for HIT (Health Information Technology) standards-based data exchange, both awards from HHS/ONC. Additionally, Dr. Chute is involved in national and international efforts to define clinical phenotypes and their associated HIT (Health Information Technology) standards. These roles include most pertinently Chair of the ICD (International Classification of Disease) Revision process at WHO for ICD-11 with an emphasis on scientific consensus of clinical phenotype. Related efforts include Chair of the ISO Technical Committee on Health Informatics (TC215), service on, the HL7 Advisory Council and an initial member of the US HIT Standards Committee at HHS/ONC appointed by Secretary Sebelius.

Dr. Chute received his DrPH from Harvard University in 1990, an MPH from Harvard University in 1982, and his M.D. from Brown University in 1982

Scientifically, Dr. Chute is PI of the Pharmacogenomics Research Network phenotyping vocabulary resource, and is co-PI on the NIH National Center for Biomedical Computing effort within the National Center for Biomedical Ontology. In 2002 he received the President’s Award, American Medical Informatics Association. He is the President Elect, American College of Medical Informatics (ACMI), since 2015.

Stephen Johnson, Ph.D.

Dr. Johnson

Dr. Stephen B. Johnson is currently Professor of Healthcare Policy and Research and Director of Graduate Programs with the Division of Health Informatics, Department of Healthcare Policy and Research, Weill Cornell Medical College. He is also Director of Informatics Core at the Weill Cornell Medical College’s Clinical and Translational Science Center. Previously, he was Professor of Biomedical Informatics at Columbia University College of Physicians and Surgeons.

Dr. Johnson received his Ph.D from the New York University in 1987, and his B.A. degree from McGill University (Canada) in 1982.

His research explores the use of information systems to promote communication and collaboration in patient care and biomedical research. The goal is to develop models that increase our understanding of interactions between information systems and biomedical organizational structures, patterns of work flow, and the specialized languages that professionals employ. This work is a fusion between technical and social disciplines, drawing from computer science, information technology, cognition, linguistics, behavioral and organizational science. Methods include natural language processing, linguistic analysis, content analysis, data modeling, work flow analysis and social network analysis. Applications include electronic health records, research databases, clinical research systems and systems to promote scientific collaboration.

Dr. Johnson has been the recipient of many grant awards, including PCORI NYC-CDRN, Phase II, awarded by Patient-Centered Outcomes Research Institute, RENYC; Rare Epilepsies in New York City: Epidemiology and Health Outcomes, awarded by Centers for Disease Control & Prevention and, most recently, the New York City Clinical Data Research Network Key Personnel and the Clinical and Translational Science Center (UL1) - Cooperative Agreement, awarded by National Center for Advancing Translational Sciences, and An Information Fusion Approach to Longitudinal Health Records, awarded by United States Department of Health & Human Services. Previously, he worked on the projects An Information Fusion Approach to Longitudinal Health Records, awarded by United States Department of Health & Human Services, and A Model Integrated Data Management System for Autism Research, awarded by the National Institute of Mental Health.

Vipin Kumar, Ph.D.

Dr. Kumar

Dr. Vipin Kumar is a Regents Professor at the University of Minnesota, where he holds the William Norris Endowed Chair in the Department of Computer Science and Engineering. He received the B.E. degree in Electronics & Communication Engineering from Indian Institute of Technology Roorkee (formerly, University of Roorkee), India, in 1977, the M.E. degree in Electronics Engineering from Philips International Institute, Eindhoven, Netherlands, in 1979, and the Ph.D. degree in Computer Science from University of Maryland, College Park, in 1982.

Dr. Kumar's current research interests include data mining, high-performance computing, and their applications in Climate/Ecosystems and Biomedical domains. He is the Lead PI of a 5-year, $10 Million project, "Understanding Climate Change - A Data Driven Approach", funded by the NSF's Expeditions in Computing program that is aimed at pushing the boundaries of computer science research. He also served as the Head of the Computer Science and Engineering Department from 2005 to 2015 and the Director of Army High Performance Computing Research Center (AHPCRC) from 1998 to 2005. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES) and graph partitioning (METIS, ParMetis, hMetis). He has authored over 300 research articles, and has coedited or coauthored 10 books including two text books “Introduction to Parallel Computing and “Introduction to Data Mining, that are used world-wide and have been translated into many languages.

Dr. Kumar has served as chair/co-chair for many international conferences and workshops in the area of data mining and parallel computing, including 2015 IEEE International Conference on Big Data, IEEE International Conference on Data Mining (2002), and International Parallel and Distributed Processing Symposium (2001). He co-founded SIAM International Conference on Data Mining and served as a founding co-editor-in-chief of Journal of Statistical Analysis and Data Mining (an official journal of the American Statistical Association). He received the Distinguished Alumnus Award from the Indian Institute of Technology (IIT) Roorkee (2013), the Distinguished Alumnus Award from the Computer Science Department, University of Maryland College Park (2009), and IEEE Computer Society's Technical Achievement Award (2005). His foundational research in data mining and its applications to scientific data was honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD).

Claudia Neuhauser, Ph.D.

Dr. Neuhauser

Claudia Neuhauser, PhD, is the director of Research Computing in the Office of the Vice President for Research, overseeing the University of Minnesota Informatics Institute (UMII) and the Minnesota Supercomputing Institute (MSI). UMII fosters and accelerates data-intensive research across all disciplines in the University and develops partnership with industry. MSI provides high-performance computing resources to the University.

Dr. Neuhauser’s research is at the interface of mathematics and biology, and focuses on the analysis of ecological and evolutionary models and the development of statistical methods in biomedical applications. She has been the Director of Graduate Studies of the Biomedical Informatics and Computational Biology graduate program since 2008. Between 2008 and 2013, she served as Vice Chancellor for Academic Affairs at the University of Minnesota Rochester (UMR). Prior to moving to UMR, she was Professor and Head in the department of Ecology, Evolution and Behavior at the University of Minnesota Twin Cities. She is a Distinguished McKnight University Professor, Howard Hughes Medical Institute Professor, and Morse-Alumni Distinguished Teaching Professor. She held faculty positions at the University of Southern California, the University of Wisconsin Madison, and the University of California Davis. She received her Diplom in mathematics from the Universität Heidelberg (Germany) in 1988, and a Ph.D. in mathematics from Cornell University in 1990. She is a fellow of the American Association for the Advancement of Science (AAAS) and a fellow of the American Mathematical Society (AMS).

Jyotishman Pathak, Ph.D.

Dr. Pathak

Dr. Pathak is the Frances and John L. Loeb Professor of Medical Informatics and Chief of the Division of Health Informatics, Department of Healthcare Policy & Research, at Weill Cornell Medicine. Prior to joining Weill Cornell in October 2015, he was the Professor of Biomedical Informatics at Mayo Clinic in Rochester, Minnesota where his research focused on biomedical knowledge representation and semantic information integration. He has been a key contributor in two major NIH/HHS funded initiatives—the Electronic Medical Records and Genomics (eMERGE) and Strategic Health IT Research Project (SHARP) projects— which have pioneered techniques for high-throughput phenotyping from electronic health records.

Dr. Pathak’s research interest and expertise lies in developing and applying informatics methods for data mining and phenotype extraction from electronic health records (EHRs), and their applications in pharmacogenomics, comparative effectiveness research, and population health research, particularly focusing on major depressive disorders.

David S. Pieczkiewicz, Ph.D.

Dr. Pieczkiewicz

David S. Pieczkiewicz, PhD is Director of Graduate Studies for the Health Informatics Graduate Program, and Clinical Assistant Professor in the Institute for Health Informatics. In addition to a PhD in health informatics (University of Minnesota), he earned a BA with honors in anthropology (Case Western Reserve University) and an MA in biological anthropology (University of Kansas), specializing in medical anthropology and the simulation of infectious disease epidemics. His work in simulation eventually led him to the problem of visualizing the results of his research, and from there, into health informatics. After earning his doctorate, Dr. Pieczkiewicz worked for three years at the Marshfield Clinic Research Foundation in Wisconsin, where he was the first postdoctoral fellow at the Foundation's Biomedical Informatics Research Center, and a National Library of Medicine Postdoctoral Fellow with the University of Wisconsin, Madison. In addition to working on projects in data visualization, he expanded his research into the usability of data visualizations and other health information technology.

In Fall 2010, Dr. Pieczkiewicz returned to the University of Minnesota as a faculty member in the new Institute for Health Informatics. 2010 also marked a milestone for the Institute, when it received over five million dollars as part of the American Recovery and Reinvestment Act of 2009 to train students in informatics and health information technology. During his first two years, he spearheaded a comprehensive renewal of the Health Informatics Graduate Program's entire curriculum, originating courses in databases, clinical informatics, and software engineering, redesigning the existing survey courses in informatics, and bringing all courses into the online realm. Since then, he has also created and taught courses in programming, analytics and data science, and data visualization. His courses consistently receive very high ratings for the quality of their instruction and materials.

Dr. Pieczkiewicz has served as Director of Graduate Studies since 2013, and has advised and mentored dozens of masters-level and PhD students, as well as coached and co-authored numerous student papers and conference presentations. In April 2015, he was the recipient of the 2014-15 Outstanding Advising and Mentoring Award from the Graduate and Professional Student Association and the University Provost.

Currently, Dr. Pieczkiewicz has academic and research interests in data visualization and information design, human-computer interaction and usability, analytics and data science, and epidemiology. His particular interests are in data visualization and its impact on decision making among clinicians, patients, and researchers. He has collaborated frequently with faculty in the School of Nursing, most prominently incorporating data visualization and usability studies with research on the Omaha System, a major standardized terminology in nursing. In 2013, he was a recipient of the Early Career Informatics Methodologist Award, presented at the First International Conference on Research Methods for Standardized Terminologies. Most recently, he has contributed a chapter on exploratory data analysis to a recent textbook on health care data analytics.

While his work has moved him away from his original field of study, Dr. Pieczkiewicz remains an anthropologist at heart. He takes a holistic approach to informatics, and firmly believes that humans are the most important part of health information technology.

Fei Wang, Ph.D.

Dr. Wang

Fei Wang is an Assistant Professor in Division of Health Informatics, Department of Healthcare Policy and Research, Weill Cornell Medicine. His major research interest is data analytics and its applications in health informatics. His papers have received over 4,200 citations so far with an H-index 35. He won best research paper runner-up at ICDM 2016, best short paper at ICHI 2016, best student paper at ICDM 2015, best research paper nomination at ICDM 2010, Marco Romani Best paper nomination in AMIA TBI 2014, and his paper was selected into the best paper finalist in SDM 2011 and 2015. Dr. Wang is also the winner of the MJFF Parkinson's Progression Markers Initiative data challenge on subtyping Parkinson's disease. Dr. Wang is an action editor of the journal Data Mining and Knowledge Discovery, an associate editor of Journal of Health Informatics Research, Smart Health, and an editorial board member of Pattern Recognition and International Journal of Big Data and Analytics in Healthcare. Dr. Wang is the vice chair of the KDD working group in AMIA.

Susan M. Wolf, J.D.

Dr. Wolf

Professor Susan M. Wolf is the McKnight Presidential Professor of Law, Medicine & Public Policy; Faegre Baker Daniels Professor of Law; and Professor of Medicine at the University of Minnesota. She is Chair of the University’s Consortium on Law and Values in Health, Environment & the Life Sciences (

Professor Wolf is an elected member of the National Academy of Medicine, a Fellow of the American Association for the Advancement of Science (AAAS), a member of the American Law Institute (ALI), and a Fellow of The Hastings Center. She has received numerous grants to fund her research from the National Institutes of Health (NIH), the National Science Foundation (NSF), the Robert Woods Johnson Foundation, and others. She has published widely, including in Science, JAMA, New England Journal of Medicine, Genetics in Medicine, and the Journal of Law, Medicine & Ethics. She teaches in the areas of health law, law and science, and bioethics.

Professor Wolf received an A.B. degree summa cum laude from Princeton University and a J.D. degree from Yale Law School, with graduate work at Harvard University. She clerked in the U.S. District Court for the Southern District of New York and then practiced with the law firm of Paul, Weiss, Rifkind, Wharton & Garrison from 1981 to 1984. In 1984-85, Professor Wolf was a National Endowment for the Humanities (NEH) Fellow at The Hastings Center in New York, a senior bioethics research institute. She then became the Center’s Associate for Law. She also taught law and medicine at New York University School of Law as an adjunct associate professor from 1987 to 1992. In 1992-93, she was a Fellow at Harvard University in the Program in Ethics and the Professions, before joining the University of Minnesota faculty.

In 2011, she was appointed by the Secretary of Health & Human Services to the National Science Advisory Board for Biosecurity (NSABB). She is a past chair of the AALS Section on Law, Medicine and Health Care and a past board member of the American Society of Bioethics & Humanities (ASBH). She has served on committees for the Institute of Medicine (IOM), AMA, and others. Professor Wolf has served on the Editorial Board for the Journal of Law, Medicine & Ethics; American Journal of Bioethics (AJOB); Journal of Urban Health; and Journal of Women’s Health and Law; and as Faculty Advisor to the Minnesota Journal of Law, Science & Technology. She has lectured widely, in the United States and abroad.

Back To Top

Important Dates

  • Application Deadline
    • March 31, 2017
  • Selection Notification
    • April 14, 2017
  • Deadline for Acceptance Reply
    • April 30, 2017
  • Course
    • July 10 - 13, 2017

Back To Top


  • Faculty, scientists, post-doctoral fellows and researchers with a PhD, MD, or equivalent in computer science, biomedical informatics, bioinformatics, statistics, health information technology or a related degree
  • Graduate students in good standing and currently enrolled in a PhD, MD, or equivalent program in computer science, biomedical informatics, bioinformatics, statistics, health information technology or a related degree
  • Must be an U.S. citizen, U.S. permanent resident, or a non - U.S. citizen with a valid temporary U.S. visa

We anticipate accepting 20 applicants with a variety of academic backgrounds and experiences.

Back To Top


This grant has been awarded by the U.S. National Institutes of Health (R25 EB201381) and will provide 20 competitively awarded fellowships including:

  1. Travel stipend
  2. Registration Fees
  3. Breakfasts, Lunches, and Break Refreshments
  • Travel arrangements to and from New York City are the sole responsibility of the individual awardee.

Applicant Requirements

All applicants must

  • Complete the BDC4CM Application, including contact information and required documents
  • Have a contact send a letter of recommendation in a separate email

Documents to be emailed by applicant

All documents may be submitted on the online application. However, should you prefer, you may choose to submit them via email to Maritza Montalvo ( with your name in the subject line.

  • If you are unable to attach your documents to the online application, please submit them via email (above). All three documents should be sent in one email as three separate PDF files.
    • 1. CV/Resume
    • 2. List of Experiences (courses, research, work)
      • Describe your level of preparation in the field of Big Data in Computational Medicine
    • 3. One-page Essay:
      • Explain your career goals, previous career preparation and why participation in this course will be beneficial to your career advancement.

Document to be emailed by contact

1. Letter of Recommendation:

The letter of recommendation should be mailed separately by the author with your name in the subject line.
Applicant Type Author of Letter
Student or postdoc Faculty Advisor
University Faculty Department Head or Dean
Industry Supervisor

Back To Top


Classes will be held at Weill Cornell Medicine, located in New York City's Upper East Side.

Welcome to NYC!

New York City, the most popular city in the U.S., is arguably the most diverse, energetic, and entertaining place to visit, meet, and stay. The city is serviced by three major airports: JFK, LaGuardia, and Newark Liberty, as well as two major railroad stations: Grand Central Station (NYC subways, NYC buses, and MetroNorth) and Penn Station (MTA subways, MYA buses, Amtrak, Greyhound, and more). Access to Weill Cornell Medicine from any of these hubs is possible through taxi, subway, rental car, and walking. Attractions include the Empire State Building, Statue of Liberty, Times Square, Central Park, Metropolitan Museum of Art, Rockefeller Center, National September 11 Memorial, the Museum of Modern Art, and more.

New York City

Back To Top


Questions? Please email Maritza Montalvo.

Past BDC4CM Sessions

Thank you for those who attended the 2015, 2016 and 2017 BDC4CM Fellowship!. The materials can be accessed from Open Educational Resources: OER Commons

Group Photo - BDC4CM 2017

BDC4CM 2017 group photo

Group Photo - BDC4CM 2016

BDC4CM 2016 group photo

Group Photo - BDC4CM 2015

Group photo
Group photo

Back To Top