The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift,
Sara Beery,
Guanhang Wu,
Trevor Edwards,
Filip Pavetic,
Bo Majewski,
Shreyasee Mukherjee,
Stanley Chan,
John Morgan,
Vivek Rathod,
Jonathan Huang.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022),
pp. 21294--21307, 2022.
Perf-net: Pose empowered rgb-flow net,
Li, Yinxiao,
Lu, Zhichao,
Xiong, Xuehan,
Huang, Jonathan.
In
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2022),
pp. 513--522, 2022.
Progressive neural architecture search,
Chenxi Liu,
Barret Zoph,
Jonathon Shlens,
Wei Hua,
Li-Jia Li,
Fei-Fei Li,
Alan Yuille,
Jonathan Huang,
Kevin Murphy.
In
European Conference on Computer Vision (ECCV 2018),
2018.
Speed/accuracy trade-offs for modern convolutional object detectors,
Jonathan Huang,
Vivek Rathod,
Chen Sun,
Menglong Zhu,
Anoop Korattikara,
Alireza Fathi,
Ian Fischer,
Zbigniew Wojna,
Yang Song,
Sergio Guadarrama,
Kevin Murphy.
In
The 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2017.
Generation and Comprehension of Unambiguous Object Descriptions,
Junhua Mao,
Jonathan Huang,
Alexander Toshev,
Oana Camburu,
Alan Yuille,
Kevin Murphy.
In The 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Las Vegas, Nevada, June, 2016.
Detecting events and key actors in multi-person videos,
Vignesh Ramanathan,
Jonathan Huang,
Sami Abu-El-Haija,
Alexander Gorban,
Kevin Murphy,
Li Fei-Fei.
In The 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Las Vegas, Nevada, June, 2016.
Im2Calories: towards an automated mobile vision food diary,
Austin Myers,
Nick Johnston,
Vivek Rathod,
Anoop Korattikara,
Alex Gorban,
Nathan Silberman,
Sergio Guadarrama,
George Papandreou,
Jonathan Huang,
Kevin Murphy.
In International Conference on Computer Vision (ICCV),
Santiago, Chile, December, 2015.
Deep Knowledge Tracing,
Chris Piech,
Jonathan Spencer,
Jonathan Huang,
Surya Ganguli,
Mehran Sahami,
Leonidas Guibas,
Jascha Sohl-Dickstein.
In
Neural Information Processing Systems (NIPS),
Montreal, Canada, December, 2015.
Learning Program Embeddings to Propagate Feedback,
Chris Piech,
Jonathan Huang,
Andy Nguyen,
Mike Phulsuksombati,
Mehran Sahami,
Leonidas Guibas.
In
International Conference on Machine Learning (ICML 2015),
Lille, France, July, 2015.
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision,
Jonathan Malmaud,
Jonathan Huang,
Vivek Rathod,
Nick Johnston,
Andrew Rabinovich,
Kevin Murphy.
In
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL),
Denver, Colorado, 2015.
Multiple Orderings of Events in Disease Progression,
Alexandra Young,
Neil Oxtoby,
Jonathan Huang,
Razvan Marinescu,
Pankag Daga,
David Cash,
Nick Fox,
Sebastien Ourselin,
Daniel Alexander.
In
Information Processing in Medical Imaging (IPMI),
Isle of Skye, Scotland, 2015.
Tuned Models of Peer Assessment in MOOCs,
Chris Piech,
Jonathan Huang,
Zhenghao Chen,
Chuong Do,
Andrew Ng,
Daphne Koller.
In
Proceedings of the 6th International Conference on Educational Data Mining (EDM 2013),
Memphis, TN, July, 2013.
A database of vocal tract resonance trajectories for research in speech processing,
Li Deng,
Xiaodong Cui,
Robert Pruvenok,
Jonathan Huang,
Safiyy Momen,
Yanyi Chen,
Abeer Alwan.
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006),
pp. 60--63, Toulouse, France, 2006.
Abstract:
Generalization to novel domains is a fundamental challenge for computer vision. Near-perfect accuracy on benchmarks is common, but these models do not work as expected when deployed outside of the training distribution . To build computer vision systems that truly solve real-world problems at global scale, we need benchmarks that fully capture real-world complexity, including geographic domain shift, long-tailed distributions, and data noise.
We propose urban forest monitoring as an ideal testbed for studying and improving upon these computer vision challenges, while working towards filling a crucial environmental and societal need. Urban forests provide significant benefits to urban societies. However, planning and maintaining these forests is expensive. One particularly costly aspect of urban forest management is monitoring the existing trees in a city: e.g., tracking tree locations, species, and health. Monitoring efforts are currently based on tree censuses built by human experts, costing cities millions of dollars per census and thus collected infrequently.
Previous investigations into automating urban forest monitoring focused on small datasets from single cities, covering only common categories . To address these shortcomings, we introduce a new large-scale dataset that joins public tree censuses from 23 cities with a large collection of street level and aerial imagery. Our Auto Arborist dataset contains over 2.5M trees and 344 genera and is >2 orders of magnitude larger than the closest dataset in the literature. We introduce baseline results on our dataset across modalities as well as metrics for the detailed analysis of generalization with respect to geographic distribution shifts, vital for such a system to be deployed at-scale.
BibTeX Citation:
@inproceedings{beery2022auto,
author = {Sara Beery and Guanhang Wu and Trevor Edwards and Filip Pavetic and Bo Majewski and Shreyasee Mukherjee and Stanley Chan and John Morgan and Vivek Rathod and Jonathan Huang},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022)},
pages = {21294--21307},
title = {The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift},
year = {2022}
}
Abstract:
This paper introduces temporally local metrics for Multi-Object Tracking. These metrics are obtained by restricting existing metrics based on track matching to a finite temporal horizon, and provide new insight into the ability of trackers to maintain identity over time. Moreover, the horizon parameter offers a novel, meaningful mechanism by which to define the relative importance of detection and association, a common dilemma in applications where imperfect association is tolerable. It is shown that the historical Average Tracking Accuracy (ATA) metric exhibits superior sensitivity to association, enabling its proposed local variant, ALTA, to capture a wide range of characteristics. In particular, ALTA is better equipped to identify advances in association independent of detection. The paper further presents an error decomposition for ATA that reveals the impact of four distinct error types and is equally applicable to ALTA. The diagnostic capabilities of ALTA are demonstrated on the MOT 2017 and Waymo Open Dataset benchmarks.
BibTeX Citation:
@article{valmadre2021local,
author = {Jack Valmadre and Alex Bewley and Jonathan Huang and Chen Sun and Cristian Sminchisescu and Cordelia Schmid},
journal = {arXiv preprint arXiv:2104.02631},
title = {Local Metrics for Multi-Object Tracking},
year = {2021}
}
Abstract:
In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state of the art performance. In this paper we show the benefits of including yet another stream based on human pose estimated from each frame -- specifically by rendering pose on input RGB frames. At first blush, this additional stream may seem redundant given that human pose is fully determined by RGB pixel values -- however we show (perhaps surprisingly) that this simple and flexible addition can provide complementary gains. Using this insight, we then propose a new model, which we dub PERF-Net (short for Pose Empowered RGB-Flow Net), which combines this new pose stream with the standard RGB and flow based input streams via distillation techniques and show that our model outperforms the state-of-the-art by a large margin in a number of human action recognition datasets while not requiring flow or pose to be explicitly computed at inference time.
BibTeX Citation:
@inproceedings{li2022perf,
author = {Li, Yinxiao and Lu, Zhichao and Xiong, Xuehan and Huang, Jonathan},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2022)},
pages = {513--522},
title = {Perf-net: Pose empowered rgb-flow net},
year = {2022}
}
Abstract:
Instance segmentation models today are very accurate when trained on large annotated datasets, but collecting mask annotations at scale is prohibitively expensive. We address the partially supervised instance segmentation problem in which one can train on (significantly cheaper) bounding boxes for all categories but use masks only for a subset of categories. In this work, we focus on a popular family of models which apply differentiable cropping to a feature map and predict a mask based on the resulting crop. Under this family, we study Mask R-CNN and discover that instead of its default strategy of training the mask-head with a combination of proposals and groundtruth boxes, training the mask-head with only groundtruth boxes dramatically improves its performance on novel classes. This training strategy also allows us to take advantage of alternative mask-head architectures, which we exploit by replacing the typical mask-head of 2-4 layers with significantly deeper off-the-shelf architectures (e.g. ResNet, Hourglass models). While many of these architectures perform similarly when trained in fully supervised mode, our main finding is that they can generalize to novel classes in dramatically different ways. We call this ability of mask-heads to generalize to unseen classes the strong mask generalization effect and show that without any specialty modules or losses, we can achieve state-of-the-art results in the partially supervised COCO instance segmentation benchmark. Finally, we demonstrate that our effect is general, holding across underlying detection methodologies (including anchor-based, anchor-free or no detector at all) and across different backbone networks. Code and pre-trained models are available at https://git.io/deepmac.
BibTeX Citation:
@article{birodkar2021surprising,
author = {Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang},
journal = {International Conference on Computer Vision (ICCV 2021)},
title = {The surprising impact of mask-head architecture on novel class segmentation},
year = {2021}
}
Abstract:
In static monitoring cameras, useful contextual information can stretch far beyond the few seconds typical video understanding models might see: subjects may exhibit similar behavior over multiple days, and background objects remain static. Due to power and storage constraints, sampling frequencies are low, often no faster than one frame per second, and sometimes are irregular due to the use of a motion trigger. In order to perform well in this setting, models must be robust to irregular sampling rates. In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera. Specifically, we propose an attention-based approach that allows our model, Context R-CNN, to index into a long term memory bank constructed on a per-camera basis and aggregate contextual features from other frames to boost object detection performance on the current frame.
We apply Context R-CNN to two settings: (1) species detection using camera traps, and (2) vehicle detection in traffic cameras, showing in both settings that Context R-CNN leads to performance gains over strong baselines. Moreover, we show that increasing the contextual time horizon leads to improved results. When applied to camera trap data from the Snapshot Serengeti dataset, Context R-CNN with context from up to a month of images outperforms a single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution based baseline) by 11.2% mAP.
BibTeX Citation:
@article{beery2020contextrcnn,
author = {Sara Beery and Guanhang Wu and Vivek Rathod and Ronny Votel and Jonathan Huang},
journal = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020)},
title = {Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection},
year = {2020}
}
Abstract:
Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other. Tracking systems clearly benefit from having access to accurate detections, however and there is ample evidence in literature that detectors can benefit from tracking which, for example, can help to smooth predictions over time. In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach such that it is amenable to instance-level embedding training. We show, via evaluations on the Waymo Open Dataset, that we outperform a recent state of the art tracking algorithm while requiring significantly less computation. We believe that our simple yet effective approach can serve as a strong baseline for future work in this area.
BibTeX Citation:
@article{lu2020retinatrack,
author = {Zhichao Lu and Vivek Rathod and Ronny Votel and Jonathan Huang},
journal = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020)},
title = {RetinaTrack: Online Single Stage Joint Detection and Tracking},
year = {2020}
}
Abstract:
In this paper, we propose a new generative model for multi-agent trajectory data, focusing on the case of multiplayer sports games. Our model leverages graph neural networks (GNNs) and variational recurrent neural networks (VRNNs) to achieve a permutation equivariant model suitable for sports. On two challenging datasets (basketball and soccer), we show that we are able to produce more accurate forecasts than previous methods. We assess accuracy using various metrics, such as log-likelihood and "best of N" loss, based on N different samples of the future. We also measure the distribution of statistics of interest, such as player location or velocity, and show that the distribution induced by our generative model better matches the empirical distribution of the test set. Finally, we show that our model can perform conditional prediction, which lets us answer counterfactual questions such as "how will the players move if A passes the ball to B instead of C?"
BibTeX Citation:
@article{yeh2019diverse,
author = {Raymond Yeh and Alexander Schwing and Jonathan Huang and Kevin Murphy},
journal = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)},
title = {Diverse Generation for Multi-Agent Sports Games},
year = {2019}
}
Abstract:
This paper presents a weakly-supervised approach to object instance segmentation. Starting with known or predicted object bounding boxes, we learn object masks by playing a game of cut-and-paste in an adversarial learning setup. A mask generator takes a detection box and Faster R-CNN features, and constructs a segmentation mask that is used to cut-and-paste the object into a new image location. The discriminator tries to distinguish between real objects, and those cut and pasted via the generator, giving a learning signal that leads to improved object masks. We verify our method experimentally using Cityscapes, COCO, and aerial image datasets, learning to segment objects without ever having seen a mask in training. Our method exceeds the performance of existing weakly supervised methods, without requiring hand-tuned segment proposals, and reaches 90% of supervised performance.
BibTeX Citation:
@article{remez2018learning,
author = {Tal Remez and Jonathan Huang and Matthew Brown},
journal = {European Conference on Computer Vision (ECCV 2018)},
title = {Learning to Segment via Cut-and-Paste},
year = {2018}
}
BibTeX Citation:
@article{xie2017rethinking,
author = {Saining Xie and Chen Sun and Jonathan Huang and Zhuowen Tu and Kevin Murphy},
journal = {European Conference on Computer Vision (ECCV 2018)},
title = {Rethinking Spatiotemporal Feature Learning For Video Understanding},
year = {2018}
}
Abstract:
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space. Direct comparison under the same search space shows that our method is up to 5 times more efficient than the RL method of Zoph et al. (2018) in terms of number of models evaluated, and 8 times faster in terms of total compute. The structures we discover in this way achieve state of the art classification accuracies on CIFAR-10 and ImageNet.
BibTeX Citation:
@article{liu2017progressive,
author = {Chenxi Liu and Barret Zoph and Jonathon Shlens and Wei Hua and Li-Jia Li and Fei-Fei Li and Alan Yuille and Jonathan Huang and Kevin Murphy},
journal = {European Conference on Computer Vision (ECCV 2018)},
title = {Progressive neural architecture search},
year = {2018}
}
Abstract:
Consider how easy it is for people to imagine what a "purple hippo" would look like, even though they do not exist. If we instead said "purple hippo with wings", they could just as easily create a different internal mental representation, to represent this more specific concept. To assess whether the person has correctly understood the concept, we can ask them to draw a few sketches, to illustrate their thoughts. We call the ability to map text descriptions of concepts to latent representations and then to images (or vice versa) visually grounded semantic imagination. We propose a latent variable model for images and attributes, based on variational auto-encoders, which can perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network, which can handle partially specified (abstract) concepts in a principled and efficient way. We also propose a set of easy-to-compute evaluation metrics that capture our intuitive notions of what it means to have good imagination, namely correctness, coverage, and compositionality (the 3 C's). Finally, we perform a detailed comparison (in terms of the 3 C's) of our method with two existing joint image-attribute VAE methods (the JMVAE method of (Suzuki et al., 2017) and the bi-VCCA method of (Wang et al., 2016)) by applying them to two simple datasets based on MNIST, where it is easy to objectively evaluate performance in a controlled way.
BibTeX Citation:
@article{vedantam2017generative,
author = {Ramakrishna Vedantam and Ian Fischer and Jonathan Huang and Kevin Murphy},
journal = {International Conference on Learning Representations (ICLR 2018)},
title = {Generative Models of Visually Grounded Imagination},
year = {2018}
}
Abstract:
This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation. We present experimental results showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets. Additionally, we evaluate the computation time maps on the visual saliency dataset cat2000 and find that they correlate surprisingly well with human eye fixation positions.
BibTeX Citation:
@article{figurnov2016spatial,
author = {Michael Figurnov and Maxwell Collins and Yukun Zhu and Li Zhang and Jonathan Huang and Dmitry Vetrov and Ruslan Salakhutdinov},
journal = {The 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
title = {Spatially Adaptive Computation Time for Residual Networks},
year = {2017}
}
Abstract:
In this paper, we study the trade-off between accuracy and speed when building an object detection system based on convolutional neural networks. We consider three main families of detectors --- Faster R-CNN, R-FCN and SSD --- which we view as "meta-architectures". Each of these can be combined with different kinds of feature extractors, such as VGG, Inception or ResNet. In addition, we can vary other parameters, such as the image resolution, and the number of box proposals. We develop a unified framework (in Tensorflow) that enables us to perform a fair comparison between all of these variants. We analyze the performance of many different previously published model combinations, as well as some novel ones, and thus identify a set of models which achieve different points on the speed-accuracy tradeoff curve, ranging from fast models, suitable for use on a mobile phone, to a much slower model that achieves a new state of the art on the COCO detection challenge.
BibTeX Citation:
@article{huang2016speed,
author = {Jonathan Huang and Vivek Rathod and Chen Sun and Menglong Zhu and Anoop Korattikara and Alireza Fathi and Ian Fischer and Zbigniew Wojna and Yang Song and Sergio Guadarrama and Kevin Murphy},
journal = {The 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
title = {Speed/accuracy trade-offs for modern convolutional object detectors},
year = {2017}
}
Abstract:
We present a generative model of images based on layering, in which image layers are individually generated, then composited from front to back. We are thus able to factor the appearance of an image into the appearance of individual objects within the image --- and additionally for each individual object, we can factor content from pose. Unlike prior work on layered models, we learn a shape prior for each object/layer, allowing the model to tease out which object is in front by looking for a consistent shape, without needing access to motion cues or any labeled data. We show that ordinary stochastic gradient variational bayes (SGVB), which optimizes our fully differentiable lower-bound on the log-likelihood, is sufficient to learn an interpretable representation of images. Finally we present experiments demonstrating the effectiveness of the model for inferring foreground and background objects in images.
BibTeX Citation:
@inproceedings{huang2015efficient,
address = {Las Vegas, Nevada},
author = {Jonathan Huang and Kevin Murphy},
booktitle = {The 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
title = {Efficient inference in occlusion-aware generative models of images},
year = {2016}
}
Abstract:
We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described. We show that our method outperforms previous methods that generate descriptions of objects without taking into account other potentially ambiguous objects in the scene. Our model is inspired by recent successes of deep learning methods for image captioning, but while image captioning is difficult to evaluate, our task allows for easy objective evaluation. We also present a new large-scale dataset for referring expressions, based on MS-COCO, which we plan to share publicly within two weeks.
BibTeX Citation:
@inproceedings{mao2015generation,
address = {Las Vegas, Nevada},
author = {Junhua Mao and Jonathan Huang and Alexander Toshev and Oana Camburu and Alan Yuille and Kevin Murphy},
booktitle = {The 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
title = {Generation and Comprehension of Unambiguous Object Descriptions},
year = {2016}
}
Abstract:
Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent the track features. We learn time-varying attention weights to combine these features at each time-instant. The attended features are then processed using another RNN for event detection/classification. Since most video datasets with multiple people are restricted to a small number of videos, we also collected a new basketball dataset comprising 257 basketball games with 14K event annotations corresponding to 11 event classes. Our model outperforms state-of-the-art methods for both event classification and detection on this new dataset. Additionally, we show that the attention mechanism is able to consistently localize the relevant players.
BibTeX Citation:
@inproceedings{ramanathan2015detecting,
address = {Las Vegas, Nevada},
author = {Vignesh Ramanathan and Jonathan Huang and Sami Abu-El-Haija and Alexander Gorban and Kevin Murphy and Li Fei-Fei},
booktitle = {The 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
title = {Detecting events and key actors in multi-person videos},
year = {2016}
}
Abstract:
We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories. The simplest version assumes that the user is eating at a restaurant for which we know the menu. In this case, we can collect images offline to train a multi-label classifier. At run time, we apply the classifier (running on your phone) to predict which foods are present in your meal, and we lookup the corresponding nutritional facts. We apply this method to a new dataset of images from 23 different restaurants, using a CNN-based classifier, significantly outperforming previous work. The
more challenging setting works outside of restaurants. In this case, we need to estimate the size of the foods, as well as their labels. This requires solving segmentation and depth / volume estimation from a single image. We present CNN-based approaches to these problems, with promising preliminary results.
BibTeX Citation:
@inproceedings{myers2015im2calories,
address = {Santiago, Chile},
author = {Austin Myers and Nick Johnston and Vivek Rathod and Anoop Korattikara and Alex Gorban and Nathan Silberman and Sergio Guadarrama and George Papandreou and Jonathan Huang and Kevin Murphy},
booktitle = {International Conference on Computer Vision (ICCV)},
month = {December},
title = {Im2Calories: towards an automated mobile vision food diary},
year = {2015}
}
Abstract:
Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education. Though effectively modeling student knowledge would have high educational impact, the task has many inherent challenges. In this paper we explore the utility of using Recurrent Neural Networks (RNNs) to model student learning. The RNN family of models have important advantages over previous methods in that they do not require the explicit encoding of human domain knowledge, and can capture more complex representations of student knowledge. Using neural networks results in substantial improvements in prediction performance on a range of knowledge tracing datasets. Moreover the learned model can be used for intelligent curriculum design and allows straightforward interpretation and discovery of structure in student tasks. These results suggest a promising new line of research for knowledge tracing and an exemplary application task for RNNs.
BibTeX Citation:
@inproceedings{piech2015deep,
address = {Montreal, Canada},
author = {Chris Piech and Jonathan Spencer and Jonathan Huang and Surya Ganguli and Mehran Sahami and Leonidas Guibas and Jascha Sohl-Dickstein},
booktitle = {Neural Information Processing Systems (NIPS)},
month = {December},
title = {Deep Knowledge Tracing},
year = {2015}
}
Abstract:
Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.
BibTeX Citation:
@inproceedings{piechetal15b,
address = {Lille, France},
author = {Chris Piech and Jonathan Huang and Andy Nguyen and Mike Phulsuksombati and Mehran Sahami and Leonidas Guibas},
booktitle = {International Conference on Machine Learning (ICML 2015)},
month = {July},
title = {Learning Program Embeddings to Propagate Feedback},
year = {2015}
}
Abstract:
We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.
BibTeX Citation:
@inproceedings{malmaudetal15,
address = {Denver, Colorado},
author = {Jonathan Malmaud and Jonathan Huang and Vivek Rathod and Nick Johnston and Andrew Rabinovich and Kevin Murphy},
booktitle = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL)},
title = {What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision},
volume = {15},
year = {2015}
}
Abstract:
The event-based model constructs a discrete picture of disease progression from cross-sectional data sets, with each event corresponding to a new biomarker becoming abnormal. However, it relies on the assumption that all subjects follow a single event sequence. This is a major simplification for sporadic disease data sets, which are highly heterogeneous, include distinct subgroups, and contain significant proportions of outliers. In this work we relax this assumption by considering two extensions to the event-based model: a generalised Mallows model, which allows subjects to deviate from the main event sequence, and a Dirichlet process mixture of generalised Mallows models, which models
clusters of subjects that follow different event sequences, each of which has a corresponding variance. We develop a Gibbs sampling technique to infer the parameters of the two models from multi-modal biomarker data sets. We apply our technique to data from the Alzheimer's Disease Neuroimaging Initiative to determine the sequence in which brain regions become abnormal in sporadic Alzheimer's disease, as well as the heterogeneity of that sequence in the cohort. We find that the generalised Mallows model estimates a larger variation in the event sequence across subjects than the original event-based model. Fitting a Dirichlet process model detects three subgroups of the population with different event sequences. The Gibbs sampler additionally provides an estimate of the uncertainty in each of the model parameters, for example an individual's latent disease stage and cluster assignment. The distributions and mixtures of sequences that this new family of models introduces offer better characterisation of disease progression of heterogeneous populations, new insight into disease mechanisms, and have the potential for enhanced disease stratification and differential diagnosis.
BibTeX Citation:
@inproceedings{youngetal15,
address = {Isle of Skye, Scotland},
author = {Alexandra Young and Neil Oxtoby and Jonathan Huang and Razvan Marinescu and Pankag Daga and David Cash and Nick Fox and Sebastien Ourselin and Daniel Alexander},
booktitle = {Information Processing in Medical Imaging (IPMI)},
title = {Multiple Orderings of Events in Disease Progression},
year = {2015}
}
BibTeX Citation:
@inproceedings{piechetal15,
address = {Vancouver, Canada},
author = {Chris Piech and Mehran Sahami and Jonathan Huang and Leonidas Guibas},
booktitle = {ACM Conference on Learning at Scale (LAS'15)},
title = {Autonomously Generating Hints by Inferring Problem Solving Policies},
year = {2015}
}
Abstract:
Massive open online courses (MOOCs), one of the latest internet revolutions have engendered hope that constant iterative improvement and economies of scale may cure the ``cost disease" of higher education. While scalable in many ways, providing feedback for homework submissions (particularly open-ended ones) remains a challenge in the online classroom. In courses where the student-teacher ratio can be ten thousand to one or worse, it is impossible for instructors to personally give feedback to students or to understand the multitude of student approaches and pitfalls. Organizing and making sense of massive collections of homework solutions is thus a critical web problem. Despite the challenges, the dense solution space sampling in highly structured homeworks for some MOOCs suggests an elegant solution to providing quality feedback to students on a massive scale.
We outline a method for decomposing online homework submissions into a vocabulary of ``code phrases'', and based on this vocabulary, we architect a queryable index that allows for fast searches into the massive dataset of student homework submissions. To demonstrate the utility of our homework search engine we index over a million code submissions from users worldwide in Stanford's Machine Learning MOOC and (a) semi-automatically learn shared structure amongst homework submissions and (b) generate specific feedback for student mistakes.
Codewebs is a tool that leverages the redundancy of densely sampled, highly structured homeworks in order to force-multiply teacher effort. Giving articulate, instant feedback is a crucial component of the online learning process and thus by building a homework search engine we hope to take a step towards higher quality free education.
BibTeX Citation:
@inproceedings{nguyen14,
address = {Seoul, Korea},
author = {Andy Nguyen and Christopher Piech and Jonathan Huang and Leonidas Guibas},
booktitle = {The 23rd International World Wide Web Conference (WWW'14)},
title = {Codewebs: Scalable Homework Search for Massive Open Online Programming Courses},
year = {2014}
}
Abstract:
Discussion forums, employed by MOOC providers as the primary mode of interaction among instructors and students, have emerged as one of the important components of online courses. We empirically study contribution behavior in these online collaborative learning forums using data from 44 MOOCs hosted on Coursera, focusing primarily on the highest-volume contributors - "superposters" - in a forum. We explore who these superposters are and study their engagement patterns across the MOOC platform, with a focus on the following question - to what extent is superposting a positive phenomenon for the forum? Specifically, while superposters clearly contribute heavily to the forum in terms of quantity, how do these contributions rate in terms of quality, and does this prolific posting behavior negatively impact contribution from the large remainder of students in the class? We analyze these questions across the courses in our dataset, and find that superposters display above-average engagement across Coursera, enrolling in more courses and obtaining better grades than the average forum participant; additionally, students who are superposters in one course are significantly more likely to be superposters in other courses they take. In terms of utility, our analysis indicates that while being neither the fastest nor the most upvoted, superposters' responses are speedier and receive more upvotes than the average forum user's posts; a manual assessment of quality on a subset of this content supports this conclusion that a large fraction of superposter contributions indeed constitute useful content. Finally, we find that superposters' prolific contribution behavior does not `drown out the silent majority' - high superposter activity correlates positively and significantly with higher overall activity and forum health, as measured by total contribution volume, higher average perceived utility in terms of received votes, and a smaller fraction of orphaned threads.
BibTeX Citation:
@inproceedings{huang14,
address = {Atlanta, Georgia},
author = {Jonathan Huang and Anirban Dasgupta and Arpita Ghosh and Jane Manning and Marc Sanders},
booktitle = {ACM Conference on Learning at Scale (LAS'14)},
title = {Superposter behavior in MOOC forums},
year = {2014}
}
Abstract:
In massive open online courses (MOOCs), peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students. But despite promising initial trials, it does not always deliver accurate results compared to human experts. In this paper, we develop algorithms for estimating and correcting for grader biases and reliabilities, showing significant improvement in peer grading accuracy on real data with 63,199 peer grades from Coursera's HCI course offerings --- the largest peer grading networks analysed to date. We relate grader biases and reliabilities to other student factors such as student engagement, performance as well as commenting style. We also show that our model can lead to more intelligent assignment of graders to gradees.
BibTeX Citation:
@inproceedings{piech13,
address = {Memphis, TN},
author = {Chris Piech and Jonathan Huang and Zhenghao Chen and Chuong Do and Andrew Ng and Daphne Koller},
booktitle = {Proceedings of the 6th International Conference on Educational Data Mining (EDM 2013)},
month = {July},
title = {Tuned Models of Peer Assessment in MOOCs},
year = {2013}
}
Abstract:
In the first offering of Stanford's Machine Learning Massive Open-Access Online Course (MOOC) there were over a million programming submissions to 42 assignments --- a dense sampling of the range of possible solutions. In this paper we map out the syntax and functional similarity of the submissions in order to explore the variation in solutions. While there was a massive number of submissions, there is a much smaller set of unique approaches. This redundancy in student solutions can be leveraged to ``force multiply'' teacher feedback.
BibTeX Citation:
@misc{huangetal13a,
address = {Memphis, Tennessee},
author = {Jonathan Huang and Chris Piech and Andy Nguyen and Leonidas Guibas},
howpublished = {Artificial Intelligence in Education (AIED) Workshop on MOOCs (MOOCshop)},
month = {July},
title = {Syntactic and Functional Variability of a Million Code Submissions in a Machine Learning MOOC},
year = {2013}
}
Abstract:
Accurate and detailed models of neurodegenerative disease progression such as Alzheimer's (AD) are crucially important for reliable early diagnosis and the determination of effective treatments. We introduce the ALPACA (Alzheimer's disease Probabilistic Cascades) model, a generative model linking latent Alzheimer's progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a fixed event ordering, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed progression models. We describe efficient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer's patients from the Alzheimer's Disease Neuroimaging Initiative.
BibTeX Citation:
@inproceedings{huangetal12c,
address = {South Lake Tahoe, CA},
author = {Jonathan Huang and Daniel Alexander},
booktitle = {Neural Information Processing Systems (NIPS)},
month = {December},
title = {Probabilistic Event Cascades for Alzheimer's disease},
year = {2012}
}
Abstract:
Distributions over rankings are used to model data in a multitude of real world settings such as preference analysis and political elections. Modeling such distributions presents several computational challenges, however, due to the factorial size of the set of rankings over an item set. Some of these challenges are quite familiar to the artificial intelligence community, such as how to compactly represent a distribution over a combinatorially large space, and how to efficiently perform probabilistic inference with these representations. With respect to ranking, however, there is the additional challenge of what we refer to as human task complexity users are rarely willing to provide a full ranking over a long list of candidates, instead often preferring to provide partial ranking information. Simultaneously addressing all of these challenges i.e., designing a compactly representable model which is amenable to efficient inference and can be learned using partial ranking data is a difficult task, but is necessary if we would like to scale to problems with nontrivial size. In this paper, we show that the recently proposed riffled independence assumptions cleanly and efficiently address each of the above challenges. In particular, we establish a tight mathematical connection between the concepts of riffled independence and of partial rankings. This correspondence not only allows us to then develop efficient and exact algorithms for performing inference tasks using riffled independence based represen- tations with partial rankings, but somewhat surprisingly, also shows that efficient inference is not possible for riffle independent models (in a certain sense) with observations which do not take the form of partial rankings. Finally, using our inference algorithm, we introduce the first method for learning riffled independence based models from partially ranked data.
BibTeX Citation:
@article{huangetal12b,
author = {Jonathan Huang and Ashish Kapoor and Carlos Guestrin},
journal = {Journal of Artificial Intelligence},
pages = {491-532},
title = {Riffled Independence for Efficient Inference with Partial Ranking},
volume = {44},
year = {2012}
}
Abstract:
Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.
BibTeX Citation:
@article{huangetal12,
author = {Jonathan Huang and Carlos Guestrin},
journal = {Electronic Journal of Statistics (EJS)},
note = {Also available at \url{http://arxiv.org/abs/1006.1328}},
pages = {199-230},
title = {Uncovering the Riffled Independence Structure of Rankings},
volume = {6},
year = {2012}
}
Abstract:
Probabilistic reasoning and learning with permutation data arises as a fundamental problem in myriad applications such as modeling preference rankings over objects (such as webpages), tracking multiple moving objects, reconstructing the temporal ordering of events from multiple imperfect accounts, and more. Since the number of permutations scales factorially with the number of objects being ranked or tracked, however, it is not feasible to represent and reason with arbitrary probability distributions on permutations. Consequently, many approaches to probabilistic reasoning problems on the group of permutations have been either ad-hoc, unscalable, and/or relied on rigid and unrealistic assumptions. For example, common factorized probability distribution representations, such as graphical models, are inefficient due to the mutual exclusivity constraints that are typically associated with permutations.
This thesis addresses problems of scalability for probabilistic reasoning with permutations by exploiting a number of methods for decomposing complex distributions over permutations into simpler, smaller component parts. In particular, we explore two general and complementary approaches for decomposing distributions over permutations: (1) \emph{additive decompositions} and (2) \emph{multiplicative decompositions}. Our additive approach is based on the idea of projecting a distribution onto a group theoretic generalization of the Fourier basis. Our multiplicative approach assumes a factored form for the underlying probability distribution based on a generalization of the notion of probabilistic independence which we call \emph{riffled independence}.
We show that both probabilistic decompositions lead to compact representations for distributions over permutations and that one can formulate efficient probabilistic inference algorithms by taking advantage of the combinatorial structure of each representation. An underlying theme throughout is the idea that both kinds of structural decompositions can be employed in tandem to relax the apparent intractability of probabilistic reasoning over the space of permutations.
From the theoretical side, we address a number of problems in understanding the consequences of our approximations. For example, we present results in this thesis which illuminate the nature of error propagation in the Fourier domain and propose methods for mitigating their effects.
Finally, we apply our decompositions to multiple application domains. For example, we show how the methods in the thesis can be used to solve challenging camera tracking scenarios as well as to reveal latent voting patterns and structure in Irish political elections and food preference surveys.
To summarize, the main contributions of this thesis can be categorized into the following three broad categories:
-Principled and compactly representable approximations of probability distributions over the group of permutations,
-Flexible probabilistic reasoning and learning algorithms which exploit the structure of these compact representations for running time efficiency, and
-Theoretical analyses of approximation quality as well as algorithmic and sample complexity.
BibTeX Citation:
@phdthesis{huangthesis11,
author = {Jonathan Huang},
school = {Carnegie Mellon University},
title = {Probabilistic Reasoning and Learning on Permutations: Exploiting Structural Decompositions of the Symmetric Group},
year = {2011}
}
Abstract:
Distributions over rankings are used to model data in various settings such as preference analysis and political elections. The factorial size of the space of rankings, however, typically forces one to make structural assumptions, such as smoothness, sparsity, or probabilistic independence about these underlying distributions. We approach the modeling problem from the computational principle that one should make structural assumptions which allow for efficient calculation of typical probabilistic queries. For ranking models, ``typical'' queries predominantly take the form of partial ranking queries (e.g., given a user's top-$k$ favorite movies, what are his preferences over remaining movies?). In this paper, we argue that riffled independence factorizations proposed in recent literature are a natural structural assumption for ranking distributions, allowing for particularly efficient processing of partial ranking queries.
BibTeX Citation:
@inproceedings{huangetal11b,
address = {Barcelona, Spain},
author = {Jonathan Huang and Ashish Kapoor and Carlos Guestrin},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
month = {July},
title = {Efficient Probabilistic Inference with Partial Ranking Queries},
year = {2011}
}
Abstract:
We compare two recently proposed approaches for representing probability distributions over the space of permutations in the context of multi-target tracking. We show that these two representations, the Fourier approximation and the information form approximation can both be viewed as low dimensional projections of a true distribution, but with respect to different metrics. We identify the strengths and weaknesses of each approximation, and propose an algorithm for converting between the two forms, allowing for a \emph{hybrid} approach that draws on the strengths of both representations. We show experimental evidence that there are situations where hybrid algorithms are favorable.
BibTeX Citation:
@inproceedings{jiangetal11,
address = {Athens, Greece},
author = {Xiaoye Jiang and Jonathan Huang and Leonidas Guibas},
booktitle = {The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML 2011)},
month = {September},
title = {Fourier-Information Duality in the Identity Management Problem},
year = {2011}
}
Abstract:
Riffled independence is a generalized notion of probabilistic independence that has been shown to be naturally applicable to ranked data. In the riffled independence model, one assigns rankings to two disjoint sets of items independently, then in a second stage, interleaves (or riffles) the two rankings together to form a full ranking, as if by shuffling a deck of cards. Because of this interleaving stage, it is much more difficult to detect riffled independence than ordinary independence. In this paper, we provide the first automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.
BibTeX Citation:
@inproceedings{huangetal10,
address = {Haifa, Israel},
author = {Jonathan Huang and Carlos Guestrin},
booktitle = {International Conference on Machine Learning (ICML 2010)},
month = {June},
title = {Learning Hierarchical Riffle Independent Groupings from Rankings},
year = {2010}
}
Abstract:
In this paper, we extend the Hilbert space embedding approach to handle conditional distributions. We derive a kernel estimate for the conditional embedding, and show its connection to ordinary embeddings. Conditional embeddings largely extend our ability to manipulate distributions in Hibert spaces, and as an example, we derive a nonparametric method for modeling dynamical systems where the belief state of the system is maintained as a conditional embedding. Our method is very general in terms of both the domains and the types of distributions that it can handle, and we demonstrate the effectiveness of our method in various dynamical systems. We expect that conditional embeddings will have wider applications beyond modeling dynamical systems.
BibTeX Citation:
@inproceedings{songetal09,
address = {Montreal, Canada},
author = {Le Song and Jonathan Huang and Alex Smola and Kenji Fukumizu},
booktitle = {International Conference on Machine Learning (ICML 2009)},
month = {June},
title = {Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems},
year = {2009}
}
Abstract:
Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are $n!$ possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the ``low-frequency'' terms of a Fourier decomposition to represent distributions over permutations compactly. We present \emph{Kronecker conditioning}, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for polynomial time bandlimited approximations. Low order Fourier-based approximations, however, may lead to functions that do not correspond to valid distributions. To address this problem, we present a quadratic program defined directly in the Fourier domain for projecting the approximation onto a relaxation of the polytope of legal marginal distributions. We demonstrate the effectiveness of our approach on a real camera-based multi-person tracking scenario.
BibTeX Citation:
@article{huangetal09b,
author = {Jonathan Huang and Carlos Guestrin and Leonidas Guibas},
journal = {Journal of Machine Learning Research (JMLR)},
month = {May},
pages = {997-1070},
title = {Fourier Theoretic Probabilistic Inference over Permutations},
volume = {10},
year = {2009}
}
Abstract:
Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of $n$ objects scales factorially in $n$. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called \emph{riffled independence}, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the \emph{riffle shuffle}, common in card games, to combine the two permutations to form a single permutation. In ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers.
BibTeX Citation:
@inproceedings{huangetal09c,
address = {Vancouver, Canada},
author = {Jonathan Huang and Carlos Guestrin},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
month = {December},
note = {(accepted for spotlight presentation)},
title = {Riffled Independence for Ranked Data},
year = {2009}
}
Abstract:
Permutations are ubiquitous in many real world problems, such as voting, rankings and data association. Representing uncertainty over permutations is challenging, since there are $n!$ possibilities. Recent Fourier-based approaches can be used to provide a compact representation over low-frequency components of the distribution. Though polynomial, the complexity of these representations grows very rapidly, especially if we want to maintain reasonable estimates for peaked distributions. In this paper, we first characterize the notion of probabilistic independence for distributions over permutations. We then present a method for factoring distributions into independent components in the Fourier domain, and use our algorithms to decompose large problems into much smaller ones. We demonstrate that our method provides very significant improvements in terms of running time, on real tracking data.
BibTeX Citation:
@inproceedings{huangetal09a,
address = {Clearwater Beach, Florida},
author = {Jonathan Huang and Carlos Guestrin and Xiaoye Jiang and Leonidas Guibas},
booktitle = {Artificial Intelligence and Statistics (AISTATS)},
month = {April},
note = {(accepted for oral presentation)},
title = {Exploiting Probabilistic Independence for Permutations},
year = {2009}
}
Abstract:
Permutations are ubiquitous in many real world problems, such as voting, rankings and data association. Representing uncertainty over permutations is challenging, since there are $n!$ possibilities, and typical compact representations such as graphical models cannot efficiently capture the mutual exclusivity constraints associated with permutations. In this paper, we use the ``low-frequency'' terms of a Fourier decomposition to represent such distributions compactly. We present \emph{Kronecker conditioning}, a general and efficient approach for maintaining these distributions directly in the Fourier domain. Low order Fourier-based approximations can lead to functions that do not correspond to valid distributions. To address this problem, we present an efficient quadratic program defined directly in the Fourier domain to project the approximation onto a relaxed form of the marginal polytope. We demonstrate the effectiveness of our approach on a real camera-based multi-people tracking setting.
BibTeX Citation:
@inproceedings{huangetal07,
address = {Vancouver, Canada},
author = {Jonathan Huang and Carlos Guestrin and Leonidas Guibas},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
month = {December},
note = {(accepted for oral presentation, honorable mention for best student paper)},
title = {Efficient Inference for Distributions on Permutations},
year = {2007}
}
Abstract:
While vocal tract resonances (VTRs, or formants that are defined as such resonances) are known to play a critical role in human speech perception and in computer speech processing, there has been a lack of standard databases needed for the quantitative evaluation of automatic VTR extraction techniques. We report in this paper on our recent effort to create a publicly available database of the first three VTR frequency trajectories. The database contains a representative subset of the TIMIT corpus with respect to speaker, gender, dialect and phonetic context, with a total of 538 sentences. A Matlab-based labeling tool is developed, with high-resolution wideband spectrograms displayed to assist in visual identification of VTR frequency values which are then recorded via mouse clicks and local spline interpolation. Special attention is paid to VTR values during consonant-to-vowel (CV) and vowel-to-consonant (VC) transitions, and to speech segments with vocal tract anti-resonances. Using this database, we quantitatively assess two common automatic VTR tracking techniques in terms of their average tracking errors analyzed within each of the six major broad phonetic classes as well as during CV and VC transitions. The potential use of the VTR database for research in several areas of speech processing is discussed.
BibTeX Citation:
@inproceedings{deng06,
address = {Toulouse, France},
author = {Li Deng and Xiaodong Cui and Robert Pruvenok and Jonathan Huang and Safiyy Momen and Yanyi Chen and Abeer Alwan},
booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006)},
pages = {60--63},
title = {A database of vocal tract resonance trajectories for research in speech processing},
year = {2006}
}