Iowa State data miners top all American teams, finish fifth in international competition

AMES, Iowa – A team of six Iowa State University doctoral students in statistics recently placed fifth in the 2013 international Data-Mining Cup. The Iowa Staters were the top-placing American team.

“This is quite exciting,” said Wen Zhou, the team’s leader. “All the team members contributed a lot and sacrificed their own time for sleeping.”

In the final 10 days before they had to turn in their solution, Zhou said team members worked up to 12 hours per day on the data-mining problem.

“It was quite a challenge,” he said. “We only had one month to do this.”

What Zhou and his teammates – Cory Lanker, Fangfang Liu, Jia Liu, Ian Mouzon and Wei Zhang – had to do was sort through a dataset featuring 50,000 customer online shopping sessions from a German retailer and nearly 500,000 transactions. The data included demographic information about individual shoppers and their online shopping habits.

The 99 teams in the contest had to mine all that information for clues about whether shoppers would or wouldn’t make a purchase. Teams then had to build an algorithm to accurately predict whether another 5,111 test sessions would result in a purchase or not.

The Iowa Staters used their expertise in data analytics, machine learning and statistical learning to develop a solution that correctly predicted purchases 97 percent of the time. The team made 154 errors over all the test sessions. That’s 10 more errors than the winning team from the Technical University of Dortmund in Germany.

“This was a challenging task and very close to the kinds of things that companies now pay big money to have analytics companies do for them,” said Stephen Vardeman, University Professor of statistics and industrial engineering, who taught a course in machine learning and suggested some of his students should enter the contest. “Statistics has always been about finding, quantifying and making use of patterns in data. But its traditional applications have been to wringing all information possible out of scarce data. The news these days in analytics is that now we're looking at lots of data and trying to intelligently do the same.”

Zhou – a native of Tianjin, China, who’s working as an intern for Monsanto Co. this summer – said the team is satisfied with its work.

“The very moment I knew the results, I was very curious about how the first team did,” he said. “I realized we were pretty close, so we’re very happy.”