This repository consists an anonymized version of six datasets taken from IBM's DataStage™ production systems and used for frequent subgraph mining in the paper Refactoring ETL Flows in The Wild. If ...