- Perform exploratory and statistical analysis on data petabytes. - Develop PoCs for machine learning process. - Carry out AD-Hoc data reports extraction. - Serve as escalation point-of-contact for customers on data analytics and security settings - Provide technical support and guidance to customers - Manage customer communications for analytic and security settings - Collaborate with internal organizations on projects and initiatives - Identify and document process improvements - Identify expansion opportunities, future use cases and implementation rollouts with customer
- Acting as data specialist in Decision Science project to R&D area with goal to promote fine-tune on locations of product tests and market share areas. Responsible for solution design and data modeling using environmental (weather/soil) and market data it has been used for PoC machine learning approach in AWS Cloud and a Power BI dashboard was final product to the best regions clustering recommendations. - Development of management and operational dashboards for several areas inside the R&D organization: HR, HSE, Field Operations, and Project Management. - Building of ETL processes having no-structured and structured data coming from APIs and legacy systems database. AD-Hoc queries using mixed advanced SQL scripts and Python solutions for numerous areas. Acting together with statisticians in order to define best practices to deploy models in production.
- Pipeline development performing ingestion of file generated from legacy systems and relational databases (Oracle, SQL Server and Apache Hive/Impala/BigQuery). - Data ingestion using Apache Scoop and Apache NiFi, having as job management and scheduling shell scripts for automatization the automation server Jenkins. - Design and development of automation process of downstream/upstream monitoring and outages and anomalies alerts with Python and Apache Spark. - Web scrapping process using python, selenium and Beatifullsoup for information extraction related legal process.
- Process mapping of charging area, having as outcome the building of automated process inherent to the department. - ETL process building in SAS 7.1 Guide with files from mainframe, TXT, CSV, spreadsheets and positional files using proc SQL, dataset manipulations with macro and SAS statements. - Data pipeline structuration to metrics and indexes generation to operational and management levels.
- Allocated in a deploying analytics project, having as responsibilities of table analysis and building of views for information extraction and data junction on MySQL. - ETL process building and automatization with data sources from Hadoop, prediction model files, legacy system files and MySQL database using Oracle Data Integrator as integration tool. - Table buildings in Cloudera Hadoop distribution using HIVE and Impala. File manipulation between Hadoop clusters, directory creation and bash scripts in Oracle Linux environment. - Dimensional modeling using star schema for fact and dimension tables. Building of panels and dashboards using Oracle Analytics Cloud data visualization tool.