Data Scientist

Job Number: 174842 Primary Location: Prague Czech Republic
Schedule: Full-time Organization: Global Delivery Center (GDC)

Primary Responsibilities:
As a Data Scientist, you will perform analysis, and be responsible for implementation and support of large scale data and analytics for our clients. You will work in a team whose data science efforts range from exploration and investigation to design and development of analytic systems. Your technical leadership is extracting meaning from large scale, unstructured data is coupled to your ability to work with engineering teams to integrate and underlying systems as Think Big provides Big Data solutions to clients.

Secondary Responsibilities:
Additional responsibilities will include providing big data solutions for our clients, including analytical consulting, statistical modeling and quantitative solutions. Mentor sophisticated organizations on large scale data and analytics and work closely with client teams to deliver results. You will help translate business cases to clear research projects, be the exploratory or confirmatory, to help our clients utilize data to drive their businesses. Collaborate and communicate across geographically distributed teams and with external clients.

Job Qualifications:
• Coursework in mathematics, statistics, machine learning and data mining
• Proficiency in R or other math packages (Matlab, SAS, etc.)
• Experience with Java and Python
• Excellent programming skills in object-oriented languages
• Adept at learning and applying new technologies
• Excellent verbal and written communication skills
• Strong team player capable of working in a demanding start-up environment


Preferred Knowledge, Skills and Abilities:
· Core programming, text file manipulation, and statistics with Numpy, Pandas, Scikit or other approved modules
· Data frames, data manipulation, and objects
· Command line, pipes, and remote terminals
· Push and pull versions and code brands from approved version control system at Think Big
· Loading & parsing data in Spark. Use SQL context in Spark. GraphX proficient. Develop Models leveraging Spark (ML or MLLib)
· Exporting, importing, aggregating, and filtering data in one of the relationship stores: SQL, Hive, Pig, or approved other technology
· Cleaning, manipulating, and formatting data stored in all of these non-relational stores: flat files and RESTful APIs
· Writing jobs to read, filter, manipulate, and aggregate data stored in Hadoop with one of the APIs: Spark, Java MR, Hadoop Streaming w/ Python, or approved other API
· Generating data profiles including measures of central tendency, measures of deviation, and correlations in R, Python or other "non-big-data" technologies. Generation of basic charts (e.g. histograms, scatter plots, line charts) for data analysis purposes
· Generating data profiles including measures of central tendency, measures of deviation, and correlations over Hadoop & Spark or other approved big-data technology. Generation of basic charts (e.g. histograms, scatter plots, line charts) for data analysis purposes
· Design, develop and implement dashboards & reports using R-Shiny, Ipython Notebooks, Zeppelin or other approved open-source visualization technology
· Calculating and interpreting ANOVA models, ANCOVA models, hypothesis tests, and confidence intervals
· Creating and interpreting at least one type of each of these statistical models: GLM, CART, ensembles
· Creating and interpreting one of these models: k-means, hierarchical agglomerative clustering, or approved other clustering model
· Mapping data to lower dimensionality space using PCA, SVD, NMF, or other approved technique
· Modeling outcomes using kernels, nearest neighbors, LASSO, or other approved technique
· Bagging, boosting, and stacking models to generate meta-models
· Able to write technical reports for projects and/or internal collateral for training or internal assets
· Able to write non-technical documents that describe our offer (or solutions) for non-technical audience. This can include a delivery presentation for non-technical audience, a conference presentation or marketing material
· Able to deliver presentations during client meetings, conferences or sales events to explain our offers, positioning, solutions to technical and non-technical audience
· Successfully completed an analytics agenda including activities captured in Descriptive Statistics, Exploratory Visualizations, and at least 2 activities from Basic Modeling
· Gave internal talk to Think Big on an approved data science topic
· Successfully executed data science responsibilities and delivered results on a client project
· At least one year working in quantitative roles
· Able to estimate time needed to complete assigned tasks and deliver in that time period

Job Abilities:
Must be able to sit for long periods of time working on computers.. Must be able to interact and communicate with the client in meetings. Must be able to write programming code in applicable languages. Must be able to write project documentation in English.

Bachelor's Degree in Computer Science or related field of study or equivalent work experience. Employer will accept any suitable combination of education, training, or experience.

We offer:

• Relocation bonus (if coming to Prague from abroad)
• World-class technical training within Think Big Academy including Think Big BootCamp (intensive training for all new hires)
• 25 days of holiday a year (instead of standard 20 as per the Czech law)
• Language courses
• Private medical care
• Meal vouchers in amount of 100 CZK/day (Teradata contributes 75 CZK/day)
• Company’s contribution for the Pension fund (up to 3% of monthly gross salary)
• Life insurance
• Company’s assistance in case of sickness (25% of your gross base salary)
• Employee Referral Program (4.000 USD)
• Card for sports
• Employee Stock Purchase Program
• Contribution 300 USD for a mobile phone of your choice (every 2 years)

Apply Now


Apply Now